All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/6] generate full callchain cursor entries for inlined frames
@ 2017-10-18 18:53 Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
                   ` (6 more replies)
  0 siblings, 7 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung; +Cc: Linux-kernel, linux-perf-users, Milian Wolff


This series of patches completely reworks the way inline frames are handled.
Instead of querying for the inline nodes on-demand in the individual tools,
we now create proper callchain nodes for inlined frames. The advantages this
approach brings are numerous:

- less duplicated code in the individual browser
- aggregated cost for inlined frames for the --children top-down list
- various bug fixes that arose from querying for a srcline/symbol based on
  the IP of a sample, which will always point to the last inlined frame
  instead of the corresponding non-inlined frame
- overall much better support for visualizing cost for heavily-inlined C++
  code, which simply was confusing and unreliably before
- srcline honors the global setting as to whether full paths or basenames
  should be shown
- caches for inlined frames and srcline information, which allow us to
  enable inline frame handling by default

For comparison, below lists the output before and after for `perf script`
and `perf report`. The example file I used to generate the perf data is:

~~~~~
$ cat inlining.cpp
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
$ g++ -O2 -g -o inlining inlining.cpp
$ perf record --call-graph dwarf ./inlining
~~~~~

Now, the (broken) status-quo looks like this. Look for "NOTE:" to see some
of my comments that outline the various issues I'm trying to solve by this
patch series.

~~~~~
$ perf script --inline
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                         std::__complex_abs
                         std::abs<double>
                         std::_Norm_helper<true>::_S_do_it<double>
                         std::norm<double>
                         main
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
# NOTE: the above inlined stack is confusing: the a4a is an address into main,
#       which is the non-inlined symbol. the entry with the address should be
#       at the end of the stack, where it's actually duplicated once more but
#       there it's missing the address
...
$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio
...
             --38.86%--_start
                       __libc_start_main
                       |
                       |--15.68%--main random.tcc:3326
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
                       |
                       |--10.36%--main random.h:143
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:143 (inline)
                       |
                       |--5.66%--main random.tcc:3332
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
...
# NOTE: the grouping is totally off because the first and last frame of the
        inline nodes is completely bogus, since the IP is used to find the sym/srcline
        which is different from the actual inlined sym/srcline.
        also, the code currently displays either the inlined function name or
        the corresponding filename (but in full length, instead of just the basename).

$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--main random.tcc:3326
            |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
            |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
            |          __libc_start_main
            |          _start
...
# NOTE: the srcline for main is wrong, it should be inlining.cpp:14,
        i.e. what is displayed in the line below (see also perf script issue above)
~~~~~

Afterwards, all of the above issues are resolved (and inlined frames are
displayed by default):

~~~~~
$ perf script
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
...
# NOTE: only one main entry, at the correct position.
        we do display the (repeated) instruction pointer as that ensures
        interoperability with e.g. the stackcollapse-perf.pl script

$ perf report -s sym -g srcline -i perf.inlining.data --stdio
...
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664 (inlined)
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
            |          std::abs<double> complex:597 (inlined)
            |          std::__complex_abs complex:589 (inlined)
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
            |
             --38.86%--_start
                       __libc_start_main
                       |
                       |--38.19%--main inlining.cpp:14
                       |          |
                       |          |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
                       |          |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
                       |          |          |
                       |          |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
                       |          |                     |
                       |          |                     |--17.91%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 (inlined)
                       |          |                     |          |
                       |          |                     |           --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 (inlined)
                       |          |                     |                     std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 (inlined)
                       |          |                     |                     |
                       |          |                     |                     |--10.36%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 (inlined)
                       |          |                     |                     |
                       |          |                     |                      --1.88%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:141 (inlined)
                       |          |                     |
                       |          |                     |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
                       |          |                     |
                       |          |                      --0.79%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3335 (inlined)
                       |          |
                       |           --1.99%--std::norm<double> complex:664 (inlined)
                       |                     std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
                       |                     std::abs<double> complex:597 (inlined)
                       |                     std::__complex_abs complex:589 (inlined)
                       |
                        --0.67%--main inlining.cpp:13
...

# NOTE: still somewhat confusing due to the _start and __libc_start_main frames
        that actually are *above* the main frame. But at least the stuff below
        properly splits up and shows that mutiple functions got inlined into
        inlining.cpp:14, not just one as before.

$ perf report -s sym -g srcline -i perf.inlining.data --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
            |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
            |          main inlining.cpp:14
            |          __libc_start_main
            |          _start
...
# NOTE: the first and last entry of the inline stack have the correct symbol and srcline now
        both function and srcline is shown, as well as the (inlined) suffix
        only the basename of the srcline is shown

v6 rebases against the partial merge of this patch series in acme/perf/core

v5 attends to Namhyung's code review. Most notably, it fixes a use-after-free
   crash. Additionally, srcline resolution for hist entries now also works
   correctly, when inline frame resolution is disabled.

v4 splits the patch to create full callchain nodes for inline frames up further
   as suggested by Jiri. It also removes C99 comments and initializes the
   rb_root properly.

v3 splits the initial patch up into two to simplify reviewing. It also adds a
   comment to clarify the lifetime handling of fake symbols and aliased non-fake
   symbols, based on the feedback by Namhyung.

v2 fixes some issues reported by Namhyung or found by me in further
testing, adds caching and enables inline frames by default.


Milian Wolff (6):
  perf report: properly handle branch count in match_chain
  perf report: cache failed lookups of inlined frames
  perf report: cache srclines for callchain nodes
  perf report: use srcline from callchain for hist entries
  perf util: enable handling of inlined frames by default
  perf util: use correct IP mapping to find srcline for hist entry

 tools/perf/Documentation/perf-report.txt |   3 +-
 tools/perf/Documentation/perf-script.txt |   3 +-
 tools/perf/util/callchain.c              | 130 ++++++++++++++++---------------
 tools/perf/util/dso.c                    |   2 +
 tools/perf/util/dso.h                    |   1 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/hist.c                   |   2 +
 tools/perf/util/machine.c                |  32 +++++---
 tools/perf/util/sort.c                   |   2 +-
 tools/perf/util/srcline.c                |  82 +++++++++++++++----
 tools/perf/util/srcline.h                |   7 ++
 tools/perf/util/symbol.c                 |   1 +
 tools/perf/util/symbol.h                 |   1 +
 13 files changed, 176 insertions(+), 91 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
@ 2017-10-18 18:53 ` Milian Wolff
  2017-10-18 22:41   ` Andi Kleen
                     ` (2 more replies)
  2017-10-18 18:53 ` [PATCH v6 2/6] perf report: cache failed lookups of inlined frames Milian Wolff
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

Some of the code paths I introduced before returned too early
without running the code to handle a node's branch count.
By refactoring match_chain to only have one exit point, this
can be remedied.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 129 +++++++++++++++++++++++---------------------
 1 file changed, 67 insertions(+), 62 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f09503..ac767957fd9c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,83 +666,88 @@ static enum match_result match_chain_strings(const char *left,
 	return ret;
 }
 
+static enum match_result match_address_dso(struct dso *left_dso, u64 left_ip,
+					   struct dso *right_dso, u64 right_ip)
+{
+	if (left_dso == right_dso && left_ip == right_ip)
+		return MATCH_EQ;
+	else if (left_ip < right_ip)
+		return MATCH_LT;
+	else
+		return MATCH_GT;
+}
+
 static enum match_result match_chain(struct callchain_cursor_node *node,
 				     struct callchain_list *cnode)
 {
-	struct symbol *sym = node->sym;
-	u64 left, right;
-	struct dso *left_dso = NULL;
-	struct dso *right_dso = NULL;
-
-	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_strings(cnode->srcline,
-							      node->srcline);
-
-		/* if no srcline is available, fallback to symbol name */
-		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
-			match = match_chain_strings(cnode->ms.sym->name,
-						    node->sym->name);
+	enum match_result match = MATCH_ERROR;
 
+	switch (callchain_param.key) {
+	case CCKEY_SRCLINE:
+		match = match_chain_strings(cnode->srcline, node->srcline);
 		if (match != MATCH_ERROR)
-			return match;
-
+			break;
+		/* otherwise fall-back to symbol-based comparison below */
+		__fallthrough;
+	case CCKEY_FUNCTION:
+		if (node->sym && cnode->ms.sym) {
+			/*
+			 * Compare inlined frames based on their symbol name
+			 * because different inlined frames will have the same
+			 * symbol start. Otherwise do a faster comparison based
+			 * on the symbol start address.
+			 */
+			if (cnode->ms.sym->inlined || node->sym->inlined)
+				match = match_chain_strings(cnode->ms.sym->name,
+							    node->sym->name);
+			else
+				match = match_address_dso(cnode->ms.map->dso,
+							  cnode->ms.sym->start,
+							  node->map->dso,
+							  node->sym->start);
+			if (match != MATCH_ERROR)
+				break;
+		}
 		/* otherwise fall-back to IP-based comparison below */
+		__fallthrough;
+	case CCKEY_ADDRESS:
+	default:
+		match = match_address_dso(cnode->ms.map->dso, cnode->ip,
+					  node->map->dso, node->ip);
+		break;
 	}
 
-	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-		/*
-		 * Compare inlined frames based on their symbol name because
-		 * different inlined frames will have the same symbol start
-		 */
-		if (cnode->ms.sym->inlined || node->sym->inlined)
-			return match_chain_strings(cnode->ms.sym->name,
-						   node->sym->name);
-
-		left = cnode->ms.sym->start;
-		right = sym->start;
-		left_dso = cnode->ms.map->dso;
-		right_dso = node->map->dso;
-	} else {
-		left = cnode->ip;
-		right = node->ip;
-	}
-
-	if (left == right && left_dso == right_dso) {
-		if (node->branch) {
-			cnode->branch_count++;
+	if (match == MATCH_EQ && node->branch) {
+		cnode->branch_count++;
 
-			if (node->branch_from) {
-				/*
-				 * It's "to" of a branch
-				 */
-				cnode->brtype_stat.branch_to = true;
+		if (node->branch_from) {
+			/*
+			 * It's "to" of a branch
+			 */
+			cnode->brtype_stat.branch_to = true;
 
-				if (node->branch_flags.predicted)
-					cnode->predicted_count++;
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
 
-				if (node->branch_flags.abort)
-					cnode->abort_count++;
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
 
-				branch_type_count(&cnode->brtype_stat,
-						  &node->branch_flags,
-						  node->branch_from,
-						  node->ip);
-			} else {
-				/*
-				 * It's "from" of a branch
-				 */
-				cnode->brtype_stat.branch_to = false;
-				cnode->cycles_count +=
-					node->branch_flags.cycles;
-				cnode->iter_count += node->nr_loop_iter;
-				cnode->iter_cycles += node->iter_cycles;
-			}
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
+		} else {
+			/*
+			 * It's "from" of a branch
+			 */
+			cnode->brtype_stat.branch_to = false;
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->iter_cycles += node->iter_cycles;
 		}
-
-		return MATCH_EQ;
 	}
 
-	return left > right ? MATCH_GT : MATCH_LT;
+	return match;
 }
 
 /*
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v6 2/6] perf report: cache failed lookups of inlined frames
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
@ 2017-10-18 18:53 ` Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 3/6] perf report: cache srclines for callchain nodes Milian Wolff
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin

When no inlined frames could be found for a given address,
we did not store this information anywhere. That means we
potentially do the costly inliner lookup repeatedly for
cases where we know it can never succeed.

This patch makes dso__parse_addr_inlines always return a
valid inline_node. It will be empty when no inliners are
found. This enables us to cache the empty list in the DSO,
thereby improving the performance when many addresses
fail to find the inliners.

For my trivial example, the performance impact is already
quite significant:

Before:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
             5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
     2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
     4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
       939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
        11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )

       0.596246531 seconds time elapsed                                          ( +-  0.07% )
~~~~~

After:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                 0      cpu-migrations            #    0.000 K/sec
             5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
       432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
       670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
       141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
         2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )

       0.114222393 seconds time elapsed                                          ( +-  1.19% )
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/machine.c | 15 +++++++--------
 tools/perf/util/srcline.c | 16 +---------------
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3d049cb313ac..177c1d4088f8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2115,9 +2115,10 @@ static int append_inlines(struct callchain_cursor *cursor,
 	struct inline_node *inline_node;
 	struct inline_list *ilist;
 	u64 addr;
+	int ret = 1;
 
 	if (!symbol_conf.inline_name || !map || !sym)
-		return 1;
+		return ret;
 
 	addr = map__rip_2objdump(map, ip);
 
@@ -2125,22 +2126,20 @@ static int append_inlines(struct callchain_cursor *cursor,
 	if (!inline_node) {
 		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
 		if (!inline_node)
-			return 1;
-
+			return ret;
 		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
 	}
 
 	list_for_each_entry(ilist, &inline_node->val, list) {
-		int ret = callchain_cursor_append(cursor, ip, map,
-						  ilist->symbol, false,
-						  NULL, 0, 0, 0,
-						  ilist->srcline);
+		ret = callchain_cursor_append(cursor, ip, map,
+					      ilist->symbol, false,
+					      NULL, 0, 0, 0, ilist->srcline);
 
 		if (ret != 0)
 			return ret;
 	}
 
-	return 0;
+	return ret;
 }
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 8bea6621d657..fc3888664b20 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -353,17 +353,8 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	INIT_LIST_HEAD(&node->val);
 	node->addr = addr;
 
-	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
-		goto out_free_inline_node;
-
-	if (list_empty(&node->val))
-		goto out_free_inline_node;
-
+	addr2line(dso_name, addr, NULL, NULL, dso, true, node, sym);
 	return node;
-
-out_free_inline_node:
-	inline_node__delete(node);
-	return NULL;
 }
 
 #else /* HAVE_LIBBFD_SUPPORT */
@@ -480,11 +471,6 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 out:
 	pclose(fp);
 
-	if (list_empty(&node->val)) {
-		inline_node__delete(node);
-		return NULL;
-	}
-
 	return node;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v6 3/6] perf report: cache srclines for callchain nodes
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 2/6] perf report: cache failed lookups of inlined frames Milian Wolff
@ 2017-10-18 18:53 ` Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 4/6] perf report: use srcline from callchain for hist entries Milian Wolff
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin

On one hand this ensures that the memory is properly freed when
the DSO gets freed. On the other hand this significantly speeds up
the processing of the callchain nodes when lots of srclines are
requested. For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and
thus do not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons
of srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative
would be to remove the formatting options and handle that on a
different level - i.e. print the sym/addr on demand wherever we
actually output something. And the unwind_inlines could be moved into
a separate function that does not return the srcline.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/dso.c     |  2 ++
 tools/perf/util/dso.h     |  1 +
 tools/perf/util/machine.c | 17 +++++++++---
 tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.h |  7 +++++
 5 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 75c8250b3b8a..3192b608e91b 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1203,6 +1203,7 @@ struct dso *dso__new(const char *name)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT;
+		dso->srclines = RB_ROOT;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1237,6 +1238,7 @@ void dso__delete(struct dso *dso)
 
 	/* free inlines first, as they reference symbols */
 	inlines__tree_delete(&dso->inlined_nodes);
+	srcline__tree_delete(&dso->srclines);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 122eca0d242d..821b16c67030 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	struct rb_root	 inlined_nodes;
+	struct rb_root	 srclines;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 177c1d4088f8..94d8f1ccedd9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1711,11 +1711,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 
 static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
 {
+	char *srcline = NULL;
+
 	if (!map || callchain_param.key == CCKEY_FUNCTION)
-		return NULL;
+		return srcline;
+
+	srcline = srcline__tree_find(&map->dso->srclines, ip);
+	if (!srcline) {
+		bool show_sym = false;
+		bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+		srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+				      sym, show_sym, show_addr);
+		srcline__tree_insert(&map->dso->srclines, ip, srcline);
+	}
 
-	return get_srcline(map->dso, map__rip_2objdump(map, ip),
-			   sym, false, callchain_param.key == CCKEY_ADDRESS);
+	return srcline;
 }
 
 struct iterations {
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index fc3888664b20..c143c3bc1ef8 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -542,6 +542,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
 }
 
+struct srcline_node {
+	u64			addr;
+	char			*srcline;
+	struct rb_node		rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+	struct rb_node **p = &tree->rb_node;
+	struct rb_node *parent = NULL;
+	struct srcline_node *i, *node;
+
+	node = zalloc(sizeof(struct srcline_node));
+	if (!node) {
+		perror("not enough memory for the srcline node");
+		return;
+	}
+
+	node->addr = addr;
+	node->srcline = srcline;
+
+	while (*p != NULL) {
+		parent = *p;
+		i = rb_entry(parent, struct srcline_node, rb_node);
+		if (addr < i->addr)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&node->rb_node, parent, p);
+	rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+	struct rb_node *n = tree->rb_node;
+
+	while (n) {
+		struct srcline_node *i = rb_entry(n, struct srcline_node,
+						  rb_node);
+
+		if (addr < i->addr)
+			n = n->rb_left;
+		else if (addr > i->addr)
+			n = n->rb_right;
+		else
+			return i->srcline;
+	}
+
+	return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+	struct srcline_node *pos;
+	struct rb_node *next = rb_first(tree);
+
+	while (next) {
+		pos = rb_entry(next, struct srcline_node, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, tree);
+		free_srcline(pos->srcline);
+		zfree(&pos);
+	}
+}
+
 struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 					    struct symbol *sym)
 {
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index ebe38cd22294..1c4d6210860b 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		  bool show_sym, bool show_addr, bool unwind_inlines);
 void free_srcline(char *srcline);
 
+/* insert the srcline into the DSO, which will take ownership */
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+/* find previously inserted srcline */
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+/* delete all srclines within the tree */
+void srcline__tree_delete(struct rb_root *tree);
+
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct inline_list {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v6 4/6] perf report: use srcline from callchain for hist entries
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (2 preceding siblings ...)
  2017-10-18 18:53 ` [PATCH v6 3/6] perf report: cache srclines for callchain nodes Milian Wolff
@ 2017-10-18 18:53 ` Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 5/6] perf util: enable handling of inlined frames by default Milian Wolff
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin

This also removes the symbol name from the srcline column,
more on this below.

This ensures we use the correct srcline, which could originate
from a potentially inlined function. The hist entries used to
query for the srcline based purely on the IP, which leads to
wrong results for inlined entries.

Before:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children      Self  Source:Line
# ........  ........  ..................................................................................................................................
#
    94.23%     0.00%  __libc_start_main+18446603487898210537
    94.23%     0.00%  _start+41
    44.58%     0.00%  main+100
    44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
    44.58%     0.00%  std::__complex_abs+100
    44.58%     0.00%  std::abs<double>+100
    44.58%     0.00%  std::norm<double>+100
    36.01%     0.00%  hypot+18446603487892193300
    25.81%     0.00%  main+41
    25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
    25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
    25.75%    25.75%  random.h:143
    18.39%     0.00%  main+57
    18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
    18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
    13.80%    13.80%  random.tcc:3330
     5.64%     0.00%  ??:0
     4.13%     4.13%  __hypot_finite+163
     4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children      Self  Source:Line
# ........  ........  ...........................................
#
    94.30%     1.19%  main.cpp:39
    94.23%     0.00%  __libc_start_main+18446603487898210537
    94.23%     0.00%  _start+41
    48.44%     1.70%  random.h:1823
    48.44%     0.00%  random.h:1814
    46.74%     2.53%  random.h:185
    44.68%     0.10%  complex:589
    44.68%     0.00%  complex:597
    44.68%     0.00%  complex:654
    44.68%     0.00%  complex:664
    40.61%    13.80%  random.tcc:3330
    36.01%     0.00%  hypot+18446603487892193300
    26.81%     0.00%  random.h:151
    26.81%     0.00%  random.h:332
    25.75%    25.75%  random.h:143
     5.64%     0.00%  ??:0
     4.13%     4.13%  __hypot_finite+163
     4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line
hist column. If this information is desired, users should
explicitly query for it if needed. I.e. run this command
instead:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children      Self  Symbol                                                                                                                               Source:Line
# ........  ........  ...................................................................................................................................  ...........................................
#
    94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
    94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
    94.23%     0.00%  [.] _start                                                                                                                           _start+41
    48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
    48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
    46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
    44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
    44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
    44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
    44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
    39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
    36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
    26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
    26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
    25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
    25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
     4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
     4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children      Self  Symbol                                                                                                                               Source:Line
# ........  ........  ...................................................................................................................................  ..................................................................................................................................
#
    94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
    94.23%     0.00%  [.] _start                                                                                                                           _start+41
    44.58%     0.00%  [.] main                                                                                                                             main+100
    44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
    44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
    44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
    44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
    36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
    25.81%     0.00%  [.] main                                                                                                                             main+41
    25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
    25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
    25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
    18.39%     0.00%  [.] main                                                                                                                             main+57
    18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
    18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
    13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
     4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
     4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 1 +
 tools/perf/util/event.c     | 1 +
 tools/perf/util/hist.c      | 2 ++
 tools/perf/util/symbol.h    | 1 +
 4 files changed, 5 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index ac767957fd9c..6d03f23096f4 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1079,6 +1079,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 {
 	al->map = node->map;
 	al->sym = node->sym;
+	al->srcline = node->srcline;
 	if (node->map)
 		al->addr = node->map->map_ip(node->map, node->ip);
 	else
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 47eff4767edb..3c411e7e36aa 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1604,6 +1604,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 	al->sym = NULL;
 	al->cpu = sample->cpu;
 	al->socket = -1;
+	al->srcline = NULL;
 
 	if (al->cpu >= 0) {
 		struct perf_env *env = machine->env;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b0fa9c217e1c..25d143053ab5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -596,6 +596,7 @@ __hists__add_entry(struct hists *hists,
 			.map	= al->map,
 			.sym	= al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.socket	 = al->socket,
 		.cpu	 = al->cpu,
 		.cpumode = al->cpumode,
@@ -950,6 +951,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			.map = al->map,
 			.sym = al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.parent = iter->parent,
 		.raw_data = sample->raw_data,
 		.raw_size = sample->raw_size,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d880a059babb..d548ea5cb418 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -209,6 +209,7 @@ struct addr_location {
 	struct thread *thread;
 	struct map    *map;
 	struct symbol *sym;
+	const char    *srcline;
 	u64	      addr;
 	char	      level;
 	u8	      filtered;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v6 5/6] perf util: enable handling of inlined frames by default
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (3 preceding siblings ...)
  2017-10-18 18:53 ` [PATCH v6 4/6] perf report: use srcline from callchain for hist entries Milian Wolff
@ 2017-10-18 18:53 ` Milian Wolff
  2017-10-18 18:53 ` [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
  2017-10-18 22:43 ` [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Andi Kleen
  6 siblings, 0 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ingo Molnar

Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable
this useful option by default.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/Documentation/perf-report.txt | 3 ++-
 tools/perf/Documentation/perf-script.txt | 3 ++-
 tools/perf/util/symbol.c                 | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 383a98d992ed..ddde2b54af57 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -434,7 +434,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry is function name or file/line.
+	will be printed. Each entry is function name or file/line. Enabled by
+	default, disable with --no-inline.
 
 include::callchain-overhead-calculation.txt[]
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index bcc1ba35a2d8..25e677344728 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry has function name and file/line.
+	will be printed. Each entry has function name and file/line. Enabled by
+	default, disable with --no-inline.
 
 SEE ALSO
 --------
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 066e38aa4063..ce6993bebf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
 	.show_hist_headers	= true,
 	.symfs			= "",
 	.event_group		= true,
+	.inline_name		= true,
 };
 
 static enum dso_binary_type binary_type_symtab[] = {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (4 preceding siblings ...)
  2017-10-18 18:53 ` [PATCH v6 5/6] perf util: enable handling of inlined frames by default Milian Wolff
@ 2017-10-18 18:53 ` Milian Wolff
  2017-10-19 10:54   ` Milian Wolff
  2017-10-18 22:43 ` [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Andi Kleen
  6 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-18 18:53 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa

When inline frame resolution is disabled, a bogus srcline is obtained
for hist entries:

~~~~~
$ perf report -s sym,srcline --no-inline --stdio -g none
    95.21%     0.00%  [.] __libc_start_main                                                                                                   __libc_start_main+18446603358170398953
    95.21%     0.00%  [.] _start                                                                                                              _start+18446650082411225129
    46.67%     0.00%  [.] main                                                                                                                main+18446650082411225208
    38.75%     0.00%  [.] hypot                                                                                                               hypot+18446603358164312084
    23.75%     0.00%  [.] main                                                                                                                main+18446650082411225151
    20.83%    20.83%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.h:143
    18.12%     0.00%  [.] main                                                                                                                main+18446650082411225165
    13.12%    13.12%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3330
     4.17%     4.17%  [.] __hypot_finite                                                                                                      __hypot_finite+163
     4.17%     4.17%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3333
     4.17%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312227
     4.17%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.92%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+11
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+24
     2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312075
     2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312088
~~~~~

Note how we get very large offsets to main and cannot see any srcline
from one of the complex or random headers, even though the instruction
pointers actually lie in code inlined from there.

This patch fixes the mapping to use map__objdump_2mem instead of
map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
values for me when inline resolution is disabled:

~~~~~
$ perf report -s sym,srcline --no-inline --stdio -g none
    95.21%     0.00%  [.] __libc_start_main                                                                                                   __libc_start_main+233
    95.21%     0.00%  [.] _start                                                                                                              _start+41
    46.88%     0.00%  [.] main                                                                                                                complex:589
    43.96%     0.00%  [.] main                                                                                                                random.h:185
    38.75%     0.00%  [.] hypot                                                                                                               hypot+20
    20.83%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.h:143
    13.12%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3330
     4.17%     4.17%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239715
     4.17%     4.17%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     4.17%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+163
     4.17%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3333
     2.92%     2.92%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239563
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239576
     2.50%     2.50%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     2.50%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+11
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>

Note how most of the large offset values are now gone. Most notably,
we get proper srcline resolution for the random.h and complex headers.
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 006d10a0dc96..6f3d109078a3 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
 	if (!map)
 		return SRCLINE_UNKNOWN;
 
-	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
+	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
 			   he->ms.sym, true, true);
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
@ 2017-10-18 22:41   ` Andi Kleen
  2017-10-19 10:59     ` Milian Wolff
  2017-10-20 15:22   ` Arnaldo Carvalho de Melo
  2017-10-25 17:20   ` [tip:perf/core] perf report: Properly handle branch count in match_chain() tip-bot for Milian Wolff
  2 siblings, 1 reply; 50+ messages in thread
From: Andi Kleen @ 2017-10-18 22:41 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

Milian Wolff <milian.wolff@kdab.com> writes:
>  
> +static enum match_result match_address_dso(struct dso *left_dso, u64 left_ip,
> +					   struct dso *right_dso, u64 right_ip)
> +{
> +	if (left_dso == right_dso && left_ip == right_ip)
> +		return MATCH_EQ;
> +	else if (left_ip < right_ip)
> +		return MATCH_LT;
> +	else
> +		return MATCH_GT;
> +}

So why does only the first case check the dso? Does it not matter
for the others?

Either should be checked by none or by all.


> +	case CCKEY_FUNCTION:
> +		if (node->sym && cnode->ms.sym) {
> +			/*
> +			 * Compare inlined frames based on their symbol name
> +			 * because different inlined frames will have the same
> +			 * symbol start. Otherwise do a faster comparison based
> +			 * on the symbol start address.
> +			 */
> +			if (cnode->ms.sym->inlined || node->sym->inlined)
> +				match = match_chain_strings(cnode->ms.sym->name,
node->sym->name);

So what happens when there are multiple symbols with the same name?

(e.g. local for a DSO or local in a file)

> +					  node->ip);
> +		} else {
> +			/*
> +			 * It's "from" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = false;
> +			cnode->cycles_count += node->branch_flags.cycles;
> +			cnode->iter_count += node->nr_loop_iter;
> +			cnode->iter_cycles += node->iter_cycles;

I assume you tested the cycle accounting still works?

-Andi

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 0/6] generate full callchain cursor entries for inlined frames
  2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (5 preceding siblings ...)
  2017-10-18 18:53 ` [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
@ 2017-10-18 22:43 ` Andi Kleen
  2017-10-20 15:43   ` Arnaldo Carvalho de Melo
  6 siblings, 1 reply; 50+ messages in thread
From: Andi Kleen @ 2017-10-18 22:43 UTC (permalink / raw)
  To: Milian Wolff; +Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users

Milian Wolff <milian.wolff@kdab.com> writes:

> This series of patches completely reworks the way inline frames are handled.
> Instead of querying for the inline nodes on-demand in the individual tools,
> we now create proper callchain nodes for inlined frames. The advantages this
> approach brings are numerous:

Except for the comments on the one patch the other patches all look
good to me.

Reviewed-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-18 18:53 ` [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
@ 2017-10-19 10:54   ` Milian Wolff
  2017-10-20  5:15     ` Namhyung Kim
  0 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 10:54 UTC (permalink / raw)
  To: acme
  Cc: jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa

On Mittwoch, 18. Oktober 2017 20:53:50 CEST Milian Wolff wrote:
> When inline frame resolution is disabled, a bogus srcline is obtained
> for hist entries:
> 
> ~~~~~
> $ perf report -s sym,srcline --no-inline --stdio -g none
>     95.21%     0.00%  [.] __libc_start_main                                 
>                                                                 
> __libc_start_main+18446603358170398953 95.21%     0.00%  [.] _start        
>                                                                            
>                          _start+18446650082411225129 46.67%     0.00%  [.]
> main                                                                       
>                                         main+18446650082411225208 38.75%   
>  0.00%  [.] hypot                                                          
>                                                    
> hypot+18446603358164312084 23.75%     0.00%  [.] main                      
>                                                                            
>              main+18446650082411225151 20.83%    20.83%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  random.h:143 18.12%     0.00%  [.] main                                 
>                                                                            
>   main+18446650082411225165 13.12%    13.12%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  random.tcc:3330 4.17%     4.17%  [.] __hypot_finite                     
>                                                                            
>     __hypot_finite+163 4.17%     4.17%  [.] std::generate_canonical<double,
> 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> 2147483647ul> >  random.tcc:3333 4.17%     0.00%  [.] __hypot_finite       
>                                                                            
>                   __hypot_finite+18446603358164312227 4.17%     0.00%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  std::generate_canonical<double, 53ul, std::line 2.92%     0.00%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  std::generate_canonical<double, 53ul, std::line 2.50%     2.50%  [.]
> __hypot_finite                                                             
>                                         __hypot_finite+11 2.50%     2.50% 
> [.] __hypot_finite                                                         
>                                             __hypot_finite+24 2.50%    
> 0.00%  [.] __hypot_finite                                                  
>                                                   
> __hypot_finite+18446603358164312075 2.50%     0.00%  [.] __hypot_finite    
>                                                                            
>                      __hypot_finite+18446603358164312088 ~~~~~
> 
> Note how we get very large offsets to main and cannot see any srcline
> from one of the complex or random headers, even though the instruction
> pointers actually lie in code inlined from there.
> 
> This patch fixes the mapping to use map__objdump_2mem instead of
> map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> values for me when inline resolution is disabled:
> 
> ~~~~~
> $ perf report -s sym,srcline --no-inline --stdio -g none
>     95.21%     0.00%  [.] __libc_start_main                                 
>                                                                 
> __libc_start_main+233 95.21%     0.00%  [.] _start                         
>                                                                            
>         _start+41 46.88%     0.00%  [.] main                               
>                                                                            
>     complex:589 43.96%     0.00%  [.] main                                 
>                                                                            
>   random.h:185 38.75%     0.00%  [.] hypot                                 
>                                                                            
>  hypot+20 20.83%     0.00%  [.] std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  random.h:143 13.12%     0.00%  [.] std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  random.tcc:3330 4.17%     4.17%  [.] __hypot_finite                     
>                                                                            
>     __hypot_finite+140715545239715 4.17%     4.17%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  std::generate_canonical<double, 53ul, std::line 4.17%     0.00%  [.]
> __hypot_finite                                                             
>                                         __hypot_finite+163 4.17%     0.00% 
> [.] std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  random.tcc:3333 2.92%     2.92%  [.] std::generate_canonical<double,
> 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> 2147483647ul> >  std::generate_canonical<double, 53ul, std::line 2.50%    
> 2.50%  [.] __hypot_finite                                                  
>                                                   
> __hypot_finite+140715545239563 2.50%     2.50%  [.] __hypot_finite         
>                                                                            
>                 __hypot_finite+140715545239576 2.50%     2.50%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  std::generate_canonical<double, 53ul, std::line 2.50%     2.50%  [.]
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> >  std::generate_canonical<double, 53ul, std::line 2.50%     0.00%  [.]
> __hypot_finite                                                             
>                                         __hypot_finite+11 ~~~~~
> 
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Yao Jin <yao.jin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> 
> Note how most of the large offset values are now gone. Most notably,
> we get proper srcline resolution for the random.h and complex headers.
> ---
>  tools/perf/util/sort.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 006d10a0dc96..6f3d109078a3 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
>  	if (!map)
>  		return SRCLINE_UNKNOWN;
> 
> -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
>  			   he->ms.sym, true, true);
>  }

Sorry, this patch was declined by Nahmyung before, please discard it - I 
forgot to do that before resending v6.

Bye

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-18 22:41   ` Andi Kleen
@ 2017-10-19 10:59     ` Milian Wolff
  2017-10-19 13:55       ` Andi Kleen
  0 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 10:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> > +static enum match_result match_address_dso(struct dso *left_dso, u64
> > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > +{
> > +	if (left_dso == right_dso && left_ip == right_ip)
> > +		return MATCH_EQ;
> > +	else if (left_ip < right_ip)
> > +		return MATCH_LT;
> > +	else
> > +		return MATCH_GT;
> > +}
> 
> So why does only the first case check the dso? Does it not matter
> for the others?
> 
> Either should be checked by none or by all.

I don't see why it should be checked. It is only required to prevent two 
addresses to be considered equal while they are not. So only the one check is 
required, otherwise we return either LT or GT.

Am I missing something?

> > +	case CCKEY_FUNCTION:
> > +		if (node->sym && cnode->ms.sym) {
> > +			/*
> > +			 * Compare inlined frames based on their symbol name
> > +			 * because different inlined frames will have the same
> > +			 * symbol start. Otherwise do a faster comparison based
> > +			 * on the symbol start address.
> > +			 */
> > +			if (cnode->ms.sym->inlined || node->sym->inlined)
> > +				match = match_chain_strings(cnode->ms.sym->name,
> 
> node->sym->name);
> 
> So what happens when there are multiple symbols with the same name?
> 
> (e.g. local for a DSO or local in a file)
> 
> > +					  node->ip);
> > +		} else {
> > +			/*
> > +			 * It's "from" of a branch
> > +			 */
> > +			cnode->brtype_stat.branch_to = false;
> > +			cnode->cycles_count += node->branch_flags.cycles;
> > +			cnode->iter_count += node->nr_loop_iter;
> > +			cnode->iter_cycles += node->iter_cycles;
> 
> I assume you tested the cycle accounting still works?

Back then I did it, but it is a long time ago when I originally wrote this 
patch. I just tested it again, and indeed something crashes now. I will fix it 
and resend v7.

Sorry for that.

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
@ 2017-10-19 11:38 Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 1/5] perf report: properly handle branch count in match_chain Milian Wolff
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:38 UTC (permalink / raw)
  To: acme, jolsa, namhyung; +Cc: Linux-kernel, linux-perf-users, Milian Wolff

This series of patches completely reworks the way inline frames are handled.
Instead of querying for the inline nodes on-demand in the individual tools,
we now create proper callchain nodes for inlined frames. The advantages this
approach brings are numerous:

- less duplicated code in the individual browser
- aggregated cost for inlined frames for the --children top-down list
- various bug fixes that arose from querying for a srcline/symbol based on
  the IP of a sample, which will always point to the last inlined frame
  instead of the corresponding non-inlined frame
- overall much better support for visualizing cost for heavily-inlined C++
  code, which simply was confusing and unreliably before
- srcline honors the global setting as to whether full paths or basenames
  should be shown
- caches for inlined frames and srcline information, which allow us to
  enable inline frame handling by default

For comparison, below lists the output before and after for `perf script`
and `perf report`. The example file I used to generate the perf data is:

~~~~~
$ cat inlining.cpp
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
$ g++ -O2 -g -o inlining inlining.cpp
$ perf record --call-graph dwarf ./inlining
~~~~~

Now, the (broken) status-quo looks like this. Look for "NOTE:" to see some
of my comments that outline the various issues I'm trying to solve by this
patch series.

~~~~~
$ perf script --inline
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                         std::__complex_abs
                         std::abs<double>
                         std::_Norm_helper<true>::_S_do_it<double>
                         std::norm<double>
                         main
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
# NOTE: the above inlined stack is confusing: the a4a is an address into main,
#       which is the non-inlined symbol. the entry with the address should be
#       at the end of the stack, where it's actually duplicated once more but
#       there it's missing the address
...
$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio
...
             --38.86%--_start
                       __libc_start_main
                       |
                       |--15.68%--main random.tcc:3326
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
                       |
                       |--10.36%--main random.h:143
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:143 (inline)
                       |
                       |--5.66%--main random.tcc:3332
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
...
# NOTE: the grouping is totally off because the first and last frame of the
        inline nodes is completely bogus, since the IP is used to find the sym/srcline
        which is different from the actual inlined sym/srcline.
        also, the code currently displays either the inlined function name or
        the corresponding filename (but in full length, instead of just the basename).

$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--main random.tcc:3326
            |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
            |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
            |          __libc_start_main
            |          _start
...
# NOTE: the srcline for main is wrong, it should be inlining.cpp:14,
        i.e. what is displayed in the line below (see also perf script issue above)
~~~~~

Afterwards, all of the above issues are resolved (and inlined frames are
displayed by default):

~~~~~
$ perf script
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
...
# NOTE: only one main entry, at the correct position.
        we do display the (repeated) instruction pointer as that ensures
        interoperability with e.g. the stackcollapse-perf.pl script

$ perf report -s sym -g srcline -i perf.inlining.data --stdio
...
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664 (inlined)
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
            |          std::abs<double> complex:597 (inlined)
            |          std::__complex_abs complex:589 (inlined)
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
            |
             --38.86%--_start
                       __libc_start_main
                       |
                       |--38.19%--main inlining.cpp:14
                       |          |
                       |          |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
                       |          |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
                       |          |          |
                       |          |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
                       |          |                     |
                       |          |                     |--17.91%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 (inlined)
                       |          |                     |          |
                       |          |                     |           --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 (inlined)
                       |          |                     |                     std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 (inlined)
                       |          |                     |                     |
                       |          |                     |                     |--10.36%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 (inlined)
                       |          |                     |                     |
                       |          |                     |                      --1.88%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:141 (inlined)
                       |          |                     |
                       |          |                     |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
                       |          |                     |
                       |          |                      --0.79%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3335 (inlined)
                       |          |
                       |           --1.99%--std::norm<double> complex:664 (inlined)
                       |                     std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
                       |                     std::abs<double> complex:597 (inlined)
                       |                     std::__complex_abs complex:589 (inlined)
                       |
                        --0.67%--main inlining.cpp:13
...

# NOTE: still somewhat confusing due to the _start and __libc_start_main frames
        that actually are *above* the main frame. But at least the stuff below
        properly splits up and shows that mutiple functions got inlined into
        inlining.cpp:14, not just one as before.

$ perf report -s sym -g srcline -i perf.inlining.data --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
            |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
            |          main inlining.cpp:14
            |          __libc_start_main
            |          _start
...
# NOTE: the first and last entry of the inline stack have the correct symbol and srcline now
        both function and srcline is shown, as well as the (inlined) suffix
        only the basename of the srcline is shown

v7 fixes a crash in match_chain, when a map with invalid dso is encountered.
   It also drops the commit from v5 that tried to fix the srcline resolution.

v6 rebases against the partial merge of this patch series in acme/perf/core

v5 attends to Namhyung's code review. Most notably, it fixes a use-after-free
   crash. Additionally, srcline resolution for hist entries now also works
   correctly, when inline frame resolution is disabled.

v4 splits the patch to create full callchain nodes for inline frames up further
   as suggested by Jiri. It also removes C99 comments and initializes the
   rb_root properly.

v3 splits the initial patch up into two to simplify reviewing. It also adds a
   comment to clarify the lifetime handling of fake symbols and aliased non-fake
   symbols, based on the feedback by Namhyung.

v2 fixes some issues reported by Namhyung or found by me in further
testing, adds caching and enables inline frames by default.


Milian Wolff (5):
  perf report: properly handle branch count in match_chain
  perf report: cache failed lookups of inlined frames
  perf report: cache srclines for callchain nodes
  perf report: use srcline from callchain for hist entries
  perf util: enable handling of inlined frames by default

 tools/perf/Documentation/perf-report.txt |   3 +-
 tools/perf/Documentation/perf-script.txt |   3 +-
 tools/perf/util/callchain.c              | 133 +++++++++++++++++--------------
 tools/perf/util/dso.c                    |   2 +
 tools/perf/util/dso.h                    |   1 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/hist.c                   |   2 +
 tools/perf/util/machine.c                |  32 +++++---
 tools/perf/util/srcline.c                |  82 +++++++++++++++----
 tools/perf/util/srcline.h                |   7 ++
 tools/perf/util/symbol.c                 |   1 +
 tools/perf/util/symbol.h                 |   1 +
 12 files changed, 178 insertions(+), 90 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v7 1/5] perf report: properly handle branch count in match_chain
  2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
@ 2017-10-19 11:38 ` Milian Wolff
  2017-10-19 11:42   ` Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 2/5] perf report: cache failed lookups of inlined frames Milian Wolff
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:38 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

Some of the code paths I introduced before returned too early
without running the code to handle a node's branch count.
By refactoring match_chain to only have one exit point, this
can be remedied.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 132 +++++++++++++++++++++++---------------------
 1 file changed, 70 insertions(+), 62 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f09503..8901a95f2880 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,83 +666,91 @@ static enum match_result match_chain_strings(const char *left,
 	return ret;
 }
 
-static enum match_result match_chain(struct callchain_cursor_node *node,
-				     struct callchain_list *cnode)
+static enum match_result match_address_dso(struct map *left_map, u64 left_ip,
+					   struct map *right_map, u64 right_ip)
 {
-	struct symbol *sym = node->sym;
-	u64 left, right;
-	struct dso *left_dso = NULL;
-	struct dso *right_dso = NULL;
+	struct dso *left_dso = left_map ? left_map->dso : NULL;
+	struct dso *right_dso = right_map ? right_map->dso : NULL;
 
-	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_strings(cnode->srcline,
-							      node->srcline);
+	if (left_dso == right_dso && left_ip == right_ip)
+		return MATCH_EQ;
+	else if (left_ip < right_ip)
+		return MATCH_LT;
+	else
+		return MATCH_GT;
+}
 
-		/* if no srcline is available, fallback to symbol name */
-		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
-			match = match_chain_strings(cnode->ms.sym->name,
-						    node->sym->name);
+static enum match_result match_chain(struct callchain_cursor_node *node,
+				     struct callchain_list *cnode)
+{
+	enum match_result match = MATCH_ERROR;
 
+	switch (callchain_param.key) {
+	case CCKEY_SRCLINE:
+		match = match_chain_strings(cnode->srcline, node->srcline);
 		if (match != MATCH_ERROR)
-			return match;
-
+			break;
+		/* otherwise fall-back to symbol-based comparison below */
+		__fallthrough;
+	case CCKEY_FUNCTION:
+		if (node->sym && cnode->ms.sym) {
+			/*
+			 * Compare inlined frames based on their symbol name
+			 * because different inlined frames will have the same
+			 * symbol start. Otherwise do a faster comparison based
+			 * on the symbol start address.
+			 */
+			if (cnode->ms.sym->inlined || node->sym->inlined)
+				match = match_chain_strings(cnode->ms.sym->name,
+							    node->sym->name);
+			else
+				match = match_address_dso(cnode->ms.map,
+							  cnode->ms.sym->start,
+							  node->map,
+							  node->sym->start);
+			if (match != MATCH_ERROR)
+				break;
+		}
 		/* otherwise fall-back to IP-based comparison below */
+		__fallthrough;
+	case CCKEY_ADDRESS:
+	default:
+		match = match_address_dso(cnode->ms.map, cnode->ip,
+					  node->map, node->ip);
+		break;
 	}
 
-	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-		/*
-		 * Compare inlined frames based on their symbol name because
-		 * different inlined frames will have the same symbol start
-		 */
-		if (cnode->ms.sym->inlined || node->sym->inlined)
-			return match_chain_strings(cnode->ms.sym->name,
-						   node->sym->name);
-
-		left = cnode->ms.sym->start;
-		right = sym->start;
-		left_dso = cnode->ms.map->dso;
-		right_dso = node->map->dso;
-	} else {
-		left = cnode->ip;
-		right = node->ip;
-	}
-
-	if (left == right && left_dso == right_dso) {
-		if (node->branch) {
-			cnode->branch_count++;
+	if (match == MATCH_EQ && node->branch) {
+		cnode->branch_count++;
 
-			if (node->branch_from) {
-				/*
-				 * It's "to" of a branch
-				 */
-				cnode->brtype_stat.branch_to = true;
+		if (node->branch_from) {
+			/*
+			 * It's "to" of a branch
+			 */
+			cnode->brtype_stat.branch_to = true;
 
-				if (node->branch_flags.predicted)
-					cnode->predicted_count++;
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
 
-				if (node->branch_flags.abort)
-					cnode->abort_count++;
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
 
-				branch_type_count(&cnode->brtype_stat,
-						  &node->branch_flags,
-						  node->branch_from,
-						  node->ip);
-			} else {
-				/*
-				 * It's "from" of a branch
-				 */
-				cnode->brtype_stat.branch_to = false;
-				cnode->cycles_count +=
-					node->branch_flags.cycles;
-				cnode->iter_count += node->nr_loop_iter;
-				cnode->iter_cycles += node->iter_cycles;
-			}
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
+		} else {
+			/*
+			 * It's "from" of a branch
+			 */
+			cnode->brtype_stat.branch_to = false;
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->iter_cycles += node->iter_cycles;
 		}
-
-		return MATCH_EQ;
 	}
 
-	return left > right ? MATCH_GT : MATCH_LT;
+	return match;
 }
 
 /*
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v7 2/5] perf report: cache failed lookups of inlined frames
  2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 1/5] perf report: properly handle branch count in match_chain Milian Wolff
@ 2017-10-19 11:38 ` Milian Wolff
  2017-10-25 17:20   ` [tip:perf/core] perf report: Cache " tip-bot for Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 3/5] perf report: cache srclines for callchain nodes Milian Wolff
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:38 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin

When no inlined frames could be found for a given address,
we did not store this information anywhere. That means we
potentially do the costly inliner lookup repeatedly for
cases where we know it can never succeed.

This patch makes dso__parse_addr_inlines always return a
valid inline_node. It will be empty when no inliners are
found. This enables us to cache the empty list in the DSO,
thereby improving the performance when many addresses
fail to find the inliners.

For my trivial example, the performance impact is already
quite significant:

Before:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
             5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
     2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
     4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
       939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
        11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )

       0.596246531 seconds time elapsed                                          ( +-  0.07% )
~~~~~

After:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                 0      cpu-migrations            #    0.000 K/sec
             5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
       432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
       670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
       141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
         2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )

       0.114222393 seconds time elapsed                                          ( +-  1.19% )
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/machine.c | 15 +++++++--------
 tools/perf/util/srcline.c | 16 +---------------
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3d049cb313ac..177c1d4088f8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2115,9 +2115,10 @@ static int append_inlines(struct callchain_cursor *cursor,
 	struct inline_node *inline_node;
 	struct inline_list *ilist;
 	u64 addr;
+	int ret = 1;
 
 	if (!symbol_conf.inline_name || !map || !sym)
-		return 1;
+		return ret;
 
 	addr = map__rip_2objdump(map, ip);
 
@@ -2125,22 +2126,20 @@ static int append_inlines(struct callchain_cursor *cursor,
 	if (!inline_node) {
 		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
 		if (!inline_node)
-			return 1;
-
+			return ret;
 		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
 	}
 
 	list_for_each_entry(ilist, &inline_node->val, list) {
-		int ret = callchain_cursor_append(cursor, ip, map,
-						  ilist->symbol, false,
-						  NULL, 0, 0, 0,
-						  ilist->srcline);
+		ret = callchain_cursor_append(cursor, ip, map,
+					      ilist->symbol, false,
+					      NULL, 0, 0, 0, ilist->srcline);
 
 		if (ret != 0)
 			return ret;
 	}
 
-	return 0;
+	return ret;
 }
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 8bea6621d657..fc3888664b20 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -353,17 +353,8 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	INIT_LIST_HEAD(&node->val);
 	node->addr = addr;
 
-	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
-		goto out_free_inline_node;
-
-	if (list_empty(&node->val))
-		goto out_free_inline_node;
-
+	addr2line(dso_name, addr, NULL, NULL, dso, true, node, sym);
 	return node;
-
-out_free_inline_node:
-	inline_node__delete(node);
-	return NULL;
 }
 
 #else /* HAVE_LIBBFD_SUPPORT */
@@ -480,11 +471,6 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 out:
 	pclose(fp);
 
-	if (list_empty(&node->val)) {
-		inline_node__delete(node);
-		return NULL;
-	}
-
 	return node;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v7 3/5] perf report: cache srclines for callchain nodes
  2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 1/5] perf report: properly handle branch count in match_chain Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 2/5] perf report: cache failed lookups of inlined frames Milian Wolff
@ 2017-10-19 11:38 ` Milian Wolff
  2017-10-25 17:20   ` [tip:perf/core] perf report: Cache " tip-bot for Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 4/5] perf report: use srcline from callchain for hist entries Milian Wolff
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:38 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin

On one hand this ensures that the memory is properly freed when
the DSO gets freed. On the other hand this significantly speeds up
the processing of the callchain nodes when lots of srclines are
requested. For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and
thus do not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons
of srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative
would be to remove the formatting options and handle that on a
different level - i.e. print the sym/addr on demand wherever we
actually output something. And the unwind_inlines could be moved into
a separate function that does not return the srcline.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/dso.c     |  2 ++
 tools/perf/util/dso.h     |  1 +
 tools/perf/util/machine.c | 17 +++++++++---
 tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.h |  7 +++++
 5 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 75c8250b3b8a..3192b608e91b 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1203,6 +1203,7 @@ struct dso *dso__new(const char *name)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT;
+		dso->srclines = RB_ROOT;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1237,6 +1238,7 @@ void dso__delete(struct dso *dso)
 
 	/* free inlines first, as they reference symbols */
 	inlines__tree_delete(&dso->inlined_nodes);
+	srcline__tree_delete(&dso->srclines);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 122eca0d242d..821b16c67030 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	struct rb_root	 inlined_nodes;
+	struct rb_root	 srclines;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 177c1d4088f8..94d8f1ccedd9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1711,11 +1711,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 
 static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
 {
+	char *srcline = NULL;
+
 	if (!map || callchain_param.key == CCKEY_FUNCTION)
-		return NULL;
+		return srcline;
+
+	srcline = srcline__tree_find(&map->dso->srclines, ip);
+	if (!srcline) {
+		bool show_sym = false;
+		bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+		srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+				      sym, show_sym, show_addr);
+		srcline__tree_insert(&map->dso->srclines, ip, srcline);
+	}
 
-	return get_srcline(map->dso, map__rip_2objdump(map, ip),
-			   sym, false, callchain_param.key == CCKEY_ADDRESS);
+	return srcline;
 }
 
 struct iterations {
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index fc3888664b20..c143c3bc1ef8 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -542,6 +542,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
 }
 
+struct srcline_node {
+	u64			addr;
+	char			*srcline;
+	struct rb_node		rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+	struct rb_node **p = &tree->rb_node;
+	struct rb_node *parent = NULL;
+	struct srcline_node *i, *node;
+
+	node = zalloc(sizeof(struct srcline_node));
+	if (!node) {
+		perror("not enough memory for the srcline node");
+		return;
+	}
+
+	node->addr = addr;
+	node->srcline = srcline;
+
+	while (*p != NULL) {
+		parent = *p;
+		i = rb_entry(parent, struct srcline_node, rb_node);
+		if (addr < i->addr)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&node->rb_node, parent, p);
+	rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+	struct rb_node *n = tree->rb_node;
+
+	while (n) {
+		struct srcline_node *i = rb_entry(n, struct srcline_node,
+						  rb_node);
+
+		if (addr < i->addr)
+			n = n->rb_left;
+		else if (addr > i->addr)
+			n = n->rb_right;
+		else
+			return i->srcline;
+	}
+
+	return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+	struct srcline_node *pos;
+	struct rb_node *next = rb_first(tree);
+
+	while (next) {
+		pos = rb_entry(next, struct srcline_node, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, tree);
+		free_srcline(pos->srcline);
+		zfree(&pos);
+	}
+}
+
 struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 					    struct symbol *sym)
 {
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index ebe38cd22294..1c4d6210860b 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		  bool show_sym, bool show_addr, bool unwind_inlines);
 void free_srcline(char *srcline);
 
+/* insert the srcline into the DSO, which will take ownership */
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+/* find previously inserted srcline */
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+/* delete all srclines within the tree */
+void srcline__tree_delete(struct rb_root *tree);
+
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct inline_list {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v7 4/5] perf report: use srcline from callchain for hist entries
  2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
                   ` (2 preceding siblings ...)
  2017-10-19 11:38 ` [PATCH v7 3/5] perf report: cache srclines for callchain nodes Milian Wolff
@ 2017-10-19 11:38 ` Milian Wolff
  2017-10-25 17:21   ` [tip:perf/core] perf report: Use " tip-bot for Milian Wolff
  2017-10-19 11:38 ` [PATCH v7 5/5] perf util: enable handling of inlined frames by default Milian Wolff
  2017-10-20 16:15 ` [PATCH v7 0/5] generate full callchain cursor entries for inlined frames Arnaldo Carvalho de Melo
  5 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:38 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin

This also removes the symbol name from the srcline column,
more on this below.

This ensures we use the correct srcline, which could originate
from a potentially inlined function. The hist entries used to
query for the srcline based purely on the IP, which leads to
wrong results for inlined entries.

Before:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children      Self  Source:Line
# ........  ........  ..................................................................................................................................
#
    94.23%     0.00%  __libc_start_main+18446603487898210537
    94.23%     0.00%  _start+41
    44.58%     0.00%  main+100
    44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
    44.58%     0.00%  std::__complex_abs+100
    44.58%     0.00%  std::abs<double>+100
    44.58%     0.00%  std::norm<double>+100
    36.01%     0.00%  hypot+18446603487892193300
    25.81%     0.00%  main+41
    25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
    25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
    25.75%    25.75%  random.h:143
    18.39%     0.00%  main+57
    18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
    18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
    13.80%    13.80%  random.tcc:3330
     5.64%     0.00%  ??:0
     4.13%     4.13%  __hypot_finite+163
     4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children      Self  Source:Line
# ........  ........  ...........................................
#
    94.30%     1.19%  main.cpp:39
    94.23%     0.00%  __libc_start_main+18446603487898210537
    94.23%     0.00%  _start+41
    48.44%     1.70%  random.h:1823
    48.44%     0.00%  random.h:1814
    46.74%     2.53%  random.h:185
    44.68%     0.10%  complex:589
    44.68%     0.00%  complex:597
    44.68%     0.00%  complex:654
    44.68%     0.00%  complex:664
    40.61%    13.80%  random.tcc:3330
    36.01%     0.00%  hypot+18446603487892193300
    26.81%     0.00%  random.h:151
    26.81%     0.00%  random.h:332
    25.75%    25.75%  random.h:143
     5.64%     0.00%  ??:0
     4.13%     4.13%  __hypot_finite+163
     4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line
hist column. If this information is desired, users should
explicitly query for it if needed. I.e. run this command
instead:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children      Self  Symbol                                                                                                                               Source:Line
# ........  ........  ...................................................................................................................................  ...........................................
#
    94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
    94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
    94.23%     0.00%  [.] _start                                                                                                                           _start+41
    48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
    48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
    46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
    44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
    44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
    44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
    44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
    39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
    36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
    26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
    26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
    25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
    25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
     4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
     4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children      Self  Symbol                                                                                                                               Source:Line
# ........  ........  ...................................................................................................................................  ..................................................................................................................................
#
    94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
    94.23%     0.00%  [.] _start                                                                                                                           _start+41
    44.58%     0.00%  [.] main                                                                                                                             main+100
    44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
    44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
    44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
    44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
    36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
    25.81%     0.00%  [.] main                                                                                                                             main+41
    25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
    25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
    25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
    18.39%     0.00%  [.] main                                                                                                                             main+57
    18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
    18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
    13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
     4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
     4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 1 +
 tools/perf/util/event.c     | 1 +
 tools/perf/util/hist.c      | 2 ++
 tools/perf/util/symbol.h    | 1 +
 4 files changed, 5 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8901a95f2880..bb5f98a2bf9a 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1082,6 +1082,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 {
 	al->map = node->map;
 	al->sym = node->sym;
+	al->srcline = node->srcline;
 	if (node->map)
 		al->addr = node->map->map_ip(node->map, node->ip);
 	else
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 47eff4767edb..3c411e7e36aa 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1604,6 +1604,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 	al->sym = NULL;
 	al->cpu = sample->cpu;
 	al->socket = -1;
+	al->srcline = NULL;
 
 	if (al->cpu >= 0) {
 		struct perf_env *env = machine->env;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b0fa9c217e1c..25d143053ab5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -596,6 +596,7 @@ __hists__add_entry(struct hists *hists,
 			.map	= al->map,
 			.sym	= al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.socket	 = al->socket,
 		.cpu	 = al->cpu,
 		.cpumode = al->cpumode,
@@ -950,6 +951,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			.map = al->map,
 			.sym = al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.parent = iter->parent,
 		.raw_data = sample->raw_data,
 		.raw_size = sample->raw_size,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d880a059babb..d548ea5cb418 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -209,6 +209,7 @@ struct addr_location {
 	struct thread *thread;
 	struct map    *map;
 	struct symbol *sym;
+	const char    *srcline;
 	u64	      addr;
 	char	      level;
 	u8	      filtered;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v7 5/5] perf util: enable handling of inlined frames by default
  2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
                   ` (3 preceding siblings ...)
  2017-10-19 11:38 ` [PATCH v7 4/5] perf report: use srcline from callchain for hist entries Milian Wolff
@ 2017-10-19 11:38 ` Milian Wolff
  2017-10-25 17:21   ` [tip:perf/core] perf util: Enable " tip-bot for Milian Wolff
  2017-10-20 16:15 ` [PATCH v7 0/5] generate full callchain cursor entries for inlined frames Arnaldo Carvalho de Melo
  5 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:38 UTC (permalink / raw)
  To: acme, jolsa, namhyung
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ingo Molnar

Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable
this useful option by default.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/Documentation/perf-report.txt | 3 ++-
 tools/perf/Documentation/perf-script.txt | 3 ++-
 tools/perf/util/symbol.c                 | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 383a98d992ed..ddde2b54af57 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -434,7 +434,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry is function name or file/line.
+	will be printed. Each entry is function name or file/line. Enabled by
+	default, disable with --no-inline.
 
 include::callchain-overhead-calculation.txt[]
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index bcc1ba35a2d8..25e677344728 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry has function name and file/line.
+	will be printed. Each entry has function name and file/line. Enabled by
+	default, disable with --no-inline.
 
 SEE ALSO
 --------
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 066e38aa4063..ce6993bebf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
 	.show_hist_headers	= true,
 	.symfs			= "",
 	.event_group		= true,
+	.inline_name		= true,
 };
 
 static enum dso_binary_type binary_type_symtab[] = {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 1/5] perf report: properly handle branch count in match_chain
  2017-10-19 11:38 ` [PATCH v7 1/5] perf report: properly handle branch count in match_chain Milian Wolff
@ 2017-10-19 11:42   ` Milian Wolff
  2017-10-23 15:15     ` Andi Kleen
  0 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-19 11:42 UTC (permalink / raw)
  To: acme
  Cc: jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

On Donnerstag, 19. Oktober 2017 13:38:32 CEST Milian Wolff wrote:
> Some of the code paths I introduced before returned too early
> without running the code to handle a node's branch count.
> By refactoring match_chain to only have one exit point, this
> can be remedied.

Note: I tested this with some of the code I have available, but I'm unsure I'm 
doing it right. On my system, I never get avg_cycles != 0. I tried:

perf record -b --call-graph dwarf <some binary>
perf report --branch-history --no-children --stdio

I see predicted and iter values as before, so I think nothing is breaking. But 
I'm somewhat unsure. Can someone paste an example source code and the perf 
commands to get some meaningful avg_cycles? Or does this depend on a newer 
Intel CPU? I have currently only a Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz 
available.

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-19 10:59     ` Milian Wolff
@ 2017-10-19 13:55       ` Andi Kleen
  2017-10-19 15:01         ` Namhyung Kim
  0 siblings, 1 reply; 50+ messages in thread
From: Andi Kleen @ 2017-10-19 13:55 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

On Thu, Oct 19, 2017 at 12:59:14PM +0200, Milian Wolff wrote:
> On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> > Milian Wolff <milian.wolff@kdab.com> writes:
> > > +static enum match_result match_address_dso(struct dso *left_dso, u64
> > > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > > +{
> > > +	if (left_dso == right_dso && left_ip == right_ip)
> > > +		return MATCH_EQ;
> > > +	else if (left_ip < right_ip)
> > > +		return MATCH_LT;
> > > +	else
> > > +		return MATCH_GT;
> > > +}
> > 
> > So why does only the first case check the dso? Does it not matter
> > for the others?
> > 
> > Either should be checked by none or by all.
> 
> I don't see why it should be checked. It is only required to prevent two 
> addresses to be considered equal while they are not. So only the one check is 
> required, otherwise we return either LT or GT.

When the comparison is always in the same process (which I think
is not the case) just checking the addresses is sufficient. If they are not then you
always need to check the DSO and only compare inside the same DSO.

-Andi

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-19 13:55       ` Andi Kleen
@ 2017-10-19 15:01         ` Namhyung Kim
  2017-10-20 10:21           ` Milian Wolff
  0 siblings, 1 reply; 50+ messages in thread
From: Namhyung Kim @ 2017-10-19 15:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Milian Wolff, acme, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria, kernel-team

Hi Andi,

On Thu, Oct 19, 2017 at 06:55:19AM -0700, Andi Kleen wrote:
> On Thu, Oct 19, 2017 at 12:59:14PM +0200, Milian Wolff wrote:
> > On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > +static enum match_result match_address_dso(struct dso *left_dso, u64
> > > > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > > > +{
> > > > +	if (left_dso == right_dso && left_ip == right_ip)
> > > > +		return MATCH_EQ;
> > > > +	else if (left_ip < right_ip)
> > > > +		return MATCH_LT;
> > > > +	else
> > > > +		return MATCH_GT;
> > > > +}
> > > 
> > > So why does only the first case check the dso? Does it not matter
> > > for the others?
> > > 
> > > Either should be checked by none or by all.
> > 
> > I don't see why it should be checked. It is only required to prevent two 
> > addresses to be considered equal while they are not. So only the one check is 
> > required, otherwise we return either LT or GT.
> 
> When the comparison is always in the same process (which I think
> is not the case) just checking the addresses is sufficient. If they are not then you
> always need to check the DSO and only compare inside the same DSO.

As far as I know, the node->ip is a relative address (inside a DSO).
So it should compare the dso as well even in the same process.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-19 10:54   ` Milian Wolff
@ 2017-10-20  5:15     ` Namhyung Kim
  2017-10-24  8:51       ` Milian Wolff
  2017-11-03 14:21       ` [tip:perf/core] perf callchain: Fix double mapping al->addr for children without self period tip-bot for Namhyung Kim
  0 siblings, 2 replies; 50+ messages in thread
From: Namhyung Kim @ 2017-10-20  5:15 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa, kernel-team

Hi Milian,

On Thu, Oct 19, 2017 at 12:54:18PM +0200, Milian Wolff wrote:
> On Mittwoch, 18. Oktober 2017 20:53:50 CEST Milian Wolff wrote:
> > When inline frame resolution is disabled, a bogus srcline is obtained
> > for hist entries:
> > 
> > ~~~~~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> >     95.21%     0.00%  [.] __libc_start_main                                 
> >                                                                 
> > __libc_start_main+18446603358170398953 95.21%     0.00%  [.] _start        
> >                                                                            
> >                          _start+18446650082411225129 46.67%     0.00%  [.]
> > main                                                                       
> >                                         main+18446650082411225208 38.75%   
> >  0.00%  [.] hypot                                                          
> >                                                    
> > hypot+18446603358164312084 23.75%     0.00%  [.] main                      
> >                                                                            
> >              main+18446650082411225151 20.83%    20.83%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  random.h:143 18.12%     0.00%  [.] main                                 
> >                                                                            
> >   main+18446650082411225165 13.12%    13.12%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  random.tcc:3330 4.17%     4.17%  [.] __hypot_finite                     
> >                                                                            
> >     __hypot_finite+163 4.17%     4.17%  [.] std::generate_canonical<double,
> > 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > 2147483647ul> >  random.tcc:3333 4.17%     0.00%  [.] __hypot_finite       
> >                                                                            
> >                   __hypot_finite+18446603358164312227 4.17%     0.00%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  std::generate_canonical<double, 53ul, std::line 2.92%     0.00%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  std::generate_canonical<double, 53ul, std::line 2.50%     2.50%  [.]
> > __hypot_finite                                                             
> >                                         __hypot_finite+11 2.50%     2.50% 
> > [.] __hypot_finite                                                         
> >                                             __hypot_finite+24 2.50%    
> > 0.00%  [.] __hypot_finite                                                  
> >                                                   
> > __hypot_finite+18446603358164312075 2.50%     0.00%  [.] __hypot_finite    
> >                                                                            
> >                      __hypot_finite+18446603358164312088 ~~~~~
> > 
> > Note how we get very large offsets to main and cannot see any srcline
> > from one of the complex or random headers, even though the instruction
> > pointers actually lie in code inlined from there.
> > 
> > This patch fixes the mapping to use map__objdump_2mem instead of
> > map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> > values for me when inline resolution is disabled:
> > 
> > ~~~~~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> >     95.21%     0.00%  [.] __libc_start_main                                 
> >                                                                 
> > __libc_start_main+233 95.21%     0.00%  [.] _start                         
> >                                                                            
> >         _start+41 46.88%     0.00%  [.] main                               
> >                                                                            
> >     complex:589 43.96%     0.00%  [.] main                                 
> >                                                                            
> >   random.h:185 38.75%     0.00%  [.] hypot                                 
> >                                                                            
> >  hypot+20 20.83%     0.00%  [.] std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  random.h:143 13.12%     0.00%  [.] std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  random.tcc:3330 4.17%     4.17%  [.] __hypot_finite                     
> >                                                                            
> >     __hypot_finite+140715545239715 4.17%     4.17%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  std::generate_canonical<double, 53ul, std::line 4.17%     0.00%  [.]
> > __hypot_finite                                                             
> >                                         __hypot_finite+163 4.17%     0.00% 
> > [.] std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  random.tcc:3333 2.92%     2.92%  [.] std::generate_canonical<double,
> > 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > 2147483647ul> >  std::generate_canonical<double, 53ul, std::line 2.50%    
> > 2.50%  [.] __hypot_finite                                                  
> >                                                   
> > __hypot_finite+140715545239563 2.50%     2.50%  [.] __hypot_finite         
> >                                                                            
> >                 __hypot_finite+140715545239576 2.50%     2.50%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  std::generate_canonical<double, 53ul, std::line 2.50%     2.50%  [.]
> > std::generate_canonical<double, 53ul,
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > >  std::generate_canonical<double, 53ul, std::line 2.50%     0.00%  [.]
> > __hypot_finite                                                             
> >                                         __hypot_finite+11 ~~~~~
> > 
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Yao Jin <yao.jin@linux.intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> > 
> > Note how most of the large offset values are now gone. Most notably,
> > we get proper srcline resolution for the random.h and complex headers.
> > ---
> >  tools/perf/util/sort.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > index 006d10a0dc96..6f3d109078a3 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
> >  	if (!map)
> >  		return SRCLINE_UNKNOWN;
> > 
> > -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> > +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
> >  			   he->ms.sym, true, true);
> >  }
> 
> Sorry, this patch was declined by Nahmyung before, please discard it - I 
> forgot to do that before resending v6.

I looked into it and found a bug handling cumulative (children)
entries.  For chilren entries that has no self period, the al->addr
(so he->ip) ends up having an doubly-mapped address.

It seems to be there from the beginning but only affects entries that
have no srclines - finding srcline itself is done using a different
address but it will show the invalid address if no srcline was found.
I think we should fix the commit c7405d85d7a3 ("perf tools: Update
cpumode for each cumulative entry").

Could you please test the following patch works for you?


diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f09503..d18cdcc8d132 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1074,10 +1074,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 {
        al->map = node->map;
        al->sym = node->sym;
-       if (node->map)
-               al->addr = node->map->map_ip(node->map, node->ip);
-       else
-               al->addr = node->ip;
+       al->addr = node->ip;
 
        if (al->sym == NULL) {
                if (hide_unresolved)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-19 15:01         ` Namhyung Kim
@ 2017-10-20 10:21           ` Milian Wolff
  2017-10-20 11:38             ` Milian Wolff
  0 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-20 10:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, acme, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria, kernel-team

On Donnerstag, 19. Oktober 2017 17:01:08 CEST Namhyung Kim wrote:
> Hi Andi,
> 
> On Thu, Oct 19, 2017 at 06:55:19AM -0700, Andi Kleen wrote:
> > On Thu, Oct 19, 2017 at 12:59:14PM +0200, Milian Wolff wrote:
> > > On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> > > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > > +static enum match_result match_address_dso(struct dso *left_dso,
> > > > > u64
> > > > > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > > > > +{
> > > > > +	if (left_dso == right_dso && left_ip == right_ip)
> > > > > +		return MATCH_EQ;
> > > > > +	else if (left_ip < right_ip)
> > > > > +		return MATCH_LT;
> > > > > +	else
> > > > > +		return MATCH_GT;
> > > > > +}
> > > > 
> > > > So why does only the first case check the dso? Does it not matter
> > > > for the others?
> > > > 
> > > > Either should be checked by none or by all.
> > > 
> > > I don't see why it should be checked. It is only required to prevent two
> > > addresses to be considered equal while they are not. So only the one
> > > check is required, otherwise we return either LT or GT.
> > 
> > When the comparison is always in the same process (which I think
> > is not the case) just checking the addresses is sufficient. If they are
> > not then you always need to check the DSO and only compare inside the
> > same DSO.
>
> As far as I know, the node->ip is a relative address (inside a DSO).
> So it should compare the dso as well even in the same process.

Sorry guys, I seem to be slow at understanding your review comments.

match_address_dso should impose a sort order on two relative addresses. The 
order should ensure that relative addresses in a different DSO are not 
considered equal. But if the DSOs are different, it doesn't matter whether we 
return LT or GT - or?

Put differently, how would you write this function to take care of the DSO in 
the other two branches? I.e. what to return if the DSOs are different - a 
MATCH_ERROR?

Bye

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-20 10:21           ` Milian Wolff
@ 2017-10-20 11:38             ` Milian Wolff
  2017-10-20 13:39               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-20 11:38 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Namhyung Kim, Andi Kleen, acme, jolsa, Linux-kernel,
	linux-perf-users, Arnaldo Carvalho de Melo, David Ahern,
	Peter Zijlstra, Yao Jin, Ravi Bangoria, kernel-team

On Freitag, 20. Oktober 2017 12:21:35 CEST Milian Wolff wrote:
> On Donnerstag, 19. Oktober 2017 17:01:08 CEST Namhyung Kim wrote:
> > Hi Andi,
> > 
> > On Thu, Oct 19, 2017 at 06:55:19AM -0700, Andi Kleen wrote:
> > > On Thu, Oct 19, 2017 at 12:59:14PM +0200, Milian Wolff wrote:
> > > > On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> > > > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > > > +static enum match_result match_address_dso(struct dso *left_dso,
> > > > > > u64
> > > > > > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > > > > > +{
> > > > > > +	if (left_dso == right_dso && left_ip == right_ip)
> > > > > > +		return MATCH_EQ;
> > > > > > +	else if (left_ip < right_ip)
> > > > > > +		return MATCH_LT;
> > > > > > +	else
> > > > > > +		return MATCH_GT;
> > > > > > +}
> > > > > 
> > > > > So why does only the first case check the dso? Does it not matter
> > > > > for the others?
> > > > > 
> > > > > Either should be checked by none or by all.
> > > > 
> > > > I don't see why it should be checked. It is only required to prevent
> > > > two
> > > > addresses to be considered equal while they are not. So only the one
> > > > check is required, otherwise we return either LT or GT.
> > > 
> > > When the comparison is always in the same process (which I think
> > > is not the case) just checking the addresses is sufficient. If they are
> > > not then you always need to check the DSO and only compare inside the
> > > same DSO.
> > 
> > As far as I know, the node->ip is a relative address (inside a DSO).
> > So it should compare the dso as well even in the same process.
> 
> Sorry guys, I seem to be slow at understanding your review comments.
> 
> match_address_dso should impose a sort order on two relative addresses. The
> order should ensure that relative addresses in a different DSO are not
> considered equal. But if the DSOs are different, it doesn't matter whether
> we return LT or GT - or?
> 
> Put differently, how would you write this function to take care of the DSO
> in the other two branches? I.e. what to return if the DSOs are different -
> a MATCH_ERROR?

Thinking a bit more about this. Are you guys maybe hinting at my 
implementation breaking the strict ordering rules (is that the right word?). 
I.e. a < b && b > a iff a == b ? Potentially my implementation would break 
this assumption when the relative IPs are the same, but the DSO is different.

So is this what you want:

+static enum match_result match_address_dso(struct dso *left_dso, u64
 left_ip, +                                         struct dso *right_dso, u64 
right_ip)
 +{
 +       if (left_dso == right_dso && left_ip == right_ip)
 +               return MATCH_EQ;
 +       else if (left_dso < right_dso || left_ip < right_ip)
 +               return MATCH_LT;
 +       else
 +               return MATCH_GT;
 +}

Thanks

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-20 11:38             ` Milian Wolff
@ 2017-10-20 13:39               ` Arnaldo Carvalho de Melo
  2017-10-23  5:19                 ` Namhyung Kim
  0 siblings, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-20 13:39 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Namhyung Kim, Andi Kleen, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria, kernel-team

Em Fri, Oct 20, 2017 at 01:38:23PM +0200, Milian Wolff escreveu:
> On Freitag, 20. Oktober 2017 12:21:35 CEST Milian Wolff wrote:
> > On Donnerstag, 19. Oktober 2017 17:01:08 CEST Namhyung Kim wrote:
> > > Hi Andi,
> > > 
> > > On Thu, Oct 19, 2017 at 06:55:19AM -0700, Andi Kleen wrote:
> > > > On Thu, Oct 19, 2017 at 12:59:14PM +0200, Milian Wolff wrote:
> > > > > On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> > > > > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > > > > +static enum match_result match_address_dso(struct dso *left_dso,
> > > > > > > u64
> > > > > > > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > > > > > > +{
> > > > > > > +	if (left_dso == right_dso && left_ip == right_ip)
> > > > > > > +		return MATCH_EQ;
> > > > > > > +	else if (left_ip < right_ip)
> > > > > > > +		return MATCH_LT;
> > > > > > > +	else
> > > > > > > +		return MATCH_GT;
> > > > > > > +}
> > > > > > 
> > > > > > So why does only the first case check the dso? Does it not matter
> > > > > > for the others?
> > > > > > 
> > > > > > Either should be checked by none or by all.
> > > > > 
> > > > > I don't see why it should be checked. It is only required to prevent
> > > > > two
> > > > > addresses to be considered equal while they are not. So only the one
> > > > > check is required, otherwise we return either LT or GT.
> > > > 
> > > > When the comparison is always in the same process (which I think
> > > > is not the case) just checking the addresses is sufficient. If they are
> > > > not then you always need to check the DSO and only compare inside the
> > > > same DSO.
> > > 
> > > As far as I know, the node->ip is a relative address (inside a DSO).
> > > So it should compare the dso as well even in the same process.
> > 
> > Sorry guys, I seem to be slow at understanding your review comments.
> > 
> > match_address_dso should impose a sort order on two relative addresses. The
> > order should ensure that relative addresses in a different DSO are not
> > considered equal. But if the DSOs are different, it doesn't matter whether
> > we return LT or GT - or?
> > 
> > Put differently, how would you write this function to take care of the DSO
> > in the other two branches? I.e. what to return if the DSOs are different -
> > a MATCH_ERROR?
> 
> Thinking a bit more about this. Are you guys maybe hinting at my 
> implementation breaking the strict ordering rules (is that the right word?). 
> I.e. a < b && b > a iff a == b ? Potentially my implementation would break 
> this assumption when the relative IPs are the same, but the DSO is different.
> 
> So is this what you want:
> 
> +static enum match_result match_address_dso(struct dso *left_dso, u64
>  left_ip, +                                         struct dso *right_dso, u64 
> right_ip)
>  +{
>  +       if (left_dso == right_dso && left_ip == right_ip)
>  +               return MATCH_EQ;
>  +       else if (left_dso < right_dso || left_ip < right_ip)
>  +               return MATCH_LT;
>  +       else
>  +               return MATCH_GT;
>  +}

Why not do all in terms of absolute addresses? Comparing relative
addresses seems nonsensical anyway. Perhaps something like the patch
below, and note that cnode->ip and node->ip already already are absolute
addresses.

Ravi?

- Arnaldo

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f09503..1ac3f4a5afab 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -671,8 +671,6 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 {
 	struct symbol *sym = node->sym;
 	u64 left, right;
-	struct dso *left_dso = NULL;
-	struct dso *right_dso = NULL;
 
 	if (callchain_param.key == CCKEY_SRCLINE) {
 		enum match_result match = match_chain_strings(cnode->srcline,
@@ -698,16 +696,14 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			return match_chain_strings(cnode->ms.sym->name,
 						   node->sym->name);
 
-		left = cnode->ms.sym->start;
-		right = sym->start;
-		left_dso = cnode->ms.map->dso;
-		right_dso = node->map->dso;
+		left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode->ms.sym->start);
+		right = node->map->unmap_ip(node->map, sym->start);
 	} else {
 		left = cnode->ip;
 		right = node->ip;
 	}
 
-	if (left == right && left_dso == right_dso) {
+	if (left == right) {
 		if (node->branch) {
 			cnode->branch_count++;
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
  2017-10-18 22:41   ` Andi Kleen
@ 2017-10-20 15:22   ` Arnaldo Carvalho de Melo
  2017-10-20 19:52     ` Milian Wolff
  2017-10-25 17:20   ` [tip:perf/core] perf report: Properly handle branch count in match_chain() tip-bot for Milian Wolff
  2 siblings, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-20 15:22 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

Em Wed, Oct 18, 2017 at 08:53:45PM +0200, Milian Wolff escreveu:
> Some of the code paths I introduced before returned too early
> without running the code to handle a node's branch count.
> By refactoring match_chain to only have one exit point, this
> can be remedied.
> +	case CCKEY_FUNCTION:
> +		if (node->sym && cnode->ms.sym) {
> +			/*
> +			 * Compare inlined frames based on their symbol name
> +			 * because different inlined frames will have the same
> +			 * symbol start. Otherwise do a faster comparison based
> +			 * on the symbol start address.
> +			 */
> +			if (cnode->ms.sym->inlined || node->sym->inlined)
> +				match = match_chain_strings(cnode->ms.sym->name,
> +							    node->sym->name);
> +			else
> +				match = match_address_dso(cnode->ms.map->dso,
> +							  cnode->ms.sym->start,
> +							  node->map->dso,
> +							  node->sym->start);
> +			if (match != MATCH_ERROR)
> +				break;
> +		}
>  		/* otherwise fall-back to IP-based comparison below */
> +		__fallthrough;

If we take this __falltrough because cnode->sym or cnode->ms.sym is
NULL, then cnode->ms.map may be NULL if we got a sample for which we
somehow couldn't find a map.

And we don't really need to deal with DSOs, just with MAPs, to go from
relative to absolute when we _have_ a symbol resolved, cnode->ip and
node->ip are already absolute.

> +	case CCKEY_ADDRESS:
> +	default:
> +		match = match_address_dso(cnode->ms.map->dso, cnode->ip,
> +					  node->map->dso, node->ip);

Ok, below is this patch updated on top of my previous patch, please take
a look, I'll be adding all this to my tmp.perf/core branch, holler if
you disagree on moving it to perf/core, which I'd like to do soon.

- Arnaldo


commit ab950c4f4a262af1afd8cfb02c0f71acfc4eafe9
Author: Milian Wolff <milian.wolff@kdab.com>
Date:   Fri Oct 20 12:14:47 2017 -0300

    perf report: Properly handle branch count in match_chain()
    
    Some of the code paths I introduced before returned too early without
    running the code to handle a node's branch count.  By refactoring
    match_chain to only have one exit point, this can be remedied.
    
    Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
    [ Fixed up wrt always using absolute addresses ]
    Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 1ac3f4a5afab..eac1c9bc9d5b 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,79 +666,88 @@ static enum match_result match_chain_strings(const char *left,
 	return ret;
 }
 
+static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
+{
+	if (left_ip == right_ip)
+               return MATCH_EQ;
+       else if (left_ip < right_ip)
+               return MATCH_LT;
+       else
+               return MATCH_GT;
+}
+
 static enum match_result match_chain(struct callchain_cursor_node *node,
 				     struct callchain_list *cnode)
 {
-	struct symbol *sym = node->sym;
-	u64 left, right;
-
-	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_strings(cnode->srcline,
-							      node->srcline);
-
-		/* if no srcline is available, fallback to symbol name */
-		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
-			match = match_chain_strings(cnode->ms.sym->name,
-						    node->sym->name);
+	enum match_result match = MATCH_ERROR;
 
+	switch (callchain_param.key) {
+	case CCKEY_SRCLINE:
+		match = match_chain_strings(cnode->srcline, node->srcline);
 		if (match != MATCH_ERROR)
-			return match;
+			break;
+		/* otherwise fall-back to symbol-based comparison below */
+		__fallthrough;
+	case CCKEY_FUNCTION:
+		if (node->sym && cnode->ms.sym) {
+			/*
+			 * Compare inlined frames based on their symbol name
+			 * because different inlined frames will have the same
+			 * symbol start. Otherwise do a faster comparison based
+			 * on the symbol start address.
+			 */
+			if (cnode->ms.sym->inlined || node->sym->inlined) {
+				match = match_chain_strings(cnode->ms.sym->name,
+							    node->sym->name);
+				if (match != MATCH_ERROR)
+					break;
+			} else {
+				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode->ms.sym->start),
+				    right = node->map->unmap_ip(node->map, node->sym->start);
 
+				match = match_chain_addresses(left, right);
+				break;
+			}
+		}
 		/* otherwise fall-back to IP-based comparison below */
+		__fallthrough;
+	case CCKEY_ADDRESS:
+	default:
+		match = match_chain_addresses(cnode->ip, node->ip);
+		break;
 	}
 
-	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-		/*
-		 * Compare inlined frames based on their symbol name because
-		 * different inlined frames will have the same symbol start
-		 */
-		if (cnode->ms.sym->inlined || node->sym->inlined)
-			return match_chain_strings(cnode->ms.sym->name,
-						   node->sym->name);
-
-		left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode->ms.sym->start);
-		right = node->map->unmap_ip(node->map, sym->start);
-	} else {
-		left = cnode->ip;
-		right = node->ip;
-	}
-
-	if (left == right) {
-		if (node->branch) {
-			cnode->branch_count++;
+	if (match == MATCH_EQ && node->branch) {
+		cnode->branch_count++;
 
-			if (node->branch_from) {
-				/*
-				 * It's "to" of a branch
-				 */
-				cnode->brtype_stat.branch_to = true;
+		if (node->branch_from) {
+			/*
+			 * It's "to" of a branch
+			 */
+			cnode->brtype_stat.branch_to = true;
 
-				if (node->branch_flags.predicted)
-					cnode->predicted_count++;
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
 
-				if (node->branch_flags.abort)
-					cnode->abort_count++;
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
 
-				branch_type_count(&cnode->brtype_stat,
-						  &node->branch_flags,
-						  node->branch_from,
-						  node->ip);
-			} else {
-				/*
-				 * It's "from" of a branch
-				 */
-				cnode->brtype_stat.branch_to = false;
-				cnode->cycles_count +=
-					node->branch_flags.cycles;
-				cnode->iter_count += node->nr_loop_iter;
-				cnode->iter_cycles += node->iter_cycles;
-			}
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
+		} else {
+			/*
+			 * It's "from" of a branch
+			 */
+			cnode->brtype_stat.branch_to = false;
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->iter_cycles += node->iter_cycles;
 		}
-
-		return MATCH_EQ;
 	}
 
-	return left > right ? MATCH_GT : MATCH_LT;
+	return match;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 0/6] generate full callchain cursor entries for inlined frames
  2017-10-18 22:43 ` [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Andi Kleen
@ 2017-10-20 15:43   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-20 15:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Milian Wolff, jolsa, namhyung, Linux-kernel, linux-perf-users

Em Wed, Oct 18, 2017 at 03:43:27PM -0700, Andi Kleen escreveu:
> Milian Wolff <milian.wolff@kdab.com> writes:
> 
> > This series of patches completely reworks the way inline frames are handled.
> > Instead of querying for the inline nodes on-demand in the individual tools,
> > we now create proper callchain nodes for inlined frames. The advantages this
> > approach brings are numerous:
> 
> Except for the comments on the one patch the other patches all look
> good to me.

I think I addressed this concern by always using absolute addresses,
right?
 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>

So I'm adding this to the patches, ok?

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
                   ` (4 preceding siblings ...)
  2017-10-19 11:38 ` [PATCH v7 5/5] perf util: enable handling of inlined frames by default Milian Wolff
@ 2017-10-20 16:15 ` Arnaldo Carvalho de Melo
  2017-10-20 20:21   ` Milian Wolff
  5 siblings, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-20 16:15 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> This series of patches completely reworks the way inline frames are handled.
> Instead of querying for the inline nodes on-demand in the individual tools,
> we now create proper callchain nodes for inlined frames. The advantages this
> approach brings are numerous:
> 
> - less duplicated code in the individual browser
> - aggregated cost for inlined frames for the --children top-down list
> - various bug fixes that arose from querying for a srcline/symbol based on
>   the IP of a sample, which will always point to the last inlined frame
>   instead of the corresponding non-inlined frame
> - overall much better support for visualizing cost for heavily-inlined C++
>   code, which simply was confusing and unreliably before
> - srcline honors the global setting as to whether full paths or basenames
>   should be shown
> - caches for inlined frames and srcline information, which allow us to
>   enable inline frame handling by default
> 
> For comparison, below lists the output before and after for `perf script`
> and `perf report`. The example file I used to generate the perf data is:

So, please check my tmp.perf/core branch, it has this patchset + the fix
I proposed for the match_chain() to always use absolute addresses.

- Arnaldo
 
> ~~~~~
> $ cat inlining.cpp
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += norm(complex<double>(uniform(engine), uniform(engine)));
>     }
>     cout << s << '\n';
>     return 0;
> }
> $ g++ -O2 -g -o inlining inlining.cpp
> $ perf record --call-graph dwarf ./inlining
> ~~~~~
> 
> Now, the (broken) status-quo looks like this. Look for "NOTE:" to see some
> of my comments that outline the various issues I'm trying to solve by this
> patch series.
> 
> ~~~~~
> $ perf script --inline
> ...
> inlining 11083 97459.356656:      33680 cycles:
>                    214f7 __hypot_finite (/usr/lib/libm-2.25.so)
>                     ace3 hypot (/usr/lib/libm-2.25.so)
>                      a4a main (/home/milian/projects/src/perf-tests/inlining)
>                          std::__complex_abs
>                          std::abs<double>
>                          std::_Norm_helper<true>::_S_do_it<double>
>                          std::norm<double>
>                          main
>                    20510 __libc_start_main (/usr/lib/libc-2.25.so)
>                      bd9 _start (/home/milian/projects/src/perf-tests/inlining)
> # NOTE: the above inlined stack is confusing: the a4a is an address into main,
> #       which is the non-inlined symbol. the entry with the address should be
> #       at the end of the stack, where it's actually duplicated once more but
> #       there it's missing the address
> ...
> $ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio
> ...
>              --38.86%--_start
>                        __libc_start_main
>                        |
>                        |--15.68%--main random.tcc:3326
>                        |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
>                        |
>                        |--10.36%--main random.h:143
>                        |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:143 (inline)
>                        |
>                        |--5.66%--main random.tcc:3332
>                        |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
>                        |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
> ...
> # NOTE: the grouping is totally off because the first and last frame of the
>         inline nodes is completely bogus, since the IP is used to find the sym/srcline
>         which is different from the actual inlined sym/srcline.
>         also, the code currently displays either the inlined function name or
>         the corresponding filename (but in full length, instead of just the basename).
> 
> $ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio --no-children
> ...
>     38.86%  [.] main
>             |
>             |--15.68%--main random.tcc:3326
>             |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
>             |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
>             |          __libc_start_main
>             |          _start
> ...
> # NOTE: the srcline for main is wrong, it should be inlining.cpp:14,
>         i.e. what is displayed in the line below (see also perf script issue above)
> ~~~~~
> 
> Afterwards, all of the above issues are resolved (and inlined frames are
> displayed by default):
> 
> ~~~~~
> $ perf script
> ...
> inlining 11083 97459.356656:      33680 cycles:
>                    214f7 __hypot_finite (/usr/lib/libm-2.25.so)
>                     ace3 hypot (/usr/lib/libm-2.25.so)
>                      a4a std::__complex_abs (inlined)
>                      a4a std::abs<double> (inlined)
>                      a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
>                      a4a std::norm<double> (inlined)
>                      a4a main (/home/milian/projects/src/perf-tests/inlining)
>                    20510 __libc_start_main (/usr/lib/libc-2.25.so)
>                      bd9 _start (/home/milian/projects/src/perf-tests/inlining)
> ...
> # NOTE: only one main entry, at the correct position.
>         we do display the (repeated) instruction pointer as that ensures
>         interoperability with e.g. the stackcollapse-perf.pl script
> 
> $ perf report -s sym -g srcline -i perf.inlining.data --stdio
> ...
>    100.00%    38.86%  [.] main
>             |
>             |--61.14%--main inlining.cpp:14
>             |          std::norm<double> complex:664 (inlined)
>             |          std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
>             |          std::abs<double> complex:597 (inlined)
>             |          std::__complex_abs complex:589 (inlined)
>             |          |
>             |          |--60.29%--hypot
>             |          |          |
>             |          |           --56.03%--__hypot_finite
>             |          |
>             |           --0.85%--cabs
>             |
>              --38.86%--_start
>                        __libc_start_main
>                        |
>                        |--38.19%--main inlining.cpp:14
>                        |          |
>                        |          |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
>                        |          |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
>                        |          |          |
>                        |          |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
>                        |          |                     |
>                        |          |                     |--17.91%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 (inlined)
>                        |          |                     |          |
>                        |          |                     |           --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 (inlined)
>                        |          |                     |                     std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 (inlined)
>                        |          |                     |                     |
>                        |          |                     |                     |--10.36%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 (inlined)
>                        |          |                     |                     |
>                        |          |                     |                      --1.88%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:141 (inlined)
>                        |          |                     |
>                        |          |                     |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
>                        |          |                     |
>                        |          |                      --0.79%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3335 (inlined)
>                        |          |
>                        |           --1.99%--std::norm<double> complex:664 (inlined)
>                        |                     std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
>                        |                     std::abs<double> complex:597 (inlined)
>                        |                     std::__complex_abs complex:589 (inlined)
>                        |
>                         --0.67%--main inlining.cpp:13
> ...
> 
> # NOTE: still somewhat confusing due to the _start and __libc_start_main frames
>         that actually are *above* the main frame. But at least the stuff below
>         properly splits up and shows that mutiple functions got inlined into
>         inlining.cpp:14, not just one as before.
> 
> $ perf report -s sym -g srcline -i perf.inlining.data --stdio --no-children
> ...
>     38.86%  [.] main
>             |
>             |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
>             |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
>             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
>             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
>             |          main inlining.cpp:14
>             |          __libc_start_main
>             |          _start
> ...
> # NOTE: the first and last entry of the inline stack have the correct symbol and srcline now
>         both function and srcline is shown, as well as the (inlined) suffix
>         only the basename of the srcline is shown
> 
> v7 fixes a crash in match_chain, when a map with invalid dso is encountered.
>    It also drops the commit from v5 that tried to fix the srcline resolution.
> 
> v6 rebases against the partial merge of this patch series in acme/perf/core
> 
> v5 attends to Namhyung's code review. Most notably, it fixes a use-after-free
>    crash. Additionally, srcline resolution for hist entries now also works
>    correctly, when inline frame resolution is disabled.
> 
> v4 splits the patch to create full callchain nodes for inline frames up further
>    as suggested by Jiri. It also removes C99 comments and initializes the
>    rb_root properly.
> 
> v3 splits the initial patch up into two to simplify reviewing. It also adds a
>    comment to clarify the lifetime handling of fake symbols and aliased non-fake
>    symbols, based on the feedback by Namhyung.
> 
> v2 fixes some issues reported by Namhyung or found by me in further
> testing, adds caching and enables inline frames by default.
> 
> 
> Milian Wolff (5):
>   perf report: properly handle branch count in match_chain
>   perf report: cache failed lookups of inlined frames
>   perf report: cache srclines for callchain nodes
>   perf report: use srcline from callchain for hist entries
>   perf util: enable handling of inlined frames by default
> 
>  tools/perf/Documentation/perf-report.txt |   3 +-
>  tools/perf/Documentation/perf-script.txt |   3 +-
>  tools/perf/util/callchain.c              | 133 +++++++++++++++++--------------
>  tools/perf/util/dso.c                    |   2 +
>  tools/perf/util/dso.h                    |   1 +
>  tools/perf/util/event.c                  |   1 +
>  tools/perf/util/hist.c                   |   2 +
>  tools/perf/util/machine.c                |  32 +++++---
>  tools/perf/util/srcline.c                |  82 +++++++++++++++----
>  tools/perf/util/srcline.h                |   7 ++
>  tools/perf/util/symbol.c                 |   1 +
>  tools/perf/util/symbol.h                 |   1 +
>  12 files changed, 178 insertions(+), 90 deletions(-)
> 
> -- 
> 2.14.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-20 15:22   ` Arnaldo Carvalho de Melo
@ 2017-10-20 19:52     ` Milian Wolff
  0 siblings, 0 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-20 19:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

On Freitag, 20. Oktober 2017 17:22:22 CEST Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 18, 2017 at 08:53:45PM +0200, Milian Wolff escreveu:
> > Some of the code paths I introduced before returned too early
> > without running the code to handle a node's branch count.
> > By refactoring match_chain to only have one exit point, this
> > can be remedied.
> > +	case CCKEY_FUNCTION:
> > +		if (node->sym && cnode->ms.sym) {
> > +			/*
> > +			 * Compare inlined frames based on their symbol name
> > +			 * because different inlined frames will have the same
> > +			 * symbol start. Otherwise do a faster comparison based
> > +			 * on the symbol start address.
> > +			 */
> > +			if (cnode->ms.sym->inlined || node->sym->inlined)
> > +				match = match_chain_strings(cnode->ms.sym->name,
> > +							    node->sym->name);
> > +			else
> > +				match = match_address_dso(cnode->ms.map->dso,
> > +							  cnode->ms.sym->start,
> > +							  node->map->dso,
> > +							  node->sym->start);
> > +			if (match != MATCH_ERROR)
> > +				break;
> > +		}
> > 
> >  		/* otherwise fall-back to IP-based comparison below */
> > 
> > +		__fallthrough;
> 
> If we take this __falltrough because cnode->sym or cnode->ms.sym is
> NULL, then cnode->ms.map may be NULL if we got a sample for which we
> somehow couldn't find a map.

Yes, that was fixed in v7.

> And we don't really need to deal with DSOs, just with MAPs, to go from
> relative to absolute when we _have_ a symbol resolved, cnode->ip and
> node->ip are already absolute.

That's confusing, can you rephrase? Either we have a MAP/DSO and the ip can be 
relative or absolute. Or we don't, and then we don't have a symbol and the ip 
will remain absolute as we cannot remap it to the relative address. So is the 
sentence above maybe missing a negation somewhere? I.e. "when we _have *not*_ 
resolved a symbol, cnode->ip and node->ip are already absolute"?

> > +	case CCKEY_ADDRESS:
> > +	default:
> > +		match = match_address_dso(cnode->ms.map->dso, cnode->ip,
> > +					  node->map->dso, node->ip);
> 
> Ok, below is this patch updated on top of my previous patch, please take
> a look, I'll be adding all this to my tmp.perf/core branch, holler if
> you disagree on moving it to perf/core, which I'd like to do soon.

I'll have a look at tmp.perf/core now, thanks.

> commit ab950c4f4a262af1afd8cfb02c0f71acfc4eafe9
> Author: Milian Wolff <milian.wolff@kdab.com>
> Date:   Fri Oct 20 12:14:47 2017 -0300
> 
>     perf report: Properly handle branch count in match_chain()
> 
>     Some of the code paths I introduced before returned too early without
>     running the code to handle a node's branch count.  By refactoring
>     match_chain to only have one exit point, this can be remedied.
> 
>     Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
>     Cc: David Ahern <dsahern@gmail.com>
>     Cc: Jin Yao <yao.jin@linux.intel.com>
>     Cc: Namhyung Kim <namhyung@kernel.org>
>     Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>     [ Fixed up wrt always using absolute addresses ]
>     Link:
> http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 1ac3f4a5afab..eac1c9bc9d5b 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -666,79 +666,88 @@ static enum match_result match_chain_strings(const
> char *left, return ret;
>  }
> 
> +static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
> +{
> +	if (left_ip == right_ip)
> +               return MATCH_EQ;
> +       else if (left_ip < right_ip)
> +               return MATCH_LT;
> +       else
> +               return MATCH_GT;
> +}
> +
>  static enum match_result match_chain(struct callchain_cursor_node *node,
>  				     struct callchain_list *cnode)
>  {
> -	struct symbol *sym = node->sym;
> -	u64 left, right;
> -
> -	if (callchain_param.key == CCKEY_SRCLINE) {
> -		enum match_result match = match_chain_strings(cnode->srcline,
> -							      node->srcline);
> -
> -		/* if no srcline is available, fallback to symbol name */
> -		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
> -			match = match_chain_strings(cnode->ms.sym->name,
> -						    node->sym->name);
> +	enum match_result match = MATCH_ERROR;
> 
> +	switch (callchain_param.key) {
> +	case CCKEY_SRCLINE:
> +		match = match_chain_strings(cnode->srcline, node->srcline);
>  		if (match != MATCH_ERROR)
> -			return match;
> +			break;
> +		/* otherwise fall-back to symbol-based comparison below */
> +		__fallthrough;
> +	case CCKEY_FUNCTION:
> +		if (node->sym && cnode->ms.sym) {
> +			/*
> +			 * Compare inlined frames based on their symbol name
> +			 * because different inlined frames will have the same
> +			 * symbol start. Otherwise do a faster comparison based
> +			 * on the symbol start address.
> +			 */
> +			if (cnode->ms.sym->inlined || node->sym->inlined) {
> +				match = match_chain_strings(cnode->ms.sym->name,
> +							    node->sym->name);
> +				if (match != MATCH_ERROR)
> +					break;
> +			} else {
> +				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map,
> cnode->ms.sym->start), +				    right = node->map->unmap_ip(node-
>map,
> node->sym->start);
> 
> +				match = match_chain_addresses(left, right);
> +				break;
> +			}
> +		}
>  		/* otherwise fall-back to IP-based comparison below */
> +		__fallthrough;
> +	case CCKEY_ADDRESS:
> +	default:
> +		match = match_chain_addresses(cnode->ip, node->ip);
> +		break;
>  	}
> 
> -	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> -		/*
> -		 * Compare inlined frames based on their symbol name because
> -		 * different inlined frames will have the same symbol start
> -		 */
> -		if (cnode->ms.sym->inlined || node->sym->inlined)
> -			return match_chain_strings(cnode->ms.sym->name,
> -						   node->sym->name);
> -
> -		left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode->ms.sym->start);
> -		right = node->map->unmap_ip(node->map, sym->start);
> -	} else {
> -		left = cnode->ip;
> -		right = node->ip;
> -	}
> -
> -	if (left == right) {
> -		if (node->branch) {
> -			cnode->branch_count++;
> +	if (match == MATCH_EQ && node->branch) {
> +		cnode->branch_count++;
> 
> -			if (node->branch_from) {
> -				/*
> -				 * It's "to" of a branch
> -				 */
> -				cnode->brtype_stat.branch_to = true;
> +		if (node->branch_from) {
> +			/*
> +			 * It's "to" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = true;
> 
> -				if (node->branch_flags.predicted)
> -					cnode->predicted_count++;
> +			if (node->branch_flags.predicted)
> +				cnode->predicted_count++;
> 
> -				if (node->branch_flags.abort)
> -					cnode->abort_count++;
> +			if (node->branch_flags.abort)
> +				cnode->abort_count++;
> 
> -				branch_type_count(&cnode->brtype_stat,
> -						  &node->branch_flags,
> -						  node->branch_from,
> -						  node->ip);
> -			} else {
> -				/*
> -				 * It's "from" of a branch
> -				 */
> -				cnode->brtype_stat.branch_to = false;
> -				cnode->cycles_count +=
> -					node->branch_flags.cycles;
> -				cnode->iter_count += node->nr_loop_iter;
> -				cnode->iter_cycles += node->iter_cycles;
> -			}
> +			branch_type_count(&cnode->brtype_stat,
> +					  &node->branch_flags,
> +					  node->branch_from,
> +					  node->ip);
> +		} else {
> +			/*
> +			 * It's "from" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = false;
> +			cnode->cycles_count += node->branch_flags.cycles;
> +			cnode->iter_count += node->nr_loop_iter;
> +			cnode->iter_cycles += node->iter_cycles;
>  		}
> -
> -		return MATCH_EQ;
>  	}
> 
> -	return left > right ? MATCH_GT : MATCH_LT;
> +	return match;
>  }
> 
>  /*


-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-20 16:15 ` [PATCH v7 0/5] generate full callchain cursor entries for inlined frames Arnaldo Carvalho de Melo
@ 2017-10-20 20:21   ` Milian Wolff
  2017-10-23 14:29     ` Arnaldo Carvalho de Melo
  2017-10-23 19:04     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-20 20:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

On Freitag, 20. Oktober 2017 18:15:40 CEST Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> > This series of patches completely reworks the way inline frames are
> > handled. Instead of querying for the inline nodes on-demand in the
> > individual tools, we now create proper callchain nodes for inlined
> > frames. The advantages this approach brings are numerous:
> > 
> > - less duplicated code in the individual browser
> > - aggregated cost for inlined frames for the --children top-down list
> > - various bug fixes that arose from querying for a srcline/symbol based on
> > 
> >   the IP of a sample, which will always point to the last inlined frame
> >   instead of the corresponding non-inlined frame
> > 
> > - overall much better support for visualizing cost for heavily-inlined C++
> > 
> >   code, which simply was confusing and unreliably before
> > 
> > - srcline honors the global setting as to whether full paths or basenames
> > 
> >   should be shown
> > 
> > - caches for inlined frames and srcline information, which allow us to
> > 
> >   enable inline frame handling by default
> > 
> > For comparison, below lists the output before and after for `perf script`
> 
> > and `perf report`. The example file I used to generate the perf data is:
>
> So, please check my tmp.perf/core branch, it has this patchset + the fix
> I proposed for the match_chain() to always use absolute addresses.

OK, so I've looked at it. I think there are some style issues with the 
indentation in match_chain_addresses. Also, the unmap_ip lines are too long 
for checkpatch.pl

Additionally, we can now still run into the CCKEY_ADDRESS code path (when 
match_chain_strings for inlined symbols returns MATCH_ERROR, or when either 
cnode->ms.sym or node->sym is invalid), but won't unmap the IP properly then.

Can we maybe instead use something like this on top of your patch?

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 01fc95fdd1e0..92bca95be202 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -669,11 +669,16 @@ static enum match_result match_chain_strings(const char 
*left,
 static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
 {
 	if (left_ip == right_ip)
-               return MATCH_EQ;
-       else if (left_ip < right_ip)
-               return MATCH_LT;
-       else
-               return MATCH_GT;
+		return MATCH_EQ;
+	else if (left_ip < right_ip)
+		return MATCH_LT;
+	else
+		return MATCH_GT;
+}
+
+static u64 unmap_ip(struct map *map, u64 ip)
+{
+	return map ? map->unmap_ip(map, ip) : ip;
 }
 
 static enum match_result match_chain(struct callchain_cursor_node *node,
@@ -702,9 +707,10 @@ static enum match_result match_chain(struct 
callchain_cursor_node *node,
 				if (match != MATCH_ERROR)
 					break;
 			} else {
-				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode-
>ms.sym->start),
-				    right = node->map->unmap_ip(node->map, node->sym->start);
-
+				u64 left = unmap_ip(cnode->ms.map,
+						    cnode->ms.sym->start);
+				u64 right = unmap_ip(node->map,
+						     node->sym->start);
 				match = match_chain_addresses(left, right);
 				break;
 			}
@@ -713,7 +719,9 @@ static enum match_result match_chain(struct 
callchain_cursor_node *node,
 		__fallthrough;
 	case CCKEY_ADDRESS:
 	default:
-		match = match_chain_addresses(cnode->ip, node->ip);
+		match = match_chain_addresses(unmap_ip(cnode->ms.map,
+						       cnode->ip),
+					      unmap_ip(node->map, node->ip));
 		break;
 	}
 
Cheers
-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 1/6] perf report: properly handle branch count in match_chain
  2017-10-20 13:39               ` Arnaldo Carvalho de Melo
@ 2017-10-23  5:19                 ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2017-10-23  5:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Milian Wolff, Andi Kleen, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria, kernel-team

Hi Arnaldo and Milian,

On Fri, Oct 20, 2017 at 10:39:35AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 20, 2017 at 01:38:23PM +0200, Milian Wolff escreveu:
> > On Freitag, 20. Oktober 2017 12:21:35 CEST Milian Wolff wrote:
> > > On Donnerstag, 19. Oktober 2017 17:01:08 CEST Namhyung Kim wrote:
> > > > Hi Andi,
> > > > 
> > > > On Thu, Oct 19, 2017 at 06:55:19AM -0700, Andi Kleen wrote:
> > > > > On Thu, Oct 19, 2017 at 12:59:14PM +0200, Milian Wolff wrote:
> > > > > > On Donnerstag, 19. Oktober 2017 00:41:04 CEST Andi Kleen wrote:
> > > > > > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > > > > > +static enum match_result match_address_dso(struct dso *left_dso,
> > > > > > > > u64
> > > > > > > > left_ip, +					   struct dso *right_dso, u64 right_ip)
> > > > > > > > +{
> > > > > > > > +	if (left_dso == right_dso && left_ip == right_ip)
> > > > > > > > +		return MATCH_EQ;
> > > > > > > > +	else if (left_ip < right_ip)
> > > > > > > > +		return MATCH_LT;
> > > > > > > > +	else
> > > > > > > > +		return MATCH_GT;
> > > > > > > > +}
> > > > > > > 
> > > > > > > So why does only the first case check the dso? Does it not matter
> > > > > > > for the others?
> > > > > > > 
> > > > > > > Either should be checked by none or by all.
> > > > > > 
> > > > > > I don't see why it should be checked. It is only required to prevent
> > > > > > two
> > > > > > addresses to be considered equal while they are not. So only the one
> > > > > > check is required, otherwise we return either LT or GT.
> > > > > 
> > > > > When the comparison is always in the same process (which I think
> > > > > is not the case) just checking the addresses is sufficient. If they are
> > > > > not then you always need to check the DSO and only compare inside the
> > > > > same DSO.
> > > > 
> > > > As far as I know, the node->ip is a relative address (inside a DSO).
> > > > So it should compare the dso as well even in the same process.
> > > 
> > > Sorry guys, I seem to be slow at understanding your review comments.
> > > 
> > > match_address_dso should impose a sort order on two relative addresses. The
> > > order should ensure that relative addresses in a different DSO are not
> > > considered equal. But if the DSOs are different, it doesn't matter whether
> > > we return LT or GT - or?
> > > 
> > > Put differently, how would you write this function to take care of the DSO
> > > in the other two branches? I.e. what to return if the DSOs are different -
> > > a MATCH_ERROR?
> > 
> > Thinking a bit more about this. Are you guys maybe hinting at my 
> > implementation breaking the strict ordering rules (is that the right word?). 
> > I.e. a < b && b > a iff a == b ? Potentially my implementation would break 
> > this assumption when the relative IPs are the same, but the DSO is different.
> > 
> > So is this what you want:
> > 
> > +static enum match_result match_address_dso(struct dso *left_dso, u64
> >  left_ip, +                                         struct dso *right_dso, u64 
> > right_ip)
> >  +{
> >  +       if (left_dso == right_dso && left_ip == right_ip)
> >  +               return MATCH_EQ;
> >  +       else if (left_dso < right_dso || left_ip < right_ip)
> >  +               return MATCH_LT;
> >  +       else
> >  +               return MATCH_GT;
> >  +}

How about

	if (left_dso != right_dso)
		return left_dso < right_dso ? MATCH_LT : MATCH_GT;
	else if (left_ip != right_ip)
		return left_ip < right_ip ? MATCH_LT : MATCH_GT;
	else
		return MATCH_EQ;

?


> 
> Why not do all in terms of absolute addresses? Comparing relative
> addresses seems nonsensical anyway.

???

It needs to compare symbols of callchains from different address
spaces (i.e. tasks) too.  We do the same when comparing symbols of
samples - please see sort__sym_cmp().


> Perhaps something like the patch
> below, and note that cnode->ip and node->ip already already are absolute
> addresses.

Only if it couldn't find a map?

Thanks,
Namhyung


> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 35a920f09503..1ac3f4a5afab 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -671,8 +671,6 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
>  {
>  	struct symbol *sym = node->sym;
>  	u64 left, right;
> -	struct dso *left_dso = NULL;
> -	struct dso *right_dso = NULL;
>  
>  	if (callchain_param.key == CCKEY_SRCLINE) {
>  		enum match_result match = match_chain_strings(cnode->srcline,
> @@ -698,16 +696,14 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
>  			return match_chain_strings(cnode->ms.sym->name,
>  						   node->sym->name);
>  
> -		left = cnode->ms.sym->start;
> -		right = sym->start;
> -		left_dso = cnode->ms.map->dso;
> -		right_dso = node->map->dso;
> +		left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode->ms.sym->start);
> +		right = node->map->unmap_ip(node->map, sym->start);
>  	} else {
>  		left = cnode->ip;
>  		right = node->ip;
>  	}
>  
> -	if (left == right && left_dso == right_dso) {
> +	if (left == right) {
>  		if (node->branch) {
>  			cnode->branch_count++;
>  

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-20 20:21   ` Milian Wolff
@ 2017-10-23 14:29     ` Arnaldo Carvalho de Melo
  2017-10-23 19:04       ` Milian Wolff
  2017-10-23 19:04     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-23 14:29 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

Em Fri, Oct 20, 2017 at 10:21:03PM +0200, Milian Wolff escreveu:
> On Freitag, 20. Oktober 2017 18:15:40 CEST Arnaldo Carvalho de Melo wrote:
> > Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> > > This series of patches completely reworks the way inline frames are
> > > handled. Instead of querying for the inline nodes on-demand in the
> > > individual tools, we now create proper callchain nodes for inlined
> > > frames. The advantages this approach brings are numerous:
> > > 
> > > - less duplicated code in the individual browser
> > > - aggregated cost for inlined frames for the --children top-down list
> > > - various bug fixes that arose from querying for a srcline/symbol based on
> > > 
> > >   the IP of a sample, which will always point to the last inlined frame
> > >   instead of the corresponding non-inlined frame
> > > 
> > > - overall much better support for visualizing cost for heavily-inlined C++
> > > 
> > >   code, which simply was confusing and unreliably before
> > > 
> > > - srcline honors the global setting as to whether full paths or basenames
> > > 
> > >   should be shown
> > > 
> > > - caches for inlined frames and srcline information, which allow us to
> > > 
> > >   enable inline frame handling by default
> > > 
> > > For comparison, below lists the output before and after for `perf script`
> > 
> > > and `perf report`. The example file I used to generate the perf data is:
> >
> > So, please check my tmp.perf/core branch, it has this patchset + the fix
> > I proposed for the match_chain() to always use absolute addresses.
> 
> OK, so I've looked at it. I think there are some style issues with the 
> indentation in match_chain_addresses. Also, the unmap_ip lines are too long 
> for checkpatch.pl

I don't pay too much attention to that part of checkpatch, will take a
look if in this case we should obey that rule.
 
> Additionally, we can now still run into the CCKEY_ADDRESS code path (when 
> match_chain_strings for inlined symbols returns MATCH_ERROR, or when either 
> cnode->ms.sym or node->sym is invalid), but won't unmap the IP properly then.
> 
> Can we maybe instead use something like this on top of your patch?

I'll of course fix the identation problems and will analyse your patch
today.

- Arnaldo
 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 01fc95fdd1e0..92bca95be202 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -669,11 +669,16 @@ static enum match_result match_chain_strings(const char 
> *left,
>  static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
>  {
>  	if (left_ip == right_ip)
> -               return MATCH_EQ;
> -       else if (left_ip < right_ip)
> -               return MATCH_LT;
> -       else
> -               return MATCH_GT;
> +		return MATCH_EQ;
> +	else if (left_ip < right_ip)
> +		return MATCH_LT;
> +	else
> +		return MATCH_GT;
> +}
> +
> +static u64 unmap_ip(struct map *map, u64 ip)
> +{
> +	return map ? map->unmap_ip(map, ip) : ip;
>  }
>  
>  static enum match_result match_chain(struct callchain_cursor_node *node,
> @@ -702,9 +707,10 @@ static enum match_result match_chain(struct 
> callchain_cursor_node *node,
>  				if (match != MATCH_ERROR)
>  					break;
>  			} else {
> -				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode-
> >ms.sym->start),
> -				    right = node->map->unmap_ip(node->map, node->sym->start);
> -
> +				u64 left = unmap_ip(cnode->ms.map,
> +						    cnode->ms.sym->start);
> +				u64 right = unmap_ip(node->map,
> +						     node->sym->start);
>  				match = match_chain_addresses(left, right);
>  				break;
>  			}
> @@ -713,7 +719,9 @@ static enum match_result match_chain(struct 
> callchain_cursor_node *node,
>  		__fallthrough;
>  	case CCKEY_ADDRESS:
>  	default:
> -		match = match_chain_addresses(cnode->ip, node->ip);
> +		match = match_chain_addresses(unmap_ip(cnode->ms.map,
> +						       cnode->ip),
> +					      unmap_ip(node->map, node->ip));
>  		break;
>  	}
>  
> Cheers
> -- 
> Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 1/5] perf report: properly handle branch count in match_chain
  2017-10-19 11:42   ` Milian Wolff
@ 2017-10-23 15:15     ` Andi Kleen
  2017-10-23 18:39       ` Milian Wolff
  0 siblings, 1 reply; 50+ messages in thread
From: Andi Kleen @ 2017-10-23 15:15 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

Milian Wolff <milian.wolff@kdab.com> writes:
>
> perf record -b --call-graph dwarf <some binary>
> perf report --branch-history --no-children --stdio
>
> I see predicted and iter values as before, so I think nothing is breaking. But 
> I'm somewhat unsure. Can someone paste an example source code and the perf 
> commands to get some meaningful avg_cycles? Or does this depend on a newer 
> Intel CPU? I have currently only a Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz 
> available.

Branch cycles requires at least a Skylake or Goldmont CPU, so yes.

For testing on other systems you can fake them however with some variant
of this patch

http://lkml.iu.edu/hypermail//linux/kernel/1505.1/01135.html

-Andi

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 1/5] perf report: properly handle branch count in match_chain
  2017-10-23 15:15     ` Andi Kleen
@ 2017-10-23 18:39       ` Milian Wolff
  2017-10-23 20:39         ` Andi Kleen
  0 siblings, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-23 18:39 UTC (permalink / raw)
  To: Andi Kleen
  Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

On Montag, 23. Oktober 2017 17:15:11 CEST Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> > perf record -b --call-graph dwarf <some binary>
> > perf report --branch-history --no-children --stdio
> > 
> > I see predicted and iter values as before, so I think nothing is breaking.
> > But I'm somewhat unsure. Can someone paste an example source code and the
> > perf commands to get some meaningful avg_cycles? Or does this depend on a
> > newer Intel CPU? I have currently only a Intel(R) Core(TM) i7-5600U CPU @
> > 2.60GHz available.
> 
> Branch cycles requires at least a Skylake or Goldmont CPU, so yes.
> 
> For testing on other systems you can fake them however with some variant
> of this patch
> 
> http://lkml.iu.edu/hypermail//linux/kernel/1505.1/01135.html

I've rebased that against master:

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 25d143053ab5..d128e66fe8af 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2407,7 +2407,7 @@ void hist__account_cycles(struct branch_stack *bs, 
struct addr_location *al,
 	struct branch_info *bi;
 
 	/* If we have branch cycles always annotate them. */
-	if (bs && bs->nr && bs->entries[0].flags.cycles) {
+	if (bs && bs->nr /* && bs->entries[0].flags.cycles */) {
 		int i;
 
 		bi = sample__resolve_bstack(sample, al);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 94d8f1ccedd9..e54741308e6c 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1824,6 +1824,8 @@ struct branch_info *sample__resolve_bstack(struct 
perf_sample *sample,
 		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
 		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
 		bi[i].flags = bs->entries[i].flags;
+		if (bi[i].flags.cycles == 0)
+			bi[i].flags.cycles = 123;
 	}
 	return bi;
 }

And then I ran again the two perf commands quoted above, but still cannot see 
any avg_cycles. Am I missing something else? Or could you or someone else with 
access to the proper hardware maybe test this?

I'd still be interested in seeing source code for an example binary as well as 
the perf commands that should be used.

Thanks

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-23 14:29     ` Arnaldo Carvalho de Melo
@ 2017-10-23 19:04       ` Milian Wolff
  0 siblings, 0 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-23 19:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

On Montag, 23. Oktober 2017 16:29:35 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 20, 2017 at 10:21:03PM +0200, Milian Wolff escreveu:
> > On Freitag, 20. Oktober 2017 18:15:40 CEST Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> > > > This series of patches completely reworks the way inline frames are
> > > > handled. Instead of querying for the inline nodes on-demand in the
> > > > individual tools, we now create proper callchain nodes for inlined
> > > > frames. The advantages this approach brings are numerous:
> > > > 
> > > > - less duplicated code in the individual browser
> > > > - aggregated cost for inlined frames for the --children top-down list
> > > > - various bug fixes that arose from querying for a srcline/symbol
> > > > based on
> > > > 
> > > >   the IP of a sample, which will always point to the last inlined
> > > >   frame
> > > >   instead of the corresponding non-inlined frame
> > > > 
> > > > - overall much better support for visualizing cost for heavily-inlined
> > > > C++
> > > > 
> > > >   code, which simply was confusing and unreliably before
> > > > 
> > > > - srcline honors the global setting as to whether full paths or
> > > > basenames
> > > > 
> > > >   should be shown
> > > > 
> > > > - caches for inlined frames and srcline information, which allow us to
> > > > 
> > > >   enable inline frame handling by default
> > > > 
> > > > For comparison, below lists the output before and after for `perf
> > > > script`
> > > 
> > > > and `perf report`. The example file I used to generate the perf data 
is:
> > > So, please check my tmp.perf/core branch, it has this patchset + the fix
> > > I proposed for the match_chain() to always use absolute addresses.
> > 
> > OK, so I've looked at it. I think there are some style issues with the
> > indentation in match_chain_addresses. Also, the unmap_ip lines are too
> > long
> > for checkpatch.pl
> 
> I don't pay too much attention to that part of checkpatch, will take a
> look if in this case we should obey that rule.

Ah, that is good to know for me. I often went through great pain to make 
checkpatch happy. What is the maximum line length for the perf code base?

> > Additionally, we can now still run into the CCKEY_ADDRESS code path (when
> > match_chain_strings for inlined symbols returns MATCH_ERROR, or when
> > either
> > cnode->ms.sym or node->sym is invalid), but won't unmap the IP properly
> > then.
> > 
> > Can we maybe instead use something like this on top of your patch?
> 
> I'll of course fix the identation problems and will analyse your patch
> today.

Thanks

> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 01fc95fdd1e0..92bca95be202 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -669,11 +669,16 @@ static enum match_result match_chain_strings(const
> > char *left,
> > 
> >  static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
> >  {
> >  
> >  	if (left_ip == right_ip)
> > 
> > -               return MATCH_EQ;
> > -       else if (left_ip < right_ip)
> > -               return MATCH_LT;
> > -       else
> > -               return MATCH_GT;
> > +		return MATCH_EQ;
> > +	else if (left_ip < right_ip)
> > +		return MATCH_LT;
> > +	else
> > +		return MATCH_GT;
> > +}
> > +
> > +static u64 unmap_ip(struct map *map, u64 ip)
> > +{
> > +	return map ? map->unmap_ip(map, ip) : ip;
> > 
> >  }
> >  
> >  static enum match_result match_chain(struct callchain_cursor_node *node,
> > 
> > @@ -702,9 +707,10 @@ static enum match_result match_chain(struct
> > callchain_cursor_node *node,
> > 
> >  				if (match != MATCH_ERROR)
> >  				
> >  					break;
> >  			
> >  			} else {
> > 
> > -				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode-
> > 
> > >ms.sym->start),
> > 
> > -				    right = node->map->unmap_ip(node->map, node->sym-
>start);
> > -
> > +				u64 left = unmap_ip(cnode->ms.map,
> > +						    cnode->ms.sym->start);
> > +				u64 right = unmap_ip(node->map,
> > +						     node->sym->start);
> > 
> >  				match = match_chain_addresses(left, right);
> >  				break;
> >  			
> >  			}
> > 
> > @@ -713,7 +719,9 @@ static enum match_result match_chain(struct
> > callchain_cursor_node *node,
> > 
> >  		__fallthrough;
> >  	
> >  	case CCKEY_ADDRESS:
> > 
> >  	default:
> > -		match = match_chain_addresses(cnode->ip, node->ip);
> > +		match = match_chain_addresses(unmap_ip(cnode->ms.map,
> > +						       cnode->ip),
> > +					      unmap_ip(node->map, node->ip));
> > 
> >  		break;
> >  	
> >  	}
> > 
> > Cheers


-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-20 20:21   ` Milian Wolff
  2017-10-23 14:29     ` Arnaldo Carvalho de Melo
@ 2017-10-23 19:04     ` Arnaldo Carvalho de Melo
  2017-10-23 19:39       ` Milian Wolff
  1 sibling, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-23 19:04 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

Em Fri, Oct 20, 2017 at 10:21:03PM +0200, Milian Wolff escreveu:
> On Freitag, 20. Oktober 2017 18:15:40 CEST Arnaldo Carvalho de Melo wrote:
> > Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> > > This series of patches completely reworks the way inline frames are
> > > handled. Instead of querying for the inline nodes on-demand in the
> > > individual tools, we now create proper callchain nodes for inlined
> > > frames. The advantages this approach brings are numerous:
> > > 
> > > - less duplicated code in the individual browser
> > > - aggregated cost for inlined frames for the --children top-down list
> > > - various bug fixes that arose from querying for a srcline/symbol based on
> > > 
> > >   the IP of a sample, which will always point to the last inlined frame
> > >   instead of the corresponding non-inlined frame
> > > 
> > > - overall much better support for visualizing cost for heavily-inlined C++
> > > 
> > >   code, which simply was confusing and unreliably before
> > > 
> > > - srcline honors the global setting as to whether full paths or basenames
> > > 
> > >   should be shown
> > > 
> > > - caches for inlined frames and srcline information, which allow us to
> > > 
> > >   enable inline frame handling by default
> > > 
> > > For comparison, below lists the output before and after for `perf script`
> > 
> > > and `perf report`. The example file I used to generate the perf data is:
> >
> > So, please check my tmp.perf/core branch, it has this patchset + the fix
> > I proposed for the match_chain() to always use absolute addresses.
> 
> OK, so I've looked at it. I think there are some style issues with the 
> indentation in match_chain_addresses. Also, the unmap_ip lines are too long 
> for checkpatch.pl
> 
> Additionally, we can now still run into the CCKEY_ADDRESS code path (when 
> match_chain_strings for inlined symbols returns MATCH_ERROR, or when either 
> cnode->ms.sym or node->sym is invalid), but won't unmap the IP properly then.

so you're saying that cnode->ip and node->ip may be relative or
absolute? I thought they were always absolute, but I'll double check.
 
> Can we maybe instead use something like this on top of your patch?
> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 01fc95fdd1e0..92bca95be202 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -669,11 +669,16 @@ static enum match_result match_chain_strings(const char 
> *left,
>  static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
>  {
>  	if (left_ip == right_ip)
> -               return MATCH_EQ;
> -       else if (left_ip < right_ip)
> -               return MATCH_LT;
> -       else
> -               return MATCH_GT;
> +		return MATCH_EQ;
> +	else if (left_ip < right_ip)
> +		return MATCH_LT;
> +	else
> +		return MATCH_GT;
> +}

Applied the space fixes above, but the following I don't think makes
things clearer, it is not "unmap_ip()" it is at its best
try_to_unmap_ip_but_do_not_unmap_if_not_possible() which is confusing
8-)

So we better fix it in the users and continue using the existing
map->unmap_ip(map, rel_ip) idiom.

> +static u64 unmap_ip(struct map *map, u64 ip)
> +{
> +	return map ? map->unmap_ip(map, ip) : ip;
>  }
>  
>  static enum match_result match_chain(struct callchain_cursor_node *node,
> @@ -702,9 +707,10 @@ static enum match_result match_chain(struct 
> callchain_cursor_node *node,
>  				if (match != MATCH_ERROR)
>  					break;
>  			} else {
> -				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode-
> >ms.sym->start),
> -				    right = node->map->unmap_ip(node->map, node->sym->start);
> -
> +				u64 left = unmap_ip(cnode->ms.map,
> +						    cnode->ms.sym->start);
> +				u64 right = unmap_ip(node->map,
> +						     node->sym->start);

So, in the above, you say that cnode->ms.map or node->map may be NULL,
right? But then both are asking for a sym->start (which is a relative
address, it came from a symtab), and furthermore, for cnode->ms.sym to
be not NULL means that cnode->ms.map is not NULL, after all
cnode->ms.sym came from a dso__find_symbol(cnode->ms.map->dso).

Ditto for node->sym/node->map.

>  				match = match_chain_addresses(left, right);
>  				break;
>  			}
> @@ -713,7 +719,9 @@ static enum match_result match_chain(struct 
> callchain_cursor_node *node,
>  		__fallthrough;
>  	case CCKEY_ADDRESS:
>  	default:
> -		match = match_chain_addresses(cnode->ip, node->ip);
> +		match = match_chain_addresses(unmap_ip(cnode->ms.map,
> +						       cnode->ip),
> +					      unmap_ip(node->map, node->ip));

Here I need to look further, to see what kind of address cnode->ip is,
my expectation is that it is a absolute address, so no need for
unmapping, will check.

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-23 19:04     ` Arnaldo Carvalho de Melo
@ 2017-10-23 19:39       ` Milian Wolff
  2017-10-23 22:43         ` Arnaldo Carvalho de Melo
  2017-10-24 13:27         ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 50+ messages in thread
From: Milian Wolff @ 2017-10-23 19:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

On Montag, 23. Oktober 2017 21:04:53 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 20, 2017 at 10:21:03PM +0200, Milian Wolff escreveu:
> > On Freitag, 20. Oktober 2017 18:15:40 CEST Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> > > > This series of patches completely reworks the way inline frames are
> > > > handled. Instead of querying for the inline nodes on-demand in the
> > > > individual tools, we now create proper callchain nodes for inlined
> > > > frames. The advantages this approach brings are numerous:
> > > > 
> > > > - less duplicated code in the individual browser
> > > > - aggregated cost for inlined frames for the --children top-down list
> > > > - various bug fixes that arose from querying for a srcline/symbol
> > > > based on
> > > > 
> > > >   the IP of a sample, which will always point to the last inlined
> > > >   frame
> > > >   instead of the corresponding non-inlined frame
> > > > 
> > > > - overall much better support for visualizing cost for heavily-inlined
> > > > C++
> > > > 
> > > >   code, which simply was confusing and unreliably before
> > > > 
> > > > - srcline honors the global setting as to whether full paths or
> > > > basenames
> > > > 
> > > >   should be shown
> > > > 
> > > > - caches for inlined frames and srcline information, which allow us to
> > > > 
> > > >   enable inline frame handling by default
> > > > 
> > > > For comparison, below lists the output before and after for `perf
> > > > script`
> > > 
> > > > and `perf report`. The example file I used to generate the perf data 
is:
> > > So, please check my tmp.perf/core branch, it has this patchset + the fix
> > > I proposed for the match_chain() to always use absolute addresses.
> > 
> > OK, so I've looked at it. I think there are some style issues with the
> > indentation in match_chain_addresses. Also, the unmap_ip lines are too
> > long
> > for checkpatch.pl
> > 
> > Additionally, we can now still run into the CCKEY_ADDRESS code path (when
> > match_chain_strings for inlined symbols returns MATCH_ERROR, or when
> > either
> > cnode->ms.sym or node->sym is invalid), but won't unmap the IP properly
> > then.
>
> so you're saying that cnode->ip and node->ip may be relative or
> absolute? I thought they were always absolute, but I'll double check.

See below.
 
> > Can we maybe instead use something like this on top of your patch?
> > 
> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 01fc95fdd1e0..92bca95be202 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -669,11 +669,16 @@ static enum match_result match_chain_strings(const
> > char *left,
> > 
> >  static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
> >  {
> >  
> >  	if (left_ip == right_ip)
> > 
> > -               return MATCH_EQ;
> > -       else if (left_ip < right_ip)
> > -               return MATCH_LT;
> > -       else
> > -               return MATCH_GT;
> > +		return MATCH_EQ;
> > +	else if (left_ip < right_ip)
> > +		return MATCH_LT;
> > +	else
> > +		return MATCH_GT;
> > +}
> 
> Applied the space fixes above, but the following I don't think makes
> things clearer, it is not "unmap_ip()" it is at its best
> try_to_unmap_ip_but_do_not_unmap_if_not_possible() which is confusing
> 8-)
> 
> So we better fix it in the users and continue using the existing
> map->unmap_ip(map, rel_ip) idiom.
> 
> > +static u64 unmap_ip(struct map *map, u64 ip)
> > +{
> > +	return map ? map->unmap_ip(map, ip) : ip;
> > 
> >  }
> >  
> >  static enum match_result match_chain(struct callchain_cursor_node *node,
> > 
> > @@ -702,9 +707,10 @@ static enum match_result match_chain(struct
> > callchain_cursor_node *node,
> > 
> >  				if (match != MATCH_ERROR)
> >  				
> >  					break;
> >  			
> >  			} else {
> > 
> > -				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode-
> > 
> > >ms.sym->start),
> > 
> > -				    right = node->map->unmap_ip(node->map, node->sym-
>start);
> > -
> > +				u64 left = unmap_ip(cnode->ms.map,
> > +						    cnode->ms.sym->start);
> > +				u64 right = unmap_ip(node->map,
> > +						     node->sym->start);
> 
> So, in the above, you say that cnode->ms.map or node->map may be NULL,
> right? But then both are asking for a sym->start (which is a relative
> address, it came from a symtab), and furthermore, for cnode->ms.sym to
> be not NULL means that cnode->ms.map is not NULL, after all
> cnode->ms.sym came from a dso__find_symbol(cnode->ms.map->dso).

Ugh sorry yes, now I see where my confusion comes from... I clearly did not 
understand Ravi's patch in its entirety - sorry for that.

So trying to bring back some clarity, let's summarize:

- sym->start is always relative
- *->ip is absolute if no map could be found
- *->ip is relative otherwise if there is a map
- we need to always use relative addresses as we want to aggregate from 
different address spaces (see also Namhyung's latest mail in the thread on v6 
of this patch series)
- we need to prevent merging of equal relative addresses from different DSOs

So to fix this all, I guess the suggested approach by Namhyung would be best, 
i.e. fixup my initial match_addresses to take the map, and then if the map is 
valid also take the dso into account when comparing the addresses:

        if (left_dso != right_dso)
                return left_dso < right_dso ? MATCH_LT : MATCH_GT;
        else if (left_ip != right_ip)
                return left_ip < right_ip ? MATCH_LT : MATCH_GT;
        else
                return MATCH_EQ;

> Ditto for node->sym/node->map.
> 
> >  				match = match_chain_addresses(left, right);
> >  				break;
> >  			
> >  			}
> > 
> > @@ -713,7 +719,9 @@ static enum match_result match_chain(struct
> > callchain_cursor_node *node,
> > 
> >  		__fallthrough;
> >  	
> >  	case CCKEY_ADDRESS:
> > 
> >  	default:
> > -		match = match_chain_addresses(cnode->ip, node->ip);
> > +		match = match_chain_addresses(unmap_ip(cnode->ms.map,
> > +						       cnode->ip),
> > +					      unmap_ip(node->map, node->ip));
> 
> Here I need to look further, to see what kind of address cnode->ip is,
> my expectation is that it is a absolute address, so no need for
> unmapping, will check.

Please double check this and also the other points in my list above. It is all 
a bit confusing... 

Do you want me to supply another patch, or will you take care of this?

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 1/5] perf report: properly handle branch count in match_chain
  2017-10-23 18:39       ` Milian Wolff
@ 2017-10-23 20:39         ` Andi Kleen
  0 siblings, 0 replies; 50+ messages in thread
From: Andi Kleen @ 2017-10-23 20:39 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, namhyung, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	Ravi Bangoria

Milian Wolff <milian.wolff@kdab.com> writes:
>  		bi = sample__resolve_bstack(sample, al);
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 94d8f1ccedd9..e54741308e6c 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1824,6 +1824,8 @@ struct branch_info *sample__resolve_bstack(struct 
> perf_sample *sample,
>  		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
>  		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
>  		bi[i].flags = bs->entries[i].flags;
> +		if (bi[i].flags.cycles == 0)
> +			bi[i].flags.cycles = 123;
>  	}
>  	return bi;
>  }
>
> And then I ran again the two perf commands quoted above, but still cannot see 
> any avg_cycles. Am I missing something else? Or could you or someone else with 
> access to the proper hardware maybe test this?

The patch above was for annotate. For the call graphs you need to add
the fake cycles in the call graph path.

> I'd still be interested in seeing source code for an example binary as well as 
> the perf commands that should be used.

When supported, it works with any binary with -b
(see http://halobates.de/applicative-mental-models.pdf)

% cat tcall.c
volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
        c = a / b;
}

__attribute__((noinline)) f1()
{
        f2();
        f2();
}

main()
{
        int i;
        for (i = 0; i < 500000000; i++)
            f1();
}


% perf record -b  ./tcall

% perf report --branch-history --stdio
   78.68%  tcall.c:6              [.] f2                  tcall            
            |          
            |--39.56%--f1 tcall.c:12
            |          f2 tcall.c:7 (cycles:7)
            |          f2 tcall.c:6
            |          f1 tcall.c:12 (cycles:1)
            |          main tcall.c:17
            |          f2 tcall.c:7 (cycles:7)
            |          main tcall.c:18
            |          main tcall.c:17 (cycles:1)
            |          f1 tcall.c:11
            |          main tcall.c:18 (cycles:1)
            |          f2 tcall.c:6
            |          f1 tcall.c:11 (cycles:1)
            |          f1 tcall.c:12
            |          f2 tcall.c:7 (cycles:7)
            |          f2 tcall.c:6
            |          f1 tcall.c:12 (cycles:1)
            |          main tcall.c:17
            |          f2 tcall.c:7 (cycles:7)
            |          main tcall.c:18
            |          main tcall.c:17 (cycles:1)


-Andi

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-23 19:39       ` Milian Wolff
@ 2017-10-23 22:43         ` Arnaldo Carvalho de Melo
  2017-10-24 13:27         ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-23 22:43 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

Em Mon, Oct 23, 2017 at 09:39:57PM +0200, Milian Wolff escreveu:
> On Montag, 23. Oktober 2017 21:04:53 CEST Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 20, 2017 at 10:21:03PM +0200, Milian Wolff escreveu:
> > > On Freitag, 20. Oktober 2017 18:15:40 CEST Arnaldo Carvalho de Melo wrote:
> > > > Em Thu, Oct 19, 2017 at 01:38:31PM +0200, Milian Wolff escreveu:
> > > > > This series of patches completely reworks the way inline frames are
> > > > > handled. Instead of querying for the inline nodes on-demand in the
> > > > > individual tools, we now create proper callchain nodes for inlined
> > > > > frames. The advantages this approach brings are numerous:
> > > > > 
> > > > > - less duplicated code in the individual browser
> > > > > - aggregated cost for inlined frames for the --children top-down list
> > > > > - various bug fixes that arose from querying for a srcline/symbol
> > > > > based on
> > > > > 
> > > > >   the IP of a sample, which will always point to the last inlined
> > > > >   frame
> > > > >   instead of the corresponding non-inlined frame
> > > > > 
> > > > > - overall much better support for visualizing cost for heavily-inlined
> > > > > C++
> > > > > 
> > > > >   code, which simply was confusing and unreliably before
> > > > > 
> > > > > - srcline honors the global setting as to whether full paths or
> > > > > basenames
> > > > > 
> > > > >   should be shown
> > > > > 
> > > > > - caches for inlined frames and srcline information, which allow us to
> > > > > 
> > > > >   enable inline frame handling by default
> > > > > 
> > > > > For comparison, below lists the output before and after for `perf
> > > > > script`
> > > > 
> > > > > and `perf report`. The example file I used to generate the perf data 
> is:
> > > > So, please check my tmp.perf/core branch, it has this patchset + the fix
> > > > I proposed for the match_chain() to always use absolute addresses.
> > > 
> > > OK, so I've looked at it. I think there are some style issues with the
> > > indentation in match_chain_addresses. Also, the unmap_ip lines are too
> > > long
> > > for checkpatch.pl
> > > 
> > > Additionally, we can now still run into the CCKEY_ADDRESS code path (when
> > > match_chain_strings for inlined symbols returns MATCH_ERROR, or when
> > > either
> > > cnode->ms.sym or node->sym is invalid), but won't unmap the IP properly
> > > then.
> >
> > so you're saying that cnode->ip and node->ip may be relative or
> > absolute? I thought they were always absolute, but I'll double check.
> 
> See below.
>  
> > > Can we maybe instead use something like this on top of your patch?
> > > 
> > > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > > index 01fc95fdd1e0..92bca95be202 100644
> > > --- a/tools/perf/util/callchain.c
> > > +++ b/tools/perf/util/callchain.c
> > > @@ -669,11 +669,16 @@ static enum match_result match_chain_strings(const
> > > char *left,
> > > 
> > >  static enum match_result match_chain_addresses(u64 left_ip, u64 right_ip)
> > >  {
> > >  
> > >  	if (left_ip == right_ip)
> > > 
> > > -               return MATCH_EQ;
> > > -       else if (left_ip < right_ip)
> > > -               return MATCH_LT;
> > > -       else
> > > -               return MATCH_GT;
> > > +		return MATCH_EQ;
> > > +	else if (left_ip < right_ip)
> > > +		return MATCH_LT;
> > > +	else
> > > +		return MATCH_GT;
> > > +}
> > 
> > Applied the space fixes above, but the following I don't think makes
> > things clearer, it is not "unmap_ip()" it is at its best
> > try_to_unmap_ip_but_do_not_unmap_if_not_possible() which is confusing
> > 8-)
> > 
> > So we better fix it in the users and continue using the existing
> > map->unmap_ip(map, rel_ip) idiom.
> > 
> > > +static u64 unmap_ip(struct map *map, u64 ip)
> > > +{
> > > +	return map ? map->unmap_ip(map, ip) : ip;
> > > 
> > >  }
> > >  
> > >  static enum match_result match_chain(struct callchain_cursor_node *node,
> > > 
> > > @@ -702,9 +707,10 @@ static enum match_result match_chain(struct
> > > callchain_cursor_node *node,
> > > 
> > >  				if (match != MATCH_ERROR)
> > >  				
> > >  					break;
> > >  			
> > >  			} else {
> > > 
> > > -				u64 left = cnode->ms.map->unmap_ip(cnode->ms.map, cnode-
> > > 
> > > >ms.sym->start),
> > > 
> > > -				    right = node->map->unmap_ip(node->map, node->sym-
> >start);
> > > -
> > > +				u64 left = unmap_ip(cnode->ms.map,
> > > +						    cnode->ms.sym->start);
> > > +				u64 right = unmap_ip(node->map,
> > > +						     node->sym->start);
> > 
> > So, in the above, you say that cnode->ms.map or node->map may be NULL,
> > right? But then both are asking for a sym->start (which is a relative
> > address, it came from a symtab), and furthermore, for cnode->ms.sym to
> > be not NULL means that cnode->ms.map is not NULL, after all
> > cnode->ms.sym came from a dso__find_symbol(cnode->ms.map->dso).
> 
> Ugh sorry yes, now I see where my confusion comes from... I clearly did not 
> understand Ravi's patch in its entirety - sorry for that.
> 
> So trying to bring back some clarity, let's summarize:
> 
> - sym->start is always relative
> - *->ip is absolute if no map could be found
> - *->ip is relative otherwise if there is a map
> - we need to always use relative addresses as we want to aggregate from 
> different address spaces (see also Namhyung's latest mail in the thread on v6 
> of this patch series)

We're aggregating on the same hist entry paths to a function in a DSO, I
see, so indeed we need to always use relative, i.e. the sym->start.

And sometimes we resolve the map but not the symbol, ok. But perhaps
keeping ->ip always as absolute helps in reading the code, i.e. when we
resolve the symbol we have sym->start, relative, when we dont, then we
do a cnode->ms.map(cnode->ms.map, cnode->ip) to get the relative
address.

> - we need to prevent merging of equal relative addresses from different DSOs
> 
> So to fix this all, I guess the suggested approach by Namhyung would be best, 
> i.e. fixup my initial match_addresses to take the map, and then if the map is 
> valid also take the dso into account when comparing the addresses:
> 
>         if (left_dso != right_dso)
>                 return left_dso < right_dso ? MATCH_LT : MATCH_GT;

Humm, but then what would be a cmp function? strcmp(left->dso->name,
right->dso->name)?  Like _sort__dso_cmp() does? Probably yes, for
consistency?

>         else if (left_ip != right_ip)
>                 return left_ip < right_ip ? MATCH_LT : MATCH_GT;
>         else
>                 return MATCH_EQ;
> 
> > Ditto for node->sym/node->map.
> > 
> > >  				match = match_chain_addresses(left, right);
> > >  				break;
> > >  			
> > >  			}
> > > 
> > > @@ -713,7 +719,9 @@ static enum match_result match_chain(struct
> > > callchain_cursor_node *node,
> > > 
> > >  		__fallthrough;
> > >  	
> > >  	case CCKEY_ADDRESS:
> > > 
> > >  	default:
> > > -		match = match_chain_addresses(cnode->ip, node->ip);
> > > +		match = match_chain_addresses(unmap_ip(cnode->ms.map,
> > > +						       cnode->ip),
> > > +					      unmap_ip(node->map, node->ip));
> > 
> > Here I need to look further, to see what kind of address cnode->ip is,
> > my expectation is that it is a absolute address, so no need for
> > unmapping, will check.
 
> Please double check this and also the other points in my list above. It is all 
> a bit confusing... 

Right, but from Namhyung's comments (and yours), the confusion comes
from this "hey, its relative if resolved, absolute if not), which I
think we should resolve by stating that it is the original, "raw"
address that came in the callchain record from the kernel, and when we
need to transform it into some relative address, if we found at least
the map, then we can, using map->map_ip(map, cnode->ip).
 
> Do you want me to supply another patch, or will you take care of this?

Well, having what you think is right in a branch helps in testing, even
if as we discuss we might not use it.

Good, we're making progress, I guess :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-20  5:15     ` Namhyung Kim
@ 2017-10-24  8:51       ` Milian Wolff
  2017-10-25  1:46         ` Namhyung Kim
  2017-11-03 14:21       ` [tip:perf/core] perf callchain: Fix double mapping al->addr for children without self period tip-bot for Namhyung Kim
  1 sibling, 1 reply; 50+ messages in thread
From: Milian Wolff @ 2017-10-24  8:51 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa, kernel-team

On Freitag, 20. Oktober 2017 07:15:33 CEST Namhyung Kim wrote:
> Hi Milian,
> 
> On Thu, Oct 19, 2017 at 12:54:18PM +0200, Milian Wolff wrote:
> > On Mittwoch, 18. Oktober 2017 20:53:50 CEST Milian Wolff wrote:
> > > When inline frame resolution is disabled, a bogus srcline is obtained
> > > for hist entries:
> > > 
> > > ~~~~~
> > > $ perf report -s sym,srcline --no-inline --stdio -g none
> > > 
> > >     95.21%     0.00%  [.] __libc_start_main
> > > 
> > > __libc_start_main+18446603358170398953 95.21%     0.00%  [.] _start
> > > 
> > >                          _start+18446650082411225129 46.67%     0.00% 
> > >                          [.]
> > > 
> > > main
> > > 
> > >                                         main+18446650082411225208 38.75%
> > >  
> > >  0.00%  [.] hypot
> > > 
> > > hypot+18446603358164312084 23.75%     0.00%  [.] main
> > > 
> > >              main+18446650082411225151 20.83%    20.83%  [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  random.h:143 18.12%     0.00%  [.] main
> > >   
> > >   main+18446650082411225165 13.12%    13.12%  [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  random.tcc:3330 4.17%     4.17%  [.] __hypot_finite
> > > >  
> > >     __hypot_finite+163 4.17%     4.17%  [.]
> > >     std::generate_canonical<double,
> > > 
> > > 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul> >  random.tcc:3333 4.17%     0.00%  [.] __hypot_finite
> > > 
> > >                   __hypot_finite+18446603358164312227 4.17%     0.00% 
> > >                   [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  std::generate_canonical<double, 53ul, std::line 2.92%     0.00%  [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  std::generate_canonical<double, 53ul, std::line 2.50%     2.50%  [.]
> > > 
> > > __hypot_finite
> > > 
> > >                                         __hypot_finite+11 2.50%    
> > >                                         2.50%
> > > 
> > > [.] __hypot_finite
> > > 
> > >                                             __hypot_finite+24 2.50%
> > > 
> > > 0.00%  [.] __hypot_finite
> > > 
> > > __hypot_finite+18446603358164312075 2.50%     0.00%  [.] __hypot_finite
> > > 
> > >                      __hypot_finite+18446603358164312088 ~~~~~
> > > 
> > > Note how we get very large offsets to main and cannot see any srcline
> > > from one of the complex or random headers, even though the instruction
> > > pointers actually lie in code inlined from there.
> > > 
> > > This patch fixes the mapping to use map__objdump_2mem instead of
> > > map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> > > values for me when inline resolution is disabled:
> > > 
> > > ~~~~~
> > > $ perf report -s sym,srcline --no-inline --stdio -g none
> > > 
> > >     95.21%     0.00%  [.] __libc_start_main
> > > 
> > > __libc_start_main+233 95.21%     0.00%  [.] _start
> > > 
> > >         _start+41 46.88%     0.00%  [.] main
> > >     
> > >     complex:589 43.96%     0.00%  [.] main
> > >   
> > >   random.h:185 38.75%     0.00%  [.] hypot
> > >  
> > >  hypot+20 20.83%     0.00%  [.] std::generate_canonical<double, 53ul,
> > > 
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  random.h:143 13.12%     0.00%  [.] std::generate_canonical<double,
> > > >  53ul,
> > > 
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  random.tcc:3330 4.17%     4.17%  [.] __hypot_finite
> > > >  
> > >     __hypot_finite+140715545239715 4.17%     4.17%  [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  std::generate_canonical<double, 53ul, std::line 4.17%     0.00%  [.]
> > > 
> > > __hypot_finite
> > > 
> > >                                         __hypot_finite+163 4.17%    
> > >                                         0.00%
> > > 
> > > [.] std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  random.tcc:3333 2.92%     2.92%  [.] std::generate_canonical<double,
> > > 
> > > 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul> >  std::generate_canonical<double, 53ul, std::line 2.50%
> > > 2.50%  [.] __hypot_finite
> > > 
> > > __hypot_finite+140715545239563 2.50%     2.50%  [.] __hypot_finite
> > > 
> > >                 __hypot_finite+140715545239576 2.50%     2.50%  [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  std::generate_canonical<double, 53ul, std::line 2.50%     2.50%  [.]
> > > 
> > > std::generate_canonical<double, 53ul,
> > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > > 2147483647ul>
> > > 
> > > >  std::generate_canonical<double, 53ul, std::line 2.50%     0.00%  [.]
> > > 
> > > __hypot_finite
> > > 
> > >                                         __hypot_finite+11 ~~~~~
> > > 
> > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > Cc: Yao Jin <yao.jin@linux.intel.com>
> > > Cc: Jiri Olsa <jolsa@redhat.com>
> > > Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> > > 
> > > Note how most of the large offset values are now gone. Most notably,
> > > we get proper srcline resolution for the random.h and complex headers.
> > > ---
> > > 
> > >  tools/perf/util/sort.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > > index 006d10a0dc96..6f3d109078a3 100644
> > > --- a/tools/perf/util/sort.c
> > > +++ b/tools/perf/util/sort.c
> > > @@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
> > > 
> > >  	if (!map)
> > >  	
> > >  		return SRCLINE_UNKNOWN;
> > > 
> > > -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> > > +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
> > > 
> > >  			   he->ms.sym, true, true);
> > >  
> > >  }
> > 
> > Sorry, this patch was declined by Nahmyung before, please discard it - I
> > forgot to do that before resending v6.
> 
> I looked into it and found a bug handling cumulative (children)
> entries.  For chilren entries that has no self period, the al->addr
> (so he->ip) ends up having an doubly-mapped address.
> 
> It seems to be there from the beginning but only affects entries that
> have no srclines - finding srcline itself is done using a different
> address but it will show the invalid address if no srcline was found.
> I think we should fix the commit c7405d85d7a3 ("perf tools: Update
> cpumode for each cumulative entry").
> 
> Could you please test the following patch works for you?

Sorry for the delay, nearly forgot about this mail. The patch below does help 
in my situation, thanks! Can you commit it please?

> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 35a920f09503..d18cdcc8d132 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -1074,10 +1074,7 @@ int fill_callchain_info(struct addr_location *al,
> struct callchain_cursor_node * {
>         al->map = node->map;
>         al->sym = node->sym;
> -       if (node->map)
> -               al->addr = node->map->map_ip(node->map, node->ip);
> -       else
> -               al->addr = node->ip;
> +       al->addr = node->ip;
> 
>         if (al->sym == NULL) {
>                 if (hide_unresolved)



-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-23 19:39       ` Milian Wolff
  2017-10-23 22:43         ` Arnaldo Carvalho de Melo
@ 2017-10-24 13:27         ` Arnaldo Carvalho de Melo
  2017-10-25  2:09           ` Namhyung Kim
  1 sibling, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-24 13:27 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, namhyung, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria

Em Mon, Oct 23, 2017 at 09:39:57PM +0200, Milian Wolff escreveu:
> So to fix this all, I guess the suggested approach by Namhyung would be best, 
> i.e. fixup my initial match_addresses to take the map, and then if the map is 
> valid also take the dso into account when comparing the addresses:
> 
>         if (left_dso != right_dso)
>                 return left_dso < right_dso ? MATCH_LT : MATCH_GT;
>         else if (left_ip != right_ip)
>                 return left_ip < right_ip ? MATCH_LT : MATCH_GT;
>         else
>                 return MATCH_EQ;

So, can you check that the patch below is the one we should commit to?
Namhyung? I'm looking at your latest patch kit, v7, to see if the branch
parts, further below, are as you submitted or if I have any issues with
it.

I've updated my perf/core branch with all this.

commit 275049196c64cc1233837c9f066b4b87e32cd1df
Author: Milian Wolff <milian.wolff@kdab.com>
Date:   Fri Oct 20 12:14:47 2017 -0300

    perf report: Properly handle branch count in match_chain()
    
    Some of the code paths I introduced before returned too early without
    running the code to handle a node's branch count.  By refactoring
    match_chain to only have one exit point, this can be remedied.
    
    Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
    Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
    Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f09503..19bfcadcf891 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,83 +666,99 @@ static enum match_result match_chain_strings(const char *left,
 	return ret;
 }
 
-static enum match_result match_chain(struct callchain_cursor_node *node,
-				     struct callchain_list *cnode)
+/*
+ * We need to always use relative addresses because we're aggregating
+ * callchains from multiple threads, i.e. different address spaces, so
+ * comparing absolute addresses make no sense as a symbol in a DSO may end up
+ * in a different address when used in a different binary or even the same
+ * binary but with some sort of address randomization technique, thus we need
+ * to compare just relative addresses. -acme
+ */
+static enum match_result match_chain_dso_addresses(struct map *left_map, u64 left_ip,
+						   struct map *right_map, u64 right_ip)
 {
-	struct symbol *sym = node->sym;
-	u64 left, right;
-	struct dso *left_dso = NULL;
-	struct dso *right_dso = NULL;
+	struct dso *left_dso = left_map ? left_map->dso : NULL;
+	struct dso *right_dso = right_map ? right_map->dso : NULL;
 
-	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_strings(cnode->srcline,
-							      node->srcline);
+	if (left_dso != right_dso)
+		return left_dso < right_dso ? MATCH_LT : MATCH_GT;
 
-		/* if no srcline is available, fallback to symbol name */
-		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
-			match = match_chain_strings(cnode->ms.sym->name,
-						    node->sym->name);
+	if (left_ip != right_ip)
+ 		return left_ip < right_ip ? MATCH_LT : MATCH_GT;
 
-		if (match != MATCH_ERROR)
-			return match;
+	return MATCH_EQ;
+}
 
-		/* otherwise fall-back to IP-based comparison below */
-	}
+static enum match_result match_chain(struct callchain_cursor_node *node,
+				     struct callchain_list *cnode)
+{
+	enum match_result match = MATCH_ERROR;
 
-	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-		/*
-		 * Compare inlined frames based on their symbol name because
-		 * different inlined frames will have the same symbol start
-		 */
-		if (cnode->ms.sym->inlined || node->sym->inlined)
-			return match_chain_strings(cnode->ms.sym->name,
-						   node->sym->name);
-
-		left = cnode->ms.sym->start;
-		right = sym->start;
-		left_dso = cnode->ms.map->dso;
-		right_dso = node->map->dso;
-	} else {
-		left = cnode->ip;
-		right = node->ip;
+	switch (callchain_param.key) {
+	case CCKEY_SRCLINE:
+		match = match_chain_strings(cnode->srcline, node->srcline);
+		if (match != MATCH_ERROR)
+			break;
+		/* otherwise fall-back to symbol-based comparison below */
+		__fallthrough;
+	case CCKEY_FUNCTION:
+		if (node->sym && cnode->ms.sym) {
+			/*
+			 * Compare inlined frames based on their symbol name
+			 * because different inlined frames will have the same
+			 * symbol start. Otherwise do a faster comparison based
+			 * on the symbol start address.
+			 */
+			if (cnode->ms.sym->inlined || node->sym->inlined) {
+				match = match_chain_strings(cnode->ms.sym->name,
+							    node->sym->name);
+				if (match != MATCH_ERROR)
+					break;
+			} else {
+				match = match_chain_dso_addresses(cnode->ms.map, cnode->ms.sym->start,
+								  node->map, node->sym->start);
+				break;
+			}
+		}
+		/* otherwise fall-back to IP-based comparison below */
+		__fallthrough;
+	case CCKEY_ADDRESS:
+	default:
+		match = match_chain_dso_addresses(cnode->ms.map, cnode->ip, node->map, node->ip);
+		break;
 	}
 
-	if (left == right && left_dso == right_dso) {
-		if (node->branch) {
-			cnode->branch_count++;
+	if (match == MATCH_EQ && node->branch) {
+		cnode->branch_count++;
 
-			if (node->branch_from) {
-				/*
-				 * It's "to" of a branch
-				 */
-				cnode->brtype_stat.branch_to = true;
+		if (node->branch_from) {
+			/*
+			 * It's "to" of a branch
+			 */
+			cnode->brtype_stat.branch_to = true;
 
-				if (node->branch_flags.predicted)
-					cnode->predicted_count++;
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
 
-				if (node->branch_flags.abort)
-					cnode->abort_count++;
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
 
-				branch_type_count(&cnode->brtype_stat,
-						  &node->branch_flags,
-						  node->branch_from,
-						  node->ip);
-			} else {
-				/*
-				 * It's "from" of a branch
-				 */
-				cnode->brtype_stat.branch_to = false;
-				cnode->cycles_count +=
-					node->branch_flags.cycles;
-				cnode->iter_count += node->nr_loop_iter;
-				cnode->iter_cycles += node->iter_cycles;
-			}
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
+		} else {
+			/*
+			 * It's "from" of a branch
+			 */
+			cnode->brtype_stat.branch_to = false;
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->iter_cycles += node->iter_cycles;
 		}
-
-		return MATCH_EQ;
 	}
 
-	return left > right ? MATCH_GT : MATCH_LT;
+	return match;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-24  8:51       ` Milian Wolff
@ 2017-10-25  1:46         ` Namhyung Kim
  2017-10-30 20:03           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 50+ messages in thread
From: Namhyung Kim @ 2017-10-25  1:46 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa, kernel-team

Hi Milian,

On Tue, Oct 24, 2017 at 10:51:43AM +0200, Milian Wolff wrote:
> On Freitag, 20. Oktober 2017 07:15:33 CEST Namhyung Kim wrote:
> > I looked into it and found a bug handling cumulative (children)
> > entries.  For chilren entries that has no self period, the al->addr
> > (so he->ip) ends up having an doubly-mapped address.
> > 
> > It seems to be there from the beginning but only affects entries that
> > have no srclines - finding srcline itself is done using a different
> > address but it will show the invalid address if no srcline was found.
> > I think we should fix the commit c7405d85d7a3 ("perf tools: Update
> > cpumode for each cumulative entry").
> > 
> > Could you please test the following patch works for you?
> 
> Sorry for the delay, nearly forgot about this mail. The patch below does help 
> in my situation, thanks! Can you commit it please?

Sure, I'll add your Tested-by then.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v7 0/5] generate full callchain cursor entries for inlined frames
  2017-10-24 13:27         ` Arnaldo Carvalho de Melo
@ 2017-10-25  2:09           ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2017-10-25  2:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Milian Wolff, jolsa, linux-kernel, linux-perf-users, Andi Kleen,
	Ravi Bangoria, kernel-team

Hi Arnaldo,

On Tue, Oct 24, 2017 at 10:27:42AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 23, 2017 at 09:39:57PM +0200, Milian Wolff escreveu:
> > So to fix this all, I guess the suggested approach by Namhyung would be best, 
> > i.e. fixup my initial match_addresses to take the map, and then if the map is 
> > valid also take the dso into account when comparing the addresses:
> > 
> >         if (left_dso != right_dso)
> >                 return left_dso < right_dso ? MATCH_LT : MATCH_GT;
> >         else if (left_ip != right_ip)
> >                 return left_ip < right_ip ? MATCH_LT : MATCH_GT;
> >         else
> >                 return MATCH_EQ;
> 
> So, can you check that the patch below is the one we should commit to?
> Namhyung? I'm looking at your latest patch kit, v7, to see if the branch
> parts, further below, are as you submitted or if I have any issues with
> it.
> 
> I've updated my perf/core branch with all this.
> 
> commit 275049196c64cc1233837c9f066b4b87e32cd1df
> Author: Milian Wolff <milian.wolff@kdab.com>
> Date:   Fri Oct 20 12:14:47 2017 -0300
> 
>     perf report: Properly handle branch count in match_chain()
>     
>     Some of the code paths I introduced before returned too early without
>     running the code to handle a node's branch count.  By refactoring
>     match_chain to only have one exit point, this can be remedied.
>     
>     Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
>     Cc: David Ahern <dsahern@gmail.com>
>     Cc: Jin Yao <yao.jin@linux.intel.com>
>     Cc: Namhyung Kim <namhyung@kernel.org>
>     Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>     Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
>     Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 35a920f09503..19bfcadcf891 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -666,83 +666,99 @@ static enum match_result match_chain_strings(const char *left,
>  	return ret;
>  }
>  
> -static enum match_result match_chain(struct callchain_cursor_node *node,
> -				     struct callchain_list *cnode)
> +/*
> + * We need to always use relative addresses because we're aggregating
> + * callchains from multiple threads, i.e. different address spaces, so
> + * comparing absolute addresses make no sense as a symbol in a DSO may end up
> + * in a different address when used in a different binary or even the same
> + * binary but with some sort of address randomization technique, thus we need
> + * to compare just relative addresses. -acme
> + */
> +static enum match_result match_chain_dso_addresses(struct map *left_map, u64 left_ip,
> +						   struct map *right_map, u64 right_ip)
>  {
> -	struct symbol *sym = node->sym;
> -	u64 left, right;
> -	struct dso *left_dso = NULL;
> -	struct dso *right_dso = NULL;
> +	struct dso *left_dso = left_map ? left_map->dso : NULL;
> +	struct dso *right_dso = right_map ? right_map->dso : NULL;
>  
> -	if (callchain_param.key == CCKEY_SRCLINE) {
> -		enum match_result match = match_chain_strings(cnode->srcline,
> -							      node->srcline);
> +	if (left_dso != right_dso)
> +		return left_dso < right_dso ? MATCH_LT : MATCH_GT;
>  
> -		/* if no srcline is available, fallback to symbol name */
> -		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
> -			match = match_chain_strings(cnode->ms.sym->name,
> -						    node->sym->name);
> +	if (left_ip != right_ip)
> + 		return left_ip < right_ip ? MATCH_LT : MATCH_GT;
>  
> -		if (match != MATCH_ERROR)
> -			return match;
> +	return MATCH_EQ;
> +}
>  
> -		/* otherwise fall-back to IP-based comparison below */
> -	}
> +static enum match_result match_chain(struct callchain_cursor_node *node,
> +				     struct callchain_list *cnode)
> +{
> +	enum match_result match = MATCH_ERROR;
>  
> -	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> -		/*
> -		 * Compare inlined frames based on their symbol name because
> -		 * different inlined frames will have the same symbol start
> -		 */
> -		if (cnode->ms.sym->inlined || node->sym->inlined)
> -			return match_chain_strings(cnode->ms.sym->name,
> -						   node->sym->name);
> -
> -		left = cnode->ms.sym->start;
> -		right = sym->start;
> -		left_dso = cnode->ms.map->dso;
> -		right_dso = node->map->dso;
> -	} else {
> -		left = cnode->ip;
> -		right = node->ip;
> +	switch (callchain_param.key) {
> +	case CCKEY_SRCLINE:
> +		match = match_chain_strings(cnode->srcline, node->srcline);
> +		if (match != MATCH_ERROR)
> +			break;
> +		/* otherwise fall-back to symbol-based comparison below */
> +		__fallthrough;
> +	case CCKEY_FUNCTION:
> +		if (node->sym && cnode->ms.sym) {
> +			/*
> +			 * Compare inlined frames based on their symbol name
> +			 * because different inlined frames will have the same
> +			 * symbol start. Otherwise do a faster comparison based
> +			 * on the symbol start address.
> +			 */
> +			if (cnode->ms.sym->inlined || node->sym->inlined) {
> +				match = match_chain_strings(cnode->ms.sym->name,
> +							    node->sym->name);
> +				if (match != MATCH_ERROR)
> +					break;
> +			} else {
> +				match = match_chain_dso_addresses(cnode->ms.map, cnode->ms.sym->start,
> +								  node->map, node->sym->start);
> +				break;
> +			}
> +		}
> +		/* otherwise fall-back to IP-based comparison below */
> +		__fallthrough;
> +	case CCKEY_ADDRESS:
> +	default:
> +		match = match_chain_dso_addresses(cnode->ms.map, cnode->ip, node->map, node->ip);
> +		break;
>  	}
>  
> -	if (left == right && left_dso == right_dso) {
> -		if (node->branch) {
> -			cnode->branch_count++;
> +	if (match == MATCH_EQ && node->branch) {
> +		cnode->branch_count++;
>  
> -			if (node->branch_from) {
> -				/*
> -				 * It's "to" of a branch
> -				 */
> -				cnode->brtype_stat.branch_to = true;
> +		if (node->branch_from) {
> +			/*
> +			 * It's "to" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = true;
>  
> -				if (node->branch_flags.predicted)
> -					cnode->predicted_count++;
> +			if (node->branch_flags.predicted)
> +				cnode->predicted_count++;
>  
> -				if (node->branch_flags.abort)
> -					cnode->abort_count++;
> +			if (node->branch_flags.abort)
> +				cnode->abort_count++;
>  
> -				branch_type_count(&cnode->brtype_stat,
> -						  &node->branch_flags,
> -						  node->branch_from,
> -						  node->ip);
> -			} else {
> -				/*
> -				 * It's "from" of a branch
> -				 */
> -				cnode->brtype_stat.branch_to = false;
> -				cnode->cycles_count +=
> -					node->branch_flags.cycles;
> -				cnode->iter_count += node->nr_loop_iter;
> -				cnode->iter_cycles += node->iter_cycles;
> -			}
> +			branch_type_count(&cnode->brtype_stat,
> +					  &node->branch_flags,
> +					  node->branch_from,
> +					  node->ip);
> +		} else {
> +			/*
> +			 * It's "from" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = false;
> +			cnode->cycles_count += node->branch_flags.cycles;
> +			cnode->iter_count += node->nr_loop_iter;
> +			cnode->iter_cycles += node->iter_cycles;
>  		}
> -
> -		return MATCH_EQ;
>  	}
>  
> -	return left > right ? MATCH_GT : MATCH_LT;
> +	return match;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf report: Properly handle branch count in match_chain()
  2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
  2017-10-18 22:41   ` Andi Kleen
  2017-10-20 15:22   ` Arnaldo Carvalho de Melo
@ 2017-10-25 17:20   ` tip-bot for Milian Wolff
  2 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dsahern, a.p.zijlstra, acme, linux-kernel, hpa, milian.wolff,
	yao.jin, namhyung, tglx, mingo, ravi.bangoria

Commit-ID:  bf36eb5c4b3ef0ebfb19b1a67a5fa5821e6c9fa7
Gitweb:     https://git.kernel.org/tip/bf36eb5c4b3ef0ebfb19b1a67a5fa5821e6c9fa7
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Fri, 20 Oct 2017 12:14:47 -0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 25 Oct 2017 10:50:37 -0300

perf report: Properly handle branch count in match_chain()

Some of the code paths I introduced before returned too early without
running the code to handle a node's branch count.  By refactoring
match_chain to only have one exit point, this can be remedied.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 140 ++++++++++++++++++++++++--------------------
 1 file changed, 78 insertions(+), 62 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f..19bfcad 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,83 +666,99 @@ static enum match_result match_chain_strings(const char *left,
 	return ret;
 }
 
-static enum match_result match_chain(struct callchain_cursor_node *node,
-				     struct callchain_list *cnode)
+/*
+ * We need to always use relative addresses because we're aggregating
+ * callchains from multiple threads, i.e. different address spaces, so
+ * comparing absolute addresses make no sense as a symbol in a DSO may end up
+ * in a different address when used in a different binary or even the same
+ * binary but with some sort of address randomization technique, thus we need
+ * to compare just relative addresses. -acme
+ */
+static enum match_result match_chain_dso_addresses(struct map *left_map, u64 left_ip,
+						   struct map *right_map, u64 right_ip)
 {
-	struct symbol *sym = node->sym;
-	u64 left, right;
-	struct dso *left_dso = NULL;
-	struct dso *right_dso = NULL;
+	struct dso *left_dso = left_map ? left_map->dso : NULL;
+	struct dso *right_dso = right_map ? right_map->dso : NULL;
 
-	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_strings(cnode->srcline,
-							      node->srcline);
+	if (left_dso != right_dso)
+		return left_dso < right_dso ? MATCH_LT : MATCH_GT;
 
-		/* if no srcline is available, fallback to symbol name */
-		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
-			match = match_chain_strings(cnode->ms.sym->name,
-						    node->sym->name);
+	if (left_ip != right_ip)
+ 		return left_ip < right_ip ? MATCH_LT : MATCH_GT;
 
-		if (match != MATCH_ERROR)
-			return match;
+	return MATCH_EQ;
+}
 
-		/* otherwise fall-back to IP-based comparison below */
-	}
+static enum match_result match_chain(struct callchain_cursor_node *node,
+				     struct callchain_list *cnode)
+{
+	enum match_result match = MATCH_ERROR;
 
-	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-		/*
-		 * Compare inlined frames based on their symbol name because
-		 * different inlined frames will have the same symbol start
-		 */
-		if (cnode->ms.sym->inlined || node->sym->inlined)
-			return match_chain_strings(cnode->ms.sym->name,
-						   node->sym->name);
-
-		left = cnode->ms.sym->start;
-		right = sym->start;
-		left_dso = cnode->ms.map->dso;
-		right_dso = node->map->dso;
-	} else {
-		left = cnode->ip;
-		right = node->ip;
+	switch (callchain_param.key) {
+	case CCKEY_SRCLINE:
+		match = match_chain_strings(cnode->srcline, node->srcline);
+		if (match != MATCH_ERROR)
+			break;
+		/* otherwise fall-back to symbol-based comparison below */
+		__fallthrough;
+	case CCKEY_FUNCTION:
+		if (node->sym && cnode->ms.sym) {
+			/*
+			 * Compare inlined frames based on their symbol name
+			 * because different inlined frames will have the same
+			 * symbol start. Otherwise do a faster comparison based
+			 * on the symbol start address.
+			 */
+			if (cnode->ms.sym->inlined || node->sym->inlined) {
+				match = match_chain_strings(cnode->ms.sym->name,
+							    node->sym->name);
+				if (match != MATCH_ERROR)
+					break;
+			} else {
+				match = match_chain_dso_addresses(cnode->ms.map, cnode->ms.sym->start,
+								  node->map, node->sym->start);
+				break;
+			}
+		}
+		/* otherwise fall-back to IP-based comparison below */
+		__fallthrough;
+	case CCKEY_ADDRESS:
+	default:
+		match = match_chain_dso_addresses(cnode->ms.map, cnode->ip, node->map, node->ip);
+		break;
 	}
 
-	if (left == right && left_dso == right_dso) {
-		if (node->branch) {
-			cnode->branch_count++;
+	if (match == MATCH_EQ && node->branch) {
+		cnode->branch_count++;
 
-			if (node->branch_from) {
-				/*
-				 * It's "to" of a branch
-				 */
-				cnode->brtype_stat.branch_to = true;
+		if (node->branch_from) {
+			/*
+			 * It's "to" of a branch
+			 */
+			cnode->brtype_stat.branch_to = true;
 
-				if (node->branch_flags.predicted)
-					cnode->predicted_count++;
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
 
-				if (node->branch_flags.abort)
-					cnode->abort_count++;
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
 
-				branch_type_count(&cnode->brtype_stat,
-						  &node->branch_flags,
-						  node->branch_from,
-						  node->ip);
-			} else {
-				/*
-				 * It's "from" of a branch
-				 */
-				cnode->brtype_stat.branch_to = false;
-				cnode->cycles_count +=
-					node->branch_flags.cycles;
-				cnode->iter_count += node->nr_loop_iter;
-				cnode->iter_cycles += node->iter_cycles;
-			}
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
+		} else {
+			/*
+			 * It's "from" of a branch
+			 */
+			cnode->brtype_stat.branch_to = false;
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->iter_cycles += node->iter_cycles;
 		}
-
-		return MATCH_EQ;
 	}
 
-	return left > right ? MATCH_GT : MATCH_LT;
+	return match;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf report: Cache failed lookups of inlined frames
  2017-10-19 11:38 ` [PATCH v7 2/5] perf report: cache failed lookups of inlined frames Milian Wolff
@ 2017-10-25 17:20   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: namhyung, acme, peterz, ak, jolsa, linux-kernel, yao.jin,
	dsahern, mingo, hpa, milian.wolff, tglx

Commit-ID:  b38775cf7678d7715b35dded3dcfab66e244baae
Gitweb:     https://git.kernel.org/tip/b38775cf7678d7715b35dded3dcfab66e244baae
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Thu, 19 Oct 2017 13:38:33 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 25 Oct 2017 10:50:45 -0300

perf report: Cache failed lookups of inlined frames

When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.

This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.

For my trivial example, the performance impact is already quite
significant:

Before:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
             5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
     2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
     4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
       939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
        11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )

       0.596246531 seconds time elapsed                                          ( +-  0.07% )
~~~~~

After:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                 0      cpu-migrations            #    0.000 K/sec
             5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
       432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
       670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
       141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
         2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )

       0.114222393 seconds time elapsed                                          ( +-  1.19% )
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 15 +++++++--------
 tools/perf/util/srcline.c | 16 +---------------
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3d049cb..177c1d4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2115,9 +2115,10 @@ static int append_inlines(struct callchain_cursor *cursor,
 	struct inline_node *inline_node;
 	struct inline_list *ilist;
 	u64 addr;
+	int ret = 1;
 
 	if (!symbol_conf.inline_name || !map || !sym)
-		return 1;
+		return ret;
 
 	addr = map__rip_2objdump(map, ip);
 
@@ -2125,22 +2126,20 @@ static int append_inlines(struct callchain_cursor *cursor,
 	if (!inline_node) {
 		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
 		if (!inline_node)
-			return 1;
-
+			return ret;
 		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
 	}
 
 	list_for_each_entry(ilist, &inline_node->val, list) {
-		int ret = callchain_cursor_append(cursor, ip, map,
-						  ilist->symbol, false,
-						  NULL, 0, 0, 0,
-						  ilist->srcline);
+		ret = callchain_cursor_append(cursor, ip, map,
+					      ilist->symbol, false,
+					      NULL, 0, 0, 0, ilist->srcline);
 
 		if (ret != 0)
 			return ret;
 	}
 
-	return 0;
+	return ret;
 }
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 8bea662..fc38886 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -353,17 +353,8 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	INIT_LIST_HEAD(&node->val);
 	node->addr = addr;
 
-	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
-		goto out_free_inline_node;
-
-	if (list_empty(&node->val))
-		goto out_free_inline_node;
-
+	addr2line(dso_name, addr, NULL, NULL, dso, true, node, sym);
 	return node;
-
-out_free_inline_node:
-	inline_node__delete(node);
-	return NULL;
 }
 
 #else /* HAVE_LIBBFD_SUPPORT */
@@ -480,11 +471,6 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 out:
 	pclose(fp);
 
-	if (list_empty(&node->val)) {
-		inline_node__delete(node);
-		return NULL;
-	}
-
 	return node;
 }
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf report: Cache srclines for callchain nodes
  2017-10-19 11:38 ` [PATCH v7 3/5] perf report: cache srclines for callchain nodes Milian Wolff
@ 2017-10-25 17:20   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: milian.wolff, peterz, linux-kernel, acme, jolsa, hpa, ak,
	dsahern, yao.jin, mingo, namhyung, tglx

Commit-ID:  21ac9d547fdde79c1e8692587d9044fde549214b
Gitweb:     https://git.kernel.org/tip/21ac9d547fdde79c1e8692587d9044fde549214b
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Thu, 19 Oct 2017 13:38:34 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 25 Oct 2017 10:50:46 -0300

perf report: Cache srclines for callchain nodes

On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/dso.c     |  2 ++
 tools/perf/util/dso.h     |  1 +
 tools/perf/util/machine.c | 17 +++++++++---
 tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.h |  7 +++++
 5 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 75c8250..3192b60 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1203,6 +1203,7 @@ struct dso *dso__new(const char *name)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT;
+		dso->srclines = RB_ROOT;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1237,6 +1238,7 @@ void dso__delete(struct dso *dso)
 
 	/* free inlines first, as they reference symbols */
 	inlines__tree_delete(&dso->inlined_nodes);
+	srcline__tree_delete(&dso->srclines);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 122eca0..821b16c 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	struct rb_root	 inlined_nodes;
+	struct rb_root	 srclines;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 177c1d4..94d8f1c 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1711,11 +1711,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 
 static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
 {
+	char *srcline = NULL;
+
 	if (!map || callchain_param.key == CCKEY_FUNCTION)
-		return NULL;
+		return srcline;
+
+	srcline = srcline__tree_find(&map->dso->srclines, ip);
+	if (!srcline) {
+		bool show_sym = false;
+		bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+		srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+				      sym, show_sym, show_addr);
+		srcline__tree_insert(&map->dso->srclines, ip, srcline);
+	}
 
-	return get_srcline(map->dso, map__rip_2objdump(map, ip),
-			   sym, false, callchain_param.key == CCKEY_ADDRESS);
+	return srcline;
 }
 
 struct iterations {
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index fc38886..c143c3b 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -542,6 +542,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
 }
 
+struct srcline_node {
+	u64			addr;
+	char			*srcline;
+	struct rb_node		rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+	struct rb_node **p = &tree->rb_node;
+	struct rb_node *parent = NULL;
+	struct srcline_node *i, *node;
+
+	node = zalloc(sizeof(struct srcline_node));
+	if (!node) {
+		perror("not enough memory for the srcline node");
+		return;
+	}
+
+	node->addr = addr;
+	node->srcline = srcline;
+
+	while (*p != NULL) {
+		parent = *p;
+		i = rb_entry(parent, struct srcline_node, rb_node);
+		if (addr < i->addr)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&node->rb_node, parent, p);
+	rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+	struct rb_node *n = tree->rb_node;
+
+	while (n) {
+		struct srcline_node *i = rb_entry(n, struct srcline_node,
+						  rb_node);
+
+		if (addr < i->addr)
+			n = n->rb_left;
+		else if (addr > i->addr)
+			n = n->rb_right;
+		else
+			return i->srcline;
+	}
+
+	return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+	struct srcline_node *pos;
+	struct rb_node *next = rb_first(tree);
+
+	while (next) {
+		pos = rb_entry(next, struct srcline_node, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, tree);
+		free_srcline(pos->srcline);
+		zfree(&pos);
+	}
+}
+
 struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 					    struct symbol *sym)
 {
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index ebe38cd..1c4d621 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		  bool show_sym, bool show_addr, bool unwind_inlines);
 void free_srcline(char *srcline);
 
+/* insert the srcline into the DSO, which will take ownership */
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+/* find previously inserted srcline */
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+/* delete all srclines within the tree */
+void srcline__tree_delete(struct rb_root *tree);
+
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct inline_list {

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf report: Use srcline from callchain for hist entries
  2017-10-19 11:38 ` [PATCH v7 4/5] perf report: use srcline from callchain for hist entries Milian Wolff
@ 2017-10-25 17:21   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, linux-kernel, dsahern, namhyung, acme, peterz, tglx,
	milian.wolff, yao.jin, hpa, mingo, jolsa

Commit-ID:  1fb7d06a509e82893e59e0f0b223e7d5d6d0ef8c
Gitweb:     https://git.kernel.org/tip/1fb7d06a509e82893e59e0f0b223e7d5d6d0ef8c
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Thu, 19 Oct 2017 13:38:35 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 25 Oct 2017 10:50:46 -0300

perf report: Use srcline from callchain for hist entries

This also removes the symbol name from the srcline column, more on this
below.

This ensures we use the correct srcline, which could originate from a
potentially inlined function. The hist entries used to query for the
srcline based purely on the IP, which leads to wrong results for inlined
entries.

Before:

~~~~~
  perf report --inline -s srcline -g none --stdio
  ...
  # Children      Self  Source:Line
  # ........  ........  ..................................................................................................................................
  #
      94.23%     0.00%  __libc_start_main+18446603487898210537
      94.23%     0.00%  _start+41
      44.58%     0.00%  main+100
      44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
      44.58%     0.00%  std::__complex_abs+100
      44.58%     0.00%  std::abs<double>+100
      44.58%     0.00%  std::norm<double>+100
      36.01%     0.00%  hypot+18446603487892193300
      25.81%     0.00%  main+41
      25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
      25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
      25.75%    25.75%  random.h:143
      18.39%     0.00%  main+57
      18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
      18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
      13.80%    13.80%  random.tcc:3330
       5.64%     0.00%  ??:0
       4.13%     4.13%  __hypot_finite+163
       4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
  perf report --inline -s srcline -g none --stdio
  ...
  # Children      Self  Source:Line
  # ........  ........  ...........................................
  #
      94.30%     1.19%  main.cpp:39
      94.23%     0.00%  __libc_start_main+18446603487898210537
      94.23%     0.00%  _start+41
      48.44%     1.70%  random.h:1823
      48.44%     0.00%  random.h:1814
      46.74%     2.53%  random.h:185
      44.68%     0.10%  complex:589
      44.68%     0.00%  complex:597
      44.68%     0.00%  complex:654
      44.68%     0.00%  complex:664
      40.61%    13.80%  random.tcc:3330
      36.01%     0.00%  hypot+18446603487892193300
      26.81%     0.00%  random.h:151
      26.81%     0.00%  random.h:332
      25.75%    25.75%  random.h:143
       5.64%     0.00%  ??:0
       4.13%     4.13%  __hypot_finite+163
       4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line hist
column. If this information is desired, users should explicitly query
for it if needed. I.e. run this command instead:

~~~~~
  perf report --inline -s sym,srcline -g none --stdio
  ...
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1K of event 'cycles:uppp'
  # Event count (approx.): 1381229476
  #
  # Children      Self  Symbol                                                                                                                               Source:Line
  # ........  ........  ...................................................................................................................................  ...........................................
  #
      94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
      94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
      94.23%     0.00%  [.] _start                                                                                                                           _start+41
      48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
      48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
      46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
      44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
      44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
      44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
      44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
      39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
      36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
      26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
      26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
      25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
      25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
       4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
       4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
  perf report --inline -s sym,srcline -g none --stdio
  ...
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1K of event 'cycles:uppp'
  # Event count (approx.): 1381229476
  #
  # Children      Self  Symbol                                                                                                                               Source:Line
  # ........  ........  ...................................................................................................................................  ..................................................................................................................................
  #
      94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
      94.23%     0.00%  [.] _start                                                                                                                           _start+41
      44.58%     0.00%  [.] main                                                                                                                             main+100
      44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
      44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
      44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
      44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
      36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
      25.81%     0.00%  [.] main                                                                                                                             main+41
      25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
      25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
      25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
      18.39%     0.00%  [.] main                                                                                                                             main+57
      18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
      18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
      13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
       4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
       4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 1 +
 tools/perf/util/event.c     | 1 +
 tools/perf/util/hist.c      | 2 ++
 tools/perf/util/symbol.h    | 1 +
 4 files changed, 5 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 19bfcad..3a39169 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1090,6 +1090,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 {
 	al->map = node->map;
 	al->sym = node->sym;
+	al->srcline = node->srcline;
 	if (node->map)
 		al->addr = node->map->map_ip(node->map, node->ip);
 	else
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 47eff47..3c411e7 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1604,6 +1604,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 	al->sym = NULL;
 	al->cpu = sample->cpu;
 	al->socket = -1;
+	al->srcline = NULL;
 
 	if (al->cpu >= 0) {
 		struct perf_env *env = machine->env;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b0fa9c2..25d1430 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -596,6 +596,7 @@ __hists__add_entry(struct hists *hists,
 			.map	= al->map,
 			.sym	= al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.socket	 = al->socket,
 		.cpu	 = al->cpu,
 		.cpumode = al->cpumode,
@@ -950,6 +951,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			.map = al->map,
 			.sym = al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.parent = iter->parent,
 		.raw_data = sample->raw_data,
 		.raw_size = sample->raw_size,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d880a05..d548ea5 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -209,6 +209,7 @@ struct addr_location {
 	struct thread *thread;
 	struct map    *map;
 	struct symbol *sym;
+	const char    *srcline;
 	u64	      addr;
 	char	      level;
 	u8	      filtered;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf util: Enable handling of inlined frames by default
  2017-10-19 11:38 ` [PATCH v7 5/5] perf util: enable handling of inlined frames by default Milian Wolff
@ 2017-10-25 17:21   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dsahern, mingo, tglx, peterz, milian.wolff, hpa, yao.jin, acme,
	linux-kernel, namhyung, ak, jolsa

Commit-ID:  d8a88dd243a170a226aba33e7c53704db2f82aa6
Gitweb:     https://git.kernel.org/tip/d8a88dd243a170a226aba33e7c53704db2f82aa6
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Thu, 19 Oct 2017 13:38:36 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 25 Oct 2017 10:50:47 -0300

perf util: Enable handling of inlined frames by default

Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable this
useful option by default.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 3 ++-
 tools/perf/Documentation/perf-script.txt | 3 ++-
 tools/perf/util/symbol.c                 | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 383a98d..ddde2b5 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -434,7 +434,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry is function name or file/line.
+	will be printed. Each entry is function name or file/line. Enabled by
+	default, disable with --no-inline.
 
 include::callchain-overhead-calculation.txt[]
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index bcc1ba3..25e6773 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry has function name and file/line.
+	will be printed. Each entry has function name and file/line. Enabled by
+	default, disable with --no-inline.
 
 SEE ALSO
 --------
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 066e38a..ce6993b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
 	.show_hist_headers	= true,
 	.symfs			= "",
 	.event_group		= true,
+	.inline_name		= true,
 };
 
 static enum dso_binary_type binary_type_symtab[] = {

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-25  1:46         ` Namhyung Kim
@ 2017-10-30 20:03           ` Arnaldo Carvalho de Melo
  2017-10-30 23:35             ` Namhyung Kim
  0 siblings, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-30 20:03 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Milian Wolff, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa, kernel-team

Em Wed, Oct 25, 2017 at 10:46:00AM +0900, Namhyung Kim escreveu:
> Hi Milian,
> 
> On Tue, Oct 24, 2017 at 10:51:43AM +0200, Milian Wolff wrote:
> > On Freitag, 20. Oktober 2017 07:15:33 CEST Namhyung Kim wrote:
> > > I looked into it and found a bug handling cumulative (children)
> > > entries.  For chilren entries that has no self period, the al->addr
> > > (so he->ip) ends up having an doubly-mapped address.
> > > 
> > > It seems to be there from the beginning but only affects entries that
> > > have no srclines - finding srcline itself is done using a different
> > > address but it will show the invalid address if no srcline was found.
> > > I think we should fix the commit c7405d85d7a3 ("perf tools: Update
> > > cpumode for each cumulative entry").
> > > 
> > > Could you please test the following patch works for you?
> > 
> > Sorry for the delay, nearly forgot about this mail. The patch below does help 
> > in my situation, thanks! Can you commit it please?
> 
> Sure, I'll add your Tested-by then.

Namhyung, I couldn't find a submission from you for this one, so I
tentatively added this to my perf/core branch, please let me know if you
want to reword this somehow.

- Arnaldo

commit 0485954310bf3490cb73936164ca03a0c5916773
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Fri Oct 20 14:15:33 2017 +0900

    perf callchain: Fix double mapping al->addr for children without self period
    
    Milian Wolff found a problem he described in [1] and that for him would
    get fixed:
    
    "Note how most of the large offset values are now gone. Most notably, we
    get proper srcline resolution for the random.h and complex headers."
    
    Then Namhyung found the root cause:
    
    "I looked into it and found a bug handling cumulative (children)
    entries.  For children entries that has no self period, the al->addr (so
    he->ip) ends up having an doubly-mapped address.
    
    It seems to be there from the beginning but only affects entries that
    have no srclines - finding srcline itself is done using a different
    address but it will show the invalid address if no srcline was found.  I
    think we should fix the commit c7405d85d7a3 ("perf tools: Update cpumode
    for each cumulative entry")."
    
    Reported-by: Milian Wolff <milian.wolff@kdab.com>
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Tested-by: Milian Wolff <milian.wolff@kdab.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: kernel-team@lge.com
    Fixes: c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry")
    Link: http://lkml.kernel.org/r/20171020051533.GA2746@sejong
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 3a3916934a92..837012147c7b 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1091,10 +1091,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 	al->map = node->map;
 	al->sym = node->sym;
 	al->srcline = node->srcline;
-	if (node->map)
-		al->addr = node->map->map_ip(node->map, node->ip);
-	else
-		al->addr = node->ip;
+	al->addr = node->ip;
 
 	if (al->sym == NULL) {
 		if (hide_unresolved)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-30 20:03           ` Arnaldo Carvalho de Melo
@ 2017-10-30 23:35             ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2017-10-30 23:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Milian Wolff, jolsa, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Yao Jin, Jiri Olsa, kernel-team

Hi Arnaldo,

On Mon, Oct 30, 2017 at 05:03:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 25, 2017 at 10:46:00AM +0900, Namhyung Kim escreveu:
> > Hi Milian,
> > 
> > On Tue, Oct 24, 2017 at 10:51:43AM +0200, Milian Wolff wrote:
> > > On Freitag, 20. Oktober 2017 07:15:33 CEST Namhyung Kim wrote:
> > > > I looked into it and found a bug handling cumulative (children)
> > > > entries.  For chilren entries that has no self period, the al->addr
> > > > (so he->ip) ends up having an doubly-mapped address.
> > > > 
> > > > It seems to be there from the beginning but only affects entries that
> > > > have no srclines - finding srcline itself is done using a different
> > > > address but it will show the invalid address if no srcline was found.
> > > > I think we should fix the commit c7405d85d7a3 ("perf tools: Update
> > > > cpumode for each cumulative entry").
> > > > 
> > > > Could you please test the following patch works for you?
> > > 
> > > Sorry for the delay, nearly forgot about this mail. The patch below does help 
> > > in my situation, thanks! Can you commit it please?
> > 
> > Sure, I'll add your Tested-by then.
> 
> Namhyung, I couldn't find a submission from you for this one, so I
> tentatively added this to my perf/core branch, please let me know if you
> want to reword this somehow.

I already sent it:

  https://lkml.org/lkml/2017/10/24/1130

But I'm also ok with yours (with small changes below).


> 
> commit 0485954310bf3490cb73936164ca03a0c5916773
> Author: Namhyung Kim <namhyung@kernel.org>
> Date:   Fri Oct 20 14:15:33 2017 +0900
> 
>     perf callchain: Fix double mapping al->addr for children without self period
>     
>     Milian Wolff found a problem he described in [1] and that for him would

Where's the link for [1]?

>     get fixed:
>     
>     "Note how most of the large offset values are now gone. Most notably, we
>     get proper srcline resolution for the random.h and complex headers."
>     
>     Then Namhyung found the root cause:
>     
>     "I looked into it and found a bug handling cumulative (children)
>     entries.  For children entries that has no self period, the al->addr (so

s/has/have/

Thanks,
Namhyung


>     he->ip) ends up having an doubly-mapped address.
>     
>     It seems to be there from the beginning but only affects entries that
>     have no srclines - finding srcline itself is done using a different
>     address but it will show the invalid address if no srcline was found.  I
>     think we should fix the commit c7405d85d7a3 ("perf tools: Update cpumode
>     for each cumulative entry")."
>     
>     Reported-by: Milian Wolff <milian.wolff@kdab.com>
>     Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>     Tested-by: Milian Wolff <milian.wolff@kdab.com>
>     Cc: Jin Yao <yao.jin@linux.intel.com>
>     Cc: Jiri Olsa <jolsa@redhat.com>
>     Cc: kernel-team@lge.com
>     Fixes: c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry")
>     Link: http://lkml.kernel.org/r/20171020051533.GA2746@sejong
>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 3a3916934a92..837012147c7b 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -1091,10 +1091,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
>  	al->map = node->map;
>  	al->sym = node->sym;
>  	al->srcline = node->srcline;
> -	if (node->map)
> -		al->addr = node->map->map_ip(node->map, node->ip);
> -	else
> -		al->addr = node->ip;
> +	al->addr = node->ip;
>  
>  	if (al->sym == NULL) {
>  		if (hide_unresolved)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf callchain: Fix double mapping al->addr for children without self period
  2017-10-20  5:15     ` Namhyung Kim
  2017-10-24  8:51       ` Milian Wolff
@ 2017-11-03 14:21       ` tip-bot for Namhyung Kim
  1 sibling, 0 replies; 50+ messages in thread
From: tip-bot for Namhyung Kim @ 2017-11-03 14:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, tglx, mingo, yao.jin, hpa, namhyung, jolsa,
	milian.wolff

Commit-ID:  d6332a176b869df1839abb26c8f80026a66d21d6
Gitweb:     https://git.kernel.org/tip/d6332a176b869df1839abb26c8f80026a66d21d6
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Fri, 20 Oct 2017 14:15:33 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 31 Oct 2017 16:14:50 -0300

perf callchain: Fix double mapping al->addr for children without self period

Milian Wolff found a problem he described in [1] and that for him would
get fixed:

"Note how most of the large offset values are now gone. Most notably, we
get proper srcline resolution for the random.h and complex headers."

Then Namhyung found the root cause:

"I looked into it and found a bug handling cumulative (children)
entries.  For children entries that have no self period, the al->addr (so
he->ip) ends up having an doubly-mapped address.

It seems to be there from the beginning but only affects entries that
have no srclines - finding srcline itself is done using a different
address but it will show the invalid address if no srcline was found.  I
think we should fix the commit c7405d85d7a3 ("perf tools: Update cpumode
for each cumulative entry")."

[1] https://lkml.kernel.org/r/20171018185350.14893-7-milian.wolff@kdab.com

Reported-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: kernel-team@lge.com
Fixes: c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry")
Link: https://lkml.kernel.org/r/20171020051533.GA2746@sejong
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 3a39169..8370121 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1091,10 +1091,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 	al->map = node->map;
 	al->sym = node->sym;
 	al->srcline = node->srcline;
-	if (node->map)
-		al->addr = node->map->map_ip(node->map, node->ip);
-	else
-		al->addr = node->ip;
+	al->addr = node->ip;
 
 	if (al->sym == NULL) {
 		if (hide_unresolved)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2017-11-03 14:23 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-18 18:53 [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Milian Wolff
2017-10-18 18:53 ` [PATCH v6 1/6] perf report: properly handle branch count in match_chain Milian Wolff
2017-10-18 22:41   ` Andi Kleen
2017-10-19 10:59     ` Milian Wolff
2017-10-19 13:55       ` Andi Kleen
2017-10-19 15:01         ` Namhyung Kim
2017-10-20 10:21           ` Milian Wolff
2017-10-20 11:38             ` Milian Wolff
2017-10-20 13:39               ` Arnaldo Carvalho de Melo
2017-10-23  5:19                 ` Namhyung Kim
2017-10-20 15:22   ` Arnaldo Carvalho de Melo
2017-10-20 19:52     ` Milian Wolff
2017-10-25 17:20   ` [tip:perf/core] perf report: Properly handle branch count in match_chain() tip-bot for Milian Wolff
2017-10-18 18:53 ` [PATCH v6 2/6] perf report: cache failed lookups of inlined frames Milian Wolff
2017-10-18 18:53 ` [PATCH v6 3/6] perf report: cache srclines for callchain nodes Milian Wolff
2017-10-18 18:53 ` [PATCH v6 4/6] perf report: use srcline from callchain for hist entries Milian Wolff
2017-10-18 18:53 ` [PATCH v6 5/6] perf util: enable handling of inlined frames by default Milian Wolff
2017-10-18 18:53 ` [PATCH v6 6/6] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
2017-10-19 10:54   ` Milian Wolff
2017-10-20  5:15     ` Namhyung Kim
2017-10-24  8:51       ` Milian Wolff
2017-10-25  1:46         ` Namhyung Kim
2017-10-30 20:03           ` Arnaldo Carvalho de Melo
2017-10-30 23:35             ` Namhyung Kim
2017-11-03 14:21       ` [tip:perf/core] perf callchain: Fix double mapping al->addr for children without self period tip-bot for Namhyung Kim
2017-10-18 22:43 ` [PATCH v6 0/6] generate full callchain cursor entries for inlined frames Andi Kleen
2017-10-20 15:43   ` Arnaldo Carvalho de Melo
2017-10-19 11:38 [PATCH v7 0/5] " Milian Wolff
2017-10-19 11:38 ` [PATCH v7 1/5] perf report: properly handle branch count in match_chain Milian Wolff
2017-10-19 11:42   ` Milian Wolff
2017-10-23 15:15     ` Andi Kleen
2017-10-23 18:39       ` Milian Wolff
2017-10-23 20:39         ` Andi Kleen
2017-10-19 11:38 ` [PATCH v7 2/5] perf report: cache failed lookups of inlined frames Milian Wolff
2017-10-25 17:20   ` [tip:perf/core] perf report: Cache " tip-bot for Milian Wolff
2017-10-19 11:38 ` [PATCH v7 3/5] perf report: cache srclines for callchain nodes Milian Wolff
2017-10-25 17:20   ` [tip:perf/core] perf report: Cache " tip-bot for Milian Wolff
2017-10-19 11:38 ` [PATCH v7 4/5] perf report: use srcline from callchain for hist entries Milian Wolff
2017-10-25 17:21   ` [tip:perf/core] perf report: Use " tip-bot for Milian Wolff
2017-10-19 11:38 ` [PATCH v7 5/5] perf util: enable handling of inlined frames by default Milian Wolff
2017-10-25 17:21   ` [tip:perf/core] perf util: Enable " tip-bot for Milian Wolff
2017-10-20 16:15 ` [PATCH v7 0/5] generate full callchain cursor entries for inlined frames Arnaldo Carvalho de Melo
2017-10-20 20:21   ` Milian Wolff
2017-10-23 14:29     ` Arnaldo Carvalho de Melo
2017-10-23 19:04       ` Milian Wolff
2017-10-23 19:04     ` Arnaldo Carvalho de Melo
2017-10-23 19:39       ` Milian Wolff
2017-10-23 22:43         ` Arnaldo Carvalho de Melo
2017-10-24 13:27         ` Arnaldo Carvalho de Melo
2017-10-25  2:09           ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.