linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/16] generate full callchain cursor entries for inlined frames
@ 2017-10-09 20:32 Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 01/16] perf report: remove code to handle inline frames from browsers Milian Wolff
                   ` (15 more replies)
  0 siblings, 16 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:32 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao; +Cc: Linux-kernel, linux-perf-users, Milian Wolff

This series of patches completely reworks the way inline frames are handled.
Instead of querying for the inline nodes on-demand in the individual tools,
we now create proper callchain nodes for inlined frames. The advantages this
approach brings are numerous:

- less duplicated code in the individual browser
- aggregated cost for inlined frames for the --children top-down list
- various bug fixes that arose from querying for a srcline/symbol based on
  the IP of a sample, which will always point to the last inlined frame
  instead of the corresponding non-inlined frame
- overall much better support for visualizing cost for heavily-inlined C++
  code, which simply was confusing and unreliably before
- srcline honors the global setting as to whether full paths or basenames
  should be shown
- caches for inlined frames and srcline information, which allow us to
  enable inline frame handling by default

For comparison, below lists the output before and after for `perf script`
and `perf report`. The example file I used to generate the perf data is:

~~~~~
$ cat inlining.cpp
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
$ g++ -O2 -g -o inlining inlining.cpp
$ perf record --call-graph dwarf ./inlining
~~~~~

Now, the (broken) status-quo looks like this. Look for "NOTE:" to see some
of my comments that outline the various issues I'm trying to solve by this
patch series.

~~~~~
$ perf script --inline
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                         std::__complex_abs
                         std::abs<double>
                         std::_Norm_helper<true>::_S_do_it<double>
                         std::norm<double>
                         main
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
# NOTE: the above inlined stack is confusing: the a4a is an address into main,
#       which is the non-inlined symbol. the entry with the address should be
#       at the end of the stack, where it's actually duplicated once more but
#       there it's missing the address
...
$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio
...
             --38.86%--_start
                       __libc_start_main
                       |
                       |--15.68%--main random.tcc:3326
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
                       |
                       |--10.36%--main random.h:143
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:143 (inline)
                       |
                       |--5.66%--main random.tcc:3332
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
...
# NOTE: the grouping is totally off because the first and last frame of the
        inline nodes is completely bogus, since the IP is used to find the sym/srcline
        which is different from the actual inlined sym/srcline.
        also, the code currently displays either the inlined function name or
        the corresponding filename (but in full length, instead of just the basename).

$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--main random.tcc:3326
            |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
            |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
            |          __libc_start_main
            |          _start
...
# NOTE: the srcline for main is wrong, it should be inlining.cpp:14,
        i.e. what is displayed in the line below (see also perf script issue above)
~~~~~

Afterwards, all of the above issues are resolved (and inlined frames are
displayed by default):

~~~~~
$ perf script
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
...
# NOTE: only one main entry, at the correct position.
        we do display the (repeated) instruction pointer as that ensures
        interoperability with e.g. the stackcollapse-perf.pl script

$ perf report -s sym -g srcline -i perf.inlining.data --stdio
...
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664 (inlined)
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
            |          std::abs<double> complex:597 (inlined)
            |          std::__complex_abs complex:589 (inlined)
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
            |
             --38.86%--_start
                       __libc_start_main
                       |
                       |--38.19%--main inlining.cpp:14
                       |          |
                       |          |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
                       |          |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
                       |          |          |
                       |          |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
                       |          |                     |
                       |          |                     |--17.91%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 (inlined)
                       |          |                     |          |
                       |          |                     |           --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 (inlined)
                       |          |                     |                     std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 (inlined)
                       |          |                     |                     |
                       |          |                     |                     |--10.36%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 (inlined)
                       |          |                     |                     |
                       |          |                     |                      --1.88%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:141 (inlined)
                       |          |                     |
                       |          |                     |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
                       |          |                     |
                       |          |                      --0.79%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3335 (inlined)
                       |          |
                       |           --1.99%--std::norm<double> complex:664 (inlined)
                       |                     std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
                       |                     std::abs<double> complex:597 (inlined)
                       |                     std::__complex_abs complex:589 (inlined)
                       |
                        --0.67%--main inlining.cpp:13
...

# NOTE: still somewhat confusing due to the _start and __libc_start_main frames
        that actually are *above* the main frame. But at least the stuff below
        properly splits up and shows that mutiple functions got inlined into
        inlining.cpp:14, not just one as before.

$ perf report -s sym -g srcline -i perf.inlining.data --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
            |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
            |          main inlining.cpp:14
            |          __libc_start_main
            |          _start
...
# NOTE: the first and last entry of the inline stack have the correct symbol and srcline now
        both function and srcline is shown, as well as the (inlined) suffix
        only the basename of the srcline is shown

v5 attends to Namhyung's code review. Most notably, it fixes a use-after-free
   crash. Additionally, srcline resolution for hist entries now also works
   correctly, when inline frame resolution is disabled.

v4 splits the patch to create full callchain nodes for inline frames up further
   as suggested by Jiri. It also removes C99 comments and initializes the
   rb_root properly.

v3 splits the initial patch up into two to simplify reviewing. It also adds a
   comment to clarify the lifetime handling of fake symbols and aliased non-fake
   symbols, based on the feedback by Namhyung.

v2 fixes some issues reported by Namhyung or found by me in further
testing, adds caching and enables inline frames by default.

Milian Wolff (16):
  perf report: remove code to handle inline frames from browsers
  perf util: store srcline in callchain_cursor_node
  perf util: refactor inline_list to operate on symbols
  perf util: refactor inline_list to store srcline string directly
  perf report: create real callchain entries for inlined frames
  perf report: fall-back to function name comparison for -g srcline
  perf report: mark inlined frames in output by " (inlined)" suffix
  perf script: mark inlined frames and do not print DSO for them
  perf report: compare symbol name for inlined frames when matching
  perf report: compare symbol name for inlined frames when sorting
  perf report: properly handle branch count in match_chain
  perf report: cache failed lookups of inlined frames
  perf report: cache srclines for callchain nodes
  perf report: use srcline from callchain for hist entries
  perf util: enable handling of inlined frames by default
  perf util: use correct IP mapping to find srcline for hist entry

 tools/perf/Documentation/perf-report.txt |   3 +-
 tools/perf/Documentation/perf-script.txt |   3 +-
 tools/perf/ui/browsers/hists.c           | 180 ++-------------------
 tools/perf/ui/stdio/hist.c               |  77 +--------
 tools/perf/util/callchain.c              | 152 +++++++++---------
 tools/perf/util/callchain.h              |   6 +-
 tools/perf/util/dso.c                    |   7 +
 tools/perf/util/dso.h                    |   2 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/evsel_fprintf.c          |  37 +----
 tools/perf/util/hist.c                   |   7 +-
 tools/perf/util/machine.c                |  65 +++++++-
 tools/perf/util/sort.c                   |   8 +-
 tools/perf/util/sort.h                   |   1 -
 tools/perf/util/srcline.c                | 268 +++++++++++++++++++++++++------
 tools/perf/util/srcline.h                |  26 ++-
 tools/perf/util/symbol.c                 |   1 +
 tools/perf/util/symbol.h                 |   2 +
 18 files changed, 426 insertions(+), 420 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 01/16] perf report: remove code to handle inline frames from browsers
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
@ 2017-10-09 20:32 ` Milian Wolff
  2017-10-25 17:15   ` [tip:perf/core] perf report: Remove " tip-bot for Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 02/16] perf util: store srcline in callchain_cursor_node Milian Wolff
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:32 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

The follow-up commits will make inline frames first-class citizens
in the callchain, thereby obsoleting all of this special code.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/ui/browsers/hists.c  | 180 +++-------------------------------------
 tools/perf/ui/stdio/hist.c      |  77 +----------------
 tools/perf/util/evsel_fprintf.c |  32 -------
 tools/perf/util/hist.c          |   5 --
 tools/perf/util/sort.h          |   1 -
 5 files changed, 13 insertions(+), 282 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 13dfb0a0bdeb..3a433f370e7f 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -154,57 +154,9 @@ static void callchain_list__set_folding(struct callchain_list *cl, bool unfold)
 	cl->unfolded = unfold ? cl->has_children : false;
 }
 
-static struct inline_node *inline_node__create(struct map *map, u64 ip)
-{
-	struct dso *dso;
-	struct inline_node *node;
-
-	if (map == NULL)
-		return NULL;
-
-	dso = map->dso;
-	if (dso == NULL)
-		return NULL;
-
-	node = dso__parse_addr_inlines(dso,
-				       map__rip_2objdump(map, ip));
-
-	return node;
-}
-
-static int inline__count_rows(struct inline_node *node)
-{
-	struct inline_list *ilist;
-	int i = 0;
-
-	if (node == NULL)
-		return 0;
-
-	list_for_each_entry(ilist, &node->val, list) {
-		if ((ilist->filename != NULL) || (ilist->funcname != NULL))
-			i++;
-	}
-
-	return i;
-}
-
-static int callchain_list__inline_rows(struct callchain_list *chain)
-{
-	struct inline_node *node;
-	int rows;
-
-	node = inline_node__create(chain->ms.map, chain->ip);
-	if (node == NULL)
-		return 0;
-
-	rows = inline__count_rows(node);
-	inline_node__delete(node);
-	return rows;
-}
-
 static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
 {
-	int n = 0, inline_rows;
+	int n = 0;
 	struct rb_node *nd;
 
 	for (nd = rb_first(&node->rb_root); nd; nd = rb_next(nd)) {
@@ -215,12 +167,6 @@ static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
 		list_for_each_entry(chain, &child->val, list) {
 			++n;
 
-			if (symbol_conf.inline_name) {
-				inline_rows =
-					callchain_list__inline_rows(chain);
-				n += inline_rows;
-			}
-
 			/* We need this because we may not have children */
 			folded_sign = callchain_list__folded(chain);
 			if (folded_sign == '+')
@@ -272,7 +218,7 @@ static int callchain_node__count_rows(struct callchain_node *node)
 {
 	struct callchain_list *chain;
 	bool unfolded = false;
-	int n = 0, inline_rows;
+	int n = 0;
 
 	if (callchain_param.mode == CHAIN_FLAT)
 		return callchain_node__count_flat_rows(node);
@@ -281,10 +227,6 @@ static int callchain_node__count_rows(struct callchain_node *node)
 
 	list_for_each_entry(chain, &node->val, list) {
 		++n;
-		if (symbol_conf.inline_name) {
-			inline_rows = callchain_list__inline_rows(chain);
-			n += inline_rows;
-		}
 
 		unfolded = chain->unfolded;
 	}
@@ -432,19 +374,6 @@ static void hist_entry__init_have_children(struct hist_entry *he)
 	he->init_have_children = true;
 }
 
-static void hist_entry_init_inline_node(struct hist_entry *he)
-{
-	if (he->inline_node)
-		return;
-
-	he->inline_node = inline_node__create(he->ms.map, he->ip);
-
-	if (he->inline_node == NULL)
-		return;
-
-	he->has_children = true;
-}
-
 static bool hist_browser__toggle_fold(struct hist_browser *browser)
 {
 	struct hist_entry *he = browser->he_selection;
@@ -476,12 +405,8 @@ static bool hist_browser__toggle_fold(struct hist_browser *browser)
 
 		if (he->unfolded) {
 			if (he->leaf)
-				if (he->inline_node)
-					he->nr_rows = inline__count_rows(
-							he->inline_node);
-				else
-					he->nr_rows = callchain__count_rows(
-							&he->sorted_chain);
+				he->nr_rows = callchain__count_rows(
+						&he->sorted_chain);
 			else
 				he->nr_rows = hierarchy_count_rows(browser, he, false);
 
@@ -841,71 +766,6 @@ static bool hist_browser__check_dump_full(struct hist_browser *browser __maybe_u
 
 #define LEVEL_OFFSET_STEP 3
 
-static int hist_browser__show_inline(struct hist_browser *browser,
-				     struct inline_node *node,
-				     unsigned short row,
-				     int offset)
-{
-	struct inline_list *ilist;
-	char buf[1024];
-	int color, width, first_row;
-
-	first_row = row;
-	width = browser->b.width - (LEVEL_OFFSET_STEP + 2);
-	list_for_each_entry(ilist, &node->val, list) {
-		if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
-			color = HE_COLORSET_NORMAL;
-			if (ui_browser__is_current_entry(&browser->b, row))
-				color = HE_COLORSET_SELECTED;
-
-			if (callchain_param.key == CCKEY_ADDRESS ||
-			    callchain_param.key == CCKEY_SRCLINE) {
-				if (ilist->filename != NULL)
-					scnprintf(buf, sizeof(buf),
-						  "%s:%d (inline)",
-						  ilist->filename,
-						  ilist->line_nr);
-				else
-					scnprintf(buf, sizeof(buf), "??");
-			} else if (ilist->funcname != NULL)
-				scnprintf(buf, sizeof(buf), "%s (inline)",
-					  ilist->funcname);
-			else if (ilist->filename != NULL)
-				scnprintf(buf, sizeof(buf),
-					  "%s:%d (inline)",
-					  ilist->filename,
-					  ilist->line_nr);
-			else
-				scnprintf(buf, sizeof(buf), "??");
-
-			ui_browser__set_color(&browser->b, color);
-			hist_browser__gotorc(browser, row, 0);
-			ui_browser__write_nstring(&browser->b, " ",
-				LEVEL_OFFSET_STEP + offset);
-			ui_browser__write_nstring(&browser->b, buf, width);
-			row++;
-		}
-	}
-
-	return row - first_row;
-}
-
-static size_t show_inline_list(struct hist_browser *browser, struct map *map,
-			       u64 ip, int row, int offset)
-{
-	struct inline_node *node;
-	int ret;
-
-	node = inline_node__create(map, ip);
-	if (node == NULL)
-		return 0;
-
-	ret = hist_browser__show_inline(browser, node, row, offset);
-
-	inline_node__delete(node);
-	return ret;
-}
-
 static int hist_browser__show_callchain_list(struct hist_browser *browser,
 					     struct callchain_node *node,
 					     struct callchain_list *chain,
@@ -917,7 +777,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	char bf[1024], *alloc_str;
 	char buf[64], *alloc_str2;
 	const char *str;
-	int inline_rows = 0, ret = 1;
+	int ret = 1;
 
 	if (arg->row_offset != 0) {
 		arg->row_offset--;
@@ -954,12 +814,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	free(alloc_str);
 	free(alloc_str2);
 
-	if (symbol_conf.inline_name) {
-		inline_rows = show_inline_list(browser, chain->ms.map,
-					       chain->ip, row + 1, offset);
-	}
-
-	return ret + inline_rows;
+	return ret;
 }
 
 static bool check_percent_display(struct rb_node *node, u64 parent_total)
@@ -1383,12 +1238,6 @@ static int hist_browser__show_entry(struct hist_browser *browser,
 		folded_sign = hist_entry__folded(entry);
 	}
 
-	if (symbol_conf.inline_name &&
-	    (!entry->has_children)) {
-		hist_entry_init_inline_node(entry);
-		folded_sign = hist_entry__folded(entry);
-	}
-
 	if (row_offset == 0) {
 		struct hpp_arg arg = {
 			.b		= &browser->b,
@@ -1420,8 +1269,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
 			}
 
 			if (first) {
-				if (symbol_conf.use_callchain ||
-					symbol_conf.inline_name) {
+				if (symbol_conf.use_callchain) {
 					ui_browser__printf(&browser->b, "%c ", folded_sign);
 					width -= 2;
 				}
@@ -1463,15 +1311,11 @@ static int hist_browser__show_entry(struct hist_browser *browser,
 			.is_current_entry = current_entry,
 		};
 
-		if (entry->inline_node)
-			printed += hist_browser__show_inline(browser,
-					entry->inline_node, row, 0);
-		else
-			printed += hist_browser__show_callchain(browser,
-					entry, 1, row,
-					hist_browser__show_callchain_entry,
-					&arg,
-					hist_browser__check_output_full);
+		printed += hist_browser__show_callchain(browser,
+				entry, 1, row,
+				hist_browser__show_callchain_entry,
+				&arg,
+				hist_browser__check_output_full);
 	}
 
 	return printed;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 8bdb7a500181..b6b9baac0e3b 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -21,64 +21,6 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int left_margin)
 	return ret;
 }
 
-static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
-			      int depth, int depth_mask, FILE *fp)
-{
-	struct dso *dso;
-	struct inline_node *node;
-	struct inline_list *ilist;
-	int ret = 0, i;
-
-	if (map == NULL)
-		return 0;
-
-	dso = map->dso;
-	if (dso == NULL)
-		return 0;
-
-	node = dso__parse_addr_inlines(dso,
-				       map__rip_2objdump(map, ip));
-	if (node == NULL)
-		return 0;
-
-	list_for_each_entry(ilist, &node->val, list) {
-		if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
-			ret += callchain__fprintf_left_margin(fp, left_margin);
-
-			for (i = 0; i < depth; i++) {
-				if (depth_mask & (1 << i))
-					ret += fprintf(fp, "|");
-				else
-					ret += fprintf(fp, " ");
-				ret += fprintf(fp, "          ");
-			}
-
-			if (callchain_param.key == CCKEY_ADDRESS ||
-			    callchain_param.key == CCKEY_SRCLINE) {
-				if (ilist->filename != NULL)
-					ret += fprintf(fp, "%s:%d (inline)",
-						       ilist->filename,
-						       ilist->line_nr);
-				else
-					ret += fprintf(fp, "??");
-			} else if (ilist->funcname != NULL)
-				ret += fprintf(fp, "%s (inline)",
-					       ilist->funcname);
-			else if (ilist->filename != NULL)
-				ret += fprintf(fp, "%s:%d (inline)",
-					       ilist->filename,
-					       ilist->line_nr);
-			else
-				ret += fprintf(fp, "??");
-
-			ret += fprintf(fp, "\n");
-		}
-	}
-
-	inline_node__delete(node);
-	return ret;
-}
-
 static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
 					  int left_margin)
 {
@@ -137,9 +79,6 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
 	fputc('\n', fp);
 	free(alloc_str);
 
-	if (symbol_conf.inline_name)
-		ret += inline__fprintf(chain->ms.map, chain->ip,
-				       left_margin, depth, depth_mask, fp);
 	return ret;
 }
 
@@ -314,13 +253,6 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;
-
-			if (symbol_conf.inline_name)
-				ret += inline__fprintf(chain->ms.map,
-						       chain->ip,
-						       left_margin,
-						       0, 0,
-						       fp);
 		}
 		root = &cnode->rb_root;
 	}
@@ -600,7 +532,6 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
 {
 	int ret;
 	int callchain_ret = 0;
-	int inline_ret = 0;
 	struct perf_hpp hpp = {
 		.buf		= bf,
 		.size		= size,
@@ -622,13 +553,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
 		callchain_ret = hist_entry_callchain__fprintf(he, total_period,
 							      0, fp);
 
-	if (callchain_ret == 0 && symbol_conf.inline_name) {
-		inline_ret = inline__fprintf(he->ms.map, he->ip, 0, 0, 0, fp);
-		ret += inline_ret;
-		if (inline_ret > 0)
-			ret += fprintf(fp, "\n");
-	} else
-		ret += callchain_ret;
+	ret += callchain_ret;
 
 	return ret;
 }
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 583f3a602506..f2c6c5ee11e8 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -169,38 +169,6 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 			if (!print_oneline)
 				printed += fprintf(fp, "\n");
 
-			if (symbol_conf.inline_name && node->map) {
-				struct inline_node *inode;
-
-				addr = map__rip_2objdump(node->map, node->ip),
-				inode = dso__parse_addr_inlines(node->map->dso, addr);
-
-				if (inode) {
-					struct inline_list *ilist;
-
-					list_for_each_entry(ilist, &inode->val, list) {
-						if (print_arrow)
-							printed += fprintf(fp, " <-");
-
-						/* IP is same, just skip it */
-						if (print_ip)
-							printed += fprintf(fp, "%c%16s",
-									   s, "");
-						if (print_sym)
-							printed += fprintf(fp, " %s",
-									   ilist->funcname);
-						if (print_srcline)
-							printed += fprintf(fp, "\n  %s:%d",
-									   ilist->filename,
-									   ilist->line_nr);
-						if (!print_oneline)
-							printed += fprintf(fp, "\n");
-					}
-
-					inline_node__delete(inode);
-				}
-			}
-
 			if (symbol_conf.bt_stop_list &&
 			    node->sym &&
 			    strlist__has_entry(symbol_conf.bt_stop_list,
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e60d8d8ea4c2..b0fa9c217e1c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1141,11 +1141,6 @@ void hist_entry__delete(struct hist_entry *he)
 		zfree(&he->mem_info);
 	}
 
-	if (he->inline_node) {
-		inline_node__delete(he->inline_node);
-		he->inline_node = NULL;
-	}
-
 	zfree(&he->stat_acc);
 	free_srcline(he->srcline);
 	if (he->srcfile && he->srcfile[0])
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index f36dc4980a6c..507d096aee4e 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -129,7 +129,6 @@ struct hist_entry {
 	};
 	char			*srcline;
 	char			*srcfile;
-	struct inline_node	*inline_node;
 	struct symbol		*parent;
 	struct branch_info	*branch_info;
 	struct hists		*hists;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 02/16] perf util: store srcline in callchain_cursor_node
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 01/16] perf report: remove code to handle inline frames from browsers Milian Wolff
@ 2017-10-09 20:32 ` Milian Wolff
  2017-10-25 17:16   ` [tip:perf/core] perf callchain: Store " tip-bot for Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 03/16] perf util: refactor inline_list to operate on symbols Milian Wolff
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:32 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

This is mostly a preparation to enable the creation of
full callchain nodes for inline frames. Such frames will
reference the IP of the non-inlined frame, but hold the
symbol and srcline for an inlined location. As such, we
won't be able to query the srcline on-demand based on the
IP alone. Instead, we will leverage the functionality provided
by this patch here, and store the srcline for the inlined nodes
in the new srcline member of callchain_cursor_node.

Note that this patch on its own leaks the srcline, as there is
no free_callchain_cursor_node or similar. A future patch will
add caching of the srcline and handle deletion properly.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 31 +++++++++----------------------
 tools/perf/util/callchain.h |  6 ++++--
 tools/perf/util/machine.c   | 18 ++++++++++++++++--
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index be09d77cade0..c208cd5a5e83 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -566,6 +566,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 		call->ip = cursor_node->ip;
 		call->ms.sym = cursor_node->sym;
 		call->ms.map = map__get(cursor_node->map);
+		call->srcline = cursor_node->srcline;
 
 		if (cursor_node->branch) {
 			call->branch_count = 1;
@@ -647,20 +648,11 @@ enum match_result {
 static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 					     struct callchain_list *cnode)
 {
-	char *left = NULL;
-	char *right = NULL;
+	const char *left = cnode->srcline;
+	const char *right = node->srcline;
 	enum match_result ret = MATCH_EQ;
 	int cmp;
 
-	if (cnode->ms.map)
-		left = get_srcline(cnode->ms.map->dso,
-				 map__rip_2objdump(cnode->ms.map, cnode->ip),
-				 cnode->ms.sym, true, false);
-	if (node->map)
-		right = get_srcline(node->map->dso,
-				  map__rip_2objdump(node->map, node->ip),
-				  node->sym, true, false);
-
 	if (left && right)
 		cmp = strcmp(left, right);
 	else if (!left && right)
@@ -675,8 +667,6 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 	if (cmp != 0)
 		ret = cmp < 0 ? MATCH_LT : MATCH_GT;
 
-	free_srcline(left);
-	free_srcline(right);
 	return ret;
 }
 
@@ -965,7 +955,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
 	list_for_each_entry_safe(list, next_list, &src->val, list) {
 		callchain_cursor_append(cursor, list->ip,
 					list->ms.map, list->ms.sym,
-					false, NULL, 0, 0, 0);
+					false, NULL, 0, 0, 0, list->srcline);
 		list_del(&list->list);
 		map__zput(list->ms.map);
 		free(list);
@@ -1005,7 +995,8 @@ int callchain_merge(struct callchain_cursor *cursor,
 int callchain_cursor_append(struct callchain_cursor *cursor,
 			    u64 ip, struct map *map, struct symbol *sym,
 			    bool branch, struct branch_flags *flags,
-			    int nr_loop_iter, u64 iter_cycles, u64 branch_from)
+			    int nr_loop_iter, u64 iter_cycles, u64 branch_from,
+			    const char *srcline)
 {
 	struct callchain_cursor_node *node = *cursor->last;
 
@@ -1024,6 +1015,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 	node->branch = branch;
 	node->nr_loop_iter = nr_loop_iter;
 	node->iter_cycles = iter_cycles;
+	node->srcline = srcline;
 
 	if (flags)
 		memcpy(&node->branch_flags, flags,
@@ -1111,12 +1103,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
-		if (show_srcline && cl->ms.map && !cl->srcline)
-			cl->srcline = get_srcline(cl->ms.map->dso,
-						  map__rip_2objdump(cl->ms.map,
-								    cl->ip),
-						  cl->ms.sym, false, show_addr);
-		if (cl->srcline)
+		if (show_srcline && cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s",
 					cl->ms.sym->name, cl->srcline);
 		else
@@ -1528,7 +1515,7 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
 					     node->branch, &node->branch_flags,
 					     node->nr_loop_iter,
 					     node->iter_cycles,
-					     node->branch_from);
+					     node->branch_from, node->srcline);
 		if (rc)
 			break;
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 1ed6fc61d0a5..8f67b681cde9 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -121,7 +121,7 @@ struct callchain_list {
 	u64			iter_count;
 	u64			iter_cycles;
 	struct branch_type_stat brtype_stat;
-	char		       *srcline;
+	const char		*srcline;
 	struct list_head	list;
 };
 
@@ -135,6 +135,7 @@ struct callchain_cursor_node {
 	u64				ip;
 	struct map			*map;
 	struct symbol			*sym;
+	const char			*srcline;
 	bool				branch;
 	struct branch_flags		branch_flags;
 	u64				branch_from;
@@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
 int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
 			    struct map *map, struct symbol *sym,
 			    bool branch, struct branch_flags *flags,
-			    int nr_loop_iter, u64 iter_cycles, u64 branch_from);
+			    int nr_loop_iter, u64 iter_cycles, u64 branch_from,
+			    const char *srcline);
 
 /* Close a cursor writing session. Initialize for the reader */
 static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7c3aa479201a..a37e1c056415 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1709,6 +1709,15 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 	return mi;
 }
 
+static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
+{
+	if (!map || callchain_param.key == CCKEY_FUNCTION)
+		return NULL;
+
+	return get_srcline(map->dso, map__rip_2objdump(map, ip),
+			   sym, false, callchain_param.key == CCKEY_ADDRESS);
+}
+
 struct iterations {
 	int nr_loop_iter;
 	u64 cycles;
@@ -1728,6 +1737,7 @@ static int add_callchain_ip(struct thread *thread,
 	struct addr_location al;
 	int nr_loop_iter = 0;
 	u64 iter_cycles = 0;
+	const char *srcline = NULL;
 
 	al.filtered = 0;
 	al.sym = NULL;
@@ -1783,9 +1793,10 @@ static int add_callchain_ip(struct thread *thread,
 		iter_cycles = iter->cycles;
 	}
 
+	srcline = callchain_srcline(al.map, al.sym, al.addr);
 	return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
 				       branch, flags, nr_loop_iter,
-				       iter_cycles, branch_from);
+				       iter_cycles, branch_from, srcline);
 }
 
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -2101,12 +2112,15 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 static int unwind_entry(struct unwind_entry *entry, void *arg)
 {
 	struct callchain_cursor *cursor = arg;
+	const char *srcline = NULL;
 
 	if (symbol_conf.hide_unresolved && entry->sym == NULL)
 		return 0;
+
+	srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
 	return callchain_cursor_append(cursor, entry->ip,
 				       entry->map, entry->sym,
-				       false, NULL, 0, 0, 0);
+				       false, NULL, 0, 0, 0, srcline);
 }
 
 static int thread__resolve_callchain_unwind(struct thread *thread,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 03/16] perf util: refactor inline_list to operate on symbols
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 01/16] perf report: remove code to handle inline frames from browsers Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 02/16] perf util: store srcline in callchain_cursor_node Milian Wolff
@ 2017-10-09 20:32 ` Milian Wolff
  2017-10-25 17:16   ` [tip:perf/core] perf callchain: Refactor " tip-bot for Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 04/16] perf util: refactor inline_list to store srcline string directly Milian Wolff
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:32 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff, Jiri Olsa,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

This is a requirement to create real callchain entries for inlined
frames.

Since the list of inlines usually contains the target symbol too,
i.e. the location where the frames get inlined to, we alias that
symbol and reuse it as-is is. This ensures that other dependent
functionality keeps working, most notably annotation of the
target frames.

For all other entries in the inline_list, a fake symbol is created.
These are marked by new 'inlined' member which is set to true. Only
those symbols are managed by the inline_list and get freed when
the inline_list is deleted from within inline_node__delete.

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/srcline.c | 93 ++++++++++++++++++++++++++++++++---------------
 tools/perf/util/srcline.h |  7 +++-
 tools/perf/util/symbol.h  |  1 +
 3 files changed, 69 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ed8e8d2de942..c0af61b355ed 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,29 +33,19 @@ static const char *dso__name(struct dso *dso)
 	return dso_name;
 }
 
-static int inline_list__append(char *filename, char *funcname, int line_nr,
-			       struct inline_node *node, struct dso *dso)
+static int inline_list__append(struct symbol *symbol, char *filename,
+			       int line_nr, struct inline_node *node)
 {
 	struct inline_list *ilist;
-	char *demangled;
 
 	ilist = zalloc(sizeof(*ilist));
 	if (ilist == NULL)
 		return -1;
 
+	ilist->symbol = symbol;
 	ilist->filename = filename;
 	ilist->line_nr = line_nr;
 
-	if (dso != NULL) {
-		demangled = dso__demangle_sym(dso, 0, funcname);
-		if (demangled == NULL) {
-			ilist->funcname = funcname;
-		} else {
-			ilist->funcname = demangled;
-			free(funcname);
-		}
-	}
-
 	if (callchain_param.order == ORDER_CALLEE)
 		list_add_tail(&ilist->list, &node->val);
 	else
@@ -206,19 +196,56 @@ static void addr2line_cleanup(struct a2l_data *a2l)
 
 #define MAX_INLINE_NEST 1024
 
+static struct symbol *new_inline_sym(struct dso *dso,
+				     struct symbol *base_sym,
+				     const char *funcname)
+{
+	struct symbol *inline_sym;
+	char *demangled = NULL;
+
+	if (dso) {
+		demangled = dso__demangle_sym(dso, 0, funcname);
+		if (demangled)
+			funcname = demangled;
+	}
+
+	if (base_sym && strcmp(funcname, base_sym->name) == 0) {
+		/* reuse the real, existing symbol */
+		inline_sym = base_sym;
+		/* ensure that we don't alias an inlined symbol, which could
+		 * lead to double frees in inline_node__delete
+		 */
+		assert(!base_sym->inlined);
+	} else {
+		/* create a fake symbol for the inline frame */
+		inline_sym = symbol__new(base_sym ? base_sym->start : 0,
+					 base_sym ? base_sym->end : 0,
+					 base_sym ? base_sym->binding : 0,
+					 funcname);
+		if (inline_sym)
+			inline_sym->inlined = 1;
+	}
+
+	free(demangled);
+
+	return inline_sym;
+}
+
 static int inline_list__append_dso_a2l(struct dso *dso,
-				       struct inline_node *node)
+				       struct inline_node *node,
+				       struct symbol *sym)
 {
 	struct a2l_data *a2l = dso->a2l;
-	char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
-	char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
+	struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
 
-	return inline_list__append(filename, funcname, a2l->line, node, dso);
+	return inline_list__append(inline_sym, strdup(a2l->filename),
+				   a2l->line, node);
 }
 
 static int addr2line(const char *dso_name, u64 addr,
 		     char **file, unsigned int *line, struct dso *dso,
-		     bool unwind_inlines, struct inline_node *node)
+		     bool unwind_inlines, struct inline_node *node,
+		     struct symbol *sym)
 {
 	int ret = 0;
 	struct a2l_data *a2l = dso->a2l;
@@ -244,7 +271,7 @@ static int addr2line(const char *dso_name, u64 addr,
 	if (unwind_inlines) {
 		int cnt = 0;
 
-		if (node && inline_list__append_dso_a2l(dso, node))
+		if (node && inline_list__append_dso_a2l(dso, node, sym))
 			return 0;
 
 		while (bfd_find_inliner_info(a2l->abfd, &a2l->filename,
@@ -255,7 +282,7 @@ static int addr2line(const char *dso_name, u64 addr,
 				a2l->filename = NULL;
 
 			if (node != NULL) {
-				if (inline_list__append_dso_a2l(dso, node))
+				if (inline_list__append_dso_a2l(dso, node, sym))
 					return 0;
 				// found at least one inline frame
 				ret = 1;
@@ -287,7 +314,7 @@ void dso__free_a2l(struct dso *dso)
 }
 
 static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
-	struct dso *dso)
+					struct dso *dso, struct symbol *sym)
 {
 	struct inline_node *node;
 
@@ -300,7 +327,7 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	INIT_LIST_HEAD(&node->val);
 	node->addr = addr;
 
-	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node))
+	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
 		goto out_free_inline_node;
 
 	if (list_empty(&node->val))
@@ -340,7 +367,8 @@ static int addr2line(const char *dso_name, u64 addr,
 		     char **file, unsigned int *line_nr,
 		     struct dso *dso __maybe_unused,
 		     bool unwind_inlines __maybe_unused,
-		     struct inline_node *node __maybe_unused)
+		     struct inline_node *node __maybe_unused,
+		     struct symbol *sym __maybe_unused)
 {
 	FILE *fp;
 	char cmd[PATH_MAX];
@@ -380,7 +408,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
 }
 
 static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
-	struct dso *dso __maybe_unused)
+					struct dso *dso __maybe_unused,
+					struct symbol *sym)
 {
 	FILE *fp;
 	char cmd[PATH_MAX];
@@ -408,13 +437,13 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	node->addr = addr;
 
 	while (getline(&filename, &len, fp) != -1) {
+
 		if (filename_split(filename, &line_nr) != 1) {
 			free(filename);
 			goto out;
 		}
 
-		if (inline_list__append(filename, NULL, line_nr, node,
-					NULL) != 0)
+		if (inline_list__append(sym, filename, line_nr, node) != 0)
 			goto out;
 
 		filename = NULL;
@@ -454,7 +483,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	if (dso_name == NULL)
 		goto out;
 
-	if (!addr2line(dso_name, addr, &file, &line, dso, unwind_inlines, NULL))
+	if (!addr2line(dso_name, addr, &file, &line, dso,
+		       unwind_inlines, NULL, sym))
 		goto out;
 
 	if (asprintf(&srcline, "%s:%u",
@@ -500,7 +530,8 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
 }
 
-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+					    struct symbol *sym)
 {
 	const char *dso_name;
 
@@ -508,7 +539,7 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
 	if (dso_name == NULL)
 		return NULL;
 
-	return addr2inlines(dso_name, addr, dso);
+	return addr2inlines(dso_name, addr, dso, sym);
 }
 
 void inline_node__delete(struct inline_node *node)
@@ -518,7 +549,9 @@ void inline_node__delete(struct inline_node *node)
 	list_for_each_entry_safe(ilist, tmp, &node->val, list) {
 		list_del_init(&ilist->list);
 		zfree(&ilist->filename);
-		zfree(&ilist->funcname);
+		/* only the inlined symbols are owned by the list */
+		if (ilist->symbol && ilist->symbol->inlined)
+			symbol__delete(ilist->symbol);
 		free(ilist);
 	}
 
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 7b52ba88676e..730bfd6926d7 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -17,8 +17,8 @@ void free_srcline(char *srcline);
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct inline_list {
+	struct symbol		*symbol;
 	char			*filename;
-	char			*funcname;
 	unsigned int		line_nr;
 	struct list_head	list;
 };
@@ -28,7 +28,10 @@ struct inline_node {
 	struct list_head	val;
 };
 
-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr);
+/* parse inlined frames for the given address */
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+					    struct symbol *sym);
+/* free resources associated to the inline node list */
 void inline_node__delete(struct inline_node *node);
 
 #endif /* PERF_SRCLINE_H */
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index aad99e7e179b..d880a059babb 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -59,6 +59,7 @@ struct symbol {
 	u8		binding;
 	u8		idle:1;
 	u8		ignore:1;
+	u8		inlined:1;
 	u8		arch_sym;
 	char		name[0];
 };
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 04/16] perf util: refactor inline_list to store srcline string directly
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (2 preceding siblings ...)
  2017-10-09 20:32 ` [PATCH v5 03/16] perf util: refactor inline_list to operate on symbols Milian Wolff
@ 2017-10-09 20:32 ` Milian Wolff
  2017-10-25 17:17   ` [tip:perf/core] perf callchain: Refactor " tip-bot for Milian Wolff
  2017-10-09 20:32 ` [PATCH v5 05/16] perf report: create real callchain entries for inlined frames Milian Wolff
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:32 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff, Jiri Olsa,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

This is a preparation for the creation of real callchain entries
for inlined frames. The rest of the perf code uses the srcline
string. As such, using that also for the srcline API allows us
to simplify some of the upcoming code. Most notably, it will
allow us to cache the srcline for a given inline node and reuse
it for different callchain entries.

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/srcline.c | 54 +++++++++++++++++++++++++++++++++++------------
 tools/perf/util/srcline.h |  3 +--
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index c0af61b355ed..f202fc7827df 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,8 +33,8 @@ static const char *dso__name(struct dso *dso)
 	return dso_name;
 }
 
-static int inline_list__append(struct symbol *symbol, char *filename,
-			       int line_nr, struct inline_node *node)
+static int inline_list__append(struct symbol *symbol, char *srcline,
+			       struct inline_node *node)
 {
 	struct inline_list *ilist;
 
@@ -43,8 +43,7 @@ static int inline_list__append(struct symbol *symbol, char *filename,
 		return -1;
 
 	ilist->symbol = symbol;
-	ilist->filename = filename;
-	ilist->line_nr = line_nr;
+	ilist->srcline = srcline;
 
 	if (callchain_param.order == ORDER_CALLEE)
 		list_add_tail(&ilist->list, &node->val);
@@ -54,6 +53,30 @@ static int inline_list__append(struct symbol *symbol, char *filename,
 	return 0;
 }
 
+/* basename version that takes a const input string */
+static const char *gnu_basename(const char *path)
+{
+	const char *base = strrchr(path, '/');
+
+	return base ? base + 1 : path;
+}
+
+static char *srcline_from_fileline(const char *file, unsigned int line)
+{
+	char *srcline;
+
+	if (!file)
+		return NULL;
+
+	if (!srcline_full_filename)
+		file = gnu_basename(file);
+
+	if (asprintf(&srcline, "%s:%u", file, line) < 0)
+		return NULL;
+
+	return srcline;
+}
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -237,9 +260,12 @@ static int inline_list__append_dso_a2l(struct dso *dso,
 {
 	struct a2l_data *a2l = dso->a2l;
 	struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
+	char *srcline = NULL;
 
-	return inline_list__append(inline_sym, strdup(a2l->filename),
-				   a2l->line, node);
+	if (a2l->filename)
+		srcline = srcline_from_fileline(a2l->filename, a2l->line);
+
+	return inline_list__append(inline_sym, srcline, node);
 }
 
 static int addr2line(const char *dso_name, u64 addr,
@@ -437,13 +463,15 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	node->addr = addr;
 
 	while (getline(&filename, &len, fp) != -1) {
+		char *srcline;
 
 		if (filename_split(filename, &line_nr) != 1) {
 			free(filename);
 			goto out;
 		}
 
-		if (inline_list__append(sym, filename, line_nr, node) != 0)
+		srcline = srcline_from_fileline(filename, line_nr);
+		if (inline_list__append(sym, srcline, node) != 0)
 			goto out;
 
 		filename = NULL;
@@ -487,16 +515,14 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		       unwind_inlines, NULL, sym))
 		goto out;
 
-	if (asprintf(&srcline, "%s:%u",
-				srcline_full_filename ? file : basename(file),
-				line) < 0) {
-		free(file);
+	srcline = srcline_from_fileline(file, line);
+	free(file);
+
+	if (!srcline)
 		goto out;
-	}
 
 	dso->a2l_fails = 0;
 
-	free(file);
 	return srcline;
 
 out:
@@ -548,7 +574,7 @@ void inline_node__delete(struct inline_node *node)
 
 	list_for_each_entry_safe(ilist, tmp, &node->val, list) {
 		list_del_init(&ilist->list);
-		zfree(&ilist->filename);
+		free_srcline(ilist->srcline);
 		/* only the inlined symbols are owned by the list */
 		if (ilist->symbol && ilist->symbol->inlined)
 			symbol__delete(ilist->symbol);
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 730bfd6926d7..0201ed2c0b9c 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -18,8 +18,7 @@ void free_srcline(char *srcline);
 
 struct inline_list {
 	struct symbol		*symbol;
-	char			*filename;
-	unsigned int		line_nr;
+	char			*srcline;
 	struct list_head	list;
 };
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 05/16] perf report: create real callchain entries for inlined frames
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (3 preceding siblings ...)
  2017-10-09 20:32 ` [PATCH v5 04/16] perf util: refactor inline_list to store srcline string directly Milian Wolff
@ 2017-10-09 20:32 ` Milian Wolff
  2017-10-25 17:17   ` [tip:perf/core] perf callchain: Create " tip-bot for Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 06/16] perf report: fall-back to function name comparison for -g srcline Milian Wolff
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:32 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff, Jiri Olsa,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

The inline_node structs are maintained by the new dso->inlines
tree. This in turn keeps ownership of the fake symbols and
srcline string representing an inline frame.

This tree is sorted by address to allow quick lookups. All other
entries of the symbol beside the function name are unused for inline
frames. The advantage of this approach is that all existing users
of the callchain API can now transparently display inlined frames
without having to patch their code.

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/dso.c     |  5 +++++
 tools/perf/util/dso.h     |  1 +
 tools/perf/util/machine.c | 37 ++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.h |  9 +++++++++
 5 files changed, 103 insertions(+)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 339e52971380..75c8250b3b8a 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -10,6 +10,7 @@
 #include "compress.h"
 #include "path.h"
 #include "symbol.h"
+#include "srcline.h"
 #include "dso.h"
 #include "machine.h"
 #include "auxtrace.h"
@@ -1201,6 +1202,7 @@ struct dso *dso__new(const char *name)
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->data.cache = RB_ROOT;
+		dso->inlined_nodes = RB_ROOT;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1232,6 +1234,9 @@ void dso__delete(struct dso *dso)
 	if (!RB_EMPTY_NODE(&dso->rb_node))
 		pr_err("DSO %s is still in rbtree when being deleted!\n",
 		       dso->long_name);
+
+	/* free inlines first, as they reference symbols */
+	inlines__tree_delete(&dso->inlined_nodes);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index a2bbb21f301c..122eca0d242d 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -141,6 +141,7 @@ struct dso {
 	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
+	struct rb_root	 inlined_nodes;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a37e1c056415..3d049cb313ac 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2109,6 +2109,40 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	return 0;
 }
 
+static int append_inlines(struct callchain_cursor *cursor,
+			  struct map *map, struct symbol *sym, u64 ip)
+{
+	struct inline_node *inline_node;
+	struct inline_list *ilist;
+	u64 addr;
+
+	if (!symbol_conf.inline_name || !map || !sym)
+		return 1;
+
+	addr = map__rip_2objdump(map, ip);
+
+	inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
+	if (!inline_node) {
+		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
+		if (!inline_node)
+			return 1;
+
+		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
+	}
+
+	list_for_each_entry(ilist, &inline_node->val, list) {
+		int ret = callchain_cursor_append(cursor, ip, map,
+						  ilist->symbol, false,
+						  NULL, 0, 0, 0,
+						  ilist->srcline);
+
+		if (ret != 0)
+			return ret;
+	}
+
+	return 0;
+}
+
 static int unwind_entry(struct unwind_entry *entry, void *arg)
 {
 	struct callchain_cursor *cursor = arg;
@@ -2117,6 +2151,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 	if (symbol_conf.hide_unresolved && entry->sym == NULL)
 		return 0;
 
+	if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
+		return 0;
+
 	srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
 	return callchain_cursor_append(cursor, entry->ip,
 				       entry->map, entry->sym,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index f202fc7827df..8bea6621d657 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -583,3 +583,54 @@ void inline_node__delete(struct inline_node *node)
 
 	free(node);
 }
+
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
+{
+	struct rb_node **p = &tree->rb_node;
+	struct rb_node *parent = NULL;
+	const u64 addr = inlines->addr;
+	struct inline_node *i;
+
+	while (*p != NULL) {
+		parent = *p;
+		i = rb_entry(parent, struct inline_node, rb_node);
+		if (addr < i->addr)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&inlines->rb_node, parent, p);
+	rb_insert_color(&inlines->rb_node, tree);
+}
+
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
+{
+	struct rb_node *n = tree->rb_node;
+
+	while (n) {
+		struct inline_node *i = rb_entry(n, struct inline_node,
+						 rb_node);
+
+		if (addr < i->addr)
+			n = n->rb_left;
+		else if (addr > i->addr)
+			n = n->rb_right;
+		else
+			return i;
+	}
+
+	return NULL;
+}
+
+void inlines__tree_delete(struct rb_root *tree)
+{
+	struct inline_node *pos;
+	struct rb_node *next = rb_first(tree);
+
+	while (next) {
+		pos = rb_entry(next, struct inline_node, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, tree);
+		inline_node__delete(pos);
+	}
+}
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 0201ed2c0b9c..ebe38cd22294 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -2,6 +2,7 @@
 #define PERF_SRCLINE_H
 
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/types.h>
 
 struct dso;
@@ -25,6 +26,7 @@ struct inline_list {
 struct inline_node {
 	u64			addr;
 	struct list_head	val;
+	struct rb_node		rb_node;
 };
 
 /* parse inlined frames for the given address */
@@ -33,4 +35,11 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 /* free resources associated to the inline node list */
 void inline_node__delete(struct inline_node *node);
 
+/* insert the inline node list into the DSO, which will take ownership */
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
+/* find previously inserted inline node list */
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
+/* delete all nodes within the tree of inline_node s */
+void inlines__tree_delete(struct rb_root *tree);
+
 #endif /* PERF_SRCLINE_H */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 06/16] perf report: fall-back to function name comparison for -g srcline
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (4 preceding siblings ...)
  2017-10-09 20:32 ` [PATCH v5 05/16] perf report: create real callchain entries for inlined frames Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-25 17:18   ` [tip:perf/core] perf report: Fall-back " tip-bot for Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 07/16] perf report: mark inlined frames in output by " (inlined)" suffix Milian Wolff
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

When a callchain entry has no srcline available, we ended up
comparing the instruction pointer. I consider this to be not
too useful. Rather, I think we should group the entries by
function name, which this patch adds. For people who want to
split the data on the IP boundary, using `-g address` is the
correct choice.

Before:

~~~~~
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654
            |          std::abs<double> complex:597
            |          std::__complex_abs complex:589
            |          |
            |          |--56.03%--hypot
            |          |          |
            |          |          |--8.45%--__hypot_finite
            |          |          |
            |          |          |--7.62%--__hypot_finite
            |          |          |
            |          |          |--2.29%--__hypot_finite
            |          |          |
            |          |          |--2.24%--__hypot_finite
            |          |          |
            |          |          |--2.06%--__hypot_finite
            |          |          |
            |          |          |--1.81%--__hypot_finite
...
~~~~~

After:

~~~~~
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654
            |          std::abs<double> complex:597
            |          std::__complex_abs complex:589
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index c208cd5a5e83..007e8364ea01 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -645,11 +645,9 @@ enum match_result {
 	MATCH_GT,
 };
 
-static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
-					     struct callchain_list *cnode)
+static enum match_result match_chain_strings(const char *left,
+					     const char *right)
 {
-	const char *left = cnode->srcline;
-	const char *right = node->srcline;
 	enum match_result ret = MATCH_EQ;
 	int cmp;
 
@@ -659,10 +657,8 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 		cmp = 1;
 	else if (left && !right)
 		cmp = -1;
-	else if (cnode->ip == node->ip)
-		cmp = 0;
 	else
-		cmp = (cnode->ip < node->ip) ? -1 : 1;
+		return MATCH_ERROR;
 
 	if (cmp != 0)
 		ret = cmp < 0 ? MATCH_LT : MATCH_GT;
@@ -677,10 +673,18 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 	u64 left, right;
 
 	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_srcline(node, cnode);
+		enum match_result match = match_chain_strings(cnode->srcline,
+							      node->srcline);
+
+		/* if no srcline is available, fallback to symbol name */
+		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
+			match = match_chain_strings(cnode->ms.sym->name,
+						    node->sym->name);
 
 		if (match != MATCH_ERROR)
 			return match;
+
+		/* otherwise fall-back to IP-based comparison below */
 	}
 
 	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 07/16] perf report: mark inlined frames in output by " (inlined)" suffix
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (5 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 06/16] perf report: fall-back to function name comparison for -g srcline Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-25 17:18   ` [tip:perf/core] perf callchain: Mark " tip-bot for Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 08/16] perf script: mark inlined frames and do not print DSO for them Milian Wolff
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

The original patch that introduced inline frame output in the
various browsers used this suffix already. The new centralized
approach that uses fake symbols for inlined frames was missing
this approach so far.

Instead of changing the symbol name itself, we only print the
suffix where needed. This allows us to efficiently lookup
the symbol for a given name without first having to append the
suffix before the lookup.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 10 +++++++---
 tools/perf/util/sort.c      |  3 +++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 007e8364ea01..b0ff56332134 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1107,11 +1107,15 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
+		const char *inlined = cl->ms.sym->inlined ? " (inlined)" : "";
+
 		if (show_srcline && cl->srcline)
-			printed = scnprintf(bf, bfsize, "%s %s",
-					cl->ms.sym->name, cl->srcline);
+			printed = scnprintf(bf, bfsize, "%s %s%s",
+					    cl->ms.sym->name, cl->srcline,
+					    inlined);
 		else
-			printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+			printed = scnprintf(bf, bfsize, "%s%s",
+					    cl->ms.sym->name, inlined);
 	} else
 		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index eb3ab902a1c0..acb9210fd18a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -283,6 +283,9 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
 			ret += repsep_snprintf(bf + ret, size - ret, "%.*s",
 					       width - ret,
 					       sym->name);
+			if (sym->inlined)
+				ret += repsep_snprintf(bf + ret, size - ret,
+						       " (inlined)");
 		}
 	} else {
 		size_t len = BITS_PER_LONG / 4;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 08/16] perf script: mark inlined frames and do not print DSO for them
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (6 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 07/16] perf report: mark inlined frames in output by " (inlined)" suffix Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-25 17:18   ` [tip:perf/core] perf script: Mark " tip-bot for Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching Milian Wolff
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

Instead of showing the (repeated) DSO name of the non-inlined
frame, we now show the "(inlined)" suffix instead.

Before:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)

After:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/evsel_fprintf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index f2c6c5ee11e8..5b9e89257aa7 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -157,7 +157,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 				}
 			}
 
-			if (print_dso) {
+			if (print_dso && (!node->sym || !node->sym->inlined)) {
 				printed += fprintf(fp, " (");
 				printed += map__fprintf_dsoname(node->map, fp);
 				printed += fprintf(fp, ")");
@@ -166,6 +166,9 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 			if (print_srcline)
 				printed += map__fprintf_srcline(node->map, addr, "\n  ", fp);
 
+			if (node->sym && node->sym->inlined)
+				printed += fprintf(fp, " (inlined)");
+
 			if (!print_oneline)
 				printed += fprintf(fp, "\n");
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (7 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 08/16] perf script: mark inlined frames and do not print DSO for them Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-13 13:28   ` Arnaldo Carvalho de Melo
  2017-10-25 17:19   ` [tip:perf/core] perf callchain: Compare " tip-bot for Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 10/16] perf report: compare symbol name for inlined frames when sorting Milian Wolff
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

The fake symbols we create for inlined frames will represent
different functions but can use the symbol start address. This leads
to issues when different inline branches all lead to the same
function.

Before:
~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
             --38.86%--_start
                       __libc_start_main
                       main
                       |
                        --37.57%--std::norm<double> (inlined)
                                  std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                  |
                                   --36.36%--std::abs<double> (inlined)
                                             std::__complex_abs (inlined)
                                             |
                                              --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                        std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                        std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
~~~~~

Note that this backtrace representation is completely bogus.
Complex abs does not call the linear congruential engine! It
is just a side-effect of a longer inlined stack being appended
to a shorter, different inlined stack, both of which originate
in the same function (main).

This patch fixes the issue:

~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
             --38.86%--_start
                       __libc_start_main
                       main
                       |
                       |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |          |
                       |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                       |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |                     |
                       |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                       |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                       |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                       |
                        --1.99%--std::norm<double> (inlined)
                                  std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                  std::abs<double> (inlined)
                                  std::__complex_abs (inlined)
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index b0ff56332134..3f1431bf71bd 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -670,11 +670,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 				     struct callchain_list *cnode)
 {
 	struct symbol *sym = node->sym;
+	enum match_result match;
 	u64 left, right;
 
 	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_strings(cnode->srcline,
-							      node->srcline);
+		match = match_chain_strings(cnode->srcline, node->srcline);
 
 		/* if no srcline is available, fallback to symbol name */
 		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
@@ -688,6 +688,13 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 	}
 
 	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
+		/* compare inlined frames based on their symbol name because
+		 * different inlined frames will have the same symbol start
+		 */
+		if (cnode->ms.sym->inlined || node->sym->inlined)
+			return match_chain_strings(cnode->ms.sym->name,
+						   node->sym->name);
+
 		left = cnode->ms.sym->start;
 		right = sym->start;
 	} else {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 10/16] perf report: compare symbol name for inlined frames when sorting
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (8 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-25 17:19   ` [tip:perf/core] perf report: Compare " tip-bot for Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 11/16] perf report: properly handle branch count in match_chain Milian Wolff
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

Similar to the callstack frame matching, we also have to compare
the symbol name when sorting hist entries. The reason is twofold:
On one hand, multiple inlined functions will use the same symbol
start/end values of the parent, non-inlined symbol. As such, all
of these symbols often end up missing from top-level report, as
they get merged with the non-inlined frame. On the other hand,
multiple different functions may end up inlining the same function,
and we need to aggregate these values properly.

Before:

~~~~~
perf report --stdio --inline -g none
# Children      Self  Command       Shared Object     Symbol
# ........  ........  ............  ................  ...................................
#
   100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
   100.00%     0.00%  cpp-inlining  cpp-inlining      [.] _start
   100.00%     0.00%  cpp-inlining  libc-2.25.so      [.] __libc_start_main
    97.03%     0.00%  cpp-inlining  cpp-inlining      [.] std::norm<double> (inlined)
    59.53%     4.26%  cpp-inlining  libm-2.25.so      [.] hypot
    55.21%    55.08%  cpp-inlining  libm-2.25.so      [.] __hypot_finite
     0.52%     0.52%  cpp-inlining  libm-2.25.so      [.] cabs
~~~~~

After:

~~~~~
perf report --stdio --inline -g none
# Children      Self  Command       Shared Object     Symbol
# ........  ........  ............  ................  ...................................................................................................................................
#
   100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
   100.00%     0.00%  cpp-inlining  cpp-inlining      [.] _start
   100.00%     0.00%  cpp-inlining  libc-2.25.so      [.] __libc_start_main
    62.57%     0.00%  cpp-inlining  cpp-inlining      [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
    62.57%     0.00%  cpp-inlining  cpp-inlining      [.] std::__complex_abs (inlined)
    62.57%     0.00%  cpp-inlining  cpp-inlining      [.] std::abs<double> (inlined)
    62.57%     0.00%  cpp-inlining  cpp-inlining      [.] std::norm<double> (inlined)
    59.53%     4.26%  cpp-inlining  libm-2.25.so      [.] hypot
    55.21%    55.08%  cpp-inlining  libm-2.25.so      [.] __hypot_finite
    34.46%     0.00%  cpp-inlining  cpp-inlining      [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
    32.39%     0.00%  cpp-inlining  cpp-inlining      [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
    32.39%     0.00%  cpp-inlining  cpp-inlining      [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
    12.29%     0.00%  cpp-inlining  cpp-inlining      [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
    12.29%     0.00%  cpp-inlining  cpp-inlining      [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
    12.29%     0.00%  cpp-inlining  cpp-inlining      [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
     0.52%     0.52%  cpp-inlining  libm-2.25.so      [.] cabs
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/sort.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index acb9210fd18a..006d10a0dc96 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -225,6 +225,9 @@ static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 	if (sym_l == sym_r)
 		return 0;
 
+	if (sym_l->inlined || sym_r->inlined)
+		return strcmp(sym_l->name, sym_r->name);
+
 	if (sym_l->start != sym_r->start)
 		return (int64_t)(sym_r->start - sym_l->start);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (9 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 10/16] perf report: compare symbol name for inlined frames when sorting Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-13 13:39   ` Arnaldo Carvalho de Melo
  2017-10-09 20:33 ` [PATCH v5 12/16] perf report: cache failed lookups of inlined frames Milian Wolff
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

Some of the code paths I introduced before returned too early
without running the code to handle a node's branch count.
By refactoring match_chain to only have one exit point, this
can be remedied.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 117 +++++++++++++++++++++++---------------------
 1 file changed, 60 insertions(+), 57 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 3f1431bf71bd..782de047c902 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,78 +666,81 @@ static enum match_result match_chain_strings(const char *left,
 	return ret;
 }
 
+static enum match_result match_address(u64 left, u64 right)
+{
+	if (left == right)
+		return MATCH_EQ;
+	else if (left < right)
+		return MATCH_LT;
+	else
+		return MATCH_GT;
+}
+
 static enum match_result match_chain(struct callchain_cursor_node *node,
 				     struct callchain_list *cnode)
 {
-	struct symbol *sym = node->sym;
-	enum match_result match;
-	u64 left, right;
+	enum match_result match = MATCH_ERROR;
 
-	if (callchain_param.key == CCKEY_SRCLINE) {
+	switch (callchain_param.key) {
+	case CCKEY_SRCLINE:
 		match = match_chain_strings(cnode->srcline, node->srcline);
-
-		/* if no srcline is available, fallback to symbol name */
-		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
-			match = match_chain_strings(cnode->ms.sym->name,
-						    node->sym->name);
-
 		if (match != MATCH_ERROR)
-			return match;
-
-		/* otherwise fall-back to IP-based comparison below */
-	}
-
-	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
-		/* compare inlined frames based on their symbol name because
-		 * different inlined frames will have the same symbol start
-		 */
-		if (cnode->ms.sym->inlined || node->sym->inlined)
-			return match_chain_strings(cnode->ms.sym->name,
-						   node->sym->name);
-
-		left = cnode->ms.sym->start;
-		right = sym->start;
-	} else {
-		left = cnode->ip;
-		right = node->ip;
+			break;
+		__fallthrough;
+	case CCKEY_FUNCTION:
+		if (node->sym && cnode->ms.sym) {
+			/* compare inlined frames based on their symbol name
+			 * because different inlined frames will have the same
+			 * symbol start. otherwise do a faster comparison based
+			 * on the symbol start address
+			 */
+			if (cnode->ms.sym->inlined || node->sym->inlined)
+				match = match_chain_strings(cnode->ms.sym->name,
+							    node->sym->name);
+			else
+				match = match_address(cnode->ms.sym->start,
+						      node->sym->start);
+			if (match != MATCH_ERROR)
+				break;
+		}
+		__fallthrough;
+	case CCKEY_ADDRESS:
+	default:
+		match = match_address(cnode->ip, node->ip);
+		break;
 	}
 
-	if (left == right) {
-		if (node->branch) {
-			cnode->branch_count++;
+	if (match == MATCH_EQ && node->branch) {
+		cnode->branch_count++;
 
-			if (node->branch_from) {
-				/*
-				 * It's "to" of a branch
-				 */
-				cnode->brtype_stat.branch_to = true;
+		if (node->branch_from) {
+			/*
+			 * It's "to" of a branch
+			 */
+			cnode->brtype_stat.branch_to = true;
 
-				if (node->branch_flags.predicted)
-					cnode->predicted_count++;
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
 
-				if (node->branch_flags.abort)
-					cnode->abort_count++;
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
 
-				branch_type_count(&cnode->brtype_stat,
-						  &node->branch_flags,
-						  node->branch_from,
-						  node->ip);
-			} else {
-				/*
-				 * It's "from" of a branch
-				 */
-				cnode->brtype_stat.branch_to = false;
-				cnode->cycles_count +=
-					node->branch_flags.cycles;
-				cnode->iter_count += node->nr_loop_iter;
-				cnode->iter_cycles += node->iter_cycles;
-			}
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
+		} else {
+			/*
+			 * It's "from" of a branch
+			 */
+			cnode->brtype_stat.branch_to = false;
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->iter_cycles += node->iter_cycles;
 		}
-
-		return MATCH_EQ;
 	}
 
-	return left > right ? MATCH_GT : MATCH_LT;
+	return match;
 }
 
 /*
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 12/16] perf report: cache failed lookups of inlined frames
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (10 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 11/16] perf report: properly handle branch count in match_chain Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 13/16] perf report: cache srclines for callchain nodes Milian Wolff
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

When no inlined frames could be found for a given address,
we did not store this information anywhere. That means we
potentially do the costly inliner lookup repeatedly for
cases where we know it can never succeed.

This patch makes dso__parse_addr_inlines always return a
valid inline_node. It will be empty when no inliners are
found. This enables us to cache the empty list in the DSO,
thereby improving the performance when many addresses
fail to find the inliners.

For my trivial example, the performance impact is already
quite significant:

Before:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
             5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
     2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
     4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
       939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
        11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )

       0.596246531 seconds time elapsed                                          ( +-  0.07% )
~~~~~

After:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                 0      cpu-migrations            #    0.000 K/sec
             5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
       432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
       670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
       141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
         2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )

       0.114222393 seconds time elapsed                                          ( +-  1.19% )
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/machine.c | 15 +++++++--------
 tools/perf/util/srcline.c | 16 +---------------
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3d049cb313ac..177c1d4088f8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2115,9 +2115,10 @@ static int append_inlines(struct callchain_cursor *cursor,
 	struct inline_node *inline_node;
 	struct inline_list *ilist;
 	u64 addr;
+	int ret = 1;
 
 	if (!symbol_conf.inline_name || !map || !sym)
-		return 1;
+		return ret;
 
 	addr = map__rip_2objdump(map, ip);
 
@@ -2125,22 +2126,20 @@ static int append_inlines(struct callchain_cursor *cursor,
 	if (!inline_node) {
 		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
 		if (!inline_node)
-			return 1;
-
+			return ret;
 		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
 	}
 
 	list_for_each_entry(ilist, &inline_node->val, list) {
-		int ret = callchain_cursor_append(cursor, ip, map,
-						  ilist->symbol, false,
-						  NULL, 0, 0, 0,
-						  ilist->srcline);
+		ret = callchain_cursor_append(cursor, ip, map,
+					      ilist->symbol, false,
+					      NULL, 0, 0, 0, ilist->srcline);
 
 		if (ret != 0)
 			return ret;
 	}
 
-	return 0;
+	return ret;
 }
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 8bea6621d657..fc3888664b20 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -353,17 +353,8 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	INIT_LIST_HEAD(&node->val);
 	node->addr = addr;
 
-	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
-		goto out_free_inline_node;
-
-	if (list_empty(&node->val))
-		goto out_free_inline_node;
-
+	addr2line(dso_name, addr, NULL, NULL, dso, true, node, sym);
 	return node;
-
-out_free_inline_node:
-	inline_node__delete(node);
-	return NULL;
 }
 
 #else /* HAVE_LIBBFD_SUPPORT */
@@ -480,11 +471,6 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 out:
 	pclose(fp);
 
-	if (list_empty(&node->val)) {
-		inline_node__delete(node);
-		return NULL;
-	}
-
 	return node;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 13/16] perf report: cache srclines for callchain nodes
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (11 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 12/16] perf report: cache failed lookups of inlined frames Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 14/16] perf report: use srcline from callchain for hist entries Milian Wolff
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

On one hand this ensures that the memory is properly freed when
the DSO gets freed. On the other hand this significantly speeds up
the processing of the callchain nodes when lots of srclines are
requested. For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and
thus do not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons
of srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative
would be to remove the formatting options and handle that on a
different level - i.e. print the sym/addr on demand wherever we
actually output something. And the unwind_inlines could be moved into
a separate function that does not return the srcline.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/dso.c     |  2 ++
 tools/perf/util/dso.h     |  1 +
 tools/perf/util/machine.c | 17 +++++++++---
 tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.h |  7 +++++
 5 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 75c8250b3b8a..3192b608e91b 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1203,6 +1203,7 @@ struct dso *dso__new(const char *name)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT;
+		dso->srclines = RB_ROOT;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1237,6 +1238,7 @@ void dso__delete(struct dso *dso)
 
 	/* free inlines first, as they reference symbols */
 	inlines__tree_delete(&dso->inlined_nodes);
+	srcline__tree_delete(&dso->srclines);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 122eca0d242d..821b16c67030 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	struct rb_root	 inlined_nodes;
+	struct rb_root	 srclines;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 177c1d4088f8..94d8f1ccedd9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1711,11 +1711,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 
 static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
 {
+	char *srcline = NULL;
+
 	if (!map || callchain_param.key == CCKEY_FUNCTION)
-		return NULL;
+		return srcline;
+
+	srcline = srcline__tree_find(&map->dso->srclines, ip);
+	if (!srcline) {
+		bool show_sym = false;
+		bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+		srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+				      sym, show_sym, show_addr);
+		srcline__tree_insert(&map->dso->srclines, ip, srcline);
+	}
 
-	return get_srcline(map->dso, map__rip_2objdump(map, ip),
-			   sym, false, callchain_param.key == CCKEY_ADDRESS);
+	return srcline;
 }
 
 struct iterations {
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index fc3888664b20..c143c3bc1ef8 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -542,6 +542,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
 }
 
+struct srcline_node {
+	u64			addr;
+	char			*srcline;
+	struct rb_node		rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+	struct rb_node **p = &tree->rb_node;
+	struct rb_node *parent = NULL;
+	struct srcline_node *i, *node;
+
+	node = zalloc(sizeof(struct srcline_node));
+	if (!node) {
+		perror("not enough memory for the srcline node");
+		return;
+	}
+
+	node->addr = addr;
+	node->srcline = srcline;
+
+	while (*p != NULL) {
+		parent = *p;
+		i = rb_entry(parent, struct srcline_node, rb_node);
+		if (addr < i->addr)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&node->rb_node, parent, p);
+	rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+	struct rb_node *n = tree->rb_node;
+
+	while (n) {
+		struct srcline_node *i = rb_entry(n, struct srcline_node,
+						  rb_node);
+
+		if (addr < i->addr)
+			n = n->rb_left;
+		else if (addr > i->addr)
+			n = n->rb_right;
+		else
+			return i->srcline;
+	}
+
+	return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+	struct srcline_node *pos;
+	struct rb_node *next = rb_first(tree);
+
+	while (next) {
+		pos = rb_entry(next, struct srcline_node, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, tree);
+		free_srcline(pos->srcline);
+		zfree(&pos);
+	}
+}
+
 struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 					    struct symbol *sym)
 {
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index ebe38cd22294..1c4d6210860b 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		  bool show_sym, bool show_addr, bool unwind_inlines);
 void free_srcline(char *srcline);
 
+/* insert the srcline into the DSO, which will take ownership */
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+/* find previously inserted srcline */
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+/* delete all srclines within the tree */
+void srcline__tree_delete(struct rb_root *tree);
+
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct inline_list {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 14/16] perf report: use srcline from callchain for hist entries
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (12 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 13/16] perf report: cache srclines for callchain nodes Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 15/16] perf util: enable handling of inlined frames by default Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
  15 siblings, 0 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

This also removes the symbol name from the srcline column,
more on this below.

This ensures we use the correct srcline, which could originate
from a potentially inlined function. The hist entries used to
query for the srcline based purely on the IP, which leads to
wrong results for inlined entries.

Before:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children      Self  Source:Line
# ........  ........  ..................................................................................................................................
#
    94.23%     0.00%  __libc_start_main+18446603487898210537
    94.23%     0.00%  _start+41
    44.58%     0.00%  main+100
    44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
    44.58%     0.00%  std::__complex_abs+100
    44.58%     0.00%  std::abs<double>+100
    44.58%     0.00%  std::norm<double>+100
    36.01%     0.00%  hypot+18446603487892193300
    25.81%     0.00%  main+41
    25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
    25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
    25.75%    25.75%  random.h:143
    18.39%     0.00%  main+57
    18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
    18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
    13.80%    13.80%  random.tcc:3330
     5.64%     0.00%  ??:0
     4.13%     4.13%  __hypot_finite+163
     4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children      Self  Source:Line
# ........  ........  ...........................................
#
    94.30%     1.19%  main.cpp:39
    94.23%     0.00%  __libc_start_main+18446603487898210537
    94.23%     0.00%  _start+41
    48.44%     1.70%  random.h:1823
    48.44%     0.00%  random.h:1814
    46.74%     2.53%  random.h:185
    44.68%     0.10%  complex:589
    44.68%     0.00%  complex:597
    44.68%     0.00%  complex:654
    44.68%     0.00%  complex:664
    40.61%    13.80%  random.tcc:3330
    36.01%     0.00%  hypot+18446603487892193300
    26.81%     0.00%  random.h:151
    26.81%     0.00%  random.h:332
    25.75%    25.75%  random.h:143
     5.64%     0.00%  ??:0
     4.13%     4.13%  __hypot_finite+163
     4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line
hist column. If this information is desired, users should
explicitly query for it if needed. I.e. run this command
instead:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children      Self  Symbol                                                                                                                               Source:Line
# ........  ........  ...................................................................................................................................  ...........................................
#
    94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
    94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
    94.23%     0.00%  [.] _start                                                                                                                           _start+41
    48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
    48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
    46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
    44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
    44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
    44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
    44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
    39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
    36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
    26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
    26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
    25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
    25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
     4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
     4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children      Self  Symbol                                                                                                                               Source:Line
# ........  ........  ...................................................................................................................................  ..................................................................................................................................
#
    94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
    94.23%     0.00%  [.] _start                                                                                                                           _start+41
    44.58%     0.00%  [.] main                                                                                                                             main+100
    44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
    44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
    44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
    44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
    36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
    25.81%     0.00%  [.] main                                                                                                                             main+41
    25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
    25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
    25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
    18.39%     0.00%  [.] main                                                                                                                             main+57
    18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
    18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
    13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
     4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
     4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/util/callchain.c | 1 +
 tools/perf/util/event.c     | 1 +
 tools/perf/util/hist.c      | 2 ++
 tools/perf/util/symbol.h    | 1 +
 4 files changed, 5 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 782de047c902..e0c25cf62d72 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1072,6 +1072,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 {
 	al->map = node->map;
 	al->sym = node->sym;
+	al->srcline = node->srcline;
 	if (node->map)
 		al->addr = node->map->map_ip(node->map, node->ip);
 	else
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 47eff4767edb..3c411e7e36aa 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1604,6 +1604,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 	al->sym = NULL;
 	al->cpu = sample->cpu;
 	al->socket = -1;
+	al->srcline = NULL;
 
 	if (al->cpu >= 0) {
 		struct perf_env *env = machine->env;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b0fa9c217e1c..25d143053ab5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -596,6 +596,7 @@ __hists__add_entry(struct hists *hists,
 			.map	= al->map,
 			.sym	= al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.socket	 = al->socket,
 		.cpu	 = al->cpu,
 		.cpumode = al->cpumode,
@@ -950,6 +951,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			.map = al->map,
 			.sym = al->sym,
 		},
+		.srcline = al->srcline ? strdup(al->srcline) : NULL,
 		.parent = iter->parent,
 		.raw_data = sample->raw_data,
 		.raw_size = sample->raw_size,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d880a059babb..d548ea5cb418 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -209,6 +209,7 @@ struct addr_location {
 	struct thread *thread;
 	struct map    *map;
 	struct symbol *sym;
+	const char    *srcline;
 	u64	      addr;
 	char	      level;
 	u8	      filtered;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 15/16] perf util: enable handling of inlined frames by default
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (13 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 14/16] perf report: use srcline from callchain for hist entries Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-09 20:33 ` [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
  15 siblings, 0 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra, Ingo Molnar

Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable
this useful option by default.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
 tools/perf/Documentation/perf-report.txt | 3 ++-
 tools/perf/Documentation/perf-script.txt | 3 ++-
 tools/perf/util/symbol.c                 | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 383a98d992ed..ddde2b54af57 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -434,7 +434,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry is function name or file/line.
+	will be printed. Each entry is function name or file/line. Enabled by
+	default, disable with --no-inline.
 
 include::callchain-overhead-calculation.txt[]
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index bcc1ba35a2d8..25e677344728 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]
 
 --inline::
 	If a callgraph address belongs to an inlined function, the inline stack
-	will be printed. Each entry has function name and file/line.
+	will be printed. Each entry has function name and file/line. Enabled by
+	default, disable with --no-inline.
 
 SEE ALSO
 --------
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 066e38aa4063..ce6993bebf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
 	.show_hist_headers	= true,
 	.symfs			= "",
 	.event_group		= true,
+	.inline_name		= true,
 };
 
 static enum dso_binary_type binary_type_symtab[] = {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
                   ` (14 preceding siblings ...)
  2017-10-09 20:33 ` [PATCH v5 15/16] perf util: enable handling of inlined frames by default Milian Wolff
@ 2017-10-09 20:33 ` Milian Wolff
  2017-10-10  4:49   ` Namhyung Kim
  15 siblings, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-09 20:33 UTC (permalink / raw)
  To: acme, jolsa, Jin Yao
  Cc: Linux-kernel, linux-perf-users, Milian Wolff,
	Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa

When inline frame resolution is disabled, a bogus srcline is obtained
for hist entries:

~~~~~
$ perf report -s sym,srcline --no-inline --stdio -g none
    95.21%     0.00%  [.] __libc_start_main                                                                                                   __libc_start_main+18446603358170398953
    95.21%     0.00%  [.] _start                                                                                                              _start+18446650082411225129
    46.67%     0.00%  [.] main                                                                                                                main+18446650082411225208
    38.75%     0.00%  [.] hypot                                                                                                               hypot+18446603358164312084
    23.75%     0.00%  [.] main                                                                                                                main+18446650082411225151
    20.83%    20.83%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.h:143
    18.12%     0.00%  [.] main                                                                                                                main+18446650082411225165
    13.12%    13.12%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3330
     4.17%     4.17%  [.] __hypot_finite                                                                                                      __hypot_finite+163
     4.17%     4.17%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3333
     4.17%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312227
     4.17%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.92%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+11
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+24
     2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312075
     2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312088
~~~~~

Note how we get very large offsets to main and cannot see any srcline
from one of the complex or random headers, even though the instruction
pointers actually lie in code inlined from there.

This patch fixes the mapping to use map__objdump_2mem instead of
map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
values for me when inline resolution is disabled:

~~~~~
$ perf report -s sym,srcline --no-inline --stdio -g none
    95.21%     0.00%  [.] __libc_start_main                                                                                                   __libc_start_main+233
    95.21%     0.00%  [.] _start                                                                                                              _start+41
    46.88%     0.00%  [.] main                                                                                                                complex:589
    43.96%     0.00%  [.] main                                                                                                                random.h:185
    38.75%     0.00%  [.] hypot                                                                                                               hypot+20
    20.83%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.h:143
    13.12%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3330
     4.17%     4.17%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239715
     4.17%     4.17%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     4.17%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+163
     4.17%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3333
     2.92%     2.92%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239563
     2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239576
     2.50%     2.50%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     2.50%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
     2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+11
~~~~~

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>

Note how most of the large offset values are now gone. Most notably,
we get proper srcline resolution for the random.h and complex headers.
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 006d10a0dc96..6f3d109078a3 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
 	if (!map)
 		return SRCLINE_UNKNOWN;
 
-	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
+	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
 			   he->ms.sym, true, true);
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-09 20:33 ` [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
@ 2017-10-10  4:49   ` Namhyung Kim
  2017-10-12 18:22     ` Milian Wolff
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2017-10-10  4:49 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Jiri Olsa, kernel-team

Hi Milian,

On Mon, Oct 09, 2017 at 10:33:10PM +0200, Milian Wolff wrote:
> When inline frame resolution is disabled, a bogus srcline is obtained
> for hist entries:
> 
> ~~~~~
> $ perf report -s sym,srcline --no-inline --stdio -g none
>     95.21%     0.00%  [.] __libc_start_main                                                                                                   __libc_start_main+18446603358170398953
>     95.21%     0.00%  [.] _start                                                                                                              _start+18446650082411225129
>     46.67%     0.00%  [.] main                                                                                                                main+18446650082411225208
>     38.75%     0.00%  [.] hypot                                                                                                               hypot+18446603358164312084
>     23.75%     0.00%  [.] main                                                                                                                main+18446650082411225151
>     20.83%    20.83%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.h:143
>     18.12%     0.00%  [.] main                                                                                                                main+18446650082411225165
>     13.12%    13.12%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3330
>      4.17%     4.17%  [.] __hypot_finite                                                                                                      __hypot_finite+163
>      4.17%     4.17%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3333
>      4.17%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312227
>      4.17%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
>      2.92%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
>      2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+11
>      2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+24
>      2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312075
>      2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+18446603358164312088
> ~~~~~
> 
> Note how we get very large offsets to main and cannot see any srcline
> from one of the complex or random headers, even though the instruction
> pointers actually lie in code inlined from there.
> 
> This patch fixes the mapping to use map__objdump_2mem instead of
> map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> values for me when inline resolution is disabled:
> 
> ~~~~~
> $ perf report -s sym,srcline --no-inline --stdio -g none
>     95.21%     0.00%  [.] __libc_start_main                                                                                                   __libc_start_main+233
>     95.21%     0.00%  [.] _start                                                                                                              _start+41
>     46.88%     0.00%  [.] main                                                                                                                complex:589
>     43.96%     0.00%  [.] main                                                                                                                random.h:185
>     38.75%     0.00%  [.] hypot                                                                                                               hypot+20
>     20.83%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.h:143
>     13.12%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3330
>      4.17%     4.17%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239715
>      4.17%     4.17%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
>      4.17%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+163
>      4.17%     0.00%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  random.tcc:3333
>      2.92%     2.92%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
>      2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239563
>      2.50%     2.50%  [.] __hypot_finite                                                                                                      __hypot_finite+140715545239576
>      2.50%     2.50%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
>      2.50%     2.50%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >  std::generate_canonical<double, 53ul, std::line
>      2.50%     0.00%  [.] __hypot_finite                                                                                                      __hypot_finite+11
> ~~~~~
> 
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Yao Jin <yao.jin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> 
> Note how most of the large offset values are now gone. Most notably,
> we get proper srcline resolution for the random.h and complex headers.
> ---
>  tools/perf/util/sort.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 006d10a0dc96..6f3d109078a3 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
>  	if (!map)
>  		return SRCLINE_UNKNOWN;
>  
> -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
>  			   he->ms.sym, true, true);

I'm not sure this is right.  IIRC hist_entry->ip is a relative IP so
it needs to be changed for objdump use.  Maybe there's some bug on
translating the address but it seems that the original code is
correct.  And this change breaks srcline for my test program (which is
a PIE).

Thanks,
Namhyung


>  }
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-10  4:49   ` Namhyung Kim
@ 2017-10-12 18:22     ` Milian Wolff
  2017-10-12 18:52       ` Jiri Olsa
  2017-10-13  1:19       ` Namhyung Kim
  0 siblings, 2 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-12 18:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, jolsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Jiri Olsa, kernel-team

On Dienstag, 10. Oktober 2017 06:49:54 CEST Namhyung Kim wrote:
> Hi Milian,
> 
> On Mon, Oct 09, 2017 at 10:33:10PM +0200, Milian Wolff wrote:
> > When inline frame resolution is disabled, a bogus srcline is obtained
> > for hist entries:
> > 
> > ~~~~~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> > 
> >     95.21%     0.00%  [.] __libc_start_main                               
> >                                                                       
> >     __libc_start_main+18446603358170398953 95.21%     0.00%  [.] _start  
> >                                                                          
> >                                          _start+18446650082411225129
> >     46.67%     0.00%  [.] main                                           
> >                                                                        
> >     main+18446650082411225208 38.75%     0.00%  [.] hypot                
> >                                                                          
> >                             hypot+18446603358164312084 23.75%     0.00% 
> >     [.] main                                                             
> >                                                      
> >     main+18446650082411225151 20.83%    20.83%  [.]
> >     std::generate_canonical<double, 53ul,
> >     std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >     2147483647ul> >  random.h:143 18.12%     0.00%  [.] main             
> >                                                                          
> >                                 main+18446650082411225165 13.12%   
> >     13.12%  [.] std::generate_canonical<double, 53ul,
> >     std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >     2147483647ul> >  random.tcc:3330>     
> >      4.17%     4.17%  [.] __hypot_finite                                  
> >                                                                        
> >      __hypot_finite+163 4.17%     4.17%  [.]
> >      std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  random.tcc:3333 4.17%     0.00%  [.] __hypot_finite
> >                                                                          
> >                                      __hypot_finite+18446603358164312227
> >      4.17%     0.00%  [.] std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  std::generate_canonical<double, 53ul, std::line
> >      2.92%     0.00%  [.] std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  std::generate_canonical<double, 53ul, std::line
> >      2.50%     2.50%  [.] __hypot_finite                                 
> >                                                                         
> >      __hypot_finite+11 2.50%     2.50%  [.] __hypot_finite               
> >                                                                          
> >                       __hypot_finite+24 2.50%     0.00%  [.]
> >      __hypot_finite                                                      
> >                                                    
> >      __hypot_finite+18446603358164312075 2.50%     0.00%  [.]
> >      __hypot_finite                                                      
> >                                                    
> >      __hypot_finite+18446603358164312088> 
> > ~~~~~
> > 
> > Note how we get very large offsets to main and cannot see any srcline
> > from one of the complex or random headers, even though the instruction
> > pointers actually lie in code inlined from there.
> > 
> > This patch fixes the mapping to use map__objdump_2mem instead of
> > map__objdump_2mem in hist_entry__get_srcline. This fixes the srcline
> > values for me when inline resolution is disabled:
> > 
> > ~~~~~
> > $ perf report -s sym,srcline --no-inline --stdio -g none
> > 
> >     95.21%     0.00%  [.] __libc_start_main                               
> >                                                                       
> >     __libc_start_main+233 95.21%     0.00%  [.] _start                   
> >                                                                          
> >                         _start+41 46.88%     0.00%  [.] main             
> >                                                                          
> >                                 complex:589 43.96%     0.00%  [.] main   
> >                                                                          
> >                                           random.h:185 38.75%     0.00% 
> >     [.] hypot                                                            
> >                                                       hypot+20 20.83%    
> >     0.00%  [.] std::generate_canonical<double, 53ul,
> >     std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >     2147483647ul> >  random.h:143 13.12%     0.00%  [.]
> >     std::generate_canonical<double, 53ul,
> >     std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >     2147483647ul> >  random.tcc:3330>     
> >      4.17%     4.17%  [.] __hypot_finite                                  
> >                                                                        
> >      __hypot_finite+140715545239715 4.17%     4.17%  [.]
> >      std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  std::generate_canonical<double, 53ul, std::line
> >      4.17%     0.00%  [.] __hypot_finite                                 
> >                                                                         
> >      __hypot_finite+163 4.17%     0.00%  [.]
> >      std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  random.tcc:3333 2.92%     2.92%  [.]
> >      std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  std::generate_canonical<double, 53ul, std::line
> >      2.50%     2.50%  [.] __hypot_finite                                 
> >                                                                         
> >      __hypot_finite+140715545239563 2.50%     2.50%  [.] __hypot_finite  
> >                                                                          
> >                                    __hypot_finite+140715545239576 2.50%  
> >        2.50%  [.] std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  std::generate_canonical<double, 53ul, std::line
> >      2.50%     2.50%  [.] std::generate_canonical<double, 53ul,
> >      std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >      2147483647ul> >  std::generate_canonical<double, 53ul, std::line
> >      2.50%     0.00%  [.] __hypot_finite                                 
> >                                                                         
> >      __hypot_finite+11> 
> > ~~~~~
> > 
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Yao Jin <yao.jin@linux.intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> > 
> > Note how most of the large offset values are now gone. Most notably,
> > we get proper srcline resolution for the random.h and complex headers.
> > ---
> > 
> >  tools/perf/util/sort.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > index 006d10a0dc96..6f3d109078a3 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
> > 
> >  	if (!map)
> >  	
> >  		return SRCLINE_UNKNOWN;
> > 
> > -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> > +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
> > 
> >  			   he->ms.sym, true, true);
> 
> I'm not sure this is right.  IIRC hist_entry->ip is a relative IP so
> it needs to be changed for objdump use.  Maybe there's some bug on
> translating the address but it seems that the original code is
> correct.  And this change breaks srcline for my test program (which is
> a PIE).

Odd, I'll have to look at that. But this is imo unrelated to the rest of the 
patch series. So maybe we skip this one and apply the others. Assuming those 
other patches are OK by now?

Jiri, you also reviewed it before, is there anything else missing?

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-12 18:22     ` Milian Wolff
@ 2017-10-12 18:52       ` Jiri Olsa
  2017-10-13 11:03         ` Jiri Olsa
  2017-10-13  1:19       ` Namhyung Kim
  1 sibling, 1 reply; 40+ messages in thread
From: Jiri Olsa @ 2017-10-12 18:52 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Namhyung Kim, acme, jolsa, Jin Yao, Linux-kernel,
	linux-perf-users, Arnaldo Carvalho de Melo, kernel-team

On Thu, Oct 12, 2017 at 08:22:58PM +0200, Milian Wolff wrote:

SNIP

> > >  		return SRCLINE_UNKNOWN;
> > > 
> > > -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> > > +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
> > > 
> > >  			   he->ms.sym, true, true);
> > 
> > I'm not sure this is right.  IIRC hist_entry->ip is a relative IP so
> > it needs to be changed for objdump use.  Maybe there's some bug on
> > translating the address but it seems that the original code is
> > correct.  And this change breaks srcline for my test program (which is
> > a PIE).
> 
> Odd, I'll have to look at that. But this is imo unrelated to the rest of the 
> patch series. So maybe we skip this one and apply the others. Assuming those 
> other patches are OK by now?
> 
> Jiri, you also reviewed it before, is there anything else missing?

I plan to check on it this week.. sry for delay

jirka

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-12 18:22     ` Milian Wolff
  2017-10-12 18:52       ` Jiri Olsa
@ 2017-10-13  1:19       ` Namhyung Kim
  1 sibling, 0 replies; 40+ messages in thread
From: Namhyung Kim @ 2017-10-13  1:19 UTC (permalink / raw)
  To: Milian Wolff
  Cc: acme, jolsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Jiri Olsa, kernel-team

On Thu, Oct 12, 2017 at 08:22:58PM +0200, Milian Wolff wrote:
> On Dienstag, 10. Oktober 2017 06:49:54 CEST Namhyung Kim wrote:
> > Hi Milian,
> > 
> > On Mon, Oct 09, 2017 at 10:33:10PM +0200, Milian Wolff wrote:
> > > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > > index 006d10a0dc96..6f3d109078a3 100644
> > > --- a/tools/perf/util/sort.c
> > > +++ b/tools/perf/util/sort.c
> > > @@ -334,7 +334,7 @@ char *hist_entry__get_srcline(struct hist_entry *he)
> > > 
> > >  	if (!map)
> > >  	
> > >  		return SRCLINE_UNKNOWN;
> > > 
> > > -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> > > +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
> > > 
> > >  			   he->ms.sym, true, true);
> > 
> > I'm not sure this is right.  IIRC hist_entry->ip is a relative IP so
> > it needs to be changed for objdump use.  Maybe there's some bug on
> > translating the address but it seems that the original code is
> > correct.  And this change breaks srcline for my test program (which is
> > a PIE).
> 
> Odd, I'll have to look at that. But this is imo unrelated to the rest of the 
> patch series. So maybe we skip this one and apply the others. Assuming those 
> other patches are OK by now?

Yep, for the patch 1 - 15:

  Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry
  2017-10-12 18:52       ` Jiri Olsa
@ 2017-10-13 11:03         ` Jiri Olsa
  0 siblings, 0 replies; 40+ messages in thread
From: Jiri Olsa @ 2017-10-13 11:03 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Namhyung Kim, acme, jolsa, Jin Yao, Linux-kernel,
	linux-perf-users, Arnaldo Carvalho de Melo, kernel-team

On Thu, Oct 12, 2017 at 08:52:18PM +0200, Jiri Olsa wrote:
> On Thu, Oct 12, 2017 at 08:22:58PM +0200, Milian Wolff wrote:
> 
> SNIP
> 
> > > >  		return SRCLINE_UNKNOWN;
> > > > 
> > > > -	return get_srcline(map->dso, map__rip_2objdump(map, he->ip),
> > > > +	return get_srcline(map->dso, map__objdump_2mem(map, he->ip),
> > > > 
> > > >  			   he->ms.sym, true, true);
> > > 
> > > I'm not sure this is right.  IIRC hist_entry->ip is a relative IP so
> > > it needs to be changed for objdump use.  Maybe there's some bug on
> > > translating the address but it seems that the original code is
> > > correct.  And this change breaks srcline for my test program (which is
> > > a PIE).
> > 
> > Odd, I'll have to look at that. But this is imo unrelated to the rest of the 
> > patch series. So maybe we skip this one and apply the others. Assuming those 
> > other patches are OK by now?
> > 
> > Jiri, you also reviewed it before, is there anything else missing?
> 
> I plan to check on it this week.. sry for delay

I think Namhyung is right with his comment, the rest look ok,
so for patches 1 - 15:

Reviewed-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching
  2017-10-09 20:33 ` [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching Milian Wolff
@ 2017-10-13 13:28   ` Arnaldo Carvalho de Melo
  2017-10-25 17:19   ` [tip:perf/core] perf callchain: Compare " tip-bot for Milian Wolff
  1 sibling, 0 replies; 40+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-13 13:28 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

Em Mon, Oct 09, 2017 at 10:33:03PM +0200, Milian Wolff escreveu:
> +++ b/tools/perf/util/callchain.c
> @@ -670,11 +670,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
>  				     struct callchain_list *cnode)
>  {
>  	struct symbol *sym = node->sym;
> +	enum match_result match;
>  	u64 left, right;
>  
>  	if (callchain_param.key == CCKEY_SRCLINE) {
> -		enum match_result match = match_chain_strings(cnode->srcline,
> -							      node->srcline);
> +		match = match_chain_strings(cnode->srcline, node->srcline);
>  
>  		/* if no srcline is available, fallback to symbol name */
>  		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)

the above one is unnecessary, as match_result is, at this time, only
used inide that CCKEY_SRCLINE if branch, so I left it out.

> @@ -688,6 +688,13 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
>  	}
>  
>  	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> +		/* compare inlined frames based on their symbol name because
> +		 * different inlined frames will have the same symbol start
> +		 */
> +		if (cnode->ms.sym->inlined || node->sym->inlined)
> +			return match_chain_strings(cnode->ms.sym->name,
> +						   node->sym->name);
> +

And this clashed with a change by Ravi Bangoria, which I fixed up:

[acme@jouet linux]$ git log -1 --oneline c1fbc0cf81f1c464f5fda322c1104d4bb1da6711
c1fbc0cf81f1 (tag: perf-urgent-for-mingo-4.14-20171005) perf callchain: Compare dsos (as well) for CCKEY_FUNCTION
[acme@jouet linux]$ 

Continuing...

>  		left = cnode->ms.sym->start;
>  		right = sym->start;
>  	} else {
> -- 
> 2.14.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-09 20:33 ` [PATCH v5 11/16] perf report: properly handle branch count in match_chain Milian Wolff
@ 2017-10-13 13:39   ` Arnaldo Carvalho de Melo
  2017-10-13 14:08     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 40+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-13 13:39 UTC (permalink / raw)
  To: Milian Wolff
  Cc: jolsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra

Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
> Some of the code paths I introduced before returned too early
> without running the code to handle a node's branch count.
> By refactoring match_chain to only have one exit point, this
> can be remedied.

Fixing up this one now.
 
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Yao Jin <yao.jin@linux.intel.com>
> Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> ---
>  tools/perf/util/callchain.c | 117 +++++++++++++++++++++++---------------------
>  1 file changed, 60 insertions(+), 57 deletions(-)
> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 3f1431bf71bd..782de047c902 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -666,78 +666,81 @@ static enum match_result match_chain_strings(const char *left,
>  	return ret;
>  }
>  
> +static enum match_result match_address(u64 left, u64 right)
> +{
> +	if (left == right)
> +		return MATCH_EQ;
> +	else if (left < right)
> +		return MATCH_LT;
> +	else
> +		return MATCH_GT;
> +}
> +
>  static enum match_result match_chain(struct callchain_cursor_node *node,
>  				     struct callchain_list *cnode)
>  {
> -	struct symbol *sym = node->sym;
> -	enum match_result match;
> -	u64 left, right;
> +	enum match_result match = MATCH_ERROR;
>  
> -	if (callchain_param.key == CCKEY_SRCLINE) {
> +	switch (callchain_param.key) {
> +	case CCKEY_SRCLINE:
>  		match = match_chain_strings(cnode->srcline, node->srcline);
> -
> -		/* if no srcline is available, fallback to symbol name */
> -		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
> -			match = match_chain_strings(cnode->ms.sym->name,
> -						    node->sym->name);
> -
>  		if (match != MATCH_ERROR)
> -			return match;
> -
> -		/* otherwise fall-back to IP-based comparison below */
> -	}
> -
> -	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> -		/* compare inlined frames based on their symbol name because
> -		 * different inlined frames will have the same symbol start
> -		 */
> -		if (cnode->ms.sym->inlined || node->sym->inlined)
> -			return match_chain_strings(cnode->ms.sym->name,
> -						   node->sym->name);
> -
> -		left = cnode->ms.sym->start;
> -		right = sym->start;
> -	} else {
> -		left = cnode->ip;
> -		right = node->ip;
> +			break;
> +		__fallthrough;
> +	case CCKEY_FUNCTION:
> +		if (node->sym && cnode->ms.sym) {
> +			/* compare inlined frames based on their symbol name
> +			 * because different inlined frames will have the same
> +			 * symbol start. otherwise do a faster comparison based
> +			 * on the symbol start address
> +			 */
> +			if (cnode->ms.sym->inlined || node->sym->inlined)
> +				match = match_chain_strings(cnode->ms.sym->name,
> +							    node->sym->name);
> +			else
> +				match = match_address(cnode->ms.sym->start,
> +						      node->sym->start);
> +			if (match != MATCH_ERROR)
> +				break;
> +		}
> +		__fallthrough;
> +	case CCKEY_ADDRESS:
> +	default:
> +		match = match_address(cnode->ip, node->ip);
> +		break;
>  	}
>  
> -	if (left == right) {
> -		if (node->branch) {
> -			cnode->branch_count++;
> +	if (match == MATCH_EQ && node->branch) {
> +		cnode->branch_count++;
>  
> -			if (node->branch_from) {
> -				/*
> -				 * It's "to" of a branch
> -				 */
> -				cnode->brtype_stat.branch_to = true;
> +		if (node->branch_from) {
> +			/*
> +			 * It's "to" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = true;
>  
> -				if (node->branch_flags.predicted)
> -					cnode->predicted_count++;
> +			if (node->branch_flags.predicted)
> +				cnode->predicted_count++;
>  
> -				if (node->branch_flags.abort)
> -					cnode->abort_count++;
> +			if (node->branch_flags.abort)
> +				cnode->abort_count++;
>  
> -				branch_type_count(&cnode->brtype_stat,
> -						  &node->branch_flags,
> -						  node->branch_from,
> -						  node->ip);
> -			} else {
> -				/*
> -				 * It's "from" of a branch
> -				 */
> -				cnode->brtype_stat.branch_to = false;
> -				cnode->cycles_count +=
> -					node->branch_flags.cycles;
> -				cnode->iter_count += node->nr_loop_iter;
> -				cnode->iter_cycles += node->iter_cycles;
> -			}
> +			branch_type_count(&cnode->brtype_stat,
> +					  &node->branch_flags,
> +					  node->branch_from,
> +					  node->ip);
> +		} else {
> +			/*
> +			 * It's "from" of a branch
> +			 */
> +			cnode->brtype_stat.branch_to = false;
> +			cnode->cycles_count += node->branch_flags.cycles;
> +			cnode->iter_count += node->nr_loop_iter;
> +			cnode->iter_cycles += node->iter_cycles;
>  		}
> -
> -		return MATCH_EQ;
>  	}
>  
> -	return left > right ? MATCH_GT : MATCH_LT;
> +	return match;
>  }
>  
>  /*
> -- 
> 2.14.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-13 13:39   ` Arnaldo Carvalho de Melo
@ 2017-10-13 14:08     ` Arnaldo Carvalho de Melo
  2017-10-14 19:30       ` Milian Wolff
  2017-10-16  4:18       ` ravi
  0 siblings, 2 replies; 40+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-13 14:08 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Jiri Olsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra, Ravi Bangoria

Em Fri, Oct 13, 2017 at 10:39:03AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
> > Some of the code paths I introduced before returned too early
> > without running the code to handle a node's branch count.
> > By refactoring match_chain to only have one exit point, this
> > can be remedied.
> 
> Fixing up this one now.

Millian, this is all fresher in your mind, can you please take a look at
my perf/core branch and check if the change i made to ]PATCH v5 09/16]
"perf report: compare symbol name for inlined frames when matching" is
ok wrt Ravi's fix and then, please, rebase v5 on top of what is there?

Ravi, please take a look at this as well, to see if with these changes
your fix remains valid, ok?

Thanks,

- Arnaldo
  
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Yao Jin <yao.jin@linux.intel.com>
> > Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> > ---
> >  tools/perf/util/callchain.c | 117 +++++++++++++++++++++++---------------------
> >  1 file changed, 60 insertions(+), 57 deletions(-)
> > 
> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 3f1431bf71bd..782de047c902 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -666,78 +666,81 @@ static enum match_result match_chain_strings(const char *left,
> >  	return ret;
> >  }
> >  
> > +static enum match_result match_address(u64 left, u64 right)
> > +{
> > +	if (left == right)
> > +		return MATCH_EQ;
> > +	else if (left < right)
> > +		return MATCH_LT;
> > +	else
> > +		return MATCH_GT;
> > +}
> > +
> >  static enum match_result match_chain(struct callchain_cursor_node *node,
> >  				     struct callchain_list *cnode)
> >  {
> > -	struct symbol *sym = node->sym;
> > -	enum match_result match;
> > -	u64 left, right;
> > +	enum match_result match = MATCH_ERROR;
> >  
> > -	if (callchain_param.key == CCKEY_SRCLINE) {
> > +	switch (callchain_param.key) {
> > +	case CCKEY_SRCLINE:
> >  		match = match_chain_strings(cnode->srcline, node->srcline);
> > -
> > -		/* if no srcline is available, fallback to symbol name */
> > -		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
> > -			match = match_chain_strings(cnode->ms.sym->name,
> > -						    node->sym->name);
> > -
> >  		if (match != MATCH_ERROR)
> > -			return match;
> > -
> > -		/* otherwise fall-back to IP-based comparison below */
> > -	}
> > -
> > -	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> > -		/* compare inlined frames based on their symbol name because
> > -		 * different inlined frames will have the same symbol start
> > -		 */
> > -		if (cnode->ms.sym->inlined || node->sym->inlined)
> > -			return match_chain_strings(cnode->ms.sym->name,
> > -						   node->sym->name);
> > -
> > -		left = cnode->ms.sym->start;
> > -		right = sym->start;
> > -	} else {
> > -		left = cnode->ip;
> > -		right = node->ip;
> > +			break;
> > +		__fallthrough;
> > +	case CCKEY_FUNCTION:
> > +		if (node->sym && cnode->ms.sym) {
> > +			/* compare inlined frames based on their symbol name
> > +			 * because different inlined frames will have the same
> > +			 * symbol start. otherwise do a faster comparison based
> > +			 * on the symbol start address
> > +			 */
> > +			if (cnode->ms.sym->inlined || node->sym->inlined)
> > +				match = match_chain_strings(cnode->ms.sym->name,
> > +							    node->sym->name);
> > +			else
> > +				match = match_address(cnode->ms.sym->start,
> > +						      node->sym->start);
> > +			if (match != MATCH_ERROR)
> > +				break;
> > +		}
> > +		__fallthrough;
> > +	case CCKEY_ADDRESS:
> > +	default:
> > +		match = match_address(cnode->ip, node->ip);
> > +		break;
> >  	}
> >  
> > -	if (left == right) {
> > -		if (node->branch) {
> > -			cnode->branch_count++;
> > +	if (match == MATCH_EQ && node->branch) {
> > +		cnode->branch_count++;
> >  
> > -			if (node->branch_from) {
> > -				/*
> > -				 * It's "to" of a branch
> > -				 */
> > -				cnode->brtype_stat.branch_to = true;
> > +		if (node->branch_from) {
> > +			/*
> > +			 * It's "to" of a branch
> > +			 */
> > +			cnode->brtype_stat.branch_to = true;
> >  
> > -				if (node->branch_flags.predicted)
> > -					cnode->predicted_count++;
> > +			if (node->branch_flags.predicted)
> > +				cnode->predicted_count++;
> >  
> > -				if (node->branch_flags.abort)
> > -					cnode->abort_count++;
> > +			if (node->branch_flags.abort)
> > +				cnode->abort_count++;
> >  
> > -				branch_type_count(&cnode->brtype_stat,
> > -						  &node->branch_flags,
> > -						  node->branch_from,
> > -						  node->ip);
> > -			} else {
> > -				/*
> > -				 * It's "from" of a branch
> > -				 */
> > -				cnode->brtype_stat.branch_to = false;
> > -				cnode->cycles_count +=
> > -					node->branch_flags.cycles;
> > -				cnode->iter_count += node->nr_loop_iter;
> > -				cnode->iter_cycles += node->iter_cycles;
> > -			}
> > +			branch_type_count(&cnode->brtype_stat,
> > +					  &node->branch_flags,
> > +					  node->branch_from,
> > +					  node->ip);
> > +		} else {
> > +			/*
> > +			 * It's "from" of a branch
> > +			 */
> > +			cnode->brtype_stat.branch_to = false;
> > +			cnode->cycles_count += node->branch_flags.cycles;
> > +			cnode->iter_count += node->nr_loop_iter;
> > +			cnode->iter_cycles += node->iter_cycles;
> >  		}
> > -
> > -		return MATCH_EQ;
> >  	}
> >  
> > -	return left > right ? MATCH_GT : MATCH_LT;
> > +	return match;
> >  }
> >  
> >  /*
> > -- 
> > 2.14.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-13 14:08     ` Arnaldo Carvalho de Melo
@ 2017-10-14 19:30       ` Milian Wolff
  2017-10-16 14:17         ` Arnaldo Carvalho de Melo
  2017-10-16  4:18       ` ravi
  1 sibling, 1 reply; 40+ messages in thread
From: Milian Wolff @ 2017-10-14 19:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra, Ravi Bangoria

On Freitag, 13. Oktober 2017 16:08:34 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 13, 2017 at 10:39:03AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
> > > Some of the code paths I introduced before returned too early
> > > without running the code to handle a node's branch count.
> > > By refactoring match_chain to only have one exit point, this
> > > can be remedied.
> > 
> > Fixing up this one now.
> 
> Millian, this is all fresher in your mind, can you please take a look at
> my perf/core branch and check if the change i made to ]PATCH v5 09/16]
> "perf report: compare symbol name for inlined frames when matching" is
> ok wrt Ravi's fix and then, please, rebase v5 on top of what is there?

Regarding the 09/16 patch, I think your change is fine. With your rebase 
request, do you mean I should rebase the rest of v5 (starting from 11/16, you 
seem to have applied 10/16 already) and resent that as v6? I can do that, when 
I get the time.

> Ravi, please take a look at this as well, to see if with these changes
> your fix remains valid, ok?
> 
> Thanks,

Thanks for the review and rebase.

> > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > Cc: David Ahern <dsahern@gmail.com>
> > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > Cc: Yao Jin <yao.jin@linux.intel.com>
> > > Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> > > ---
> > > 
> > >  tools/perf/util/callchain.c | 117
> > >  +++++++++++++++++++++++--------------------- 1 file changed, 60
> > >  insertions(+), 57 deletions(-)
> > > 
> > > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > > index 3f1431bf71bd..782de047c902 100644
> > > --- a/tools/perf/util/callchain.c
> > > +++ b/tools/perf/util/callchain.c
> > > @@ -666,78 +666,81 @@ static enum match_result match_chain_strings(const
> > > char *left,> > 
> > >  	return ret;
> > >  
> > >  }
> > > 
> > > +static enum match_result match_address(u64 left, u64 right)
> > > +{
> > > +	if (left == right)
> > > +		return MATCH_EQ;
> > > +	else if (left < right)
> > > +		return MATCH_LT;
> > > +	else
> > > +		return MATCH_GT;
> > > +}
> > > +
> > > 
> > >  static enum match_result match_chain(struct callchain_cursor_node
> > >  *node,
> > >  
> > >  				     struct callchain_list *cnode)
> > >  
> > >  {
> > > 
> > > -	struct symbol *sym = node->sym;
> > > -	enum match_result match;
> > > -	u64 left, right;
> > > +	enum match_result match = MATCH_ERROR;
> > > 
> > > -	if (callchain_param.key == CCKEY_SRCLINE) {
> > > +	switch (callchain_param.key) {
> > > 
> > > +	case CCKEY_SRCLINE:
> > >  		match = match_chain_strings(cnode->srcline, node->srcline);
> > > 
> > > -
> > > -		/* if no srcline is available, fallback to symbol name */
> > > -		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
> > > -			match = match_chain_strings(cnode->ms.sym->name,
> > > -						    node->sym->name);
> > > -
> > > 
> > >  		if (match != MATCH_ERROR)
> > > 
> > > -			return match;
> > > -
> > > -		/* otherwise fall-back to IP-based comparison below */
> > > -	}
> > > -
> > > -	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> > > -		/* compare inlined frames based on their symbol name because
> > > -		 * different inlined frames will have the same symbol start
> > > -		 */
> > > -		if (cnode->ms.sym->inlined || node->sym->inlined)
> > > -			return match_chain_strings(cnode->ms.sym->name,
> > > -						   node->sym->name);
> > > -
> > > -		left = cnode->ms.sym->start;
> > > -		right = sym->start;
> > > -	} else {
> > > -		left = cnode->ip;
> > > -		right = node->ip;
> > > +			break;
> > > +		__fallthrough;
> > > +	case CCKEY_FUNCTION:
> > > +		if (node->sym && cnode->ms.sym) {
> > > +			/* compare inlined frames based on their symbol name
> > > +			 * because different inlined frames will have the same
> > > +			 * symbol start. otherwise do a faster comparison based
> > > +			 * on the symbol start address
> > > +			 */
> > > +			if (cnode->ms.sym->inlined || node->sym->inlined)
> > > +				match = match_chain_strings(cnode->ms.sym->name,
> > > +							    node->sym->name);
> > > +			else
> > > +				match = match_address(cnode->ms.sym->start,
> > > +						      node->sym->start);
> > > +			if (match != MATCH_ERROR)
> > > +				break;
> > > +		}
> > > +		__fallthrough;
> > > +	case CCKEY_ADDRESS:
> > > +	default:
> > > +		match = match_address(cnode->ip, node->ip);
> > > +		break;
> > > 
> > >  	}
> > > 
> > > -	if (left == right) {
> > > -		if (node->branch) {
> > > -			cnode->branch_count++;
> > > +	if (match == MATCH_EQ && node->branch) {
> > > +		cnode->branch_count++;
> > > 
> > > -			if (node->branch_from) {
> > > -				/*
> > > -				 * It's "to" of a branch
> > > -				 */
> > > -				cnode->brtype_stat.branch_to = true;
> > > +		if (node->branch_from) {
> > > +			/*
> > > +			 * It's "to" of a branch
> > > +			 */
> > > +			cnode->brtype_stat.branch_to = true;
> > > 
> > > -				if (node->branch_flags.predicted)
> > > -					cnode->predicted_count++;
> > > +			if (node->branch_flags.predicted)
> > > +				cnode->predicted_count++;
> > > 
> > > -				if (node->branch_flags.abort)
> > > -					cnode->abort_count++;
> > > +			if (node->branch_flags.abort)
> > > +				cnode->abort_count++;
> > > 
> > > -				branch_type_count(&cnode->brtype_stat,
> > > -						  &node->branch_flags,
> > > -						  node->branch_from,
> > > -						  node->ip);
> > > -			} else {
> > > -				/*
> > > -				 * It's "from" of a branch
> > > -				 */
> > > -				cnode->brtype_stat.branch_to = false;
> > > -				cnode->cycles_count +=
> > > -					node->branch_flags.cycles;
> > > -				cnode->iter_count += node->nr_loop_iter;
> > > -				cnode->iter_cycles += node->iter_cycles;
> > > -			}
> > > +			branch_type_count(&cnode->brtype_stat,
> > > +					  &node->branch_flags,
> > > +					  node->branch_from,
> > > +					  node->ip);
> > > +		} else {
> > > +			/*
> > > +			 * It's "from" of a branch
> > > +			 */
> > > +			cnode->brtype_stat.branch_to = false;
> > > +			cnode->cycles_count += node->branch_flags.cycles;
> > > +			cnode->iter_count += node->nr_loop_iter;
> > > +			cnode->iter_cycles += node->iter_cycles;
> > > 
> > >  		}
> > > 
> > > -
> > > -		return MATCH_EQ;
> > > 
> > >  	}
> > > 
> > > -	return left > right ? MATCH_GT : MATCH_LT;
> > > +	return match;
> > > 
> > >  }
> > >  
> > >  /*


-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-13 14:08     ` Arnaldo Carvalho de Melo
  2017-10-14 19:30       ` Milian Wolff
@ 2017-10-16  4:18       ` ravi
  2017-10-16  8:27         ` Milian Wolff
  2017-10-16 14:19         ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 40+ messages in thread
From: ravi @ 2017-10-16  4:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Milian Wolff
  Cc: Jiri Olsa, Jin Yao, Linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Namhyung Kim,
	Peter Zijlstra, ravi.bangoria


On Friday 13 October 2017 07:38 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 13, 2017 at 10:39:03AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
>>> Some of the code paths I introduced before returned too early
>>> without running the code to handle a node's branch count.
>>> By refactoring match_chain to only have one exit point, this
>>> can be remedied.
>> Fixing up this one now.
> Millian, this is all fresher in your mind, can you please take a look at
> my perf/core branch and check if the change i made to ]PATCH v5 09/16]
> "perf report: compare symbol name for inlined frames when matching" is
> ok wrt Ravi's fix and then, please, rebase v5 on top of what is there?
>
> Ravi, please take a look at this as well, to see if with these changes
> your fix remains valid, ok?

Yes Arnaldo, my changes are still valid.

Milian, Can you please change this patch such that it incorporates dso
comparison for CCKEY_FUNCTION.

( Also, will that be good to change macro to CCKEY_FUNCTION_DOS ?)

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-16  4:18       ` ravi
@ 2017-10-16  8:27         ` Milian Wolff
  2017-10-16 14:19         ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 40+ messages in thread
From: Milian Wolff @ 2017-10-16  8:27 UTC (permalink / raw)
  To: ravi
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Jin Yao, Linux-kernel,
	linux-perf-users, Arnaldo Carvalho de Melo, David Ahern,
	Namhyung Kim, Peter Zijlstra

On Montag, 16. Oktober 2017 06:18:17 CEST ravi wrote:
> On Friday 13 October 2017 07:38 PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 13, 2017 at 10:39:03AM -0300, Arnaldo Carvalho de Melo 
escreveu:
> >> Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
> >>> Some of the code paths I introduced before returned too early
> >>> without running the code to handle a node's branch count.
> >>> By refactoring match_chain to only have one exit point, this
> >>> can be remedied.
> >> 
> >> Fixing up this one now.
> > 
> > Millian, this is all fresher in your mind, can you please take a look at
> > my perf/core branch and check if the change i made to ]PATCH v5 09/16]
> > "perf report: compare symbol name for inlined frames when matching" is
> > ok wrt Ravi's fix and then, please, rebase v5 on top of what is there?
> > 
> > Ravi, please take a look at this as well, to see if with these changes
> > your fix remains valid, ok?
> 
> Yes Arnaldo, my changes are still valid.
> 
> Milian, Can you please change this patch such that it incorporates dso
> comparison for CCKEY_FUNCTION.

Arnaldo has already done that.

> ( Also, will that be good to change macro to CCKEY_FUNCTION_DOS ?)

Personally, I don't think so. The DSO compare is, imo, just an implementation 
detail.

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-14 19:30       ` Milian Wolff
@ 2017-10-16 14:17         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 40+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-16 14:17 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Jin Yao, Linux-kernel,
	linux-perf-users, David Ahern, Namhyung Kim, Peter Zijlstra,
	Ravi Bangoria

Em Sat, Oct 14, 2017 at 09:30:54PM +0200, Milian Wolff escreveu:
> On Freitag, 13. Oktober 2017 16:08:34 CEST Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 13, 2017 at 10:39:03AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
> > > > Some of the code paths I introduced before returned too early
> > > > without running the code to handle a node's branch count.
> > > > By refactoring match_chain to only have one exit point, this
> > > > can be remedied.
> > > 
> > > Fixing up this one now.
> > 
> > Millian, this is all fresher in your mind, can you please take a look at
> > my perf/core branch and check if the change i made to ]PATCH v5 09/16]
> > "perf report: compare symbol name for inlined frames when matching" is
> > ok wrt Ravi's fix and then, please, rebase v5 on top of what is there?
> 
> Regarding the 09/16 patch, I think your change is fine. With your rebase 
> request, do you mean I should rebase the rest of v5 (starting from 11/16, you 
> seem to have applied 10/16 already) and resent that as v6? I can do that, when 
> I get the time.

Yes, can you please do that? As soon as you have the time, if I think it
takes long I'll just move it to a separate branch and continue
processing other patches, just take your time.

Right now I'm processing/testing some perf/urgent bits.

- Arnaldo
 
> > Ravi, please take a look at this as well, to see if with these changes
> > your fix remains valid, ok?
> > 
> > Thanks,
> 
> Thanks for the review and rebase.
> 
> > > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > Cc: David Ahern <dsahern@gmail.com>
> > > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > Cc: Yao Jin <yao.jin@linux.intel.com>
> > > > Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> > > > ---
> > > > 
> > > >  tools/perf/util/callchain.c | 117
> > > >  +++++++++++++++++++++++--------------------- 1 file changed, 60
> > > >  insertions(+), 57 deletions(-)
> > > > 
> > > > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > > > index 3f1431bf71bd..782de047c902 100644
> > > > --- a/tools/perf/util/callchain.c
> > > > +++ b/tools/perf/util/callchain.c
> > > > @@ -666,78 +666,81 @@ static enum match_result match_chain_strings(const
> > > > char *left,> > 
> > > >  	return ret;
> > > >  
> > > >  }
> > > > 
> > > > +static enum match_result match_address(u64 left, u64 right)
> > > > +{
> > > > +	if (left == right)
> > > > +		return MATCH_EQ;
> > > > +	else if (left < right)
> > > > +		return MATCH_LT;
> > > > +	else
> > > > +		return MATCH_GT;
> > > > +}
> > > > +
> > > > 
> > > >  static enum match_result match_chain(struct callchain_cursor_node
> > > >  *node,
> > > >  
> > > >  				     struct callchain_list *cnode)
> > > >  
> > > >  {
> > > > 
> > > > -	struct symbol *sym = node->sym;
> > > > -	enum match_result match;
> > > > -	u64 left, right;
> > > > +	enum match_result match = MATCH_ERROR;
> > > > 
> > > > -	if (callchain_param.key == CCKEY_SRCLINE) {
> > > > +	switch (callchain_param.key) {
> > > > 
> > > > +	case CCKEY_SRCLINE:
> > > >  		match = match_chain_strings(cnode->srcline, node->srcline);
> > > > 
> > > > -
> > > > -		/* if no srcline is available, fallback to symbol name */
> > > > -		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
> > > > -			match = match_chain_strings(cnode->ms.sym->name,
> > > > -						    node->sym->name);
> > > > -
> > > > 
> > > >  		if (match != MATCH_ERROR)
> > > > 
> > > > -			return match;
> > > > -
> > > > -		/* otherwise fall-back to IP-based comparison below */
> > > > -	}
> > > > -
> > > > -	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
> > > > -		/* compare inlined frames based on their symbol name because
> > > > -		 * different inlined frames will have the same symbol start
> > > > -		 */
> > > > -		if (cnode->ms.sym->inlined || node->sym->inlined)
> > > > -			return match_chain_strings(cnode->ms.sym->name,
> > > > -						   node->sym->name);
> > > > -
> > > > -		left = cnode->ms.sym->start;
> > > > -		right = sym->start;
> > > > -	} else {
> > > > -		left = cnode->ip;
> > > > -		right = node->ip;
> > > > +			break;
> > > > +		__fallthrough;
> > > > +	case CCKEY_FUNCTION:
> > > > +		if (node->sym && cnode->ms.sym) {
> > > > +			/* compare inlined frames based on their symbol name
> > > > +			 * because different inlined frames will have the same
> > > > +			 * symbol start. otherwise do a faster comparison based
> > > > +			 * on the symbol start address
> > > > +			 */
> > > > +			if (cnode->ms.sym->inlined || node->sym->inlined)
> > > > +				match = match_chain_strings(cnode->ms.sym->name,
> > > > +							    node->sym->name);
> > > > +			else
> > > > +				match = match_address(cnode->ms.sym->start,
> > > > +						      node->sym->start);
> > > > +			if (match != MATCH_ERROR)
> > > > +				break;
> > > > +		}
> > > > +		__fallthrough;
> > > > +	case CCKEY_ADDRESS:
> > > > +	default:
> > > > +		match = match_address(cnode->ip, node->ip);
> > > > +		break;
> > > > 
> > > >  	}
> > > > 
> > > > -	if (left == right) {
> > > > -		if (node->branch) {
> > > > -			cnode->branch_count++;
> > > > +	if (match == MATCH_EQ && node->branch) {
> > > > +		cnode->branch_count++;
> > > > 
> > > > -			if (node->branch_from) {
> > > > -				/*
> > > > -				 * It's "to" of a branch
> > > > -				 */
> > > > -				cnode->brtype_stat.branch_to = true;
> > > > +		if (node->branch_from) {
> > > > +			/*
> > > > +			 * It's "to" of a branch
> > > > +			 */
> > > > +			cnode->brtype_stat.branch_to = true;
> > > > 
> > > > -				if (node->branch_flags.predicted)
> > > > -					cnode->predicted_count++;
> > > > +			if (node->branch_flags.predicted)
> > > > +				cnode->predicted_count++;
> > > > 
> > > > -				if (node->branch_flags.abort)
> > > > -					cnode->abort_count++;
> > > > +			if (node->branch_flags.abort)
> > > > +				cnode->abort_count++;
> > > > 
> > > > -				branch_type_count(&cnode->brtype_stat,
> > > > -						  &node->branch_flags,
> > > > -						  node->branch_from,
> > > > -						  node->ip);
> > > > -			} else {
> > > > -				/*
> > > > -				 * It's "from" of a branch
> > > > -				 */
> > > > -				cnode->brtype_stat.branch_to = false;
> > > > -				cnode->cycles_count +=
> > > > -					node->branch_flags.cycles;
> > > > -				cnode->iter_count += node->nr_loop_iter;
> > > > -				cnode->iter_cycles += node->iter_cycles;
> > > > -			}
> > > > +			branch_type_count(&cnode->brtype_stat,
> > > > +					  &node->branch_flags,
> > > > +					  node->branch_from,
> > > > +					  node->ip);
> > > > +		} else {
> > > > +			/*
> > > > +			 * It's "from" of a branch
> > > > +			 */
> > > > +			cnode->brtype_stat.branch_to = false;
> > > > +			cnode->cycles_count += node->branch_flags.cycles;
> > > > +			cnode->iter_count += node->nr_loop_iter;
> > > > +			cnode->iter_cycles += node->iter_cycles;
> > > > 
> > > >  		}
> > > > 
> > > > -
> > > > -		return MATCH_EQ;
> > > > 
> > > >  	}
> > > > 
> > > > -	return left > right ? MATCH_GT : MATCH_LT;
> > > > +	return match;
> > > > 
> > > >  }
> > > >  
> > > >  /*
> 
> 
> -- 
> Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/16] perf report: properly handle branch count in match_chain
  2017-10-16  4:18       ` ravi
  2017-10-16  8:27         ` Milian Wolff
@ 2017-10-16 14:19         ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 40+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-16 14:19 UTC (permalink / raw)
  To: ravi
  Cc: Arnaldo Carvalho de Melo, Milian Wolff, Jiri Olsa, Jin Yao,
	Linux-kernel, linux-perf-users, David Ahern, Namhyung Kim,
	Peter Zijlstra

Em Mon, Oct 16, 2017 at 09:48:17AM +0530, ravi escreveu:
> 
> On Friday 13 October 2017 07:38 PM, Arnaldo Carvalho de Melo wrote:
> >Em Fri, Oct 13, 2017 at 10:39:03AM -0300, Arnaldo Carvalho de Melo escreveu:
> >>Em Mon, Oct 09, 2017 at 10:33:05PM +0200, Milian Wolff escreveu:
> >>>Some of the code paths I introduced before returned too early
> >>>without running the code to handle a node's branch count.
> >>>By refactoring match_chain to only have one exit point, this
> >>>can be remedied.
> >>Fixing up this one now.
> >Millian, this is all fresher in your mind, can you please take a look at
> >my perf/core branch and check if the change i made to ]PATCH v5 09/16]
> >"perf report: compare symbol name for inlined frames when matching" is
> >ok wrt Ravi's fix and then, please, rebase v5 on top of what is there?
> >
> >Ravi, please take a look at this as well, to see if with these changes
> >your fix remains valid, ok?
> 
> Yes Arnaldo, my changes are still valid.

I knot they are valid, probably my wording was unclear, I was asking if,
with Milian changes, and my fixing up to cope with your patch, that was
just in perf/urgent, while Milian work was done on perf/core, everything
worked for your use case.

- Arnaldo
 
> Milian, Can you please change this patch such that it incorporates dso
> comparison for CCKEY_FUNCTION.
> 
> ( Also, will that be good to change macro to CCKEY_FUNCTION_DOS ?)
> 
> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf report: Remove code to handle inline frames from browsers
  2017-10-09 20:32 ` [PATCH v5 01/16] perf report: remove code to handle inline frames from browsers Milian Wolff
@ 2017-10-25 17:15   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, yao.jin, peterz, tglx, linux-kernel, milian.wolff,
	dsahern, mingo, namhyung, acme, hpa

Commit-ID:  2a704fc8db7b0080a67d9f4f4cb2a7bcaf79949d
Gitweb:     https://git.kernel.org/tip/2a704fc8db7b0080a67d9f4f4cb2a7bcaf79949d
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:32:55 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:55 -0300

perf report: Remove code to handle inline frames from browsers

The follow-up commits will make inline frames first-class citizens in
the callchain, thereby obsoleting all of this special code.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/hists.c  | 180 +++-------------------------------------
 tools/perf/ui/stdio/hist.c      |  77 +----------------
 tools/perf/util/evsel_fprintf.c |  32 -------
 tools/perf/util/hist.c          |   5 --
 tools/perf/util/sort.h          |   1 -
 5 files changed, 13 insertions(+), 282 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 13dfb0a..3a433f3 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -154,57 +154,9 @@ static void callchain_list__set_folding(struct callchain_list *cl, bool unfold)
 	cl->unfolded = unfold ? cl->has_children : false;
 }
 
-static struct inline_node *inline_node__create(struct map *map, u64 ip)
-{
-	struct dso *dso;
-	struct inline_node *node;
-
-	if (map == NULL)
-		return NULL;
-
-	dso = map->dso;
-	if (dso == NULL)
-		return NULL;
-
-	node = dso__parse_addr_inlines(dso,
-				       map__rip_2objdump(map, ip));
-
-	return node;
-}
-
-static int inline__count_rows(struct inline_node *node)
-{
-	struct inline_list *ilist;
-	int i = 0;
-
-	if (node == NULL)
-		return 0;
-
-	list_for_each_entry(ilist, &node->val, list) {
-		if ((ilist->filename != NULL) || (ilist->funcname != NULL))
-			i++;
-	}
-
-	return i;
-}
-
-static int callchain_list__inline_rows(struct callchain_list *chain)
-{
-	struct inline_node *node;
-	int rows;
-
-	node = inline_node__create(chain->ms.map, chain->ip);
-	if (node == NULL)
-		return 0;
-
-	rows = inline__count_rows(node);
-	inline_node__delete(node);
-	return rows;
-}
-
 static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
 {
-	int n = 0, inline_rows;
+	int n = 0;
 	struct rb_node *nd;
 
 	for (nd = rb_first(&node->rb_root); nd; nd = rb_next(nd)) {
@@ -215,12 +167,6 @@ static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
 		list_for_each_entry(chain, &child->val, list) {
 			++n;
 
-			if (symbol_conf.inline_name) {
-				inline_rows =
-					callchain_list__inline_rows(chain);
-				n += inline_rows;
-			}
-
 			/* We need this because we may not have children */
 			folded_sign = callchain_list__folded(chain);
 			if (folded_sign == '+')
@@ -272,7 +218,7 @@ static int callchain_node__count_rows(struct callchain_node *node)
 {
 	struct callchain_list *chain;
 	bool unfolded = false;
-	int n = 0, inline_rows;
+	int n = 0;
 
 	if (callchain_param.mode == CHAIN_FLAT)
 		return callchain_node__count_flat_rows(node);
@@ -281,10 +227,6 @@ static int callchain_node__count_rows(struct callchain_node *node)
 
 	list_for_each_entry(chain, &node->val, list) {
 		++n;
-		if (symbol_conf.inline_name) {
-			inline_rows = callchain_list__inline_rows(chain);
-			n += inline_rows;
-		}
 
 		unfolded = chain->unfolded;
 	}
@@ -432,19 +374,6 @@ static void hist_entry__init_have_children(struct hist_entry *he)
 	he->init_have_children = true;
 }
 
-static void hist_entry_init_inline_node(struct hist_entry *he)
-{
-	if (he->inline_node)
-		return;
-
-	he->inline_node = inline_node__create(he->ms.map, he->ip);
-
-	if (he->inline_node == NULL)
-		return;
-
-	he->has_children = true;
-}
-
 static bool hist_browser__toggle_fold(struct hist_browser *browser)
 {
 	struct hist_entry *he = browser->he_selection;
@@ -476,12 +405,8 @@ static bool hist_browser__toggle_fold(struct hist_browser *browser)
 
 		if (he->unfolded) {
 			if (he->leaf)
-				if (he->inline_node)
-					he->nr_rows = inline__count_rows(
-							he->inline_node);
-				else
-					he->nr_rows = callchain__count_rows(
-							&he->sorted_chain);
+				he->nr_rows = callchain__count_rows(
+						&he->sorted_chain);
 			else
 				he->nr_rows = hierarchy_count_rows(browser, he, false);
 
@@ -841,71 +766,6 @@ static bool hist_browser__check_dump_full(struct hist_browser *browser __maybe_u
 
 #define LEVEL_OFFSET_STEP 3
 
-static int hist_browser__show_inline(struct hist_browser *browser,
-				     struct inline_node *node,
-				     unsigned short row,
-				     int offset)
-{
-	struct inline_list *ilist;
-	char buf[1024];
-	int color, width, first_row;
-
-	first_row = row;
-	width = browser->b.width - (LEVEL_OFFSET_STEP + 2);
-	list_for_each_entry(ilist, &node->val, list) {
-		if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
-			color = HE_COLORSET_NORMAL;
-			if (ui_browser__is_current_entry(&browser->b, row))
-				color = HE_COLORSET_SELECTED;
-
-			if (callchain_param.key == CCKEY_ADDRESS ||
-			    callchain_param.key == CCKEY_SRCLINE) {
-				if (ilist->filename != NULL)
-					scnprintf(buf, sizeof(buf),
-						  "%s:%d (inline)",
-						  ilist->filename,
-						  ilist->line_nr);
-				else
-					scnprintf(buf, sizeof(buf), "??");
-			} else if (ilist->funcname != NULL)
-				scnprintf(buf, sizeof(buf), "%s (inline)",
-					  ilist->funcname);
-			else if (ilist->filename != NULL)
-				scnprintf(buf, sizeof(buf),
-					  "%s:%d (inline)",
-					  ilist->filename,
-					  ilist->line_nr);
-			else
-				scnprintf(buf, sizeof(buf), "??");
-
-			ui_browser__set_color(&browser->b, color);
-			hist_browser__gotorc(browser, row, 0);
-			ui_browser__write_nstring(&browser->b, " ",
-				LEVEL_OFFSET_STEP + offset);
-			ui_browser__write_nstring(&browser->b, buf, width);
-			row++;
-		}
-	}
-
-	return row - first_row;
-}
-
-static size_t show_inline_list(struct hist_browser *browser, struct map *map,
-			       u64 ip, int row, int offset)
-{
-	struct inline_node *node;
-	int ret;
-
-	node = inline_node__create(map, ip);
-	if (node == NULL)
-		return 0;
-
-	ret = hist_browser__show_inline(browser, node, row, offset);
-
-	inline_node__delete(node);
-	return ret;
-}
-
 static int hist_browser__show_callchain_list(struct hist_browser *browser,
 					     struct callchain_node *node,
 					     struct callchain_list *chain,
@@ -917,7 +777,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	char bf[1024], *alloc_str;
 	char buf[64], *alloc_str2;
 	const char *str;
-	int inline_rows = 0, ret = 1;
+	int ret = 1;
 
 	if (arg->row_offset != 0) {
 		arg->row_offset--;
@@ -954,12 +814,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	free(alloc_str);
 	free(alloc_str2);
 
-	if (symbol_conf.inline_name) {
-		inline_rows = show_inline_list(browser, chain->ms.map,
-					       chain->ip, row + 1, offset);
-	}
-
-	return ret + inline_rows;
+	return ret;
 }
 
 static bool check_percent_display(struct rb_node *node, u64 parent_total)
@@ -1383,12 +1238,6 @@ static int hist_browser__show_entry(struct hist_browser *browser,
 		folded_sign = hist_entry__folded(entry);
 	}
 
-	if (symbol_conf.inline_name &&
-	    (!entry->has_children)) {
-		hist_entry_init_inline_node(entry);
-		folded_sign = hist_entry__folded(entry);
-	}
-
 	if (row_offset == 0) {
 		struct hpp_arg arg = {
 			.b		= &browser->b,
@@ -1420,8 +1269,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
 			}
 
 			if (first) {
-				if (symbol_conf.use_callchain ||
-					symbol_conf.inline_name) {
+				if (symbol_conf.use_callchain) {
 					ui_browser__printf(&browser->b, "%c ", folded_sign);
 					width -= 2;
 				}
@@ -1463,15 +1311,11 @@ static int hist_browser__show_entry(struct hist_browser *browser,
 			.is_current_entry = current_entry,
 		};
 
-		if (entry->inline_node)
-			printed += hist_browser__show_inline(browser,
-					entry->inline_node, row, 0);
-		else
-			printed += hist_browser__show_callchain(browser,
-					entry, 1, row,
-					hist_browser__show_callchain_entry,
-					&arg,
-					hist_browser__check_output_full);
+		printed += hist_browser__show_callchain(browser,
+				entry, 1, row,
+				hist_browser__show_callchain_entry,
+				&arg,
+				hist_browser__check_output_full);
 	}
 
 	return printed;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 8bdb7a5..b6b9baa 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -21,64 +21,6 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int left_margin)
 	return ret;
 }
 
-static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
-			      int depth, int depth_mask, FILE *fp)
-{
-	struct dso *dso;
-	struct inline_node *node;
-	struct inline_list *ilist;
-	int ret = 0, i;
-
-	if (map == NULL)
-		return 0;
-
-	dso = map->dso;
-	if (dso == NULL)
-		return 0;
-
-	node = dso__parse_addr_inlines(dso,
-				       map__rip_2objdump(map, ip));
-	if (node == NULL)
-		return 0;
-
-	list_for_each_entry(ilist, &node->val, list) {
-		if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
-			ret += callchain__fprintf_left_margin(fp, left_margin);
-
-			for (i = 0; i < depth; i++) {
-				if (depth_mask & (1 << i))
-					ret += fprintf(fp, "|");
-				else
-					ret += fprintf(fp, " ");
-				ret += fprintf(fp, "          ");
-			}
-
-			if (callchain_param.key == CCKEY_ADDRESS ||
-			    callchain_param.key == CCKEY_SRCLINE) {
-				if (ilist->filename != NULL)
-					ret += fprintf(fp, "%s:%d (inline)",
-						       ilist->filename,
-						       ilist->line_nr);
-				else
-					ret += fprintf(fp, "??");
-			} else if (ilist->funcname != NULL)
-				ret += fprintf(fp, "%s (inline)",
-					       ilist->funcname);
-			else if (ilist->filename != NULL)
-				ret += fprintf(fp, "%s:%d (inline)",
-					       ilist->filename,
-					       ilist->line_nr);
-			else
-				ret += fprintf(fp, "??");
-
-			ret += fprintf(fp, "\n");
-		}
-	}
-
-	inline_node__delete(node);
-	return ret;
-}
-
 static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
 					  int left_margin)
 {
@@ -137,9 +79,6 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
 	fputc('\n', fp);
 	free(alloc_str);
 
-	if (symbol_conf.inline_name)
-		ret += inline__fprintf(chain->ms.map, chain->ip,
-				       left_margin, depth, depth_mask, fp);
 	return ret;
 }
 
@@ -314,13 +253,6 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;
-
-			if (symbol_conf.inline_name)
-				ret += inline__fprintf(chain->ms.map,
-						       chain->ip,
-						       left_margin,
-						       0, 0,
-						       fp);
 		}
 		root = &cnode->rb_root;
 	}
@@ -600,7 +532,6 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
 {
 	int ret;
 	int callchain_ret = 0;
-	int inline_ret = 0;
 	struct perf_hpp hpp = {
 		.buf		= bf,
 		.size		= size,
@@ -622,13 +553,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
 		callchain_ret = hist_entry_callchain__fprintf(he, total_period,
 							      0, fp);
 
-	if (callchain_ret == 0 && symbol_conf.inline_name) {
-		inline_ret = inline__fprintf(he->ms.map, he->ip, 0, 0, 0, fp);
-		ret += inline_ret;
-		if (inline_ret > 0)
-			ret += fprintf(fp, "\n");
-	} else
-		ret += callchain_ret;
+	ret += callchain_ret;
 
 	return ret;
 }
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 583f3a6..f2c6c5e 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -169,38 +169,6 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 			if (!print_oneline)
 				printed += fprintf(fp, "\n");
 
-			if (symbol_conf.inline_name && node->map) {
-				struct inline_node *inode;
-
-				addr = map__rip_2objdump(node->map, node->ip),
-				inode = dso__parse_addr_inlines(node->map->dso, addr);
-
-				if (inode) {
-					struct inline_list *ilist;
-
-					list_for_each_entry(ilist, &inode->val, list) {
-						if (print_arrow)
-							printed += fprintf(fp, " <-");
-
-						/* IP is same, just skip it */
-						if (print_ip)
-							printed += fprintf(fp, "%c%16s",
-									   s, "");
-						if (print_sym)
-							printed += fprintf(fp, " %s",
-									   ilist->funcname);
-						if (print_srcline)
-							printed += fprintf(fp, "\n  %s:%d",
-									   ilist->filename,
-									   ilist->line_nr);
-						if (!print_oneline)
-							printed += fprintf(fp, "\n");
-					}
-
-					inline_node__delete(inode);
-				}
-			}
-
 			if (symbol_conf.bt_stop_list &&
 			    node->sym &&
 			    strlist__has_entry(symbol_conf.bt_stop_list,
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e60d8d8..b0fa9c2 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1141,11 +1141,6 @@ void hist_entry__delete(struct hist_entry *he)
 		zfree(&he->mem_info);
 	}
 
-	if (he->inline_node) {
-		inline_node__delete(he->inline_node);
-		he->inline_node = NULL;
-	}
-
 	zfree(&he->stat_acc);
 	free_srcline(he->srcline);
 	if (he->srcfile && he->srcfile[0])
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index f36dc49..507d096 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -129,7 +129,6 @@ struct hist_entry {
 	};
 	char			*srcline;
 	char			*srcfile;
-	struct inline_node	*inline_node;
 	struct symbol		*parent;
 	struct branch_info	*branch_info;
 	struct hists		*hists;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf callchain: Store srcline in callchain_cursor_node
  2017-10-09 20:32 ` [PATCH v5 02/16] perf util: store srcline in callchain_cursor_node Milian Wolff
@ 2017-10-25 17:16   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, namhyung, tglx, dsahern, jolsa, mingo, hpa, milian.wolff,
	linux-kernel, yao.jin, peterz

Commit-ID:  40a342cda2cd9bc8f7bf81c5ce1a141584760757
Gitweb:     https://git.kernel.org/tip/40a342cda2cd9bc8f7bf81c5ce1a141584760757
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:32:56 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:55 -0300

perf callchain: Store srcline in callchain_cursor_node

This is mostly a preparation to enable the creation of full callchain
nodes for inline frames. Such frames will reference the IP of the
non-inlined frame, but hold the symbol and srcline for an inlined
location. As such, we won't be able to query the srcline on-demand based
on the IP alone. Instead, we will leverage the functionality provided by
this patch here, and store the srcline for the inlined nodes in the new
srcline member of callchain_cursor_node.

Note that this patch on its own leaks the srcline, as there is no
free_callchain_cursor_node or similar. A future patch will add caching
of the srcline and handle deletion properly.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 31 +++++++++----------------------
 tools/perf/util/callchain.h |  6 ++++--
 tools/perf/util/machine.c   | 18 ++++++++++++++++--
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index a971caf..e7ee794 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -566,6 +566,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 		call->ip = cursor_node->ip;
 		call->ms.sym = cursor_node->sym;
 		call->ms.map = map__get(cursor_node->map);
+		call->srcline = cursor_node->srcline;
 
 		if (cursor_node->branch) {
 			call->branch_count = 1;
@@ -647,20 +648,11 @@ enum match_result {
 static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 					     struct callchain_list *cnode)
 {
-	char *left = NULL;
-	char *right = NULL;
+	const char *left = cnode->srcline;
+	const char *right = node->srcline;
 	enum match_result ret = MATCH_EQ;
 	int cmp;
 
-	if (cnode->ms.map)
-		left = get_srcline(cnode->ms.map->dso,
-				 map__rip_2objdump(cnode->ms.map, cnode->ip),
-				 cnode->ms.sym, true, false);
-	if (node->map)
-		right = get_srcline(node->map->dso,
-				  map__rip_2objdump(node->map, node->ip),
-				  node->sym, true, false);
-
 	if (left && right)
 		cmp = strcmp(left, right);
 	else if (!left && right)
@@ -675,8 +667,6 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 	if (cmp != 0)
 		ret = cmp < 0 ? MATCH_LT : MATCH_GT;
 
-	free_srcline(left);
-	free_srcline(right);
 	return ret;
 }
 
@@ -969,7 +959,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
 	list_for_each_entry_safe(list, next_list, &src->val, list) {
 		callchain_cursor_append(cursor, list->ip,
 					list->ms.map, list->ms.sym,
-					false, NULL, 0, 0, 0);
+					false, NULL, 0, 0, 0, list->srcline);
 		list_del(&list->list);
 		map__zput(list->ms.map);
 		free(list);
@@ -1009,7 +999,8 @@ int callchain_merge(struct callchain_cursor *cursor,
 int callchain_cursor_append(struct callchain_cursor *cursor,
 			    u64 ip, struct map *map, struct symbol *sym,
 			    bool branch, struct branch_flags *flags,
-			    int nr_loop_iter, u64 iter_cycles, u64 branch_from)
+			    int nr_loop_iter, u64 iter_cycles, u64 branch_from,
+			    const char *srcline)
 {
 	struct callchain_cursor_node *node = *cursor->last;
 
@@ -1028,6 +1019,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 	node->branch = branch;
 	node->nr_loop_iter = nr_loop_iter;
 	node->iter_cycles = iter_cycles;
+	node->srcline = srcline;
 
 	if (flags)
 		memcpy(&node->branch_flags, flags,
@@ -1115,12 +1107,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
-		if (show_srcline && cl->ms.map && !cl->srcline)
-			cl->srcline = get_srcline(cl->ms.map->dso,
-						  map__rip_2objdump(cl->ms.map,
-								    cl->ip),
-						  cl->ms.sym, false, show_addr);
-		if (cl->srcline)
+		if (show_srcline && cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s",
 					cl->ms.sym->name, cl->srcline);
 		else
@@ -1532,7 +1519,7 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
 					     node->branch, &node->branch_flags,
 					     node->nr_loop_iter,
 					     node->iter_cycles,
-					     node->branch_from);
+					     node->branch_from, node->srcline);
 		if (rc)
 			break;
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 1ed6fc6..8f67b68 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -121,7 +121,7 @@ struct callchain_list {
 	u64			iter_count;
 	u64			iter_cycles;
 	struct branch_type_stat brtype_stat;
-	char		       *srcline;
+	const char		*srcline;
 	struct list_head	list;
 };
 
@@ -135,6 +135,7 @@ struct callchain_cursor_node {
 	u64				ip;
 	struct map			*map;
 	struct symbol			*sym;
+	const char			*srcline;
 	bool				branch;
 	struct branch_flags		branch_flags;
 	u64				branch_from;
@@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
 int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
 			    struct map *map, struct symbol *sym,
 			    bool branch, struct branch_flags *flags,
-			    int nr_loop_iter, u64 iter_cycles, u64 branch_from);
+			    int nr_loop_iter, u64 iter_cycles, u64 branch_from,
+			    const char *srcline);
 
 /* Close a cursor writing session. Initialize for the reader */
 static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7c3aa47..a37e1c0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1709,6 +1709,15 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 	return mi;
 }
 
+static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
+{
+	if (!map || callchain_param.key == CCKEY_FUNCTION)
+		return NULL;
+
+	return get_srcline(map->dso, map__rip_2objdump(map, ip),
+			   sym, false, callchain_param.key == CCKEY_ADDRESS);
+}
+
 struct iterations {
 	int nr_loop_iter;
 	u64 cycles;
@@ -1728,6 +1737,7 @@ static int add_callchain_ip(struct thread *thread,
 	struct addr_location al;
 	int nr_loop_iter = 0;
 	u64 iter_cycles = 0;
+	const char *srcline = NULL;
 
 	al.filtered = 0;
 	al.sym = NULL;
@@ -1783,9 +1793,10 @@ static int add_callchain_ip(struct thread *thread,
 		iter_cycles = iter->cycles;
 	}
 
+	srcline = callchain_srcline(al.map, al.sym, al.addr);
 	return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
 				       branch, flags, nr_loop_iter,
-				       iter_cycles, branch_from);
+				       iter_cycles, branch_from, srcline);
 }
 
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -2101,12 +2112,15 @@ check_calls:
 static int unwind_entry(struct unwind_entry *entry, void *arg)
 {
 	struct callchain_cursor *cursor = arg;
+	const char *srcline = NULL;
 
 	if (symbol_conf.hide_unresolved && entry->sym == NULL)
 		return 0;
+
+	srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
 	return callchain_cursor_append(cursor, entry->ip,
 				       entry->map, entry->sym,
-				       false, NULL, 0, 0, 0);
+				       false, NULL, 0, 0, 0, srcline);
 }
 
 static int thread__resolve_callchain_unwind(struct thread *thread,

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf callchain: Refactor inline_list to operate on symbols
  2017-10-09 20:32 ` [PATCH v5 03/16] perf util: refactor inline_list to operate on symbols Milian Wolff
@ 2017-10-25 17:16   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, peterz, jolsa, dsahern, hpa, linux-kernel, yao.jin, tglx,
	namhyung, acme, milian.wolff

Commit-ID:  fea0cf842c7aa08950063264ab1cfbce4ba38c1b
Gitweb:     https://git.kernel.org/tip/fea0cf842c7aa08950063264ab1cfbce4ba38c1b
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:32:57 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:55 -0300

perf callchain: Refactor inline_list to operate on symbols

This is a requirement to create real callchain entries for inlined
frames.

Since the list of inlines usually contains the target symbol too, i.e.
the location where the frames get inlined to, we alias that symbol and
reuse it as-is is. This ensures that other dependent functionality keeps
working, most notably annotation of the target frames.

For all other entries in the inline_list, a fake symbol is created.
These are marked by new 'inlined' member which is set to true. Only
those symbols are managed by the inline_list and get freed when the
inline_list is deleted from within inline_node__delete.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/srcline.c | 93 ++++++++++++++++++++++++++++++++---------------
 tools/perf/util/srcline.h |  7 +++-
 tools/perf/util/symbol.h  |  1 +
 3 files changed, 69 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ed8e8d2..c0af61b 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,29 +33,19 @@ static const char *dso__name(struct dso *dso)
 	return dso_name;
 }
 
-static int inline_list__append(char *filename, char *funcname, int line_nr,
-			       struct inline_node *node, struct dso *dso)
+static int inline_list__append(struct symbol *symbol, char *filename,
+			       int line_nr, struct inline_node *node)
 {
 	struct inline_list *ilist;
-	char *demangled;
 
 	ilist = zalloc(sizeof(*ilist));
 	if (ilist == NULL)
 		return -1;
 
+	ilist->symbol = symbol;
 	ilist->filename = filename;
 	ilist->line_nr = line_nr;
 
-	if (dso != NULL) {
-		demangled = dso__demangle_sym(dso, 0, funcname);
-		if (demangled == NULL) {
-			ilist->funcname = funcname;
-		} else {
-			ilist->funcname = demangled;
-			free(funcname);
-		}
-	}
-
 	if (callchain_param.order == ORDER_CALLEE)
 		list_add_tail(&ilist->list, &node->val);
 	else
@@ -206,19 +196,56 @@ static void addr2line_cleanup(struct a2l_data *a2l)
 
 #define MAX_INLINE_NEST 1024
 
+static struct symbol *new_inline_sym(struct dso *dso,
+				     struct symbol *base_sym,
+				     const char *funcname)
+{
+	struct symbol *inline_sym;
+	char *demangled = NULL;
+
+	if (dso) {
+		demangled = dso__demangle_sym(dso, 0, funcname);
+		if (demangled)
+			funcname = demangled;
+	}
+
+	if (base_sym && strcmp(funcname, base_sym->name) == 0) {
+		/* reuse the real, existing symbol */
+		inline_sym = base_sym;
+		/* ensure that we don't alias an inlined symbol, which could
+		 * lead to double frees in inline_node__delete
+		 */
+		assert(!base_sym->inlined);
+	} else {
+		/* create a fake symbol for the inline frame */
+		inline_sym = symbol__new(base_sym ? base_sym->start : 0,
+					 base_sym ? base_sym->end : 0,
+					 base_sym ? base_sym->binding : 0,
+					 funcname);
+		if (inline_sym)
+			inline_sym->inlined = 1;
+	}
+
+	free(demangled);
+
+	return inline_sym;
+}
+
 static int inline_list__append_dso_a2l(struct dso *dso,
-				       struct inline_node *node)
+				       struct inline_node *node,
+				       struct symbol *sym)
 {
 	struct a2l_data *a2l = dso->a2l;
-	char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
-	char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
+	struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
 
-	return inline_list__append(filename, funcname, a2l->line, node, dso);
+	return inline_list__append(inline_sym, strdup(a2l->filename),
+				   a2l->line, node);
 }
 
 static int addr2line(const char *dso_name, u64 addr,
 		     char **file, unsigned int *line, struct dso *dso,
-		     bool unwind_inlines, struct inline_node *node)
+		     bool unwind_inlines, struct inline_node *node,
+		     struct symbol *sym)
 {
 	int ret = 0;
 	struct a2l_data *a2l = dso->a2l;
@@ -244,7 +271,7 @@ static int addr2line(const char *dso_name, u64 addr,
 	if (unwind_inlines) {
 		int cnt = 0;
 
-		if (node && inline_list__append_dso_a2l(dso, node))
+		if (node && inline_list__append_dso_a2l(dso, node, sym))
 			return 0;
 
 		while (bfd_find_inliner_info(a2l->abfd, &a2l->filename,
@@ -255,7 +282,7 @@ static int addr2line(const char *dso_name, u64 addr,
 				a2l->filename = NULL;
 
 			if (node != NULL) {
-				if (inline_list__append_dso_a2l(dso, node))
+				if (inline_list__append_dso_a2l(dso, node, sym))
 					return 0;
 				// found at least one inline frame
 				ret = 1;
@@ -287,7 +314,7 @@ void dso__free_a2l(struct dso *dso)
 }
 
 static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
-	struct dso *dso)
+					struct dso *dso, struct symbol *sym)
 {
 	struct inline_node *node;
 
@@ -300,7 +327,7 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	INIT_LIST_HEAD(&node->val);
 	node->addr = addr;
 
-	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node))
+	if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
 		goto out_free_inline_node;
 
 	if (list_empty(&node->val))
@@ -340,7 +367,8 @@ static int addr2line(const char *dso_name, u64 addr,
 		     char **file, unsigned int *line_nr,
 		     struct dso *dso __maybe_unused,
 		     bool unwind_inlines __maybe_unused,
-		     struct inline_node *node __maybe_unused)
+		     struct inline_node *node __maybe_unused,
+		     struct symbol *sym __maybe_unused)
 {
 	FILE *fp;
 	char cmd[PATH_MAX];
@@ -380,7 +408,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
 }
 
 static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
-	struct dso *dso __maybe_unused)
+					struct dso *dso __maybe_unused,
+					struct symbol *sym)
 {
 	FILE *fp;
 	char cmd[PATH_MAX];
@@ -408,13 +437,13 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	node->addr = addr;
 
 	while (getline(&filename, &len, fp) != -1) {
+
 		if (filename_split(filename, &line_nr) != 1) {
 			free(filename);
 			goto out;
 		}
 
-		if (inline_list__append(filename, NULL, line_nr, node,
-					NULL) != 0)
+		if (inline_list__append(sym, filename, line_nr, node) != 0)
 			goto out;
 
 		filename = NULL;
@@ -454,7 +483,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	if (dso_name == NULL)
 		goto out;
 
-	if (!addr2line(dso_name, addr, &file, &line, dso, unwind_inlines, NULL))
+	if (!addr2line(dso_name, addr, &file, &line, dso,
+		       unwind_inlines, NULL, sym))
 		goto out;
 
 	if (asprintf(&srcline, "%s:%u",
@@ -500,7 +530,8 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
 }
 
-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+					    struct symbol *sym)
 {
 	const char *dso_name;
 
@@ -508,7 +539,7 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
 	if (dso_name == NULL)
 		return NULL;
 
-	return addr2inlines(dso_name, addr, dso);
+	return addr2inlines(dso_name, addr, dso, sym);
 }
 
 void inline_node__delete(struct inline_node *node)
@@ -518,7 +549,9 @@ void inline_node__delete(struct inline_node *node)
 	list_for_each_entry_safe(ilist, tmp, &node->val, list) {
 		list_del_init(&ilist->list);
 		zfree(&ilist->filename);
-		zfree(&ilist->funcname);
+		/* only the inlined symbols are owned by the list */
+		if (ilist->symbol && ilist->symbol->inlined)
+			symbol__delete(ilist->symbol);
 		free(ilist);
 	}
 
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 7b52ba8..730bfd6 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -17,8 +17,8 @@ void free_srcline(char *srcline);
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct inline_list {
+	struct symbol		*symbol;
 	char			*filename;
-	char			*funcname;
 	unsigned int		line_nr;
 	struct list_head	list;
 };
@@ -28,7 +28,10 @@ struct inline_node {
 	struct list_head	val;
 };
 
-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr);
+/* parse inlined frames for the given address */
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+					    struct symbol *sym);
+/* free resources associated to the inline node list */
 void inline_node__delete(struct inline_node *node);
 
 #endif /* PERF_SRCLINE_H */
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index aad99e7..d880a05 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -59,6 +59,7 @@ struct symbol {
 	u8		binding;
 	u8		idle:1;
 	u8		ignore:1;
+	u8		inlined:1;
 	u8		arch_sym;
 	char		name[0];
 };

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf callchain: Refactor inline_list to store srcline string directly
  2017-10-09 20:32 ` [PATCH v5 04/16] perf util: refactor inline_list to store srcline string directly Milian Wolff
@ 2017-10-25 17:17   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: milian.wolff, yao.jin, linux-kernel, namhyung, tglx, acme, hpa,
	peterz, dsahern, jolsa, mingo

Commit-ID:  2be8832f3c51cf9e36a3e80ff57f4137505c2ba4
Gitweb:     https://git.kernel.org/tip/2be8832f3c51cf9e36a3e80ff57f4137505c2ba4
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:32:58 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:55 -0300

perf callchain: Refactor inline_list to store srcline string directly

This is a preparation for the creation of real callchain entries for
inlined frames. The rest of the perf code uses the srcline string. As
such, using that also for the srcline API allows us to simplify some of
the upcoming code. Most notably, it will allow us to cache the srcline
for a given inline node and reuse it for different callchain entries.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/srcline.c | 54 +++++++++++++++++++++++++++++++++++------------
 tools/perf/util/srcline.h |  3 +--
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index c0af61b..f202fc7 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,8 +33,8 @@ static const char *dso__name(struct dso *dso)
 	return dso_name;
 }
 
-static int inline_list__append(struct symbol *symbol, char *filename,
-			       int line_nr, struct inline_node *node)
+static int inline_list__append(struct symbol *symbol, char *srcline,
+			       struct inline_node *node)
 {
 	struct inline_list *ilist;
 
@@ -43,8 +43,7 @@ static int inline_list__append(struct symbol *symbol, char *filename,
 		return -1;
 
 	ilist->symbol = symbol;
-	ilist->filename = filename;
-	ilist->line_nr = line_nr;
+	ilist->srcline = srcline;
 
 	if (callchain_param.order == ORDER_CALLEE)
 		list_add_tail(&ilist->list, &node->val);
@@ -54,6 +53,30 @@ static int inline_list__append(struct symbol *symbol, char *filename,
 	return 0;
 }
 
+/* basename version that takes a const input string */
+static const char *gnu_basename(const char *path)
+{
+	const char *base = strrchr(path, '/');
+
+	return base ? base + 1 : path;
+}
+
+static char *srcline_from_fileline(const char *file, unsigned int line)
+{
+	char *srcline;
+
+	if (!file)
+		return NULL;
+
+	if (!srcline_full_filename)
+		file = gnu_basename(file);
+
+	if (asprintf(&srcline, "%s:%u", file, line) < 0)
+		return NULL;
+
+	return srcline;
+}
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -237,9 +260,12 @@ static int inline_list__append_dso_a2l(struct dso *dso,
 {
 	struct a2l_data *a2l = dso->a2l;
 	struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
+	char *srcline = NULL;
 
-	return inline_list__append(inline_sym, strdup(a2l->filename),
-				   a2l->line, node);
+	if (a2l->filename)
+		srcline = srcline_from_fileline(a2l->filename, a2l->line);
+
+	return inline_list__append(inline_sym, srcline, node);
 }
 
 static int addr2line(const char *dso_name, u64 addr,
@@ -437,13 +463,15 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
 	node->addr = addr;
 
 	while (getline(&filename, &len, fp) != -1) {
+		char *srcline;
 
 		if (filename_split(filename, &line_nr) != 1) {
 			free(filename);
 			goto out;
 		}
 
-		if (inline_list__append(sym, filename, line_nr, node) != 0)
+		srcline = srcline_from_fileline(filename, line_nr);
+		if (inline_list__append(sym, srcline, node) != 0)
 			goto out;
 
 		filename = NULL;
@@ -487,16 +515,14 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		       unwind_inlines, NULL, sym))
 		goto out;
 
-	if (asprintf(&srcline, "%s:%u",
-				srcline_full_filename ? file : basename(file),
-				line) < 0) {
-		free(file);
+	srcline = srcline_from_fileline(file, line);
+	free(file);
+
+	if (!srcline)
 		goto out;
-	}
 
 	dso->a2l_fails = 0;
 
-	free(file);
 	return srcline;
 
 out:
@@ -548,7 +574,7 @@ void inline_node__delete(struct inline_node *node)
 
 	list_for_each_entry_safe(ilist, tmp, &node->val, list) {
 		list_del_init(&ilist->list);
-		zfree(&ilist->filename);
+		free_srcline(ilist->srcline);
 		/* only the inlined symbols are owned by the list */
 		if (ilist->symbol && ilist->symbol->inlined)
 			symbol__delete(ilist->symbol);
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 730bfd6..0201ed2 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -18,8 +18,7 @@ void free_srcline(char *srcline);
 
 struct inline_list {
 	struct symbol		*symbol;
-	char			*filename;
-	unsigned int		line_nr;
+	char			*srcline;
 	struct list_head	list;
 };
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf callchain: Create real callchain entries for inlined frames
  2017-10-09 20:32 ` [PATCH v5 05/16] perf report: create real callchain entries for inlined frames Milian Wolff
@ 2017-10-25 17:17   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dsahern, tglx, jolsa, peterz, linux-kernel, hpa, yao.jin,
	milian.wolff, namhyung, acme, mingo

Commit-ID:  11ea2515f32e783b9a7984c148e742c377383915
Gitweb:     https://git.kernel.org/tip/11ea2515f32e783b9a7984c148e742c377383915
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:32:59 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:55 -0300

perf callchain: Create real callchain entries for inlined frames

The inline_node structs are maintained by the new dso->inlines tree.
This in turn keeps ownership of the fake symbols and srcline string
representing an inline frame.

This tree is sorted by address to allow quick lookups. All other entries
of the symbol beside the function name are unused for inline frames. The
advantage of this approach is that all existing users of the callchain
API can now transparently display inlined frames without having to patch
their code.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/dso.c     |  5 +++++
 tools/perf/util/dso.h     |  1 +
 tools/perf/util/machine.c | 37 ++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/srcline.h |  9 +++++++++
 5 files changed, 103 insertions(+)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 339e529..75c8250 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -10,6 +10,7 @@
 #include "compress.h"
 #include "path.h"
 #include "symbol.h"
+#include "srcline.h"
 #include "dso.h"
 #include "machine.h"
 #include "auxtrace.h"
@@ -1201,6 +1202,7 @@ struct dso *dso__new(const char *name)
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->data.cache = RB_ROOT;
+		dso->inlined_nodes = RB_ROOT;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1232,6 +1234,9 @@ void dso__delete(struct dso *dso)
 	if (!RB_EMPTY_NODE(&dso->rb_node))
 		pr_err("DSO %s is still in rbtree when being deleted!\n",
 		       dso->long_name);
+
+	/* free inlines first, as they reference symbols */
+	inlines__tree_delete(&dso->inlined_nodes);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index a2bbb21..122eca0 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -141,6 +141,7 @@ struct dso {
 	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
+	struct rb_root	 inlined_nodes;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a37e1c0..3d049cb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2109,6 +2109,40 @@ check_calls:
 	return 0;
 }
 
+static int append_inlines(struct callchain_cursor *cursor,
+			  struct map *map, struct symbol *sym, u64 ip)
+{
+	struct inline_node *inline_node;
+	struct inline_list *ilist;
+	u64 addr;
+
+	if (!symbol_conf.inline_name || !map || !sym)
+		return 1;
+
+	addr = map__rip_2objdump(map, ip);
+
+	inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
+	if (!inline_node) {
+		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
+		if (!inline_node)
+			return 1;
+
+		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
+	}
+
+	list_for_each_entry(ilist, &inline_node->val, list) {
+		int ret = callchain_cursor_append(cursor, ip, map,
+						  ilist->symbol, false,
+						  NULL, 0, 0, 0,
+						  ilist->srcline);
+
+		if (ret != 0)
+			return ret;
+	}
+
+	return 0;
+}
+
 static int unwind_entry(struct unwind_entry *entry, void *arg)
 {
 	struct callchain_cursor *cursor = arg;
@@ -2117,6 +2151,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 	if (symbol_conf.hide_unresolved && entry->sym == NULL)
 		return 0;
 
+	if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
+		return 0;
+
 	srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
 	return callchain_cursor_append(cursor, entry->ip,
 				       entry->map, entry->sym,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index f202fc7..8bea662 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -583,3 +583,54 @@ void inline_node__delete(struct inline_node *node)
 
 	free(node);
 }
+
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
+{
+	struct rb_node **p = &tree->rb_node;
+	struct rb_node *parent = NULL;
+	const u64 addr = inlines->addr;
+	struct inline_node *i;
+
+	while (*p != NULL) {
+		parent = *p;
+		i = rb_entry(parent, struct inline_node, rb_node);
+		if (addr < i->addr)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&inlines->rb_node, parent, p);
+	rb_insert_color(&inlines->rb_node, tree);
+}
+
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
+{
+	struct rb_node *n = tree->rb_node;
+
+	while (n) {
+		struct inline_node *i = rb_entry(n, struct inline_node,
+						 rb_node);
+
+		if (addr < i->addr)
+			n = n->rb_left;
+		else if (addr > i->addr)
+			n = n->rb_right;
+		else
+			return i;
+	}
+
+	return NULL;
+}
+
+void inlines__tree_delete(struct rb_root *tree)
+{
+	struct inline_node *pos;
+	struct rb_node *next = rb_first(tree);
+
+	while (next) {
+		pos = rb_entry(next, struct inline_node, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, tree);
+		inline_node__delete(pos);
+	}
+}
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 0201ed2..ebe38cd 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -2,6 +2,7 @@
 #define PERF_SRCLINE_H
 
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/types.h>
 
 struct dso;
@@ -25,6 +26,7 @@ struct inline_list {
 struct inline_node {
 	u64			addr;
 	struct list_head	val;
+	struct rb_node		rb_node;
 };
 
 /* parse inlined frames for the given address */
@@ -33,4 +35,11 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 /* free resources associated to the inline node list */
 void inline_node__delete(struct inline_node *node);
 
+/* insert the inline node list into the DSO, which will take ownership */
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
+/* find previously inserted inline node list */
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
+/* delete all nodes within the tree of inline_node s */
+void inlines__tree_delete(struct rb_root *tree);
+
 #endif /* PERF_SRCLINE_H */

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf report: Fall-back to function name comparison for -g srcline
  2017-10-09 20:33 ` [PATCH v5 06/16] perf report: fall-back to function name comparison for -g srcline Milian Wolff
@ 2017-10-25 17:18   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, peterz, dsahern, mingo, jolsa, linux-kernel, tglx, yao.jin,
	namhyung, milian.wolff, hpa

Commit-ID:  cbe50f61727f538f05e63186c2e0022182a3a28f
Gitweb:     https://git.kernel.org/tip/cbe50f61727f538f05e63186c2e0022182a3a28f
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:33:00 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:55 -0300

perf report: Fall-back to function name comparison for -g srcline

When a callchain entry has no srcline available, we ended up comparing
the instruction pointer. I consider this to be not too useful. Rather, I
think we should group the entries by function name, which this patch
adds. For people who want to split the data on the IP boundary, using
`-g address` is the correct choice.

Before:

~~~~~
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654
            |          std::abs<double> complex:597
            |          std::__complex_abs complex:589
            |          |
            |          |--56.03%--hypot
            |          |          |
            |          |          |--8.45%--__hypot_finite
            |          |          |
            |          |          |--7.62%--__hypot_finite
            |          |          |
            |          |          |--2.29%--__hypot_finite
            |          |          |
            |          |          |--2.24%--__hypot_finite
            |          |          |
            |          |          |--2.06%--__hypot_finite
            |          |          |
            |          |          |--1.81%--__hypot_finite
...
~~~~~

After:

~~~~~
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654
            |          std::abs<double> complex:597
            |          std::__complex_abs complex:589
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e7ee794..0f2ba49 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -645,11 +645,9 @@ enum match_result {
 	MATCH_GT,
 };
 
-static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
-					     struct callchain_list *cnode)
+static enum match_result match_chain_strings(const char *left,
+					     const char *right)
 {
-	const char *left = cnode->srcline;
-	const char *right = node->srcline;
 	enum match_result ret = MATCH_EQ;
 	int cmp;
 
@@ -659,10 +657,8 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 		cmp = 1;
 	else if (left && !right)
 		cmp = -1;
-	else if (cnode->ip == node->ip)
-		cmp = 0;
 	else
-		cmp = (cnode->ip < node->ip) ? -1 : 1;
+		return MATCH_ERROR;
 
 	if (cmp != 0)
 		ret = cmp < 0 ? MATCH_LT : MATCH_GT;
@@ -679,10 +675,18 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 	struct dso *right_dso = NULL;
 
 	if (callchain_param.key == CCKEY_SRCLINE) {
-		enum match_result match = match_chain_srcline(node, cnode);
+		enum match_result match = match_chain_strings(cnode->srcline,
+							      node->srcline);
+
+		/* if no srcline is available, fallback to symbol name */
+		if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
+			match = match_chain_strings(cnode->ms.sym->name,
+						    node->sym->name);
 
 		if (match != MATCH_ERROR)
 			return match;
+
+		/* otherwise fall-back to IP-based comparison below */
 	}
 
 	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf callchain: Mark inlined frames in output by " (inlined)" suffix
  2017-10-09 20:33 ` [PATCH v5 07/16] perf report: mark inlined frames in output by " (inlined)" suffix Milian Wolff
@ 2017-10-25 17:18   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: yao.jin, hpa, milian.wolff, mingo, peterz, acme, linux-kernel,
	tglx, jolsa, dsahern, namhyung

Commit-ID:  8932f8071cae8a12dfd5f492247777ee176b0da4
Gitweb:     https://git.kernel.org/tip/8932f8071cae8a12dfd5f492247777ee176b0da4
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:33:01 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:56 -0300

perf callchain: Mark inlined frames in output by " (inlined)" suffix

The original patch that introduced inline frame output in the various
browsers used this suffix already. The new centralized approach that
uses fake symbols for inlined frames was missing this approach so far.

Instead of changing the symbol name itself, we only print the suffix
where needed. This allows us to efficiently lookup the symbol for a
given name without first having to append the suffix before the lookup.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 10 +++++++---
 tools/perf/util/sort.c      |  3 +++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0f2ba49..77031ef 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1111,11 +1111,15 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
+		const char *inlined = cl->ms.sym->inlined ? " (inlined)" : "";
+
 		if (show_srcline && cl->srcline)
-			printed = scnprintf(bf, bfsize, "%s %s",
-					cl->ms.sym->name, cl->srcline);
+			printed = scnprintf(bf, bfsize, "%s %s%s",
+					    cl->ms.sym->name, cl->srcline,
+					    inlined);
 		else
-			printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+			printed = scnprintf(bf, bfsize, "%s%s",
+					    cl->ms.sym->name, inlined);
 	} else
 		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index eb3ab90..acb9210 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -283,6 +283,9 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
 			ret += repsep_snprintf(bf + ret, size - ret, "%.*s",
 					       width - ret,
 					       sym->name);
+			if (sym->inlined)
+				ret += repsep_snprintf(bf + ret, size - ret,
+						       " (inlined)");
 		}
 	} else {
 		size_t len = BITS_PER_LONG / 4;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf script: Mark inlined frames and do not print DSO for them
  2017-10-09 20:33 ` [PATCH v5 08/16] perf script: mark inlined frames and do not print DSO for them Milian Wolff
@ 2017-10-25 17:18   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, milian.wolff, yao.jin, acme, mingo, jolsa, linux-kernel,
	peterz, tglx, namhyung, dsahern

Commit-ID:  9628b56dc1240ce0faa3bd9b7a3390fa4451c59f
Gitweb:     https://git.kernel.org/tip/9628b56dc1240ce0faa3bd9b7a3390fa4451c59f
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:33:02 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:56 -0300

perf script: Mark inlined frames and do not print DSO for them

Instead of showing the (repeated) DSO name of the non-inlined frame, we
now show the "(inlined)" suffix instead.

Before:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)

After:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evsel_fprintf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index f2c6c5e..5b9e892 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -157,7 +157,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 				}
 			}
 
-			if (print_dso) {
+			if (print_dso && (!node->sym || !node->sym->inlined)) {
 				printed += fprintf(fp, " (");
 				printed += map__fprintf_dsoname(node->map, fp);
 				printed += fprintf(fp, ")");
@@ -166,6 +166,9 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 			if (print_srcline)
 				printed += map__fprintf_srcline(node->map, addr, "\n  ", fp);
 
+			if (node->sym && node->sym->inlined)
+				printed += fprintf(fp, " (inlined)");
+
 			if (!print_oneline)
 				printed += fprintf(fp, "\n");
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf callchain: Compare symbol name for inlined frames when matching
  2017-10-09 20:33 ` [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching Milian Wolff
  2017-10-13 13:28   ` Arnaldo Carvalho de Melo
@ 2017-10-25 17:19   ` tip-bot for Milian Wolff
  1 sibling, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, acme, dsahern, namhyung, tglx, peterz, jolsa,
	milian.wolff, ravi.bangoria, mingo, yao.jin

Commit-ID:  9856240ad3269f2fdab0b2fa4400ef8aab792061
Gitweb:     https://git.kernel.org/tip/9856240ad3269f2fdab0b2fa4400ef8aab792061
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:33:03 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:56 -0300

perf callchain: Compare symbol name for inlined frames when matching

The fake symbols we create for inlined frames will represent different
functions but can use the symbol start address. This leads to issues
when different inline branches all lead to the same function.

Before:
~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
             --38.86%--_start
                       __libc_start_main
                       main
                       |
                        --37.57%--std::norm<double> (inlined)
                                  std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                  |
                                   --36.36%--std::abs<double> (inlined)
                                             std::__complex_abs (inlined)
                                             |
                                              --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                        std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                        std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
~~~~~

Note that this backtrace representation is completely bogus.
Complex abs does not call the linear congruential engine! It
is just a side-effect of a longer inlined stack being appended
to a shorter, different inlined stack, both of which originate
in the same function (main).

This patch fixes the issue:

~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
             --38.86%--_start
                       __libc_start_main
                       main
                       |
                       |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |          |
                       |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                       |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |                     |
                       |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                       |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                       |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                       |
                        --1.99%--std::norm<double> (inlined)
                                  std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                  std::abs<double> (inlined)
                                  std::__complex_abs (inlined)
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
[ Fix up conflict with c1fbc0cf81f1 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 77031ef..35a920f 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -690,6 +690,14 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 	}
 
 	if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
+		/*
+		 * Compare inlined frames based on their symbol name because
+		 * different inlined frames will have the same symbol start
+		 */
+		if (cnode->ms.sym->inlined || node->sym->inlined)
+			return match_chain_strings(cnode->ms.sym->name,
+						   node->sym->name);
+
 		left = cnode->ms.sym->start;
 		right = sym->start;
 		left_dso = cnode->ms.map->dso;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:perf/core] perf report: Compare symbol name for inlined frames when sorting
  2017-10-09 20:33 ` [PATCH v5 10/16] perf report: compare symbol name for inlined frames when sorting Milian Wolff
@ 2017-10-25 17:19   ` tip-bot for Milian Wolff
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Milian Wolff @ 2017-10-25 17:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, peterz, mingo, jolsa, yao.jin, linux-kernel, dsahern,
	namhyung, acme, milian.wolff, tglx

Commit-ID:  aa441895f7b4ff5394d4964a8e6749f3866e44d0
Gitweb:     https://git.kernel.org/tip/aa441895f7b4ff5394d4964a8e6749f3866e44d0
Author:     Milian Wolff <milian.wolff@kdab.com>
AuthorDate: Mon, 9 Oct 2017 22:33:04 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 24 Oct 2017 09:59:56 -0300

perf report: Compare symbol name for inlined frames when sorting

Similar to the callstack frame matching, we also have to compare the
symbol name when sorting hist entries. The reason is twofold: On one
hand, multiple inlined functions will use the same symbol start/end
values of the parent, non-inlined symbol.

As such, all of these symbols often end up missing from top-level
report, as they get merged with the non-inlined frame. On the other
hand, multiple different functions may end up inlining the same
function, and we need to aggregate these values properly.

Before:

~~~~~
  perf report --stdio --inline -g none
  # Children     Self  Command       Shared Object Symbol
  # ........ ........  ............  ............. ...................................
  #
     100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
     100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
     100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
      97.03%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
      59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
      55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
       0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
~~~~~

After:

~~~~~
  perf report --stdio --inline -g none
  # Children     Self  Command       Shared Object Symbol
  # ........ ........  ............  ............. ...................................................................................................................................
  #
     100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
     100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
     100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::__complex_abs (inlined)
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::abs<double> (inlined)
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
      59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
      55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
      34.46%    0.00%  cpp-inlining  cpp-inlining  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
      32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
      32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
      12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
      12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
      12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
       0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/sort.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index acb9210..006d10a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -225,6 +225,9 @@ static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 	if (sym_l == sym_r)
 		return 0;
 
+	if (sym_l->inlined || sym_r->inlined)
+		return strcmp(sym_l->name, sym_r->name);
+
 	if (sym_l->start != sym_r->start)
 		return (int64_t)(sym_r->start - sym_l->start);
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-10-25 17:24 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09 20:32 [PATCH v5 00/16] generate full callchain cursor entries for inlined frames Milian Wolff
2017-10-09 20:32 ` [PATCH v5 01/16] perf report: remove code to handle inline frames from browsers Milian Wolff
2017-10-25 17:15   ` [tip:perf/core] perf report: Remove " tip-bot for Milian Wolff
2017-10-09 20:32 ` [PATCH v5 02/16] perf util: store srcline in callchain_cursor_node Milian Wolff
2017-10-25 17:16   ` [tip:perf/core] perf callchain: Store " tip-bot for Milian Wolff
2017-10-09 20:32 ` [PATCH v5 03/16] perf util: refactor inline_list to operate on symbols Milian Wolff
2017-10-25 17:16   ` [tip:perf/core] perf callchain: Refactor " tip-bot for Milian Wolff
2017-10-09 20:32 ` [PATCH v5 04/16] perf util: refactor inline_list to store srcline string directly Milian Wolff
2017-10-25 17:17   ` [tip:perf/core] perf callchain: Refactor " tip-bot for Milian Wolff
2017-10-09 20:32 ` [PATCH v5 05/16] perf report: create real callchain entries for inlined frames Milian Wolff
2017-10-25 17:17   ` [tip:perf/core] perf callchain: Create " tip-bot for Milian Wolff
2017-10-09 20:33 ` [PATCH v5 06/16] perf report: fall-back to function name comparison for -g srcline Milian Wolff
2017-10-25 17:18   ` [tip:perf/core] perf report: Fall-back " tip-bot for Milian Wolff
2017-10-09 20:33 ` [PATCH v5 07/16] perf report: mark inlined frames in output by " (inlined)" suffix Milian Wolff
2017-10-25 17:18   ` [tip:perf/core] perf callchain: Mark " tip-bot for Milian Wolff
2017-10-09 20:33 ` [PATCH v5 08/16] perf script: mark inlined frames and do not print DSO for them Milian Wolff
2017-10-25 17:18   ` [tip:perf/core] perf script: Mark " tip-bot for Milian Wolff
2017-10-09 20:33 ` [PATCH v5 09/16] perf report: compare symbol name for inlined frames when matching Milian Wolff
2017-10-13 13:28   ` Arnaldo Carvalho de Melo
2017-10-25 17:19   ` [tip:perf/core] perf callchain: Compare " tip-bot for Milian Wolff
2017-10-09 20:33 ` [PATCH v5 10/16] perf report: compare symbol name for inlined frames when sorting Milian Wolff
2017-10-25 17:19   ` [tip:perf/core] perf report: Compare " tip-bot for Milian Wolff
2017-10-09 20:33 ` [PATCH v5 11/16] perf report: properly handle branch count in match_chain Milian Wolff
2017-10-13 13:39   ` Arnaldo Carvalho de Melo
2017-10-13 14:08     ` Arnaldo Carvalho de Melo
2017-10-14 19:30       ` Milian Wolff
2017-10-16 14:17         ` Arnaldo Carvalho de Melo
2017-10-16  4:18       ` ravi
2017-10-16  8:27         ` Milian Wolff
2017-10-16 14:19         ` Arnaldo Carvalho de Melo
2017-10-09 20:33 ` [PATCH v5 12/16] perf report: cache failed lookups of inlined frames Milian Wolff
2017-10-09 20:33 ` [PATCH v5 13/16] perf report: cache srclines for callchain nodes Milian Wolff
2017-10-09 20:33 ` [PATCH v5 14/16] perf report: use srcline from callchain for hist entries Milian Wolff
2017-10-09 20:33 ` [PATCH v5 15/16] perf util: enable handling of inlined frames by default Milian Wolff
2017-10-09 20:33 ` [PATCH v5 16/16] perf util: use correct IP mapping to find srcline for hist entry Milian Wolff
2017-10-10  4:49   ` Namhyung Kim
2017-10-12 18:22     ` Milian Wolff
2017-10-12 18:52       ` Jiri Olsa
2017-10-13 11:03         ` Jiri Olsa
2017-10-13  1:19       ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).