linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] perf report: distinguish between inliners in the same function
@ 2017-05-03 21:35 Milian Wolff
  2017-05-08  8:45 ` Milian Wolff
  2017-05-10  5:53 ` Namhyung Kim
  0 siblings, 2 replies; 15+ messages in thread
From: Milian Wolff @ 2017-05-03 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-perf-users, Milian Wolff, Arnaldo Carvalho de Melo,
	David Ahern, Namhyung Kim, Peter Zijlstra, Yao Jin

When different functions get inlined into the same function, we
want to show them individually in the reports. But when we group by
function, we would aggregate all IPs and would only keep the first
one in that function. E.g. for C++ code like the following:

~~~~~
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += uniform(engine);
    }
    cout << "random sum: " << s << '\n';
    return 0;
}
~~~~~

Building it with `g++ -O2 -g` and recording some samples with
`perf record --call-graph dwarf` yields for me:

~~~~~
$ perf report --stdio --inline
# Overhead  Command    Shared Object      Symbol
# ........  .........  .................  ................................
#
    99.40%    99.11%  a.out    a.out                [.] main
            |
             --99.11%--_start
                       __libc_start_main
                       main
...
~~~~~

Note how no inlined frames are actually shown, because the first
sample in main points to an IP that does not correspond to any
inlined frames.

With this patch applied, we instead get the following, much more
meaningful, reports.

~~~~~
$ perf report --stdio --inline --no-children
# Overhead  Command    Shared Object      Symbol
# ........  .........  .................  ................................
#
    99.11%  a.out    a.out             [.] main
            |
            |--48.15%--main
            |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
            |          main (inline)
            |          __libc_start_main
            |          _start
            |
            |--47.61%--main
            |          std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inline)
            |          std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inline)
            |          std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
            |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
            |          main (inline)
            |          __libc_start_main
            |          _start
            |
             --3.35%--main
                       std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
                       main (inline)
                       __libc_start_main
                       _start
...

$ perf report --stdio --inline
# Children      Self  Command  Shared Object        Symbol
# ........  ........  .......  ...................  ........................................................
#
    99.40%    99.11%  a.out    a.out                [.] main
            |
             --99.11%--_start
                       __libc_start_main
                       |
                       |--70.51%--main
                       |          main (inline)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
                       |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
                       |
                       |--25.25%--main
                       |          main (inline)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
                       |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
                       |          std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
                       |          std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inline)
                       |          std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inline)
                       |
                        --3.35%--main
                                  main (inline)
                                  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
~~~~~

Note that the latter top-down call graphs require us to traverse the
full inliner call chain when comparing for equality. Without it,
we would potentially combine the three distinct branches again, since
they share an equal first inline frame "main (inline)".

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
---
v2:
- compare the full stack of inlined frames to distinguish different branches
  in top-down call-stack views which start with the same inlined frame(s)

 tools/perf/util/callchain.c | 87 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 75 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 81fc29ac798f..9ab68682c6d0 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -618,15 +618,9 @@ enum match_result {
 	MATCH_GT,
 };
 
-static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
-					     struct callchain_list *cnode)
+static enum match_result match_chain_strings(const char *left,
+					     const char *right)
 {
-	char *left = get_srcline(cnode->ms.map->dso,
-				 map__rip_2objdump(cnode->ms.map, cnode->ip),
-				 cnode->ms.sym, true, false);
-	char *right = get_srcline(node->map->dso,
-				  map__rip_2objdump(node->map, node->ip),
-				  node->sym, true, false);
 	enum match_result ret = MATCH_EQ;
 	int cmp;
 
@@ -636,19 +630,82 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
 		cmp = 1;
 	else if (left && !right)
 		cmp = -1;
-	else if (cnode->ip == node->ip)
-		cmp = 0;
 	else
-		cmp = (cnode->ip < node->ip) ? -1 : 1;
+		return MATCH_ERROR;
 
 	if (cmp != 0)
 		ret = cmp < 0 ? MATCH_LT : MATCH_GT;
 
+	return ret;
+}
+
+static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
+					     struct callchain_list *cnode)
+{
+	char *left = get_srcline(cnode->ms.map->dso,
+				 map__rip_2objdump(cnode->ms.map, cnode->ip),
+				 cnode->ms.sym, true, false);
+	char *right = get_srcline(node->map->dso,
+				  map__rip_2objdump(node->map, node->ip),
+				  node->sym, true, false);
+	enum match_result ret = match_chain_strings(left, right);
+
 	free_srcline(left);
 	free_srcline(right);
 	return ret;
 }
 
+static enum match_result match_chain_inliner(struct callchain_cursor_node *node,
+					     struct callchain_list *cnode)
+{
+	u64 left_ip = map__rip_2objdump(cnode->ms.map, cnode->ip);
+	u64 right_ip = map__rip_2objdump(node->map, node->ip);
+	struct inline_node *left_node = NULL;
+	struct inline_node *right_node = NULL;
+	struct inline_list *left_entry = NULL;
+	struct inline_list *right_entry = NULL;
+	struct inline_list *left_last_entry = NULL;
+	struct inline_list *right_last_entry = NULL;
+	enum match_result ret = MATCH_EQ;
+
+	left_node = dso__parse_addr_inlines(cnode->ms.map->dso, left_ip);
+	if (!left_node)
+		return MATCH_ERROR;
+
+	right_node = dso__parse_addr_inlines(node->map->dso, right_ip);
+	if (!right_node) {
+		inline_node__delete(left_node);
+		return MATCH_ERROR;
+	}
+
+	left_entry = list_first_entry(&left_node->val,
+				      struct inline_list, list);
+	left_last_entry = list_last_entry(&left_node->val,
+					  struct inline_list, list);
+	right_entry = list_first_entry(&right_node->val,
+				       struct inline_list, list);
+	right_last_entry = list_last_entry(&right_node->val,
+					  struct inline_list, list);
+	while (ret == MATCH_EQ && (left_entry || right_entry)) {
+		ret = match_chain_strings(left_entry ? left_entry->funcname : NULL,
+					  right_entry ? right_entry->funcname : NULL);
+
+		if (left_entry && left_entry != left_last_entry)
+			left_entry = list_next_entry(left_entry, list);
+		else
+			left_entry = NULL;
+
+		if (right_entry && right_entry != right_last_entry)
+			right_entry = list_next_entry(right_entry, list);
+		else
+			right_entry = NULL;
+	}
+
+	inline_node__delete(left_node);
+	inline_node__delete(right_node);
+	return ret;
+}
+
 static enum match_result match_chain(struct callchain_cursor_node *node,
 				     struct callchain_list *cnode)
 {
@@ -671,7 +728,13 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 	}
 
 	if (left == right) {
-		if (node->branch) {
+		if (symbol_conf.inline_name && cnode->ip != node->ip) {
+			enum match_result match = match_chain_inliner(node,
+								      cnode);
+
+			if (match != MATCH_ERROR)
+				return match;
+		} else if (node->branch) {
 			cnode->branch_count++;
 
 			if (node->branch_flags.predicted)
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-03 21:35 [PATCH v2] perf report: distinguish between inliners in the same function Milian Wolff
@ 2017-05-08  8:45 ` Milian Wolff
  2017-05-08 16:17   ` Arnaldo Carvalho de Melo
  2017-05-10  5:53 ` Namhyung Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Milian Wolff @ 2017-05-08  8:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linux-perf-users, David Ahern, Namhyung Kim,
	Peter Zijlstra, Yao Jin

[-- Attachment #1: Type: text/plain, Size: 7764 bytes --]

On Mittwoch, 3. Mai 2017 23:35:36 CEST Milian Wolff wrote:
> When different functions get inlined into the same function, we
> want to show them individually in the reports. But when we group by
> function, we would aggregate all IPs and would only keep the first
> one in that function. E.g. for C++ code like the following:
> 
> ~~~~~
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += uniform(engine);
>     }
>     cout << "random sum: " << s << '\n';
>     return 0;
> }
> ~~~~~
> 
> Building it with `g++ -O2 -g` and recording some samples with
> `perf record --call-graph dwarf` yields for me:
> 
> ~~~~~
> $ perf report --stdio --inline
> # Overhead  Command    Shared Object      Symbol
> # ........  .........  .................  ................................
> #
>     99.40%    99.11%  a.out    a.out                [.] main
> 
>              --99.11%--_start
>                        __libc_start_main
>                        main
> ...
> ~~~~~
> 
> Note how no inlined frames are actually shown, because the first
> sample in main points to an IP that does not correspond to any
> inlined frames.
> 
> With this patch applied, we instead get the following, much more
> meaningful, reports.
> 
> ~~~~~
> $ perf report --stdio --inline --no-children
> # Overhead  Command    Shared Object      Symbol
> # ........  .........  .................  ................................
> #
>     99.11%  a.out    a.out             [.] main
> 
>             |--48.15%--main
>             |
>             |          std::__detail::_Adaptor<std::linear_congruential_engi
>             |          ne<unsigned long, 16807ul, 0ul, 2147483647ul>,
>             |          double>::operator() (inline)
>             |          std::uniform_real_distribution<double>::operator()<s
>             |          td::linear_congruential_engine<unsigned long,
>             |          16807ul, 0ul, 2147483647ul> > (inline)
>             |          std::uniform_real_distribution<double>::operator()<s
>             |          td::linear_congruential_engine<unsigned long,
>             |          16807ul, 0ul, 2147483647ul> > (inline) main (inline)
>             |          __libc_start_main
>             |          _start
>             |
>             |--47.61%--main
>             |
>             |          std::__detail::__mod<unsigned long, 2147483647ul,
>             |          16807ul, 0ul> (inline)
>             |          std::linear_congruential_engine<unsigned long,
>             |          16807ul, 0ul, 2147483647ul>::operator() (inline)
>             |          std::generate_canonical<double, 53ul,
>             |          std::linear_congruential_engine<unsigned long,
>             |          16807ul, 0ul, 2147483647ul> > (inline)
>             |          std::__detail::_Adaptor<std::linear_congruential_eng
>             |          ine<unsigned long, 16807ul, 0ul, 2147483647ul>,
>             |          double>::operator() (inline)
>             |          std::uniform_real_distribution<double>::operator()<s
>             |          td::linear_congruential_engine<unsigned long,
>             |          16807ul, 0ul, 2147483647ul> > (inline)
>             |          std::uniform_real_distribution<double>::operator()<s
>             |          td::linear_congruential_engine<unsigned long,
>             |          16807ul, 0ul, 2147483647ul> > (inline) main (inline)
>             |          __libc_start_main
>             |          _start
> 
>              --3.35%--main
>                       
> std::uniform_real_distribution<double>::operator()<std::linear_congruential
> _engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline) main (inline)
>                        __libc_start_main
>                        _start
> ...
> 
> $ perf report --stdio --inline
> # Children      Self  Command  Shared Object        Symbol
> # ........  ........  .......  ................... 
> ........................................................ #
>     99.40%    99.11%  a.out    a.out                [.] main
> 
>              --99.11%--_start
>                        __libc_start_main
> 
>                        |--70.51%--main
>                        |
>                        |          main (inline)
>                        |          std::uniform_real_distribution<double>::op
>                        |          erator()<std::linear_congruential_engine<u
>                        |          nsigned long, 16807ul, 0ul, 2147483647ul>
>                        |          > (inline)
>                        |          std::uniform_real_distribution<double>::o
>                        |          perator()<std::linear_congruential_engine<
>                        |          unsigned long, 16807ul, 0ul, 2147483647ul>
>                        |          > (inline)
>                        |          std::__detail::_Adaptor<std::linear_congr
>                        |          uential_engine<unsigned long, 16807ul,
>                        |          0ul, 2147483647ul>, double>::operator()
>                        |          (inline)                       |
>                        |--25.25%--main
>                        |
>                        |          main (inline)
>                        |          std::uniform_real_distribution<double>::op
>                        |          erator()<std::linear_congruential_engine<u
>                        |          nsigned long, 16807ul, 0ul, 2147483647ul>
>                        |          > (inline)
>                        |          std::uniform_real_distribution<double>::o
>                        |          perator()<std::linear_congruential_engine<
>                        |          unsigned long, 16807ul, 0ul, 2147483647ul>
>                        |          > (inline)
>                        |          std::__detail::_Adaptor<std::linear_congr
>                        |          uential_engine<unsigned long, 16807ul,
>                        |          0ul, 2147483647ul>, double>::operator()
>                        |          (inline) std::generate_canonical<double,
>                        |          53ul,
>                        |          std::linear_congruential_engine<unsigned
>                        |          long, 16807ul, 0ul, 2147483647ul> >
>                        |          (inline)
>                        |          std::linear_congruential_engine<unsigned
>                        |          long, 16807ul, 0ul,
>                        |          2147483647ul>::operator() (inline)
>                        |          std::__detail::__mod<unsigned long,
>                        |          2147483647ul, 16807ul, 0ul> (inline)
>                         --3.35%--main
>                                   main (inline)
>                                  
> std::uniform_real_distribution<double>::operator()<std::linear_congruential
> _engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline) ~~~~~
> 
> Note that the latter top-down call graphs require us to traverse the
> full inliner call chain when comparing for equality. Without it,
> we would potentially combine the three distinct branches again, since
> they share an equal first inline frame "main (inline)".

Ping? Any chance that I could get a review on this one please? It works really 
well for me and greatly improves perf's usability for C++ code bases.

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-08  8:45 ` Milian Wolff
@ 2017-05-08 16:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-05-08 16:17 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-kernel, linux-perf-users, David Ahern, Namhyung Kim,
	Peter Zijlstra, Yao Jin

Em Mon, May 08, 2017 at 10:45:18AM +0200, Milian Wolff escreveu:
> On Mittwoch, 3. Mai 2017 23:35:36 CEST Milian Wolff wrote:
> > When different functions get inlined into the same function, we
> > want to show them individually in the reports. But when we group by
> > function, we would aggregate all IPs and would only keep the first
> > one in that function. E.g. for C++ code like the following:

<SNIP>

> Ping? Any chance that I could get a review on this one please? It works really 
> well for me and greatly improves perf's usability for C++ code bases.

yeah, indeed that would be great to have someone reviewing this,

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-03 21:35 [PATCH v2] perf report: distinguish between inliners in the same function Milian Wolff
  2017-05-08  8:45 ` Milian Wolff
@ 2017-05-10  5:53 ` Namhyung Kim
  2017-05-12 10:37   ` Milian Wolff
  1 sibling, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2017-05-10  5:53 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

Hi,

On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> When different functions get inlined into the same function, we
> want to show them individually in the reports. But when we group by
> function, we would aggregate all IPs and would only keep the first
> one in that function. E.g. for C++ code like the following:
> 
> ~~~~~
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += uniform(engine);
>     }
>     cout << "random sum: " << s << '\n';
>     return 0;
> }
> ~~~~~
> 
> Building it with `g++ -O2 -g` and recording some samples with
> `perf record --call-graph dwarf` yields for me:
> 
> ~~~~~
> $ perf report --stdio --inline
> # Overhead  Command    Shared Object      Symbol
> # ........  .........  .................  ................................
> #
>     99.40%    99.11%  a.out    a.out                [.] main
>             |
>              --99.11%--_start
>                        __libc_start_main
>                        main
> ...
> ~~~~~
> 
> Note how no inlined frames are actually shown, because the first
> sample in main points to an IP that does not correspond to any
> inlined frames.
> 
> With this patch applied, we instead get the following, much more
> meaningful, reports.
> 
> ~~~~~
> $ perf report --stdio --inline --no-children
> # Overhead  Command    Shared Object      Symbol
> # ........  .........  .................  ................................
> #
>     99.11%  a.out    a.out             [.] main
>             |
>             |--48.15%--main
>             |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
>             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>             |          main (inline)
>             |          __libc_start_main
>             |          _start
>             |
>             |--47.61%--main
>             |          std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inline)
>             |          std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inline)
>             |          std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>             |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
>             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>             |          main (inline)
>             |          __libc_start_main
>             |          _start
>             |
>              --3.35%--main
>                        std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>                        main (inline)
>                        __libc_start_main
>                        _start
> ...
> 
> $ perf report --stdio --inline
> # Children      Self  Command  Shared Object        Symbol
> # ........  ........  .......  ...................  ........................................................
> #
>     99.40%    99.11%  a.out    a.out                [.] main
>             |
>              --99.11%--_start
>                        __libc_start_main
>                        |
>                        |--70.51%--main
>                        |          main (inline)
>                        |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>                        |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>                        |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
>                        |
>                        |--25.25%--main
>                        |          main (inline)
>                        |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>                        |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>                        |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inline)
>                        |          std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
>                        |          std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inline)
>                        |          std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inline)
>                        |
>                         --3.35%--main
>                                   main (inline)
>                                   std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inline)
> ~~~~~
> 
> Note that the latter top-down call graphs require us to traverse the
> full inliner call chain when comparing for equality. Without it,
> we would potentially combine the three distinct branches again, since
> they share an equal first inline frame "main (inline)".
> 
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Yao Jin <yao.jin@linux.intel.com>
> Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> ---
> v2:
> - compare the full stack of inlined frames to distinguish different branches
>   in top-down call-stack views which start with the same inlined frame(s)
> 
>  tools/perf/util/callchain.c | 87 ++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 75 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 81fc29ac798f..9ab68682c6d0 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -618,15 +618,9 @@ enum match_result {
>  	MATCH_GT,
>  };
>  
> -static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
> -					     struct callchain_list *cnode)
> +static enum match_result match_chain_strings(const char *left,
> +					     const char *right)
>  {
> -	char *left = get_srcline(cnode->ms.map->dso,
> -				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> -				 cnode->ms.sym, true, false);
> -	char *right = get_srcline(node->map->dso,
> -				  map__rip_2objdump(node->map, node->ip),
> -				  node->sym, true, false);
>  	enum match_result ret = MATCH_EQ;
>  	int cmp;
>  
> @@ -636,19 +630,82 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
>  		cmp = 1;
>  	else if (left && !right)
>  		cmp = -1;
> -	else if (cnode->ip == node->ip)
> -		cmp = 0;
>  	else
> -		cmp = (cnode->ip < node->ip) ? -1 : 1;
> +		return MATCH_ERROR;
>  
>  	if (cmp != 0)
>  		ret = cmp < 0 ? MATCH_LT : MATCH_GT;
>  
> +	return ret;
> +}
> +
> +static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
> +					     struct callchain_list *cnode)
> +{
> +	char *left = get_srcline(cnode->ms.map->dso,
> +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> +				 cnode->ms.sym, true, false);
> +	char *right = get_srcline(node->map->dso,
> +				  map__rip_2objdump(node->map, node->ip),
> +				  node->sym, true, false);
> +	enum match_result ret = match_chain_strings(left, right);

I think we need to check inlined srcline as well.  There might be a
case that two samples have different addresses (and from different
callchains) but happens to be mapped to a same srcline IMHO.


> +
>  	free_srcline(left);
>  	free_srcline(right);
>  	return ret;
>  }
>  
> +static enum match_result match_chain_inliner(struct callchain_cursor_node *node,
> +					     struct callchain_list *cnode)
> +{
> +	u64 left_ip = map__rip_2objdump(cnode->ms.map, cnode->ip);
> +	u64 right_ip = map__rip_2objdump(node->map, node->ip);
> +	struct inline_node *left_node = NULL;
> +	struct inline_node *right_node = NULL;
> +	struct inline_list *left_entry = NULL;
> +	struct inline_list *right_entry = NULL;
> +	struct inline_list *left_last_entry = NULL;
> +	struct inline_list *right_last_entry = NULL;
> +	enum match_result ret = MATCH_EQ;
> +
> +	left_node = dso__parse_addr_inlines(cnode->ms.map->dso, left_ip);
> +	if (!left_node)
> +		return MATCH_ERROR;
> +
> +	right_node = dso__parse_addr_inlines(node->map->dso, right_ip);
> +	if (!right_node) {
> +		inline_node__delete(left_node);
> +		return MATCH_ERROR;
> +	}
> +
> +	left_entry = list_first_entry(&left_node->val,
> +				      struct inline_list, list);
> +	left_last_entry = list_last_entry(&left_node->val,
> +					  struct inline_list, list);
> +	right_entry = list_first_entry(&right_node->val,
> +				       struct inline_list, list);
> +	right_last_entry = list_last_entry(&right_node->val,
> +					  struct inline_list, list);

What about keeping number of entries in a inline_node so that we can
check the numbers for faster comparison?


> +	while (ret == MATCH_EQ && (left_entry || right_entry)) {
> +		ret = match_chain_strings(left_entry ? left_entry->funcname : NULL,
> +					  right_entry ? right_entry->funcname : NULL);
> +
> +		if (left_entry && left_entry != left_last_entry)
> +			left_entry = list_next_entry(left_entry, list);
> +		else
> +			left_entry = NULL;
> +
> +		if (right_entry && right_entry != right_last_entry)
> +			right_entry = list_next_entry(right_entry, list);
> +		else
> +			right_entry = NULL;
> +	}
> +
> +	inline_node__delete(left_node);
> +	inline_node__delete(right_node);
> +	return ret;
> +}
> +
>  static enum match_result match_chain(struct callchain_cursor_node *node,
>  				     struct callchain_list *cnode)
>  {
> @@ -671,7 +728,13 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
>  	}
>  
>  	if (left == right) {
> -		if (node->branch) {
> +		if (symbol_conf.inline_name && cnode->ip != node->ip) {
> +			enum match_result match = match_chain_inliner(node,
> +								      cnode);
> +
> +			if (match != MATCH_ERROR)
> +				return match;

I guess it'd be better just returning the match result.  Otherwise
MATCH_ERROR will be converted to MATCH_EQ..

Thanks,
Namhyung


> +		} else if (node->branch) {
>  			cnode->branch_count++;
>  
>  			if (node->branch_flags.predicted)
> -- 
> 2.12.2
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-10  5:53 ` Namhyung Kim
@ 2017-05-12 10:37   ` Milian Wolff
  2017-05-12 13:01     ` Namhyung Kim
  2017-05-12 14:55     ` Andi Kleen
  0 siblings, 2 replies; 15+ messages in thread
From: Milian Wolff @ 2017-05-12 10:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

[-- Attachment #1: Type: text/plain, Size: 5857 bytes --]

On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> Hi,
> 
> On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:

<snip>

> > +static enum match_result match_chain_srcline(struct callchain_cursor_node
> > *node, +					     struct callchain_list *cnode)
> > +{
> > +	char *left = get_srcline(cnode->ms.map->dso,
> > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > +				 cnode->ms.sym, true, false);
> > +	char *right = get_srcline(node->map->dso,
> > +				  map__rip_2objdump(node->map, node->ip),
> > +				  node->sym, true, false);
> > +	enum match_result ret = match_chain_strings(left, right);
> 
> I think we need to check inlined srcline as well.  There might be a
> case that two samples have different addresses (and from different
> callchains) but happens to be mapped to a same srcline IMHO.

I think I'm missing something, but isn't this what this function provides? The 
function above is now being used by the match_chain_inliner function below. 

Ah, or do you mean for code such as this:

~~~~~
inline_func_1(); inline_func_2();
~~~~~

Here, both branches could be inlined into the same line and the same issue 
would occur, i.e. different branches get collapsed into the first match for 
the given srcline?

Hm yes, this should be fixed too.

But, quite frankly, I think it just shows more and more that the current 
inliner support is really fragile and leads to lots of issues throughout the 
code base as the inlined frames are different from non-inlined frames, but 
should most of the same be handled just like them.

So, maybe it's time to once more think about going back to my initial 
approach: Make inlined frames code-wise equal to non-inlined frames, i.e. 
instead of requesting the inlined frames within match_chain, do it outside and 
create callchain_node/callchain_cursor instances (not sure which one right 
now) for the inlined frames too.

This way, we should be able to centrally add support for inlined frames and 
all areas will benefit from it:

- aggregation by srcline/function will magically work
- all browsers will automatically display them, i.e. no longer need to 
duplicate the code for inliner support in perf script, perf report tui/
stdio/...
- we can easily support --inline in other tools, like `perf trace --call-
graph`

So before I invest more time trying to massage match_chain to behave well for 
inline nodes, can I get some feedback on the above?

Back then when Jin and me discussed this, noone from the core perf 
contributors ever bothered to give us any insight in what they think is the 
better approach.

> > +
> > 
> >  	free_srcline(left);
> >  	free_srcline(right);
> >  	return ret;
> >  
> >  }
> > 
> > +static enum match_result match_chain_inliner(struct callchain_cursor_node
> > *node, +					     struct callchain_list *cnode)
> > +{
> > +	u64 left_ip = map__rip_2objdump(cnode->ms.map, cnode->ip);
> > +	u64 right_ip = map__rip_2objdump(node->map, node->ip);
> > +	struct inline_node *left_node = NULL;
> > +	struct inline_node *right_node = NULL;
> > +	struct inline_list *left_entry = NULL;
> > +	struct inline_list *right_entry = NULL;
> > +	struct inline_list *left_last_entry = NULL;
> > +	struct inline_list *right_last_entry = NULL;
> > +	enum match_result ret = MATCH_EQ;
> > +
> > +	left_node = dso__parse_addr_inlines(cnode->ms.map->dso, left_ip);
> > +	if (!left_node)
> > +		return MATCH_ERROR;
> > +
> > +	right_node = dso__parse_addr_inlines(node->map->dso, right_ip);
> > +	if (!right_node) {
> > +		inline_node__delete(left_node);
> > +		return MATCH_ERROR;
> > +	}
> > +
> > +	left_entry = list_first_entry(&left_node->val,
> > +				      struct inline_list, list);
> > +	left_last_entry = list_last_entry(&left_node->val,
> > +					  struct inline_list, list);
> > +	right_entry = list_first_entry(&right_node->val,
> > +				       struct inline_list, list);
> > +	right_last_entry = list_last_entry(&right_node->val,
> > +					  struct inline_list, list);
> 
> What about keeping number of entries in a inline_node so that we can
> check the numbers for faster comparison?

What benefit would that have? The performance cost is dominated by finding the 
inlined nodes, not by doing the comparison on the callstack.

> > +	while (ret == MATCH_EQ && (left_entry || right_entry)) {
> > +		ret = match_chain_strings(left_entry ? left_entry->funcname : 
NULL,
> > +					  right_entry ? right_entry->funcname : NULL);
> > +
> > +		if (left_entry && left_entry != left_last_entry)
> > +			left_entry = list_next_entry(left_entry, list);
> > +		else
> > +			left_entry = NULL;
> > +
> > +		if (right_entry && right_entry != right_last_entry)
> > +			right_entry = list_next_entry(right_entry, list);
> > +		else
> > +			right_entry = NULL;
> > +	}
> > +
> > +	inline_node__delete(left_node);
> > +	inline_node__delete(right_node);
> > +	return ret;
> > +}
> > +
> > 
> >  static enum match_result match_chain(struct callchain_cursor_node *node,
> >  
> >  				     struct callchain_list *cnode)
> >  
> >  {
> > 
> > @@ -671,7 +728,13 @@ static enum match_result match_chain(struct
> > callchain_cursor_node *node,> 
> >  	}
> >  	
> >  	if (left == right) {
> > 
> > -		if (node->branch) {
> > +		if (symbol_conf.inline_name && cnode->ip != node->ip) {
> > +			enum match_result match = match_chain_inliner(node,
> > +								      cnode);
> > +
> > +			if (match != MATCH_ERROR)
> > +				return match;
> 
> I guess it'd be better just returning the match result.  Otherwise
> MATCH_ERROR will be converted to MATCH_EQ..

This is done on purpose to fall-back to the IP-based comparison. That way, 
entries without inlined nodes will be sorted the same way as before this 
patch.

Cheers
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-12 10:37   ` Milian Wolff
@ 2017-05-12 13:01     ` Namhyung Kim
  2017-05-14 18:10       ` Milian Wolff
  2017-05-12 14:55     ` Andi Kleen
  1 sibling, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2017-05-12 13:01 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > Hi,
> > 
> > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> 
> <snip>
> 
> > > +static enum match_result match_chain_srcline(struct callchain_cursor_node
> > > *node, +					     struct callchain_list *cnode)
> > > +{
> > > +	char *left = get_srcline(cnode->ms.map->dso,
> > > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > +				 cnode->ms.sym, true, false);
> > > +	char *right = get_srcline(node->map->dso,
> > > +				  map__rip_2objdump(node->map, node->ip),
> > > +				  node->sym, true, false);
> > > +	enum match_result ret = match_chain_strings(left, right);
> > 
> > I think we need to check inlined srcline as well.  There might be a
> > case that two samples have different addresses (and from different
> > callchains) but happens to be mapped to a same srcline IMHO.
> 
> I think I'm missing something, but isn't this what this function provides? The 
> function above is now being used by the match_chain_inliner function below. 
> 
> Ah, or do you mean for code such as this:
> 
> ~~~~~
> inline_func_1(); inline_func_2();
> ~~~~~
> 
> Here, both branches could be inlined into the same line and the same issue 
> would occur, i.e. different branches get collapsed into the first match for 
> the given srcline?
> 
> Hm yes, this should be fixed too.

OK.

> 
> But, quite frankly, I think it just shows more and more that the current 
> inliner support is really fragile and leads to lots of issues throughout the 
> code base as the inlined frames are different from non-inlined frames, but 
> should most of the same be handled just like them.
> 
> So, maybe it's time to once more think about going back to my initial 
> approach: Make inlined frames code-wise equal to non-inlined frames, i.e. 
> instead of requesting the inlined frames within match_chain, do it outside and 
> create callchain_node/callchain_cursor instances (not sure which one right 
> now) for the inlined frames too.
> 
> This way, we should be able to centrally add support for inlined frames and 
> all areas will benefit from it:
> 
> - aggregation by srcline/function will magically work
> - all browsers will automatically display them, i.e. no longer need to 
> duplicate the code for inliner support in perf script, perf report tui/
> stdio/...
> - we can easily support --inline in other tools, like `perf trace --call-
> graph`
> 
> So before I invest more time trying to massage match_chain to behave well for 
> inline nodes, can I get some feedback on the above?

Fair enough.  I agree that it'd be better adding them as separate
callchain nodes when resolving callchains.

> 
> Back then when Jin and me discussed this, noone from the core perf 
> contributors ever bothered to give us any insight in what they think is the 
> better approach.

That's unfortunate, sorry about that.

> 
> > > +
> > > 
> > >  	free_srcline(left);
> > >  	free_srcline(right);
> > >  	return ret;
> > >  
> > >  }
> > > 
> > > +static enum match_result match_chain_inliner(struct callchain_cursor_node
> > > *node, +					     struct callchain_list *cnode)
> > > +{
> > > +	u64 left_ip = map__rip_2objdump(cnode->ms.map, cnode->ip);
> > > +	u64 right_ip = map__rip_2objdump(node->map, node->ip);
> > > +	struct inline_node *left_node = NULL;
> > > +	struct inline_node *right_node = NULL;
> > > +	struct inline_list *left_entry = NULL;
> > > +	struct inline_list *right_entry = NULL;
> > > +	struct inline_list *left_last_entry = NULL;
> > > +	struct inline_list *right_last_entry = NULL;
> > > +	enum match_result ret = MATCH_EQ;
> > > +
> > > +	left_node = dso__parse_addr_inlines(cnode->ms.map->dso, left_ip);
> > > +	if (!left_node)
> > > +		return MATCH_ERROR;
> > > +
> > > +	right_node = dso__parse_addr_inlines(node->map->dso, right_ip);
> > > +	if (!right_node) {
> > > +		inline_node__delete(left_node);
> > > +		return MATCH_ERROR;
> > > +	}
> > > +
> > > +	left_entry = list_first_entry(&left_node->val,
> > > +				      struct inline_list, list);
> > > +	left_last_entry = list_last_entry(&left_node->val,
> > > +					  struct inline_list, list);
> > > +	right_entry = list_first_entry(&right_node->val,
> > > +				       struct inline_list, list);
> > > +	right_last_entry = list_last_entry(&right_node->val,
> > > +					  struct inline_list, list);
> > 
> > What about keeping number of entries in a inline_node so that we can
> > check the numbers for faster comparison?
> 
> What benefit would that have? The performance cost is dominated by finding the 
> inlined nodes, not by doing the comparison on the callstack.

Well, I didn't measure the performance cost but your example contains
long symbols and they share some parts.  So I guess it would hurt
performance as they'll be checked frequently.

> 
> > > +	while (ret == MATCH_EQ && (left_entry || right_entry)) {
> > > +		ret = match_chain_strings(left_entry ? left_entry->funcname : 
> NULL,
> > > +					  right_entry ? right_entry->funcname : NULL);
> > > +
> > > +		if (left_entry && left_entry != left_last_entry)
> > > +			left_entry = list_next_entry(left_entry, list);
> > > +		else
> > > +			left_entry = NULL;
> > > +
> > > +		if (right_entry && right_entry != right_last_entry)
> > > +			right_entry = list_next_entry(right_entry, list);
> > > +		else
> > > +			right_entry = NULL;
> > > +	}
> > > +
> > > +	inline_node__delete(left_node);
> > > +	inline_node__delete(right_node);
> > > +	return ret;
> > > +}
> > > +
> > > 
> > >  static enum match_result match_chain(struct callchain_cursor_node *node,
> > >  
> > >  				     struct callchain_list *cnode)
> > >  
> > >  {
> > > 
> > > @@ -671,7 +728,13 @@ static enum match_result match_chain(struct
> > > callchain_cursor_node *node,> 
> > >  	}
> > >  	
> > >  	if (left == right) {
> > > 
> > > -		if (node->branch) {
> > > +		if (symbol_conf.inline_name && cnode->ip != node->ip) {
> > > +			enum match_result match = match_chain_inliner(node,
> > > +								      cnode);
> > > +
> > > +			if (match != MATCH_ERROR)
> > > +				return match;
> > 
> > I guess it'd be better just returning the match result.  Otherwise
> > MATCH_ERROR will be converted to MATCH_EQ..
> 
> This is done on purpose to fall-back to the IP-based comparison. That way, 
> entries without inlined nodes will be sorted the same way as before this 
> patch.

Hmm.. OK, but as I said in another thread, if one node has inlines and
the other don't, they should be separated.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-12 10:37   ` Milian Wolff
  2017-05-12 13:01     ` Namhyung Kim
@ 2017-05-12 14:55     ` Andi Kleen
  2017-05-15  0:44       ` Namhyung Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2017-05-12 14:55 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Namhyung Kim, linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	kernel-team

Milian Wolff <milian.wolff@kdab.com> writes:
>
> I think I'm missing something, but isn't this what this function provides? The 
> function above is now being used by the match_chain_inliner function below. 
>
> Ah, or do you mean for code such as this:
>
> ~~~~~
> inline_func_1(); inline_func_2();

This could be handled by looking at columns or discriminators too (which
some compilers generate in dwarf). srcline.c would need to be changed
to also call bfd_get_nearest_discriminator() and pass that extra
information everywhere.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-12 13:01     ` Namhyung Kim
@ 2017-05-14 18:10       ` Milian Wolff
  2017-05-15  1:21         ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Milian Wolff @ 2017-05-14 18:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

[-- Attachment #1: Type: text/plain, Size: 4237 bytes --]

On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> > On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > > Hi,
> > 
> > > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> > <snip>
> > 
> > > > +static enum match_result match_chain_srcline(struct
> > > > callchain_cursor_node
> > > > *node, +					     struct callchain_list *cnode)
> > > > +{
> > > > +	char *left = get_srcline(cnode->ms.map->dso,
> > > > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > > +				 cnode->ms.sym, true, false);
> > > > +	char *right = get_srcline(node->map->dso,
> > > > +				  map__rip_2objdump(node->map, node->ip),
> > > > +				  node->sym, true, false);
> > > > +	enum match_result ret = match_chain_strings(left, right);
> > > 
> > > I think we need to check inlined srcline as well.  There might be a
> > > case that two samples have different addresses (and from different
> > > callchains) but happens to be mapped to a same srcline IMHO.
> > 
> > I think I'm missing something, but isn't this what this function provides?
> > The function above is now being used by the match_chain_inliner function
> > below.
> > 
> > Ah, or do you mean for code such as this:
> > 
> > ~~~~~
> > inline_func_1(); inline_func_2();
> > ~~~~~
> > 
> > Here, both branches could be inlined into the same line and the same issue
> > would occur, i.e. different branches get collapsed into the first match
> > for
> > the given srcline?
> > 
> > Hm yes, this should be fixed too.
> 
> OK.
> 
> > But, quite frankly, I think it just shows more and more that the current
> > inliner support is really fragile and leads to lots of issues throughout
> > the code base as the inlined frames are different from non-inlined
> > frames, but should most of the same be handled just like them.
> > 
> > So, maybe it's time to once more think about going back to my initial
> > approach: Make inlined frames code-wise equal to non-inlined frames, i.e.
> > instead of requesting the inlined frames within match_chain, do it outside
> > and create callchain_node/callchain_cursor instances (not sure which one
> > right now) for the inlined frames too.
> > 
> > This way, we should be able to centrally add support for inlined frames
> > and
> > all areas will benefit from it:
> > 
> > - aggregation by srcline/function will magically work
> > - all browsers will automatically display them, i.e. no longer need to
> > duplicate the code for inliner support in perf script, perf report tui/
> > stdio/...
> > - we can easily support --inline in other tools, like `perf trace --call-
> > graph`
> > 
> > So before I invest more time trying to massage match_chain to behave well
> > for inline nodes, can I get some feedback on the above?
> 
> Fair enough.  I agree that it'd be better adding them as separate
> callchain nodes when resolving callchains.

Can you, or anyone else more involved with the current callchain code, guide 
me a bit?

My previous attempt at doing this can be seen here:
https://github.com/milianw/linux/commit/
71d031c9d679bfb4a4044226e8903dd80ea601b3

There are some issues with that. Most of it boils down to the question of how 
to feed an inlined frame into a callchain_cursor_node. The latter contains a 
symbol* e.g. and users of that class currently rely on using the IP to find 
e.g. the corresponding srcline.

>From what I can see, we either have to hack inline nodes in there, cf. my 
original approach in the github repo above. Or, better, we would have to 
(heavily?) refactor the callchain_cursor_node struct and the code depending on 
it. What I have in mind would be something like adding two members to this 
struct:

const char* funcname;
const char* srcline;

For non-inlined frames, the funcname aliases the `symbol->name` value, and the 
srcline is queried as-needed. For inlined frames the values from the inlined 
node struct are used. The inlined frames for a given code location would all 
share the same symbol and ip.

Would that be OK as a path forward?

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-12 14:55     ` Andi Kleen
@ 2017-05-15  0:44       ` Namhyung Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2017-05-15  0:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Milian Wolff, linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Yao Jin,
	kernel-team

Hi Andi,

On Fri, May 12, 2017 at 07:55:13AM -0700, Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> >
> > I think I'm missing something, but isn't this what this function provides? The 
> > function above is now being used by the match_chain_inliner function below. 
> >
> > Ah, or do you mean for code such as this:
> >
> > ~~~~~
> > inline_func_1(); inline_func_2();
> 
> This could be handled by looking at columns or discriminators too (which
> some compilers generate in dwarf). srcline.c would need to be changed
> to also call bfd_get_nearest_discriminator() and pass that extra
> information everywhere.

You're right.  The discriminators should be carried too.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-14 18:10       ` Milian Wolff
@ 2017-05-15  1:21         ` Namhyung Kim
  2017-05-15 10:01           ` Milian Wolff
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2017-05-15  1:21 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

Hi Milian,

On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> > > On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > > > Hi,
> > > 
> > > > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> > > <snip>
> > > 
> > > > > +static enum match_result match_chain_srcline(struct
> > > > > callchain_cursor_node
> > > > > *node, +					     struct callchain_list *cnode)
> > > > > +{
> > > > > +	char *left = get_srcline(cnode->ms.map->dso,
> > > > > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > > > +				 cnode->ms.sym, true, false);
> > > > > +	char *right = get_srcline(node->map->dso,
> > > > > +				  map__rip_2objdump(node->map, node->ip),
> > > > > +				  node->sym, true, false);
> > > > > +	enum match_result ret = match_chain_strings(left, right);
> > > > 
> > > > I think we need to check inlined srcline as well.  There might be a
> > > > case that two samples have different addresses (and from different
> > > > callchains) but happens to be mapped to a same srcline IMHO.
> > > 
> > > I think I'm missing something, but isn't this what this function provides?
> > > The function above is now being used by the match_chain_inliner function
> > > below.
> > > 
> > > Ah, or do you mean for code such as this:
> > > 
> > > ~~~~~
> > > inline_func_1(); inline_func_2();
> > > ~~~~~
> > > 
> > > Here, both branches could be inlined into the same line and the same issue
> > > would occur, i.e. different branches get collapsed into the first match
> > > for
> > > the given srcline?
> > > 
> > > Hm yes, this should be fixed too.
> > 
> > OK.
> > 
> > > But, quite frankly, I think it just shows more and more that the current
> > > inliner support is really fragile and leads to lots of issues throughout
> > > the code base as the inlined frames are different from non-inlined
> > > frames, but should most of the same be handled just like them.
> > > 
> > > So, maybe it's time to once more think about going back to my initial
> > > approach: Make inlined frames code-wise equal to non-inlined frames, i.e.
> > > instead of requesting the inlined frames within match_chain, do it outside
> > > and create callchain_node/callchain_cursor instances (not sure which one
> > > right now) for the inlined frames too.
> > > 
> > > This way, we should be able to centrally add support for inlined frames
> > > and
> > > all areas will benefit from it:
> > > 
> > > - aggregation by srcline/function will magically work
> > > - all browsers will automatically display them, i.e. no longer need to
> > > duplicate the code for inliner support in perf script, perf report tui/
> > > stdio/...
> > > - we can easily support --inline in other tools, like `perf trace --call-
> > > graph`
> > > 
> > > So before I invest more time trying to massage match_chain to behave well
> > > for inline nodes, can I get some feedback on the above?
> > 
> > Fair enough.  I agree that it'd be better adding them as separate
> > callchain nodes when resolving callchains.
> 
> Can you, or anyone else more involved with the current callchain code, guide 
> me a bit?
> 
> My previous attempt at doing this can be seen here:
> https://github.com/milianw/linux/commit/
> 71d031c9d679bfb4a4044226e8903dd80ea601b3
> 
> There are some issues with that. Most of it boils down to the question of how 
> to feed an inlined frame into a callchain_cursor_node. The latter contains a 
> symbol* e.g. and users of that class currently rely on using the IP to find 
> e.g. the corresponding srcline.
> 
> From what I can see, we either have to hack inline nodes in there, cf. my 
> original approach in the github repo above. Or, better, we would have to 
> (heavily?) refactor the callchain_cursor_node struct and the code depending on 
> it. What I have in mind would be something like adding two members to this 
> struct:
> 
> const char* funcname;
> const char* srcline;
> 
> For non-inlined frames, the funcname aliases the `symbol->name` value, and the 
> srcline is queried as-needed. For inlined frames the values from the inlined 
> node struct are used. The inlined frames for a given code location would all 
> share the same symbol and ip.
> 
> Would that be OK as a path forward?

I think you'd be better adding (fake) dso and sym to keep the inline
information.  The fake dso can be paired with the original dso and
maintain a tree of (inlined) symbols.  You may need a fake map to
point the fake dso then.  It seems a bit compilcated but that way the
code will be more consistent and easier to handle (e.g. for caching
and/or deletion) IMHO.

Also, for the children mode, callchain nodes should have enough
information to create hist entries (but I'm not sure how to apply
self periods for those inlined entries).

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-15  1:21         ` Namhyung Kim
@ 2017-05-15 10:01           ` Milian Wolff
  2017-05-16  0:53             ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Milian Wolff @ 2017-05-15 10:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

[-- Attachment #1: Type: text/plain, Size: 6627 bytes --]

On Monday, May 15, 2017 3:21:58 AM CEST Namhyung Kim wrote:
> Hi Milian,
> 
> On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> > On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > > On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> > > > On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > > > > Hi,
> > > > 
> > > > > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> > > > <snip>
> > > > 
> > > > > > +static enum match_result match_chain_srcline(struct
> > > > > > callchain_cursor_node
> > > > > > *node, +					     struct callchain_list *cnode)
> > > > > > +{
> > > > > > +	char *left = get_srcline(cnode->ms.map->dso,
> > > > > > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > > > > +				 cnode->ms.sym, true, false);
> > > > > > +	char *right = get_srcline(node->map->dso,
> > > > > > +				  map__rip_2objdump(node->map, node->ip),
> > > > > > +				  node->sym, true, false);
> > > > > > +	enum match_result ret = match_chain_strings(left, right);
> > > > > 
> > > > > I think we need to check inlined srcline as well.  There might be a
> > > > > case that two samples have different addresses (and from different
> > > > > callchains) but happens to be mapped to a same srcline IMHO.
> > > > 
> > > > I think I'm missing something, but isn't this what this function
> > > > provides?
> > > > The function above is now being used by the match_chain_inliner
> > > > function
> > > > below.
> > > > 
> > > > Ah, or do you mean for code such as this:
> > > > 
> > > > ~~~~~
> > > > inline_func_1(); inline_func_2();
> > > > ~~~~~
> > > > 
> > > > Here, both branches could be inlined into the same line and the same
> > > > issue
> > > > would occur, i.e. different branches get collapsed into the first
> > > > match
> > > > for
> > > > the given srcline?
> > > > 
> > > > Hm yes, this should be fixed too.
> > > 
> > > OK.
> > > 
> > > > But, quite frankly, I think it just shows more and more that the
> > > > current
> > > > inliner support is really fragile and leads to lots of issues
> > > > throughout
> > > > the code base as the inlined frames are different from non-inlined
> > > > frames, but should most of the same be handled just like them.
> > > > 
> > > > So, maybe it's time to once more think about going back to my initial
> > > > approach: Make inlined frames code-wise equal to non-inlined frames,
> > > > i.e.
> > > > instead of requesting the inlined frames within match_chain, do it
> > > > outside
> > > > and create callchain_node/callchain_cursor instances (not sure which
> > > > one
> > > > right now) for the inlined frames too.
> > > > 
> > > > This way, we should be able to centrally add support for inlined
> > > > frames
> > > > and
> > > > all areas will benefit from it:
> > > > 
> > > > - aggregation by srcline/function will magically work
> > > > - all browsers will automatically display them, i.e. no longer need to
> > > > duplicate the code for inliner support in perf script, perf report
> > > > tui/
> > > > stdio/...
> > > > - we can easily support --inline in other tools, like `perf trace
> > > > --call-
> > > > graph`
> > > > 
> > > > So before I invest more time trying to massage match_chain to behave
> > > > well
> > > > for inline nodes, can I get some feedback on the above?
> > > 
> > > Fair enough.  I agree that it'd be better adding them as separate
> > > callchain nodes when resolving callchains.
> > 
> > Can you, or anyone else more involved with the current callchain code,
> > guide me a bit?
> > 
> > My previous attempt at doing this can be seen here:
> > https://github.com/milianw/linux/commit/
> > 71d031c9d679bfb4a4044226e8903dd80ea601b3
> > 
> > There are some issues with that. Most of it boils down to the question of
> > how to feed an inlined frame into a callchain_cursor_node. The latter
> > contains a symbol* e.g. and users of that class currently rely on using
> > the IP to find e.g. the corresponding srcline.
> > 
> > From what I can see, we either have to hack inline nodes in there, cf. my
> > original approach in the github repo above. Or, better, we would have to
> > (heavily?) refactor the callchain_cursor_node struct and the code
> > depending on it. What I have in mind would be something like adding two
> > members to this struct:
> > 
> > const char* funcname;
> > const char* srcline;
> > 
> > For non-inlined frames, the funcname aliases the `symbol->name` value, and
> > the srcline is queried as-needed. For inlined frames the values from the
> > inlined node struct are used. The inlined frames for a given code
> > location would all share the same symbol and ip.
> > 
> > Would that be OK as a path forward?
> 
> I think you'd be better adding (fake) dso and sym to keep the inline
> information.  The fake dso can be paired with the original dso and
> maintain a tree of (inlined) symbols.  You may need a fake map to
> point the fake dso then.  It seems a bit compilcated but that way the
> code will be more consistent and easier to handle (e.g. for caching
> and/or deletion) IMHO.

Can you expand on this please? How would that solve the problem of finding a 
function name or srcline for a given inline frame?

I.e.: the function name is, currently, part of the sym. So the fake dso/map 
would contain an internal, fake, string table which fake symbols could 
leverage for the function name?

Sounds like doable, but also sounds like *a lot* of work. And I don't see how 
that would solve the srcline situation: That one is queried on-demand based on 
the IP of a frame. I would say that inline frames should keep the IP of the 
first non-inlined frame. But that would make it impossible to find the srcline 
for the N'th inlined frame... Am I missing something?

> Also, for the children mode, callchain nodes should have enough
> information to create hist entries (but I'm not sure how to apply
> self periods for those inlined entries).

I'm not sure I'm following here either, but once inlined frames are normal 
callchain nodes, all browsers will simply support them out of the box, no? All 
of the aggregation and browsing features should just work™. We'd only need to 
patch the browsers for special usecases, like when we want to change the 
visuals to make it clear that a given frame was actually inlined.

Bye

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-15 10:01           ` Milian Wolff
@ 2017-05-16  0:53             ` Namhyung Kim
  2017-05-16 13:18               ` Milian Wolff
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2017-05-16  0:53 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

On Mon, May 15, 2017 at 12:01:54PM +0200, Milian Wolff wrote:
> On Monday, May 15, 2017 3:21:58 AM CEST Namhyung Kim wrote:
> > Hi Milian,
> > 
> > On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> > > On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > > > On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> > > > > On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > > > > > Hi,
> > > > > 
> > > > > > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> > > > > <snip>
> > > > > 
> > > > > > > +static enum match_result match_chain_srcline(struct
> > > > > > > callchain_cursor_node
> > > > > > > *node, +					     struct callchain_list *cnode)
> > > > > > > +{
> > > > > > > +	char *left = get_srcline(cnode->ms.map->dso,
> > > > > > > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > > > > > +				 cnode->ms.sym, true, false);
> > > > > > > +	char *right = get_srcline(node->map->dso,
> > > > > > > +				  map__rip_2objdump(node->map, node->ip),
> > > > > > > +				  node->sym, true, false);
> > > > > > > +	enum match_result ret = match_chain_strings(left, right);
> > > > > > 
> > > > > > I think we need to check inlined srcline as well.  There might be a
> > > > > > case that two samples have different addresses (and from different
> > > > > > callchains) but happens to be mapped to a same srcline IMHO.
> > > > > 
> > > > > I think I'm missing something, but isn't this what this function
> > > > > provides?
> > > > > The function above is now being used by the match_chain_inliner
> > > > > function
> > > > > below.
> > > > > 
> > > > > Ah, or do you mean for code such as this:
> > > > > 
> > > > > ~~~~~
> > > > > inline_func_1(); inline_func_2();
> > > > > ~~~~~
> > > > > 
> > > > > Here, both branches could be inlined into the same line and the same
> > > > > issue
> > > > > would occur, i.e. different branches get collapsed into the first
> > > > > match
> > > > > for
> > > > > the given srcline?
> > > > > 
> > > > > Hm yes, this should be fixed too.
> > > > 
> > > > OK.
> > > > 
> > > > > But, quite frankly, I think it just shows more and more that the
> > > > > current
> > > > > inliner support is really fragile and leads to lots of issues
> > > > > throughout
> > > > > the code base as the inlined frames are different from non-inlined
> > > > > frames, but should most of the same be handled just like them.
> > > > > 
> > > > > So, maybe it's time to once more think about going back to my initial
> > > > > approach: Make inlined frames code-wise equal to non-inlined frames,
> > > > > i.e.
> > > > > instead of requesting the inlined frames within match_chain, do it
> > > > > outside
> > > > > and create callchain_node/callchain_cursor instances (not sure which
> > > > > one
> > > > > right now) for the inlined frames too.
> > > > > 
> > > > > This way, we should be able to centrally add support for inlined
> > > > > frames
> > > > > and
> > > > > all areas will benefit from it:
> > > > > 
> > > > > - aggregation by srcline/function will magically work
> > > > > - all browsers will automatically display them, i.e. no longer need to
> > > > > duplicate the code for inliner support in perf script, perf report
> > > > > tui/
> > > > > stdio/...
> > > > > - we can easily support --inline in other tools, like `perf trace
> > > > > --call-
> > > > > graph`
> > > > > 
> > > > > So before I invest more time trying to massage match_chain to behave
> > > > > well
> > > > > for inline nodes, can I get some feedback on the above?
> > > > 
> > > > Fair enough.  I agree that it'd be better adding them as separate
> > > > callchain nodes when resolving callchains.
> > > 
> > > Can you, or anyone else more involved with the current callchain code,
> > > guide me a bit?
> > > 
> > > My previous attempt at doing this can be seen here:
> > > https://github.com/milianw/linux/commit/
> > > 71d031c9d679bfb4a4044226e8903dd80ea601b3
> > > 
> > > There are some issues with that. Most of it boils down to the question of
> > > how to feed an inlined frame into a callchain_cursor_node. The latter
> > > contains a symbol* e.g. and users of that class currently rely on using
> > > the IP to find e.g. the corresponding srcline.
> > > 
> > > From what I can see, we either have to hack inline nodes in there, cf. my
> > > original approach in the github repo above. Or, better, we would have to
> > > (heavily?) refactor the callchain_cursor_node struct and the code
> > > depending on it. What I have in mind would be something like adding two
> > > members to this struct:
> > > 
> > > const char* funcname;
> > > const char* srcline;
> > > 
> > > For non-inlined frames, the funcname aliases the `symbol->name` value, and
> > > the srcline is queried as-needed. For inlined frames the values from the
> > > inlined node struct are used. The inlined frames for a given code
> > > location would all share the same symbol and ip.
> > > 
> > > Would that be OK as a path forward?
> > 
> > I think you'd be better adding (fake) dso and sym to keep the inline
> > information.  The fake dso can be paired with the original dso and
> > maintain a tree of (inlined) symbols.  You may need a fake map to
> > point the fake dso then.  It seems a bit compilcated but that way the
> > code will be more consistent and easier to handle (e.g. for caching
> > and/or deletion) IMHO.
> 
> Can you expand on this please? How would that solve the problem of finding a 
> function name or srcline for a given inline frame?
> 
> I.e.: the function name is, currently, part of the sym. So the fake dso/map 
> would contain an internal, fake, string table which fake symbols could 
> leverage for the function name?
> 
> Sounds like doable, but also sounds like *a lot* of work. And I don't see how 
> that would solve the srcline situation: That one is queried on-demand based on 
> the IP of a frame. I would say that inline frames should keep the IP of the 
> first non-inlined frame. But that would make it impossible to find the srcline 
> for the N'th inlined frame... Am I missing something?

I agree that srcline info can be kept in callchain cursor nodes, but I
still think function name should be in (fake) symbols.  Sharing a
symbol for all inlined frames would not work for the children mode
IMHO.


> 
> > Also, for the children mode, callchain nodes should have enough
> > information to create hist entries (but I'm not sure how to apply
> > self periods for those inlined entries).
> 
> I'm not sure I'm following here either, but once inlined frames are normal 
> callchain nodes, all browsers will simply support them out of the box, no? All 
> of the aggregation and browsing features should just work™. We'd only need to 
> patch the browsers for special usecases, like when we want to change the 
> visuals to make it clear that a given frame was actually inlined.

Yes, once inlined frames converted to callchain cursor nodes, it
should work out of the box.  But for children mode, it needs a symbol
to construct a hist entry from a callchain cursor node.  Please see
fill_callchain_info().  If you add the "(inline)" to the (fake) symbol
name, you don't even need to patch the browsers IMHO.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-16  0:53             ` Namhyung Kim
@ 2017-05-16 13:18               ` Milian Wolff
  2017-05-17  6:13                 ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Milian Wolff @ 2017-05-16 13:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

[-- Attachment #1: Type: text/plain, Size: 14253 bytes --]

On Dienstag, 16. Mai 2017 02:53:32 CEST Namhyung Kim wrote:
> On Mon, May 15, 2017 at 12:01:54PM +0200, Milian Wolff wrote:
> > On Monday, May 15, 2017 3:21:58 AM CEST Namhyung Kim wrote:
> > > Hi Milian,
> > > 
> > > On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> > > > On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > > > > On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> > > > > > On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > > > > > > Hi,
> > > > > > 
> > > > > > > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> > > > > > <snip>
> > > > > > 
> > > > > > > > +static enum match_result match_chain_srcline(struct
> > > > > > > > callchain_cursor_node
> > > > > > > > *node, +					     struct callchain_list *cnode)
> > > > > > > > +{
> > > > > > > > +	char *left = get_srcline(cnode->ms.map->dso,
> > > > > > > > +				 map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > > > > > > +				 cnode->ms.sym, true, false);
> > > > > > > > +	char *right = get_srcline(node->map->dso,
> > > > > > > > +				  map__rip_2objdump(node->map, node->ip),
> > > > > > > > +				  node->sym, true, false);
> > > > > > > > +	enum match_result ret = match_chain_strings(left, right);
> > > > > > > 
> > > > > > > I think we need to check inlined srcline as well.  There might
> > > > > > > be a
> > > > > > > case that two samples have different addresses (and from
> > > > > > > different
> > > > > > > callchains) but happens to be mapped to a same srcline IMHO.
> > > > > > 
> > > > > > I think I'm missing something, but isn't this what this function
> > > > > > provides?
> > > > > > The function above is now being used by the match_chain_inliner
> > > > > > function
> > > > > > below.
> > > > > > 
> > > > > > Ah, or do you mean for code such as this:
> > > > > > 
> > > > > > ~~~~~
> > > > > > inline_func_1(); inline_func_2();
> > > > > > ~~~~~
> > > > > > 
> > > > > > Here, both branches could be inlined into the same line and the
> > > > > > same
> > > > > > issue
> > > > > > would occur, i.e. different branches get collapsed into the first
> > > > > > match
> > > > > > for
> > > > > > the given srcline?
> > > > > > 
> > > > > > Hm yes, this should be fixed too.
> > > > > 
> > > > > OK.
> > > > > 
> > > > > > But, quite frankly, I think it just shows more and more that the
> > > > > > current
> > > > > > inliner support is really fragile and leads to lots of issues
> > > > > > throughout
> > > > > > the code base as the inlined frames are different from non-inlined
> > > > > > frames, but should most of the same be handled just like them.
> > > > > > 
> > > > > > So, maybe it's time to once more think about going back to my
> > > > > > initial
> > > > > > approach: Make inlined frames code-wise equal to non-inlined
> > > > > > frames,
> > > > > > i.e.
> > > > > > instead of requesting the inlined frames within match_chain, do it
> > > > > > outside
> > > > > > and create callchain_node/callchain_cursor instances (not sure
> > > > > > which
> > > > > > one
> > > > > > right now) for the inlined frames too.
> > > > > > 
> > > > > > This way, we should be able to centrally add support for inlined
> > > > > > frames
> > > > > > and
> > > > > > all areas will benefit from it:
> > > > > > 
> > > > > > - aggregation by srcline/function will magically work
> > > > > > - all browsers will automatically display them, i.e. no longer
> > > > > > need to
> > > > > > duplicate the code for inliner support in perf script, perf report
> > > > > > tui/
> > > > > > stdio/...
> > > > > > - we can easily support --inline in other tools, like `perf trace
> > > > > > --call-
> > > > > > graph`
> > > > > > 
> > > > > > So before I invest more time trying to massage match_chain to
> > > > > > behave
> > > > > > well
> > > > > > for inline nodes, can I get some feedback on the above?
> > > > > 
> > > > > Fair enough.  I agree that it'd be better adding them as separate
> > > > > callchain nodes when resolving callchains.
> > > > 
> > > > Can you, or anyone else more involved with the current callchain code,
> > > > guide me a bit?
> > > > 
> > > > My previous attempt at doing this can be seen here:
> > > > https://github.com/milianw/linux/commit/
> > > > 71d031c9d679bfb4a4044226e8903dd80ea601b3
> > > > 
> > > > There are some issues with that. Most of it boils down to the question
> > > > of
> > > > how to feed an inlined frame into a callchain_cursor_node. The latter
> > > > contains a symbol* e.g. and users of that class currently rely on
> > > > using
> > > > the IP to find e.g. the corresponding srcline.
> > > > 
> > > > From what I can see, we either have to hack inline nodes in there, cf.
> > > > my
> > > > original approach in the github repo above. Or, better, we would have
> > > > to
> > > > (heavily?) refactor the callchain_cursor_node struct and the code
> > > > depending on it. What I have in mind would be something like adding
> > > > two
> > > > members to this struct:
> > > > 
> > > > const char* funcname;
> > > > const char* srcline;
> > > > 
> > > > For non-inlined frames, the funcname aliases the `symbol->name` value,
> > > > and
> > > > the srcline is queried as-needed. For inlined frames the values from
> > > > the
> > > > inlined node struct are used. The inlined frames for a given code
> > > > location would all share the same symbol and ip.
> > > > 
> > > > Would that be OK as a path forward?
> > > 
> > > I think you'd be better adding (fake) dso and sym to keep the inline
> > > information.  The fake dso can be paired with the original dso and
> > > maintain a tree of (inlined) symbols.  You may need a fake map to
> > > point the fake dso then.  It seems a bit compilcated but that way the
> > > code will be more consistent and easier to handle (e.g. for caching
> > > and/or deletion) IMHO.
> > 
> > Can you expand on this please? How would that solve the problem of finding
> > a function name or srcline for a given inline frame?
> > 
> > I.e.: the function name is, currently, part of the sym. So the fake
> > dso/map
> > would contain an internal, fake, string table which fake symbols could
> > leverage for the function name?
> > 
> > Sounds like doable, but also sounds like *a lot* of work. And I don't see
> > how that would solve the srcline situation: That one is queried on-demand
> > based on the IP of a frame. I would say that inline frames should keep
> > the IP of the first non-inlined frame. But that would make it impossible
> > to find the srcline for the N'th inlined frame... Am I missing something?
> 
> I agree that srcline info can be kept in callchain cursor nodes, but I
> still think function name should be in (fake) symbols.  Sharing a
> symbol for all inlined frames would not work for the children mode
> IMHO.

I'm running into a bit of trouble here. I hoped to be able to store the 
inlined symbol in the DSO to reuse it for other inlined entries that use the 
same function. I also hope that this will then take care of the deletion of 
the fake symbols, once the dso is freed.

To do this, I thought I could use dso__find_symbol_by_name and, if nothing was 
found, I create the new fake symbol by symbol__new and insert it via 
dso__insert_symbol. Apparently, I also need to update the sorted lookup table, 
so I call dso__sort_by_name, but that then leads to crashes:

==22675== Invalid write of size 8
==22675==    at 0x4D89FC: rb_link_node (rbtree.h:82)
==22675==    by 0x4D89FC: symbols__insert_by_name (symbol.c:387)
==22675==    by 0x4D89FC: symbols__sort_by_name (symbol.c:398)
==22675==    by 0x4D89FC: dso__sort_by_name (symbol.c:512)
==22675==    by 0x4EB704: unwind_entry (machine.c:2061)
==22675==    by 0x55BCF7: entry (unwind-libunwind-local.c:600)
==22675==    by 0x55BCF7: get_entries (unwind-libunwind-local.c:723)
==22675==    by 0x55BE01: _unwind__get_entries (unwind-libunwind-local.c:745)
==22675==    by 0x4E8B20: sample__resolve_callchain (callchain.c:1016)
==22675==    by 0x5196F9: hist_entry_iter__add (hist.c:1039)
==22675==    by 0x448044: process_sample_event (builtin-report.c:204)
==22675==    by 0x4F61DD: perf_evlist__deliver_sample (session.c:1211)
==22675==    by 0x4F61DD: machines__deliver_event (session.c:1248)
==22675==    by 0x4F66D3: perf_session__deliver_event (session.c:1310)
==22675==    by 0x4F66D3: ordered_events__deliver_event (session.c:119)
==22675==    by 0x4F9CD2: __ordered_events__flush (ordered-events.c:210)
==22675==    by 0x4F9CD2: ordered_events__flush.part.3 (ordered-events.c:277)
==22675==    by 0x4F6DEC: perf_session__process_user_event (session.c:1349)
==22675==    by 0x4F6DEC: perf_session__process_event (session.c:1475)
==22675==    by 0x4F8C5C: __perf_session__process_events (session.c:1867)
==22675==    by 0x4F8C5C: perf_session__process_events (session.c:1921)
==22675==  Address 0xd1fae38 is 24 bytes before a block of size 54 alloc'd
==22675==    at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==22675==    by 0x4D80F0: symbol__new (symbol.c:240)
==22675==    by 0x54750E: dso__load_sym (symbol-elf.c:1121)
==22675==    by 0x4D9D3E: dso__load (symbol.c:1535)
==22675==    by 0x4F262B: map__load (map.c:292)
==22675==    by 0x4F262B: map__find_symbol (map.c:335)
==22675==    by 0x4B3B55: thread__find_addr_location (event.c:1458)
==22675==    by 0x55BC70: entry (unwind-libunwind-local.c:588)
==22675==    by 0x55BC70: get_entries (unwind-libunwind-local.c:723)
==22675==    by 0x55BE01: _unwind__get_entries (unwind-libunwind-local.c:745)
==22675==    by 0x4E8B20: sample__resolve_callchain (callchain.c:1016)
==22675==    by 0x5196F9: hist_entry_iter__add (hist.c:1039)
==22675==    by 0x448044: process_sample_event (builtin-report.c:204)
==22675==    by 0x4F61DD: perf_evlist__deliver_sample (session.c:1211)
==22675==    by 0x4F61DD: machines__deliver_event (session.c:1248)
==22675== 

Here's the patch that I tried to play around with this concept:

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a98f55a91cb2..4d1b6e58da00 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2037,6 +2037,38 @@ static int unwind_entry(struct unwind_entry *entry, 
void *arg)
 
        if (symbol_conf.hide_unresolved && entry->sym == NULL)
                return 0;
+
+       if (symbol_conf.inline_name && entry->map) {
+               u64 addr = map__rip_2objdump(entry->map, entry->ip);
+               struct inline_node *inline_node;
+               inline_node = dso__parse_addr_inlines(entry->map->dso, addr);
+               if (inline_node) {
+                       struct inline_list *ilist;
+                       list_for_each_entry(ilist, &inline_node->val, list) {
+                               struct symbol *sym;
+                               sym = dso__find_symbol_by_name(entry->map-
>dso,
+                                                              MAP__FUNCTION,
+                                                              ilist-
>funcname);
+                               if (!sym) {
+                                       fprintf(stderr, "create new entry\n");
+                                       sym = symbol__new(entry->sym->start,
+                                                         entry->sym->end,
+                                                         entry->sym->binding,
+                                                         ilist->funcname);
+                                       dso__insert_symbol(entry->map->dso,
+                                                          MAP__FUNCTION,
+                                                          sym);
+                                       dso__sort_by_name(entry->map->dso,
+                                                         MAP__FUNCTION);
+                                       fprintf(stderr, "%p | %p | %s | %s\n", 
sym,
+                                               
dso__find_symbol_by_name(entry->map->dso, MAP__FUNCTION, ilist->funcname),
+                                               sym->name, ilist->funcname);
+                               }
+                               fprintf(stderr, "want to add inline: %s %p\n", 
ilist->funcname, sym);
+                       }
+                       inline_node__delete(inline_node);
+               }
+       }
        return callchain_cursor_append(cursor, entry->ip,
                                       entry->map, entry->sym,
                                       false, NULL, 0, 0);
~~~~~~~~~~~~~~~

Any idea what I'm doing wrong here? Can I not insert into the sorted list? 
I.e. is it not safe to sort multiple times? Is this the wrong approach in 
general?

Also, any suggestion as to what I'd use for symbol start/end for the inlined 
fake symbols?

Thanks

> > > Also, for the children mode, callchain nodes should have enough
> > > information to create hist entries (but I'm not sure how to apply
> > > self periods for those inlined entries).
> > 
> > I'm not sure I'm following here either, but once inlined frames are normal
> > callchain nodes, all browsers will simply support them out of the box, no?
> > All of the aggregation and browsing features should just work™. We'd only
> > need to patch the browsers for special usecases, like when we want to
> > change the visuals to make it clear that a given frame was actually
> > inlined.
> 
> Yes, once inlined frames converted to callchain cursor nodes, it
> should work out of the box.  But for children mode, it needs a symbol
> to construct a hist entry from a callchain cursor node.  Please see
> fill_callchain_info().  If you add the "(inline)" to the (fake) symbol
> name, you don't even need to patch the browsers IMHO.

Good idea, I'll do that once I get it to work (see above)

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-16 13:18               ` Milian Wolff
@ 2017-05-17  6:13                 ` Namhyung Kim
  2017-05-18 12:20                   ` Milian Wolff
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2017-05-17  6:13 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

On Tue, May 16, 2017 at 03:18:13PM +0200, Milian Wolff wrote:
> On Dienstag, 16. Mai 2017 02:53:32 CEST Namhyung Kim wrote:
> > On Mon, May 15, 2017 at 12:01:54PM +0200, Milian Wolff wrote:
> > > On Monday, May 15, 2017 3:21:58 AM CEST Namhyung Kim wrote:
> > > > Hi Milian,
> > > > 
> > > > On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> > > > > On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > > > I think you'd be better adding (fake) dso and sym to keep the inline
> > > > information.  The fake dso can be paired with the original dso and
> > > > maintain a tree of (inlined) symbols.  You may need a fake map to
> > > > point the fake dso then.  It seems a bit compilcated but that way the
> > > > code will be more consistent and easier to handle (e.g. for caching
> > > > and/or deletion) IMHO.
> > > 
> > > Can you expand on this please? How would that solve the problem of finding
> > > a function name or srcline for a given inline frame?
> > > 
> > > I.e.: the function name is, currently, part of the sym. So the fake
> > > dso/map
> > > would contain an internal, fake, string table which fake symbols could
> > > leverage for the function name?
> > > 
> > > Sounds like doable, but also sounds like *a lot* of work. And I don't see
> > > how that would solve the srcline situation: That one is queried on-demand
> > > based on the IP of a frame. I would say that inline frames should keep
> > > the IP of the first non-inlined frame. But that would make it impossible
> > > to find the srcline for the N'th inlined frame... Am I missing something?
> > 
> > I agree that srcline info can be kept in callchain cursor nodes, but I
> > still think function name should be in (fake) symbols.  Sharing a
> > symbol for all inlined frames would not work for the children mode
> > IMHO.
> 
> I'm running into a bit of trouble here. I hoped to be able to store the 
> inlined symbol in the DSO to reuse it for other inlined entries that use the 
> same function. I also hope that this will then take care of the deletion of 
> the fake symbols, once the dso is freed.

I don't want to store inlined functions to an original DSO as it would
confuse symbol lookups in the DSO.  As you said those inlined
functions will have same address so multiple symbols exist for an
address.

I thought they can be kept in a fake DSO which should be linked to the
original DSO, but it doesn't need to be a DSO.  Instead a DSO can have
a tree that maintains lists of (inlined) symbols and srclines sorted
by address.


> 
> To do this, I thought I could use dso__find_symbol_by_name and, if nothing was 
> found, I create the new fake symbol by symbol__new and insert it via 
> dso__insert_symbol. Apparently, I also need to update the sorted lookup table, 
> so I call dso__sort_by_name, but that then leads to crashes:
> 
> ==22675== Invalid write of size 8
> ==22675==    at 0x4D89FC: rb_link_node (rbtree.h:82)
> ==22675==    by 0x4D89FC: symbols__insert_by_name (symbol.c:387)
> ==22675==    by 0x4D89FC: symbols__sort_by_name (symbol.c:398)
> ==22675==    by 0x4D89FC: dso__sort_by_name (symbol.c:512)
> ==22675==    by 0x4EB704: unwind_entry (machine.c:2061)
> ==22675==    by 0x55BCF7: entry (unwind-libunwind-local.c:600)
> ==22675==    by 0x55BCF7: get_entries (unwind-libunwind-local.c:723)
> ==22675==    by 0x55BE01: _unwind__get_entries (unwind-libunwind-local.c:745)
> ==22675==    by 0x4E8B20: sample__resolve_callchain (callchain.c:1016)
> ==22675==    by 0x5196F9: hist_entry_iter__add (hist.c:1039)
> ==22675==    by 0x448044: process_sample_event (builtin-report.c:204)
> ==22675==    by 0x4F61DD: perf_evlist__deliver_sample (session.c:1211)
> ==22675==    by 0x4F61DD: machines__deliver_event (session.c:1248)
> ==22675==    by 0x4F66D3: perf_session__deliver_event (session.c:1310)
> ==22675==    by 0x4F66D3: ordered_events__deliver_event (session.c:119)
> ==22675==    by 0x4F9CD2: __ordered_events__flush (ordered-events.c:210)
> ==22675==    by 0x4F9CD2: ordered_events__flush.part.3 (ordered-events.c:277)
> ==22675==    by 0x4F6DEC: perf_session__process_user_event (session.c:1349)
> ==22675==    by 0x4F6DEC: perf_session__process_event (session.c:1475)
> ==22675==    by 0x4F8C5C: __perf_session__process_events (session.c:1867)
> ==22675==    by 0x4F8C5C: perf_session__process_events (session.c:1921)
> ==22675==  Address 0xd1fae38 is 24 bytes before a block of size 54 alloc'd
> ==22675==    at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-
> amd64-linux.so)
> ==22675==    by 0x4D80F0: symbol__new (symbol.c:240)
> ==22675==    by 0x54750E: dso__load_sym (symbol-elf.c:1121)
> ==22675==    by 0x4D9D3E: dso__load (symbol.c:1535)
> ==22675==    by 0x4F262B: map__load (map.c:292)
> ==22675==    by 0x4F262B: map__find_symbol (map.c:335)
> ==22675==    by 0x4B3B55: thread__find_addr_location (event.c:1458)
> ==22675==    by 0x55BC70: entry (unwind-libunwind-local.c:588)
> ==22675==    by 0x55BC70: get_entries (unwind-libunwind-local.c:723)
> ==22675==    by 0x55BE01: _unwind__get_entries (unwind-libunwind-local.c:745)
> ==22675==    by 0x4E8B20: sample__resolve_callchain (callchain.c:1016)
> ==22675==    by 0x5196F9: hist_entry_iter__add (hist.c:1039)
> ==22675==    by 0x448044: process_sample_event (builtin-report.c:204)
> ==22675==    by 0x4F61DD: perf_evlist__deliver_sample (session.c:1211)
> ==22675==    by 0x4F61DD: machines__deliver_event (session.c:1248)
> ==22675== 
> 
> Here's the patch that I tried to play around with this concept:
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index a98f55a91cb2..4d1b6e58da00 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2037,6 +2037,38 @@ static int unwind_entry(struct unwind_entry *entry, 
> void *arg)
>  
>         if (symbol_conf.hide_unresolved && entry->sym == NULL)
>                 return 0;
> +
> +       if (symbol_conf.inline_name && entry->map) {
> +               u64 addr = map__rip_2objdump(entry->map, entry->ip);
> +               struct inline_node *inline_node;
> +               inline_node = dso__parse_addr_inlines(entry->map->dso, addr);
> +               if (inline_node) {
> +                       struct inline_list *ilist;
> +                       list_for_each_entry(ilist, &inline_node->val, list) {
> +                               struct symbol *sym;
> +                               sym = dso__find_symbol_by_name(entry->map-
> >dso,
> +                                                              MAP__FUNCTION,
> +                                                              ilist-
> >funcname);
> +                               if (!sym) {
> +                                       fprintf(stderr, "create new entry\n");
> +                                       sym = symbol__new(entry->sym->start,
> +                                                         entry->sym->end,
> +                                                         entry->sym->binding,
> +                                                         ilist->funcname);
> +                                       dso__insert_symbol(entry->map->dso,
> +                                                          MAP__FUNCTION,
> +                                                          sym);
> +                                       dso__sort_by_name(entry->map->dso,
> +                                                         MAP__FUNCTION);
> +                                       fprintf(stderr, "%p | %p | %s | %s\n", 
> sym,
> +                                               
> dso__find_symbol_by_name(entry->map->dso, MAP__FUNCTION, ilist->funcname),
> +                                               sym->name, ilist->funcname);
> +                               }
> +                               fprintf(stderr, "want to add inline: %s %p\n", 
> ilist->funcname, sym);
> +                       }
> +                       inline_node__delete(inline_node);
> +               }
> +       }
>         return callchain_cursor_append(cursor, entry->ip,
>                                        entry->map, entry->sym,
>                                        false, NULL, 0, 0);
> ~~~~~~~~~~~~~~~
> 
> Any idea what I'm doing wrong here? Can I not insert into the sorted list? 
> I.e. is it not safe to sort multiple times? Is this the wrong approach in 
> general?
> 
> Also, any suggestion as to what I'd use for symbol start/end for the inlined 
> fake symbols?

Please see above.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] perf report: distinguish between inliners in the same function
  2017-05-17  6:13                 ` Namhyung Kim
@ 2017-05-18 12:20                   ` Milian Wolff
  0 siblings, 0 replies; 15+ messages in thread
From: Milian Wolff @ 2017-05-18 12:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	David Ahern, Peter Zijlstra, Yao Jin, kernel-team

[-- Attachment #1: Type: text/plain, Size: 2950 bytes --]

On Mittwoch, 17. Mai 2017 08:13:16 CEST Namhyung Kim wrote:
> On Tue, May 16, 2017 at 03:18:13PM +0200, Milian Wolff wrote:
> > On Dienstag, 16. Mai 2017 02:53:32 CEST Namhyung Kim wrote:
> > > On Mon, May 15, 2017 at 12:01:54PM +0200, Milian Wolff wrote:
> > > > On Monday, May 15, 2017 3:21:58 AM CEST Namhyung Kim wrote:
> > > > > Hi Milian,
> > > > > 
> > > > > On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> > > > > > On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > > > > I think you'd be better adding (fake) dso and sym to keep the inline
> > > > > information.  The fake dso can be paired with the original dso and
> > > > > maintain a tree of (inlined) symbols.  You may need a fake map to
> > > > > point the fake dso then.  It seems a bit compilcated but that way
> > > > > the
> > > > > code will be more consistent and easier to handle (e.g. for caching
> > > > > and/or deletion) IMHO.
> > > > 
> > > > Can you expand on this please? How would that solve the problem of
> > > > finding
> > > > a function name or srcline for a given inline frame?
> > > > 
> > > > I.e.: the function name is, currently, part of the sym. So the fake
> > > > dso/map
> > > > would contain an internal, fake, string table which fake symbols could
> > > > leverage for the function name?
> > > > 
> > > > Sounds like doable, but also sounds like *a lot* of work. And I don't
> > > > see
> > > > how that would solve the srcline situation: That one is queried
> > > > on-demand
> > > > based on the IP of a frame. I would say that inline frames should keep
> > > > the IP of the first non-inlined frame. But that would make it
> > > > impossible
> > > > to find the srcline for the N'th inlined frame... Am I missing
> > > > something?
> > > 
> > > I agree that srcline info can be kept in callchain cursor nodes, but I
> > > still think function name should be in (fake) symbols.  Sharing a
> > > symbol for all inlined frames would not work for the children mode
> > > IMHO.
> > 
> > I'm running into a bit of trouble here. I hoped to be able to store the
> > inlined symbol in the DSO to reuse it for other inlined entries that use
> > the same function. I also hope that this will then take care of the
> > deletion of the fake symbols, once the dso is freed.
> 
> I don't want to store inlined functions to an original DSO as it would
> confuse symbol lookups in the DSO.  As you said those inlined
> functions will have same address so multiple symbols exist for an
> address.
> 
> I thought they can be kept in a fake DSO which should be linked to the
> original DSO, but it doesn't need to be a DSO.  Instead a DSO can have
> a tree that maintains lists of (inlined) symbols and srclines sorted
> by address.

Thank you, this is what I'll be doing now.

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-05-18 12:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 21:35 [PATCH v2] perf report: distinguish between inliners in the same function Milian Wolff
2017-05-08  8:45 ` Milian Wolff
2017-05-08 16:17   ` Arnaldo Carvalho de Melo
2017-05-10  5:53 ` Namhyung Kim
2017-05-12 10:37   ` Milian Wolff
2017-05-12 13:01     ` Namhyung Kim
2017-05-14 18:10       ` Milian Wolff
2017-05-15  1:21         ` Namhyung Kim
2017-05-15 10:01           ` Milian Wolff
2017-05-16  0:53             ` Namhyung Kim
2017-05-16 13:18               ` Milian Wolff
2017-05-17  6:13                 ` Namhyung Kim
2017-05-18 12:20                   ` Milian Wolff
2017-05-12 14:55     ` Andi Kleen
2017-05-15  0:44       ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).