* [PATCH 0/6] perf thread-stack: x86 retpolines
@ 2019-01-09 9:18 Adrian Hunter
2019-01-09 9:18 ` [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols Adrian Hunter
` (6 more replies)
0 siblings, 7 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
Hi
Here are some patches to improve the exported call graph, primarily to deal
with x86 retpolines.
Adrian Hunter (6):
perf tools: Fix split_kallsyms_for_kcore for trampoline symbols
perf thread-stack: Tidy thread_stack__push_cp() usage
perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
perf thread-stack: Represent jmps to the start of a different symbol
perf thread-stack: Improve thread_stack__no_call_return()
perf thread-stack: Hide x86 retpolines
tools/perf/scripts/python/export-to-postgresql.py | 2 +-
tools/perf/scripts/python/export-to-sqlite.py | 2 +-
tools/perf/util/symbol.c | 2 +
tools/perf/util/thread-stack.c | 235 ++++++++++++++++++----
tools/perf/util/thread-stack.h | 3 +
5 files changed, 208 insertions(+), 36 deletions(-)
Regards
Adrian
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
@ 2019-01-09 9:18 ` Adrian Hunter
2019-02-09 12:57 ` [tip:perf/core] perf tools: Fix split_kallsyms_for_kcore() " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 2/6] perf thread-stack: Tidy thread_stack__push_cp() usage Adrian Hunter
` (5 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
Kallsyms symbols do not have a size, so the size becomes the distance to
the next symbol. Consequently the recently added trampoline symbols end up
with large sizes because the trampolines are some distance from one another
and the main kernel map. However, symbols that end outside their map can
disrupt the symbol tree because, after mapping, it can appear incorrectly
that they overlap other symbols. Add logic to truncate symbol size to the
end of the corresponding map.
Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
tools/perf/util/symbol.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 01f2c7385e38..baf6498d1835 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -709,6 +709,8 @@ static int map_groups__split_kallsyms_for_kcore(struct map_groups *kmaps, struct
}
pos->start -= curr_map->start - curr_map->pgoff;
+ if (pos->end > curr_map->end)
+ pos->end = curr_map->end;
if (pos->end)
pos->end -= curr_map->start - curr_map->pgoff;
symbols__insert(&curr_map->dso->symbols, pos);
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/6] perf thread-stack: Tidy thread_stack__push_cp() usage
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
2019-01-09 9:18 ` [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols Adrian Hunter
@ 2019-01-09 9:18 ` Adrian Hunter
2019-02-09 12:58 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 3/6] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables Adrian Hunter
` (4 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
If 'cp' is checked in thread_stack__push_cp() a number of error checks can
be removed, reducing code size and improving readability.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
tools/perf/util/thread-stack.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index d52f27f373ce..93ba3ab6a602 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -493,6 +493,9 @@ static int thread_stack__push_cp(struct thread_stack *ts, u64 ret_addr,
struct thread_stack_entry *tse;
int err;
+ if (!cp)
+ return -ENOMEM;
+
if (ts->cnt == ts->sz) {
err = thread_stack__grow(ts);
if (err)
@@ -576,8 +579,6 @@ static int thread_stack__bottom(struct thread_stack *ts,
cp = call_path__findnew(cpr, &cpr->call_path, sym, ip,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
return thread_stack__push_cp(ts, ip, sample->time, ref, cp,
true, false);
@@ -609,8 +610,6 @@ static int thread_stack__no_call_return(struct thread *thread,
cp = call_path__findnew(cpr, &cpr->call_path,
to_al->sym, sample->addr,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
return thread_stack__push_cp(ts, 0, sample->time, ref,
cp, true, false);
}
@@ -633,8 +632,6 @@ static int thread_stack__no_call_return(struct thread *thread,
/* This 'return' had no 'call', so push and pop top of stack */
cp = call_path__findnew(cpr, parent, from_al->sym, sample->ip,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
err = thread_stack__push_cp(ts, sample->addr, sample->time, ref, cp,
true, false);
@@ -680,8 +677,6 @@ static int thread_stack__trace_end(struct thread_stack *ts,
cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp, NULL, 0,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
ret_addr = sample->ip + sample->insn_len;
@@ -745,8 +740,6 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp,
to_al->sym, sample->addr,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
err = thread_stack__push_cp(ts, ret_addr, sample->time, ref,
cp, false, trace_end);
} else if (sample->flags & PERF_IP_FLAG_RETURN) {
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/6] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
2019-01-09 9:18 ` [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols Adrian Hunter
2019-01-09 9:18 ` [PATCH 2/6] perf thread-stack: Tidy thread_stack__push_cp() usage Adrian Hunter
@ 2019-01-09 9:18 ` Adrian Hunter
2019-02-09 12:58 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol Adrian Hunter
` (3 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
Make thread_stack__no_call_return() more readable by adding more local
variables.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
tools/perf/util/thread-stack.c | 35 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 93ba3ab6a602..7f8eff018c16 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -591,34 +591,36 @@ static int thread_stack__no_call_return(struct thread *thread,
struct addr_location *to_al, u64 ref)
{
struct call_path_root *cpr = ts->crp->cpr;
+ struct call_path *root = &cpr->call_path;
+ struct symbol *fsym = from_al->sym;
+ struct symbol *tsym = to_al->sym;
struct call_path *cp, *parent;
u64 ks = ts->kernel_start;
+ u64 addr = sample->addr;
+ u64 tm = sample->time;
+ u64 ip = sample->ip;
int err;
- if (sample->ip >= ks && sample->addr < ks) {
+ if (ip >= ks && addr < ks) {
/* Return to userspace, so pop all kernel addresses */
while (thread_stack__in_kernel(ts)) {
err = thread_stack__call_return(thread, ts, --ts->cnt,
- sample->time, ref,
- true);
+ tm, ref, true);
if (err)
return err;
}
/* If the stack is empty, push the userspace address */
if (!ts->cnt) {
- cp = call_path__findnew(cpr, &cpr->call_path,
- to_al->sym, sample->addr,
- ts->kernel_start);
- return thread_stack__push_cp(ts, 0, sample->time, ref,
- cp, true, false);
+ cp = call_path__findnew(cpr, root, tsym, addr, ks);
+ return thread_stack__push_cp(ts, 0, tm, ref, cp, true,
+ false);
}
- } else if (thread_stack__in_kernel(ts) && sample->ip < ks) {
+ } else if (thread_stack__in_kernel(ts) && ip < ks) {
/* Return to userspace, so pop all kernel addresses */
while (thread_stack__in_kernel(ts)) {
err = thread_stack__call_return(thread, ts, --ts->cnt,
- sample->time, ref,
- true);
+ tm, ref, true);
if (err)
return err;
}
@@ -627,19 +629,16 @@ static int thread_stack__no_call_return(struct thread *thread,
if (ts->cnt)
parent = ts->stack[ts->cnt - 1].cp;
else
- parent = &cpr->call_path;
+ parent = root;
/* This 'return' had no 'call', so push and pop top of stack */
- cp = call_path__findnew(cpr, parent, from_al->sym, sample->ip,
- ts->kernel_start);
+ cp = call_path__findnew(cpr, parent, fsym, ip, ks);
- err = thread_stack__push_cp(ts, sample->addr, sample->time, ref, cp,
- true, false);
+ err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
if (err)
return err;
- return thread_stack__pop_cp(thread, ts, sample->addr, sample->time, ref,
- to_al->sym);
+ return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
}
static int thread_stack__trace_begin(struct thread *thread,
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
` (2 preceding siblings ...)
2019-01-09 9:18 ` [PATCH 3/6] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables Adrian Hunter
@ 2019-01-09 9:18 ` Adrian Hunter
2019-02-06 12:39 ` Arnaldo Carvalho de Melo
2019-02-09 12:59 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return() Adrian Hunter
` (2 subsequent siblings)
6 siblings, 2 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
The compiler might optimize a call/ret combination by making it a jmp.
However the thread-stack does not presently cater for that, so that such
control flow is not visible in the call graph. Make it visible by recording
on the stack a branch to the start of a different symbol. Note, that means
when a ret pops the stack, all jmps must be popped off first.
Example:
$ cat jmp-to-fn.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
__attribute__((noinline)) int foo(void)
{
return bar() + 1;
}
int main()
{
return foo();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
$ objdump -d jmp-to-fn
<SNIP>
0000000000001040 <main>:
1040: 31 c0 xor %eax,%eax
1042: e9 09 01 00 00 jmpq 1150 <foo>
<SNIP>
0000000000001140 <bar>:
1140: b8 ff ff ff ff mov $0xffffffff,%eax
1145: c3 retq
<SNIP>
0000000000001150 <foo>:
1150: 31 c0 xor %eax,%eax
1152: e8 e9 ff ff ff callq 1140 <bar>
1157: 83 c0 01 add $0x1,%eax
115a: c3 retq
<SNIP>
$ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
$ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
2019-01-08 13:24:58.783069 Creating database...
2019-01-08 13:24:58.794650 Writing records...
2019-01-08 13:24:59.008050 Adding indexes
2019-01-08 13:24:59.015802 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db
Before:
main
-> bar
After:
main
-> foo
-> bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
.../scripts/python/export-to-postgresql.py | 2 +-
tools/perf/scripts/python/export-to-sqlite.py | 2 +-
tools/perf/util/thread-stack.c | 30 +++++++++++++++++--
tools/perf/util/thread-stack.h | 3 ++
4 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 0564dd7377f2..30130213da7e 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -478,7 +478,7 @@ if perf_db_export_calls:
'branch_count,'
'call_id,'
'return_id,'
- 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
+ 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
'parent_call_path_id'
' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
diff --git a/tools/perf/scripts/python/export-to-sqlite.py b/tools/perf/scripts/python/export-to-sqlite.py
index 245caf2643ed..ed237f2ed03f 100644
--- a/tools/perf/scripts/python/export-to-sqlite.py
+++ b/tools/perf/scripts/python/export-to-sqlite.py
@@ -320,7 +320,7 @@ if perf_db_export_calls:
'branch_count,'
'call_id,'
'return_id,'
- 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
+ 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
'parent_call_path_id'
' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 7f8eff018c16..f52c0f90915d 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -38,6 +38,7 @@
* @cp: call path
* @no_call: a 'call' was not seen
* @trace_end: a 'call' but trace ended
+ * @non_call: a branch but not a 'call' to the start of a different symbol
*/
struct thread_stack_entry {
u64 ret_addr;
@@ -47,6 +48,7 @@ struct thread_stack_entry {
struct call_path *cp;
bool no_call;
bool trace_end;
+ bool non_call;
};
/**
@@ -268,6 +270,8 @@ static int thread_stack__call_return(struct thread *thread,
cr.flags |= CALL_RETURN_NO_CALL;
if (no_return)
cr.flags |= CALL_RETURN_NO_RETURN;
+ if (tse->non_call)
+ cr.flags |= CALL_RETURN_NON_CALL;
return crp->process(&cr, crp->data);
}
@@ -510,6 +514,7 @@ static int thread_stack__push_cp(struct thread_stack *ts, u64 ret_addr,
tse->cp = cp;
tse->no_call = no_call;
tse->trace_end = trace_end;
+ tse->non_call = false;
return 0;
}
@@ -531,14 +536,16 @@ static int thread_stack__pop_cp(struct thread *thread, struct thread_stack *ts,
timestamp, ref, false);
}
- if (ts->stack[ts->cnt - 1].ret_addr == ret_addr) {
+ if (ts->stack[ts->cnt - 1].ret_addr == ret_addr &&
+ !ts->stack[ts->cnt - 1].non_call) {
return thread_stack__call_return(thread, ts, --ts->cnt,
timestamp, ref, false);
} else {
size_t i = ts->cnt - 1;
while (i--) {
- if (ts->stack[i].ret_addr != ret_addr)
+ if (ts->stack[i].ret_addr != ret_addr ||
+ ts->stack[i].non_call)
continue;
i += 1;
while (ts->cnt > i) {
@@ -757,6 +764,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
err = thread_stack__trace_begin(thread, ts, sample->time, ref);
} else if (sample->flags & PERF_IP_FLAG_TRACE_END) {
err = thread_stack__trace_end(ts, sample, ref);
+ } else if (sample->flags & PERF_IP_FLAG_BRANCH &&
+ from_al->sym != to_al->sym && to_al->sym &&
+ to_al->addr == to_al->sym->start) {
+ struct call_path_root *cpr = ts->crp->cpr;
+ struct call_path *cp;
+
+ /*
+ * The compiler might optimize a call/ret combination by making
+ * it a jmp. Make that visible by recording on the stack a
+ * branch to the start of a different symbol. Note, that means
+ * when a ret pops the stack, all jmps must be popped off first.
+ */
+ cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp,
+ to_al->sym, sample->addr,
+ ts->kernel_start);
+ err = thread_stack__push_cp(ts, 0, sample->time, ref, cp, false,
+ false);
+ if (!err)
+ ts->stack[ts->cnt - 1].non_call = true;
}
return err;
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index 1f626f4a1c40..b7c04e19ad41 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -35,10 +35,13 @@ struct call_path;
*
* CALL_RETURN_NO_CALL: 'return' but no matching 'call'
* CALL_RETURN_NO_RETURN: 'call' but no matching 'return'
+ * CALL_RETURN_NON_CALL: a branch but not a 'call' to the start of a different
+ * symbol
*/
enum {
CALL_RETURN_NO_CALL = 1 << 0,
CALL_RETURN_NO_RETURN = 1 << 1,
+ CALL_RETURN_NON_CALL = 1 << 2,
};
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return()
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
` (3 preceding siblings ...)
2019-01-09 9:18 ` [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol Adrian Hunter
@ 2019-01-09 9:18 ` Adrian Hunter
2019-02-22 9:48 ` Adrian Hunter
2019-02-28 7:49 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
2019-01-10 9:55 ` [PATCH 0/6] perf thread-stack: " Jiri Olsa
6 siblings, 2 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
Improve thread_stack__no_call_return() to better handle 'returns' that do
not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> bar
After:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
tools/perf/util/thread-stack.c | 49 +++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index f52c0f90915d..632c07a125ab 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -638,14 +638,57 @@ static int thread_stack__no_call_return(struct thread *thread,
else
parent = root;
- /* This 'return' had no 'call', so push and pop top of stack */
- cp = call_path__findnew(cpr, parent, fsym, ip, ks);
+ if (parent->sym == from_al->sym) {
+ /*
+ * At the bottom of the stack, assume the missing 'call' was
+ * before the trace started. So, pop the current symbol and push
+ * the 'to' symbol.
+ */
+ if (ts->cnt == 1) {
+ err = thread_stack__call_return(thread, ts, --ts->cnt,
+ tm, ref, false);
+ if (err)
+ return err;
+ }
+
+ if (!ts->cnt) {
+ cp = call_path__findnew(cpr, root, tsym, addr, ks);
+
+ return thread_stack__push_cp(ts, addr, tm, ref, cp,
+ true, false);
+ }
+
+ /*
+ * Otherwise assume the 'return' is being used as a jump (e.g.
+ * retpoline) and just push the 'to' symbol.
+ */
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
+
+ err = thread_stack__push_cp(ts, 0, tm, ref, cp, true, false);
+ if (!err)
+ ts->stack[ts->cnt - 1].non_call = true;
+
+ return err;
+ }
+
+ /*
+ * Assume 'parent' has not yet returned, so push 'to', and then push and
+ * pop 'from'.
+ */
+
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
if (err)
return err;
- return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
+ cp = call_path__findnew(cpr, cp, fsym, ip, ks);
+
+ err = thread_stack__push_cp(ts, ip, tm, ref, cp, true, false);
+ if (err)
+ return err;
+
+ return thread_stack__call_return(thread, ts, --ts->cnt, tm, ref, false);
}
static int thread_stack__trace_begin(struct thread *thread,
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/6] perf thread-stack: Hide x86 retpolines
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
` (4 preceding siblings ...)
2019-01-09 9:18 ` [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return() Adrian Hunter
@ 2019-01-09 9:18 ` Adrian Hunter
2019-01-09 15:38 ` Jiri Olsa
` (2 more replies)
2019-01-10 9:55 ` [PATCH 0/6] perf thread-stack: " Jiri Olsa
6 siblings, 3 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-01-09 9:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.
This does not handle function return thunks, nor is there any improvement
for the handling of inline thunks or extern thunks.
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
After:
main
-> foo
-> bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
tools/perf/util/thread-stack.c | 112 ++++++++++++++++++++++++++++++++-
1 file changed, 109 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 632c07a125ab..805e30836460 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -20,6 +20,7 @@
#include "thread.h"
#include "event.h"
#include "machine.h"
+#include "env.h"
#include "util.h"
#include "debug.h"
#include "symbol.h"
@@ -29,6 +30,19 @@
#define STACK_GROWTH 2048
+/*
+ * State of retpoline detection.
+ *
+ * RETPOLINE_NONE: no retpoline detection
+ * X86_RETPOLINE_POSSIBLE: x86 retpoline possible
+ * X86_RETPOLINE_DETECTED: x86 retpoline detected
+ */
+enum retpoline_state_t {
+ RETPOLINE_NONE,
+ X86_RETPOLINE_POSSIBLE,
+ X86_RETPOLINE_DETECTED,
+};
+
/**
* struct thread_stack_entry - thread stack entry.
* @ret_addr: return address
@@ -64,6 +78,7 @@ struct thread_stack_entry {
* @crp: call/return processor
* @comm: current comm
* @arr_sz: size of array if this is the first element of an array
+ * @rstate: used to detect retpolines
*/
struct thread_stack {
struct thread_stack_entry *stack;
@@ -76,6 +91,7 @@ struct thread_stack {
struct call_return_processor *crp;
struct comm *comm;
unsigned int arr_sz;
+ enum retpoline_state_t rstate;
};
/*
@@ -115,10 +131,16 @@ static int thread_stack__init(struct thread_stack *ts, struct thread *thread,
if (err)
return err;
- if (thread->mg && thread->mg->machine)
- ts->kernel_start = machine__kernel_start(thread->mg->machine);
- else
+ if (thread->mg && thread->mg->machine) {
+ struct machine *machine = thread->mg->machine;
+ const char *arch = perf_env__arch(machine->env);
+
+ ts->kernel_start = machine__kernel_start(machine);
+ if (!strcmp(arch, "x86"))
+ ts->rstate = X86_RETPOLINE_POSSIBLE;
+ } else {
ts->kernel_start = 1ULL << 63;
+ }
ts->crp = crp;
return 0;
@@ -733,6 +755,70 @@ static int thread_stack__trace_end(struct thread_stack *ts,
false, true);
}
+static bool is_x86_retpoline(const char *name)
+{
+ const char *p = strstr(name, "__x86_indirect_thunk_");
+
+ return p == name || !strcmp(name, "__indirect_thunk_start");
+}
+
+/*
+ * x86 retpoline functions pollute the call graph. This function removes them.
+ * This does not handle function return thunks, nor is there any improvement
+ * for the handling of inline thunks or extern thunks.
+ */
+static int thread_stack__x86_retpoline(struct thread_stack *ts,
+ struct perf_sample *sample,
+ struct addr_location *to_al)
+{
+ struct thread_stack_entry *tse = &ts->stack[ts->cnt - 1];
+ struct call_path_root *cpr = ts->crp->cpr;
+ struct symbol *sym = tse->cp->sym;
+ struct symbol *tsym = to_al->sym;
+ struct call_path *cp;
+
+ if (sym && sym->name && is_x86_retpoline(sym->name)) {
+ /*
+ * This is a x86 retpoline fn. It pollutes the call graph by
+ * showing up everywhere there is an indirect branch, but does
+ * not itself mean anything. Here the top-of-stack is removed,
+ * by decrementing the stack count, and then further down, the
+ * resulting top-of-stack is replaced with the actual target.
+ * The result is that the retpoline functions will no longer
+ * appear in the call graph. Note this only affects the call
+ * graph, since all the original branches are left unchanged.
+ */
+ ts->cnt -= 1;
+ sym = ts->stack[ts->cnt - 2].cp->sym;
+ if (sym && sym == tsym && to_al->addr != tsym->start) {
+ /*
+ * Target is back to the middle of the symbol we came
+ * from so assume it is an indirect jmp and forget it
+ * altogether.
+ */
+ ts->cnt -= 1;
+ return 0;
+ }
+ } else if (sym && sym == tsym) {
+ /*
+ * Target is back to the symbol we came from so assume it is an
+ * indirect jmp and forget it altogether.
+ */
+ ts->cnt -= 1;
+ return 0;
+ }
+
+ cp = call_path__findnew(cpr, ts->stack[ts->cnt - 2].cp, tsym,
+ sample->addr, ts->kernel_start);
+ if (!cp)
+ return -ENOMEM;
+
+ /* Replace the top-of-stack with the actual target */
+ ts->stack[ts->cnt - 1].cp = cp;
+
+ return 0;
+}
+
int thread_stack__process(struct thread *thread, struct comm *comm,
struct perf_sample *sample,
struct addr_location *from_al,
@@ -740,6 +826,7 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
struct call_return_processor *crp)
{
struct thread_stack *ts = thread__stack(thread, sample->cpu);
+ enum retpoline_state_t rstate;
int err = 0;
if (ts && !ts->crp) {
@@ -755,6 +842,10 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
ts->comm = comm;
}
+ rstate = ts->rstate;
+ if (rstate == X86_RETPOLINE_DETECTED)
+ ts->rstate = X86_RETPOLINE_POSSIBLE;
+
/* Flush stack on exec */
if (ts->comm != comm && thread->pid_ == thread->tid) {
err = __thread_stack__flush(thread, ts);
@@ -791,10 +882,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
ts->kernel_start);
err = thread_stack__push_cp(ts, ret_addr, sample->time, ref,
cp, false, trace_end);
+
+ /*
+ * A call to the same symbol but not the start of the symbol,
+ * may be the start of a x86 retpoline.
+ */
+ if (!err && rstate == X86_RETPOLINE_POSSIBLE && to_al->sym &&
+ from_al->sym == to_al->sym &&
+ to_al->addr != to_al->sym->start)
+ ts->rstate = X86_RETPOLINE_DETECTED;
+
} else if (sample->flags & PERF_IP_FLAG_RETURN) {
if (!sample->ip || !sample->addr)
return 0;
+ /* x86 retpoline 'return' doesn't match the stack */
+ if (rstate == X86_RETPOLINE_DETECTED && ts->cnt > 2 &&
+ ts->stack[ts->cnt - 1].ret_addr != sample->addr)
+ return thread_stack__x86_retpoline(ts, sample, to_al);
+
err = thread_stack__pop_cp(thread, ts, sample->addr,
sample->time, ref, from_al->sym);
if (err) {
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 6/6] perf thread-stack: Hide x86 retpolines
2019-01-09 9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
@ 2019-01-09 15:38 ` Jiri Olsa
2019-01-10 7:52 ` Adrian Hunter
2019-02-22 19:42 ` Arnaldo Carvalho de Melo
2019-02-28 7:49 ` [tip:perf/core] " tip-bot for Adrian Hunter
2 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2019-01-09 15:38 UTC (permalink / raw)
To: Adrian Hunter; +Cc: Arnaldo Carvalho de Melo, linux-kernel
On Wed, Jan 09, 2019 at 11:18:35AM +0200, Adrian Hunter wrote:
> x86 retpoline functions pollute the call graph by showing up everywhere
> there is an indirect branch, but they do not really mean anything. Make
> changes so that the default retpoline functions will no longer appear in
> the call graph. Note this only affects the call graph, since all the
> original branches are left unchanged.
>
> This does not handle function return thunks, nor is there any improvement
> for the handling of inline thunks or extern thunks.
>
> Example:
>
> $ cat simple-retpoline.c
> __attribute__((noinline)) int bar(void)
> {
> return -1;
> }
>
> int foo(void)
> {
> return bar() + 1;
> }
>
> __attribute__((indirect_branch("thunk"))) int main()
> {
> int (*volatile fn)(void) = foo;
>
> fn();
> return fn();
> }
> $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
> $ objdump -d simple-retpoline
> <SNIP>
> 0000000000001040 <main>:
> 1040: 48 83 ec 18 sub $0x18,%rsp
> 1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
> 104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
> 1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> 1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
> 105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> 105f: 48 83 c4 18 add $0x18,%rsp
> 1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
> <SNIP>
> 0000000000001160 <bar>:
> 1160: b8 ff ff ff ff mov $0xffffffff,%eax
> 1165: c3 retq
> <SNIP>
> 0000000000001170 <foo>:
> 1170: e8 eb ff ff ff callq 1160 <bar>
> 1175: 83 c0 01 add $0x1,%eax
> 1178: c3 retq
> 0000000000001179 <__x86_indirect_thunk_rax>:
> 1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
> 117e: f3 90 pause
> 1180: 0f ae e8 lfence
> 1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
> 1185: 48 89 04 24 mov %rax,(%rsp)
> 1189: c3 retq
> <SNIP>
> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
> $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
> 2019-01-08 14:03:37.851655 Creating database...
> 2019-01-08 14:03:37.863256 Writing records...
> 2019-01-08 14:03:38.069750 Adding indexes
> 2019-01-08 14:03:38.078799 Done
> $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
>
> Before:
>
> main
> -> __x86_indirect_thunk_rax
> -> __x86_indirect_thunk_rax
> -> foo
> -> bar
>
> After:
>
> main
> -> foo
> -> bar
>
hi,
I still see them in kernel report.. after a while the output is shifted
too far right out of the screen due to (I guess) this call/ret imbalance
[root@krava perf]# ./perf-with-kcore record pt -e intel_pt/cyc/k -a -- sleep 1
[root@krava perf]# ./perf-with-kcore script pt --call-trace -C 0
Xorg 2115 [000] 5850.032452321: cbr: 36 freq: 3621 MHz (171%)
Xorg 2115 [000] 5850.032452351: ([kernel.kallsyms]) __list_add_valid
Xorg 2115 [000] 5850.032452380: ([kernel.kallsyms]) perf_pmu_enable.part.101
Xorg 2115 [000] 5850.032452380: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452415: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452415: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452442: ([kernel.kallsyms]) __list_add_valid
Xorg 2115 [000] 5850.032452457: ([kernel.kallsyms]) rb_next
Xorg 2115 [000] 5850.032452494: ([kernel.kallsyms]) __x86_indirect_thunk_r12
Xorg 2115 [000] 5850.032452494: ([kernel.kallsyms]) __x86_indirect_thunk_r12
Xorg 2115 [000] 5850.032452535: ([kernel.kallsyms]) group_sched_in
Xorg 2115 [000] 5850.032452535: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452535: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452548: ([kernel.kallsyms]) event_sched_in.isra.113
Xorg 2115 [000] 5850.032452548: ([kernel.kallsyms]) perf_event_set_state.part.70
Xorg 2115 [000] 5850.032452548: ([kernel.kallsyms]) perf_event_update_time
Xorg 2115 [000] 5850.032452562: ([kernel.kallsyms]) perf_log_itrace_start
Xorg 2115 [000] 5850.032452562: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452562: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) perf_event_update_userpage
Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) calc_timer_values
Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) sched_clock_cpu
Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) sched_clock
Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) native_sched_clock
Xorg 2115 [000] 5850.032452793: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452793: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) arch_perf_update_userpage
Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) using_native_sched_clock
Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) sched_clock_stable
Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) cyc2ns_read_begin
Xorg 2115 [000] 5850.032453157: ([kernel.kallsyms]) cyc2ns_read_end
Xorg 2115 [000] 5850.032453194: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032453194: ([kernel.kallsyms]) __indirect_thunk_start
Xorg 2115 [000] 5850.032453206: ([kernel.kallsyms]) __list_add_valid
jirka
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/6] perf thread-stack: Hide x86 retpolines
2019-01-09 15:38 ` Jiri Olsa
@ 2019-01-10 7:52 ` Adrian Hunter
0 siblings, 0 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-01-10 7:52 UTC (permalink / raw)
To: Jiri Olsa; +Cc: Arnaldo Carvalho de Melo, linux-kernel
On 9/01/19 5:38 PM, Jiri Olsa wrote:
> On Wed, Jan 09, 2019 at 11:18:35AM +0200, Adrian Hunter wrote:
>> x86 retpoline functions pollute the call graph by showing up everywhere
>> there is an indirect branch, but they do not really mean anything. Make
>> changes so that the default retpoline functions will no longer appear in
>> the call graph. Note this only affects the call graph, since all the
>> original branches are left unchanged.
>>
>> This does not handle function return thunks, nor is there any improvement
>> for the handling of inline thunks or extern thunks.
>>
>> Example:
>>
>> $ cat simple-retpoline.c
>> __attribute__((noinline)) int bar(void)
>> {
>> return -1;
>> }
>>
>> int foo(void)
>> {
>> return bar() + 1;
>> }
>>
>> __attribute__((indirect_branch("thunk"))) int main()
>> {
>> int (*volatile fn)(void) = foo;
>>
>> fn();
>> return fn();
>> }
>> $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
>> $ objdump -d simple-retpoline
>> <SNIP>
>> 0000000000001040 <main>:
>> 1040: 48 83 ec 18 sub $0x18,%rsp
>> 1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
>> 104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
>> 1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
>> 1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
>> 105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
>> 105f: 48 83 c4 18 add $0x18,%rsp
>> 1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
>> <SNIP>
>> 0000000000001160 <bar>:
>> 1160: b8 ff ff ff ff mov $0xffffffff,%eax
>> 1165: c3 retq
>> <SNIP>
>> 0000000000001170 <foo>:
>> 1170: e8 eb ff ff ff callq 1160 <bar>
>> 1175: 83 c0 01 add $0x1,%eax
>> 1178: c3 retq
>> 0000000000001179 <__x86_indirect_thunk_rax>:
>> 1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
>> 117e: f3 90 pause
>> 1180: 0f ae e8 lfence
>> 1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
>> 1185: 48 89 04 24 mov %rax,(%rsp)
>> 1189: c3 retq
>> <SNIP>
>> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
>> $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
>> 2019-01-08 14:03:37.851655 Creating database...
>> 2019-01-08 14:03:37.863256 Writing records...
>> 2019-01-08 14:03:38.069750 Adding indexes
>> 2019-01-08 14:03:38.078799 Done
>> $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
>>
>> Before:
>>
>> main
>> -> __x86_indirect_thunk_rax
>> -> __x86_indirect_thunk_rax
>> -> foo
>> -> bar
>>
>> After:
>>
>> main
>> -> foo
>> -> bar
>>
>
> hi,
> I still see them in kernel report.. after a while the output is shifted
> too far right out of the screen due to (I guess) this call/ret imbalance
>
>
> [root@krava perf]# ./perf-with-kcore record pt -e intel_pt/cyc/k -a -- sleep 1
> [root@krava perf]# ./perf-with-kcore script pt --call-trace -C 0
> Xorg 2115 [000] 5850.032452321: cbr: 36 freq: 3621 MHz (171%)
> Xorg 2115 [000] 5850.032452351: ([kernel.kallsyms]) __list_add_valid
> Xorg 2115 [000] 5850.032452380: ([kernel.kallsyms]) perf_pmu_enable.part.101
> Xorg 2115 [000] 5850.032452380: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452415: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452415: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452442: ([kernel.kallsyms]) __list_add_valid
> Xorg 2115 [000] 5850.032452457: ([kernel.kallsyms]) rb_next
> Xorg 2115 [000] 5850.032452494: ([kernel.kallsyms]) __x86_indirect_thunk_r12
> Xorg 2115 [000] 5850.032452494: ([kernel.kallsyms]) __x86_indirect_thunk_r12
> Xorg 2115 [000] 5850.032452535: ([kernel.kallsyms]) group_sched_in
> Xorg 2115 [000] 5850.032452535: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452535: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452548: ([kernel.kallsyms]) event_sched_in.isra.113
> Xorg 2115 [000] 5850.032452548: ([kernel.kallsyms]) perf_event_set_state.part.70
> Xorg 2115 [000] 5850.032452548: ([kernel.kallsyms]) perf_event_update_time
> Xorg 2115 [000] 5850.032452562: ([kernel.kallsyms]) perf_log_itrace_start
> Xorg 2115 [000] 5850.032452562: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452562: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) perf_event_update_userpage
> Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) calc_timer_values
> Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) sched_clock_cpu
> Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) sched_clock
> Xorg 2115 [000] 5850.032452600: ([kernel.kallsyms]) native_sched_clock
> Xorg 2115 [000] 5850.032452793: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452793: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) arch_perf_update_userpage
> Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) using_native_sched_clock
> Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) sched_clock_stable
> Xorg 2115 [000] 5850.032452867: ([kernel.kallsyms]) cyc2ns_read_begin
> Xorg 2115 [000] 5850.032453157: ([kernel.kallsyms]) cyc2ns_read_end
> Xorg 2115 [000] 5850.032453194: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032453194: ([kernel.kallsyms]) __indirect_thunk_start
> Xorg 2115 [000] 5850.032453206: ([kernel.kallsyms]) __list_add_valid
>
>
> jirka
>
These patches are for db export and the context-sensitive call chain report
in the exported-sql-viewer.py python script. The callchains you see above
are done at a lower level, before symbol resolution. If you don't want to
see the thunks you can grep them out. But more work is needed on them too.
I'll add it to the TODO list.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/6] perf thread-stack: x86 retpolines
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
` (5 preceding siblings ...)
2019-01-09 9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
@ 2019-01-10 9:55 ` Jiri Olsa
2019-02-06 9:10 ` Adrian Hunter
6 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2019-01-10 9:55 UTC (permalink / raw)
To: Adrian Hunter; +Cc: Arnaldo Carvalho de Melo, linux-kernel
On Wed, Jan 09, 2019 at 11:18:29AM +0200, Adrian Hunter wrote:
> Hi
>
> Here are some patches to improve the exported call graph, primarily to deal
> with x86 retpolines.
>
>
> Adrian Hunter (6):
> perf tools: Fix split_kallsyms_for_kcore for trampoline symbols
> perf thread-stack: Tidy thread_stack__push_cp() usage
> perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
> perf thread-stack: Represent jmps to the start of a different symbol
> perf thread-stack: Improve thread_stack__no_call_return()
> perf thread-stack: Hide x86 retpolines
Acked-by: Jiri Olsa <jolsa@kernel.org>
thanks,
jirka
>
> tools/perf/scripts/python/export-to-postgresql.py | 2 +-
> tools/perf/scripts/python/export-to-sqlite.py | 2 +-
> tools/perf/util/symbol.c | 2 +
> tools/perf/util/thread-stack.c | 235 ++++++++++++++++++----
> tools/perf/util/thread-stack.h | 3 +
> 5 files changed, 208 insertions(+), 36 deletions(-)
>
>
> Regards
> Adrian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/6] perf thread-stack: x86 retpolines
2019-01-10 9:55 ` [PATCH 0/6] perf thread-stack: " Jiri Olsa
@ 2019-02-06 9:10 ` Adrian Hunter
0 siblings, 0 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-02-06 9:10 UTC (permalink / raw)
To: Jiri Olsa; +Cc: Arnaldo Carvalho de Melo, linux-kernel
On 10/01/19 11:55 AM, Jiri Olsa wrote:
> On Wed, Jan 09, 2019 at 11:18:29AM +0200, Adrian Hunter wrote:
>> Hi
>>
>> Here are some patches to improve the exported call graph, primarily to deal
>> with x86 retpolines.
>>
>>
>> Adrian Hunter (6):
>> perf tools: Fix split_kallsyms_for_kcore for trampoline symbols
>> perf thread-stack: Tidy thread_stack__push_cp() usage
>> perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
>> perf thread-stack: Represent jmps to the start of a different symbol
>> perf thread-stack: Improve thread_stack__no_call_return()
>> perf thread-stack: Hide x86 retpolines
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
Can these be applied?
>
> thanks,
> jirka
>
>>
>> tools/perf/scripts/python/export-to-postgresql.py | 2 +-
>> tools/perf/scripts/python/export-to-sqlite.py | 2 +-
>> tools/perf/util/symbol.c | 2 +
>> tools/perf/util/thread-stack.c | 235 ++++++++++++++++++----
>> tools/perf/util/thread-stack.h | 3 +
>> 5 files changed, 208 insertions(+), 36 deletions(-)
>>
>>
>> Regards
>> Adrian
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol
2019-01-09 9:18 ` [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol Adrian Hunter
@ 2019-02-06 12:39 ` Arnaldo Carvalho de Melo
2019-02-06 13:25 ` Adrian Hunter
2019-02-09 12:59 ` [tip:perf/core] " tip-bot for Adrian Hunter
1 sibling, 1 reply; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-02-06 12:39 UTC (permalink / raw)
To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel
Em Wed, Jan 09, 2019 at 11:18:33AM +0200, Adrian Hunter escreveu:
> The compiler might optimize a call/ret combination by making it a jmp.
> However the thread-stack does not presently cater for that, so that such
> control flow is not visible in the call graph. Make it visible by recording
> on the stack a branch to the start of a different symbol. Note, that means
> when a ret pops the stack, all jmps must be popped off first.
>
> Example:
>
> $ cat jmp-to-fn.c
> __attribute__((noinline)) int bar(void)
> {
> return -1;
> }
>
> __attribute__((noinline)) int foo(void)
> {
> return bar() + 1;
> }
>
> int main()
> {
> return foo();
> }
> $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
> $ objdump -d jmp-to-fn
> <SNIP>
> 0000000000001040 <main>:
> 1040: 31 c0 xor %eax,%eax
> 1042: e9 09 01 00 00 jmpq 1150 <foo>
> <SNIP>
> 0000000000001140 <bar>:
> 1140: b8 ff ff ff ff mov $0xffffffff,%eax
> 1145: c3 retq
> <SNIP>
> 0000000000001150 <foo>:
> 1150: 31 c0 xor %eax,%eax
> 1152: e8 e9 ff ff ff callq 1140 <bar>
> 1157: 83 c0 01 add $0x1,%eax
> 115a: c3 retq
> <SNIP>
> $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
> $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
> 2019-01-08 13:24:58.783069 Creating database...
> 2019-01-08 13:24:58.794650 Writing records...
> 2019-01-08 13:24:59.008050 Adding indexes
> 2019-01-08 13:24:59.015802 Done
> $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db
So, while testing this I notice that we need to do the python3
conversions for these scripts as well:
[acme@quaco adrian]$ objdump -d jmp-to-fn > /tmp/bla
[acme@quaco adrian]$ vim /tmp/bla
[acme@quaco adrian]$ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
Couldn't synthesize bpf events.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB jmp-to-fn.perf.data ]
[acme@quaco adrian]$ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
File "/home/acme/libexec/perf-core/scripts/python/export-to-sqlite.py", line 103
print datetime.datetime.today(), "Creating database..."
^
SyntaxError: invalid syntax
Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-sqlite.py
[acme@quaco adrian]$
Mostly are simple, i.e. in this case we need something like:
print(str(datetime.datetime.today()) + "Creating database...")
for a really crude conversion, but one that would make this work with
both python2 and python3 (make PYTHON=python3 when building perf).
> Before:
>
> main
> -> bar
>
> After:
>
> main
> -> foo
> -> bar
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> .../scripts/python/export-to-postgresql.py | 2 +-
> tools/perf/scripts/python/export-to-sqlite.py | 2 +-
> tools/perf/util/thread-stack.c | 30 +++++++++++++++++--
> tools/perf/util/thread-stack.h | 3 ++
> 4 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
> index 0564dd7377f2..30130213da7e 100644
> --- a/tools/perf/scripts/python/export-to-postgresql.py
> +++ b/tools/perf/scripts/python/export-to-postgresql.py
> @@ -478,7 +478,7 @@ if perf_db_export_calls:
> 'branch_count,'
> 'call_id,'
> 'return_id,'
> - 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
> + 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
> 'parent_call_path_id'
> ' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
>
> diff --git a/tools/perf/scripts/python/export-to-sqlite.py b/tools/perf/scripts/python/export-to-sqlite.py
> index 245caf2643ed..ed237f2ed03f 100644
> --- a/tools/perf/scripts/python/export-to-sqlite.py
> +++ b/tools/perf/scripts/python/export-to-sqlite.py
> @@ -320,7 +320,7 @@ if perf_db_export_calls:
> 'branch_count,'
> 'call_id,'
> 'return_id,'
> - 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
> + 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
> 'parent_call_path_id'
> ' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
>
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index 7f8eff018c16..f52c0f90915d 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -38,6 +38,7 @@
> * @cp: call path
> * @no_call: a 'call' was not seen
> * @trace_end: a 'call' but trace ended
> + * @non_call: a branch but not a 'call' to the start of a different symbol
> */
> struct thread_stack_entry {
> u64 ret_addr;
> @@ -47,6 +48,7 @@ struct thread_stack_entry {
> struct call_path *cp;
> bool no_call;
> bool trace_end;
> + bool non_call;
> };
>
> /**
> @@ -268,6 +270,8 @@ static int thread_stack__call_return(struct thread *thread,
> cr.flags |= CALL_RETURN_NO_CALL;
> if (no_return)
> cr.flags |= CALL_RETURN_NO_RETURN;
> + if (tse->non_call)
> + cr.flags |= CALL_RETURN_NON_CALL;
>
> return crp->process(&cr, crp->data);
> }
> @@ -510,6 +514,7 @@ static int thread_stack__push_cp(struct thread_stack *ts, u64 ret_addr,
> tse->cp = cp;
> tse->no_call = no_call;
> tse->trace_end = trace_end;
> + tse->non_call = false;
>
> return 0;
> }
> @@ -531,14 +536,16 @@ static int thread_stack__pop_cp(struct thread *thread, struct thread_stack *ts,
> timestamp, ref, false);
> }
>
> - if (ts->stack[ts->cnt - 1].ret_addr == ret_addr) {
> + if (ts->stack[ts->cnt - 1].ret_addr == ret_addr &&
> + !ts->stack[ts->cnt - 1].non_call) {
> return thread_stack__call_return(thread, ts, --ts->cnt,
> timestamp, ref, false);
> } else {
> size_t i = ts->cnt - 1;
>
> while (i--) {
> - if (ts->stack[i].ret_addr != ret_addr)
> + if (ts->stack[i].ret_addr != ret_addr ||
> + ts->stack[i].non_call)
> continue;
> i += 1;
> while (ts->cnt > i) {
> @@ -757,6 +764,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
> err = thread_stack__trace_begin(thread, ts, sample->time, ref);
> } else if (sample->flags & PERF_IP_FLAG_TRACE_END) {
> err = thread_stack__trace_end(ts, sample, ref);
> + } else if (sample->flags & PERF_IP_FLAG_BRANCH &&
> + from_al->sym != to_al->sym && to_al->sym &&
> + to_al->addr == to_al->sym->start) {
> + struct call_path_root *cpr = ts->crp->cpr;
> + struct call_path *cp;
> +
> + /*
> + * The compiler might optimize a call/ret combination by making
> + * it a jmp. Make that visible by recording on the stack a
> + * branch to the start of a different symbol. Note, that means
> + * when a ret pops the stack, all jmps must be popped off first.
> + */
> + cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp,
> + to_al->sym, sample->addr,
> + ts->kernel_start);
> + err = thread_stack__push_cp(ts, 0, sample->time, ref, cp, false,
> + false);
> + if (!err)
> + ts->stack[ts->cnt - 1].non_call = true;
> }
>
> return err;
> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
> index 1f626f4a1c40..b7c04e19ad41 100644
> --- a/tools/perf/util/thread-stack.h
> +++ b/tools/perf/util/thread-stack.h
> @@ -35,10 +35,13 @@ struct call_path;
> *
> * CALL_RETURN_NO_CALL: 'return' but no matching 'call'
> * CALL_RETURN_NO_RETURN: 'call' but no matching 'return'
> + * CALL_RETURN_NON_CALL: a branch but not a 'call' to the start of a different
> + * symbol
> */
> enum {
> CALL_RETURN_NO_CALL = 1 << 0,
> CALL_RETURN_NO_RETURN = 1 << 1,
> + CALL_RETURN_NON_CALL = 1 << 2,
> };
>
> /**
> --
> 2.17.1
--
- Arnaldo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol
2019-02-06 12:39 ` Arnaldo Carvalho de Melo
@ 2019-02-06 13:25 ` Adrian Hunter
0 siblings, 0 replies; 23+ messages in thread
From: Adrian Hunter @ 2019-02-06 13:25 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
On 6/02/19 2:39 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jan 09, 2019 at 11:18:33AM +0200, Adrian Hunter escreveu:
>> The compiler might optimize a call/ret combination by making it a jmp.
>> However the thread-stack does not presently cater for that, so that such
>> control flow is not visible in the call graph. Make it visible by recording
>> on the stack a branch to the start of a different symbol. Note, that means
>> when a ret pops the stack, all jmps must be popped off first.
>>
>> Example:
>>
>> $ cat jmp-to-fn.c
>> __attribute__((noinline)) int bar(void)
>> {
>> return -1;
>> }
>>
>> __attribute__((noinline)) int foo(void)
>> {
>> return bar() + 1;
>> }
>>
>> int main()
>> {
>> return foo();
>> }
>> $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
>> $ objdump -d jmp-to-fn
>> <SNIP>
>> 0000000000001040 <main>:
>> 1040: 31 c0 xor %eax,%eax
>> 1042: e9 09 01 00 00 jmpq 1150 <foo>
>> <SNIP>
>> 0000000000001140 <bar>:
>> 1140: b8 ff ff ff ff mov $0xffffffff,%eax
>> 1145: c3 retq
>> <SNIP>
>> 0000000000001150 <foo>:
>> 1150: 31 c0 xor %eax,%eax
>> 1152: e8 e9 ff ff ff callq 1140 <bar>
>> 1157: 83 c0 01 add $0x1,%eax
>> 115a: c3 retq
>> <SNIP>
>> $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
>> $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
>> 2019-01-08 13:24:58.783069 Creating database...
>> 2019-01-08 13:24:58.794650 Writing records...
>> 2019-01-08 13:24:59.008050 Adding indexes
>> 2019-01-08 13:24:59.015802 Done
>> $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db
>
> So, while testing this I notice that we need to do the python3
> conversions for these scripts as well:
I will look at it, but an you take these patches as they are for now?
>
> [acme@quaco adrian]$ objdump -d jmp-to-fn > /tmp/bla
> [acme@quaco adrian]$ vim /tmp/bla
> [acme@quaco adrian]$ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
> Couldn't synthesize bpf events.
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB jmp-to-fn.perf.data ]
> [acme@quaco adrian]$ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
> File "/home/acme/libexec/perf-core/scripts/python/export-to-sqlite.py", line 103
> print datetime.datetime.today(), "Creating database..."
> ^
> SyntaxError: invalid syntax
> Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-sqlite.py
> [acme@quaco adrian]$
>
> Mostly are simple, i.e. in this case we need something like:
>
> print(str(datetime.datetime.today()) + "Creating database...")
>
>
> for a really crude conversion, but one that would make this work with
> both python2 and python3 (make PYTHON=python3 when building perf).
>
>
>> Before:
>>
>> main
>> -> bar
>>
>> After:
>>
>> main
>> -> foo
>> -> bar
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>> .../scripts/python/export-to-postgresql.py | 2 +-
>> tools/perf/scripts/python/export-to-sqlite.py | 2 +-
>> tools/perf/util/thread-stack.c | 30 +++++++++++++++++--
>> tools/perf/util/thread-stack.h | 3 ++
>> 4 files changed, 33 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
>> index 0564dd7377f2..30130213da7e 100644
>> --- a/tools/perf/scripts/python/export-to-postgresql.py
>> +++ b/tools/perf/scripts/python/export-to-postgresql.py
>> @@ -478,7 +478,7 @@ if perf_db_export_calls:
>> 'branch_count,'
>> 'call_id,'
>> 'return_id,'
>> - 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
>> + 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
>> 'parent_call_path_id'
>> ' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
>>
>> diff --git a/tools/perf/scripts/python/export-to-sqlite.py b/tools/perf/scripts/python/export-to-sqlite.py
>> index 245caf2643ed..ed237f2ed03f 100644
>> --- a/tools/perf/scripts/python/export-to-sqlite.py
>> +++ b/tools/perf/scripts/python/export-to-sqlite.py
>> @@ -320,7 +320,7 @@ if perf_db_export_calls:
>> 'branch_count,'
>> 'call_id,'
>> 'return_id,'
>> - 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
>> + 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
>> 'parent_call_path_id'
>> ' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
>>
>> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
>> index 7f8eff018c16..f52c0f90915d 100644
>> --- a/tools/perf/util/thread-stack.c
>> +++ b/tools/perf/util/thread-stack.c
>> @@ -38,6 +38,7 @@
>> * @cp: call path
>> * @no_call: a 'call' was not seen
>> * @trace_end: a 'call' but trace ended
>> + * @non_call: a branch but not a 'call' to the start of a different symbol
>> */
>> struct thread_stack_entry {
>> u64 ret_addr;
>> @@ -47,6 +48,7 @@ struct thread_stack_entry {
>> struct call_path *cp;
>> bool no_call;
>> bool trace_end;
>> + bool non_call;
>> };
>>
>> /**
>> @@ -268,6 +270,8 @@ static int thread_stack__call_return(struct thread *thread,
>> cr.flags |= CALL_RETURN_NO_CALL;
>> if (no_return)
>> cr.flags |= CALL_RETURN_NO_RETURN;
>> + if (tse->non_call)
>> + cr.flags |= CALL_RETURN_NON_CALL;
>>
>> return crp->process(&cr, crp->data);
>> }
>> @@ -510,6 +514,7 @@ static int thread_stack__push_cp(struct thread_stack *ts, u64 ret_addr,
>> tse->cp = cp;
>> tse->no_call = no_call;
>> tse->trace_end = trace_end;
>> + tse->non_call = false;
>>
>> return 0;
>> }
>> @@ -531,14 +536,16 @@ static int thread_stack__pop_cp(struct thread *thread, struct thread_stack *ts,
>> timestamp, ref, false);
>> }
>>
>> - if (ts->stack[ts->cnt - 1].ret_addr == ret_addr) {
>> + if (ts->stack[ts->cnt - 1].ret_addr == ret_addr &&
>> + !ts->stack[ts->cnt - 1].non_call) {
>> return thread_stack__call_return(thread, ts, --ts->cnt,
>> timestamp, ref, false);
>> } else {
>> size_t i = ts->cnt - 1;
>>
>> while (i--) {
>> - if (ts->stack[i].ret_addr != ret_addr)
>> + if (ts->stack[i].ret_addr != ret_addr ||
>> + ts->stack[i].non_call)
>> continue;
>> i += 1;
>> while (ts->cnt > i) {
>> @@ -757,6 +764,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
>> err = thread_stack__trace_begin(thread, ts, sample->time, ref);
>> } else if (sample->flags & PERF_IP_FLAG_TRACE_END) {
>> err = thread_stack__trace_end(ts, sample, ref);
>> + } else if (sample->flags & PERF_IP_FLAG_BRANCH &&
>> + from_al->sym != to_al->sym && to_al->sym &&
>> + to_al->addr == to_al->sym->start) {
>> + struct call_path_root *cpr = ts->crp->cpr;
>> + struct call_path *cp;
>> +
>> + /*
>> + * The compiler might optimize a call/ret combination by making
>> + * it a jmp. Make that visible by recording on the stack a
>> + * branch to the start of a different symbol. Note, that means
>> + * when a ret pops the stack, all jmps must be popped off first.
>> + */
>> + cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp,
>> + to_al->sym, sample->addr,
>> + ts->kernel_start);
>> + err = thread_stack__push_cp(ts, 0, sample->time, ref, cp, false,
>> + false);
>> + if (!err)
>> + ts->stack[ts->cnt - 1].non_call = true;
>> }
>>
>> return err;
>> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
>> index 1f626f4a1c40..b7c04e19ad41 100644
>> --- a/tools/perf/util/thread-stack.h
>> +++ b/tools/perf/util/thread-stack.h
>> @@ -35,10 +35,13 @@ struct call_path;
>> *
>> * CALL_RETURN_NO_CALL: 'return' but no matching 'call'
>> * CALL_RETURN_NO_RETURN: 'call' but no matching 'return'
>> + * CALL_RETURN_NON_CALL: a branch but not a 'call' to the start of a different
>> + * symbol
>> */
>> enum {
>> CALL_RETURN_NO_CALL = 1 << 0,
>> CALL_RETURN_NO_RETURN = 1 << 1,
>> + CALL_RETURN_NON_CALL = 1 << 2,
>> };
>>
>> /**
>> --
>> 2.17.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [tip:perf/core] perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols
2019-01-09 9:18 ` [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols Adrian Hunter
@ 2019-02-09 12:57 ` tip-bot for Adrian Hunter
0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Adrian Hunter @ 2019-02-09 12:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, mingo, adrian.hunter, hpa, jolsa, tglx, acme
Commit-ID: d6d457451eb94fa747dc202765592eb8885a7352
Gitweb: https://git.kernel.org/tip/d6d457451eb94fa747dc202765592eb8885a7352
Author: Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 9 Jan 2019 11:18:30 +0200
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 6 Feb 2019 10:00:40 -0300
perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols
Kallsyms symbols do not have a size, so the size becomes the distance to
the next symbol.
Consequently the recently added trampoline symbols end up with large
sizes because the trampolines are some distance from one another and the
main kernel map.
However, symbols that end outside their map can disrupt the symbol tree
because, after mapping, it can appear incorrectly that they overlap
other symbols.
Add logic to truncate symbol size to the end of the corresponding map.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines")
Link: http://lkml.kernel.org/r/20190109091835.5570-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/symbol.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 7b194cf53cc0..758bf5f74e6e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -718,6 +718,8 @@ static int map_groups__split_kallsyms_for_kcore(struct map_groups *kmaps, struct
}
pos->start -= curr_map->start - curr_map->pgoff;
+ if (pos->end > curr_map->end)
+ pos->end = curr_map->end;
if (pos->end)
pos->end -= curr_map->start - curr_map->pgoff;
symbols__insert(&curr_map->dso->symbols, pos);
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:perf/core] perf thread-stack: Tidy thread_stack__push_cp() usage
2019-01-09 9:18 ` [PATCH 2/6] perf thread-stack: Tidy thread_stack__push_cp() usage Adrian Hunter
@ 2019-02-09 12:58 ` tip-bot for Adrian Hunter
0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Adrian Hunter @ 2019-02-09 12:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: jolsa, acme, mingo, tglx, linux-kernel, hpa, adrian.hunter
Commit-ID: e7a3a055f2b88ebd0bdae8b0aade1e7d80c8a81e
Gitweb: https://git.kernel.org/tip/e7a3a055f2b88ebd0bdae8b0aade1e7d80c8a81e
Author: Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 9 Jan 2019 11:18:31 +0200
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 6 Feb 2019 10:00:40 -0300
perf thread-stack: Tidy thread_stack__push_cp() usage
If 'cp' is checked in thread_stack__push_cp() a number of error checks
can be removed, reducing code size and improving readability.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/thread-stack.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index d52f27f373ce..93ba3ab6a602 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -493,6 +493,9 @@ static int thread_stack__push_cp(struct thread_stack *ts, u64 ret_addr,
struct thread_stack_entry *tse;
int err;
+ if (!cp)
+ return -ENOMEM;
+
if (ts->cnt == ts->sz) {
err = thread_stack__grow(ts);
if (err)
@@ -576,8 +579,6 @@ static int thread_stack__bottom(struct thread_stack *ts,
cp = call_path__findnew(cpr, &cpr->call_path, sym, ip,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
return thread_stack__push_cp(ts, ip, sample->time, ref, cp,
true, false);
@@ -609,8 +610,6 @@ static int thread_stack__no_call_return(struct thread *thread,
cp = call_path__findnew(cpr, &cpr->call_path,
to_al->sym, sample->addr,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
return thread_stack__push_cp(ts, 0, sample->time, ref,
cp, true, false);
}
@@ -633,8 +632,6 @@ static int thread_stack__no_call_return(struct thread *thread,
/* This 'return' had no 'call', so push and pop top of stack */
cp = call_path__findnew(cpr, parent, from_al->sym, sample->ip,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
err = thread_stack__push_cp(ts, sample->addr, sample->time, ref, cp,
true, false);
@@ -680,8 +677,6 @@ static int thread_stack__trace_end(struct thread_stack *ts,
cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp, NULL, 0,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
ret_addr = sample->ip + sample->insn_len;
@@ -745,8 +740,6 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp,
to_al->sym, sample->addr,
ts->kernel_start);
- if (!cp)
- return -ENOMEM;
err = thread_stack__push_cp(ts, ret_addr, sample->time, ref,
cp, false, trace_end);
} else if (sample->flags & PERF_IP_FLAG_RETURN) {
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:perf/core] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
2019-01-09 9:18 ` [PATCH 3/6] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables Adrian Hunter
@ 2019-02-09 12:58 ` tip-bot for Adrian Hunter
0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Adrian Hunter @ 2019-02-09 12:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, tglx, hpa, adrian.hunter, jolsa, acme, mingo
Commit-ID: 90c2cda7056e3a7555d874a27aae12fd46ca802e
Gitweb: https://git.kernel.org/tip/90c2cda7056e3a7555d874a27aae12fd46ca802e
Author: Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 9 Jan 2019 11:18:32 +0200
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 6 Feb 2019 10:00:40 -0300
perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
Make thread_stack__no_call_return() more readable by adding more local
variables.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/thread-stack.c | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 93ba3ab6a602..7f8eff018c16 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -591,34 +591,36 @@ static int thread_stack__no_call_return(struct thread *thread,
struct addr_location *to_al, u64 ref)
{
struct call_path_root *cpr = ts->crp->cpr;
+ struct call_path *root = &cpr->call_path;
+ struct symbol *fsym = from_al->sym;
+ struct symbol *tsym = to_al->sym;
struct call_path *cp, *parent;
u64 ks = ts->kernel_start;
+ u64 addr = sample->addr;
+ u64 tm = sample->time;
+ u64 ip = sample->ip;
int err;
- if (sample->ip >= ks && sample->addr < ks) {
+ if (ip >= ks && addr < ks) {
/* Return to userspace, so pop all kernel addresses */
while (thread_stack__in_kernel(ts)) {
err = thread_stack__call_return(thread, ts, --ts->cnt,
- sample->time, ref,
- true);
+ tm, ref, true);
if (err)
return err;
}
/* If the stack is empty, push the userspace address */
if (!ts->cnt) {
- cp = call_path__findnew(cpr, &cpr->call_path,
- to_al->sym, sample->addr,
- ts->kernel_start);
- return thread_stack__push_cp(ts, 0, sample->time, ref,
- cp, true, false);
+ cp = call_path__findnew(cpr, root, tsym, addr, ks);
+ return thread_stack__push_cp(ts, 0, tm, ref, cp, true,
+ false);
}
- } else if (thread_stack__in_kernel(ts) && sample->ip < ks) {
+ } else if (thread_stack__in_kernel(ts) && ip < ks) {
/* Return to userspace, so pop all kernel addresses */
while (thread_stack__in_kernel(ts)) {
err = thread_stack__call_return(thread, ts, --ts->cnt,
- sample->time, ref,
- true);
+ tm, ref, true);
if (err)
return err;
}
@@ -627,19 +629,16 @@ static int thread_stack__no_call_return(struct thread *thread,
if (ts->cnt)
parent = ts->stack[ts->cnt - 1].cp;
else
- parent = &cpr->call_path;
+ parent = root;
/* This 'return' had no 'call', so push and pop top of stack */
- cp = call_path__findnew(cpr, parent, from_al->sym, sample->ip,
- ts->kernel_start);
+ cp = call_path__findnew(cpr, parent, fsym, ip, ks);
- err = thread_stack__push_cp(ts, sample->addr, sample->time, ref, cp,
- true, false);
+ err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
if (err)
return err;
- return thread_stack__pop_cp(thread, ts, sample->addr, sample->time, ref,
- to_al->sym);
+ return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
}
static int thread_stack__trace_begin(struct thread *thread,
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:perf/core] perf thread-stack: Represent jmps to the start of a different symbol
2019-01-09 9:18 ` [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol Adrian Hunter
2019-02-06 12:39 ` Arnaldo Carvalho de Melo
@ 2019-02-09 12:59 ` tip-bot for Adrian Hunter
1 sibling, 0 replies; 23+ messages in thread
From: tip-bot for Adrian Hunter @ 2019-02-09 12:59 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, jolsa, linux-kernel, acme, tglx, hpa, adrian.hunter
Commit-ID: f08046cb3082b313e7b08dc35838cf8bd902c36b
Gitweb: https://git.kernel.org/tip/f08046cb3082b313e7b08dc35838cf8bd902c36b
Author: Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 9 Jan 2019 11:18:33 +0200
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 6 Feb 2019 10:00:40 -0300
perf thread-stack: Represent jmps to the start of a different symbol
The compiler might optimize a call/ret combination by making it a jmp.
However the thread-stack does not presently cater for that, so that such
control flow is not visible in the call graph. Make it visible by
recording on the stack a branch to the start of a different symbol.
Note, that means when a ret pops the stack, all jmps must be popped off
first.
Example:
$ cat jmp-to-fn.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
__attribute__((noinline)) int foo(void)
{
return bar() + 1;
}
int main()
{
return foo();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
$ objdump -d jmp-to-fn
<SNIP>
0000000000001040 <main>:
1040: 31 c0 xor %eax,%eax
1042: e9 09 01 00 00 jmpq 1150 <foo>
<SNIP>
0000000000001140 <bar>:
1140: b8 ff ff ff ff mov $0xffffffff,%eax
1145: c3 retq
<SNIP>
0000000000001150 <foo>:
1150: 31 c0 xor %eax,%eax
1152: e8 e9 ff ff ff callq 1140 <bar>
1157: 83 c0 01 add $0x1,%eax
115a: c3 retq
<SNIP>
$ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
$ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
2019-01-08 13:24:58.783069 Creating database...
2019-01-08 13:24:58.794650 Writing records...
2019-01-08 13:24:59.008050 Adding indexes
2019-01-08 13:24:59.015802 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db
Before:
main
-> bar
After:
main
-> foo
-> bar
Committer testing:
Install the python2-pyside package, then select these menu options
on the GUI:
"Reports"
"Context sensitive callgraphs"
Then go on expanding the symbols, to get, full picture when doing this
on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC):
jmp-to-fn
PID:TID
_start (ld-2.28.so)
__libc_start_main
main
foo
bar
To verify that indeed, this fixes the problem.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/scripts/python/export-to-postgresql.py | 2 +-
tools/perf/scripts/python/export-to-sqlite.py | 2 +-
tools/perf/util/thread-stack.c | 30 +++++++++++++++++++++--
tools/perf/util/thread-stack.h | 3 +++
4 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 0564dd7377f2..30130213da7e 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -478,7 +478,7 @@ if perf_db_export_calls:
'branch_count,'
'call_id,'
'return_id,'
- 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
+ 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
'parent_call_path_id'
' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
diff --git a/tools/perf/scripts/python/export-to-sqlite.py b/tools/perf/scripts/python/export-to-sqlite.py
index 245caf2643ed..ed237f2ed03f 100644
--- a/tools/perf/scripts/python/export-to-sqlite.py
+++ b/tools/perf/scripts/python/export-to-sqlite.py
@@ -320,7 +320,7 @@ if perf_db_export_calls:
'branch_count,'
'call_id,'
'return_id,'
- 'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
+ 'CASE WHEN flags=0 THEN \'\' WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' WHEN flags=6 THEN \'jump\' ELSE flags END AS flags,'
'parent_call_path_id'
' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 7f8eff018c16..f52c0f90915d 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -38,6 +38,7 @@
* @cp: call path
* @no_call: a 'call' was not seen
* @trace_end: a 'call' but trace ended
+ * @non_call: a branch but not a 'call' to the start of a different symbol
*/
struct thread_stack_entry {
u64 ret_addr;
@@ -47,6 +48,7 @@ struct thread_stack_entry {
struct call_path *cp;
bool no_call;
bool trace_end;
+ bool non_call;
};
/**
@@ -268,6 +270,8 @@ static int thread_stack__call_return(struct thread *thread,
cr.flags |= CALL_RETURN_NO_CALL;
if (no_return)
cr.flags |= CALL_RETURN_NO_RETURN;
+ if (tse->non_call)
+ cr.flags |= CALL_RETURN_NON_CALL;
return crp->process(&cr, crp->data);
}
@@ -510,6 +514,7 @@ static int thread_stack__push_cp(struct thread_stack *ts, u64 ret_addr,
tse->cp = cp;
tse->no_call = no_call;
tse->trace_end = trace_end;
+ tse->non_call = false;
return 0;
}
@@ -531,14 +536,16 @@ static int thread_stack__pop_cp(struct thread *thread, struct thread_stack *ts,
timestamp, ref, false);
}
- if (ts->stack[ts->cnt - 1].ret_addr == ret_addr) {
+ if (ts->stack[ts->cnt - 1].ret_addr == ret_addr &&
+ !ts->stack[ts->cnt - 1].non_call) {
return thread_stack__call_return(thread, ts, --ts->cnt,
timestamp, ref, false);
} else {
size_t i = ts->cnt - 1;
while (i--) {
- if (ts->stack[i].ret_addr != ret_addr)
+ if (ts->stack[i].ret_addr != ret_addr ||
+ ts->stack[i].non_call)
continue;
i += 1;
while (ts->cnt > i) {
@@ -757,6 +764,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
err = thread_stack__trace_begin(thread, ts, sample->time, ref);
} else if (sample->flags & PERF_IP_FLAG_TRACE_END) {
err = thread_stack__trace_end(ts, sample, ref);
+ } else if (sample->flags & PERF_IP_FLAG_BRANCH &&
+ from_al->sym != to_al->sym && to_al->sym &&
+ to_al->addr == to_al->sym->start) {
+ struct call_path_root *cpr = ts->crp->cpr;
+ struct call_path *cp;
+
+ /*
+ * The compiler might optimize a call/ret combination by making
+ * it a jmp. Make that visible by recording on the stack a
+ * branch to the start of a different symbol. Note, that means
+ * when a ret pops the stack, all jmps must be popped off first.
+ */
+ cp = call_path__findnew(cpr, ts->stack[ts->cnt - 1].cp,
+ to_al->sym, sample->addr,
+ ts->kernel_start);
+ err = thread_stack__push_cp(ts, 0, sample->time, ref, cp, false,
+ false);
+ if (!err)
+ ts->stack[ts->cnt - 1].non_call = true;
}
return err;
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index 1f626f4a1c40..b7c04e19ad41 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -35,10 +35,13 @@ struct call_path;
*
* CALL_RETURN_NO_CALL: 'return' but no matching 'call'
* CALL_RETURN_NO_RETURN: 'call' but no matching 'return'
+ * CALL_RETURN_NON_CALL: a branch but not a 'call' to the start of a different
+ * symbol
*/
enum {
CALL_RETURN_NO_CALL = 1 << 0,
CALL_RETURN_NO_RETURN = 1 << 1,
+ CALL_RETURN_NON_CALL = 1 << 2,
};
/**
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return()
2019-01-09 9:18 ` [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return() Adrian Hunter
@ 2019-02-22 9:48 ` Adrian Hunter
2019-02-22 14:39 ` Arnaldo Carvalho de Melo
2019-02-28 7:49 ` [tip:perf/core] " tip-bot for Adrian Hunter
1 sibling, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2019-02-22 9:48 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel
On 9/01/19 11:18 AM, Adrian Hunter wrote:
> Improve thread_stack__no_call_return() to better handle 'returns' that do
> not match the stack i.e. 'no call'. See code comments for details.
This patch and patch 6 of the series do not seem to have been applied.
Can they be applied?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return()
2019-02-22 9:48 ` Adrian Hunter
@ 2019-02-22 14:39 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-02-22 14:39 UTC (permalink / raw)
To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel
Em Fri, Feb 22, 2019 at 11:48:51AM +0200, Adrian Hunter escreveu:
> On 9/01/19 11:18 AM, Adrian Hunter wrote:
> > Improve thread_stack__no_call_return() to better handle 'returns' that do
> > not match the stack i.e. 'no call'. See code comments for details.
>
> This patch and patch 6 of the series do not seem to have been applied.
> Can they be applied?
I think there was some python3 oddity and I ended up letting this fell
thru the cracks, now I applied and tested it, see my results below, ints
in my perf/core branch now, will go together with your new batch in the
next pull req to Ingo,
Thanks,
- Arnaldo
commit 4ad79244f4899815a904f6e6adb71d24216b9b1d
Author: Adrian Hunter <adrian.hunter@intel.com>
Date: Wed Jan 9 11:18:34 2019 +0200
perf thread-stack: Improve thread_stack__no_call_return()
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> bar
After:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
Committer testing:
Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:
Before:
simple-retpolin
PID:PID
_start
_start
__libc_start_main
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
bar
After:
Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
foo
bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index f52c0f90915d..632c07a125ab 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -638,14 +638,57 @@ static int thread_stack__no_call_return(struct thread *thread,
else
parent = root;
- /* This 'return' had no 'call', so push and pop top of stack */
- cp = call_path__findnew(cpr, parent, fsym, ip, ks);
+ if (parent->sym == from_al->sym) {
+ /*
+ * At the bottom of the stack, assume the missing 'call' was
+ * before the trace started. So, pop the current symbol and push
+ * the 'to' symbol.
+ */
+ if (ts->cnt == 1) {
+ err = thread_stack__call_return(thread, ts, --ts->cnt,
+ tm, ref, false);
+ if (err)
+ return err;
+ }
+
+ if (!ts->cnt) {
+ cp = call_path__findnew(cpr, root, tsym, addr, ks);
+
+ return thread_stack__push_cp(ts, addr, tm, ref, cp,
+ true, false);
+ }
+
+ /*
+ * Otherwise assume the 'return' is being used as a jump (e.g.
+ * retpoline) and just push the 'to' symbol.
+ */
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
+
+ err = thread_stack__push_cp(ts, 0, tm, ref, cp, true, false);
+ if (!err)
+ ts->stack[ts->cnt - 1].non_call = true;
+
+ return err;
+ }
+
+ /*
+ * Assume 'parent' has not yet returned, so push 'to', and then push and
+ * pop 'from'.
+ */
+
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
if (err)
return err;
- return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
+ cp = call_path__findnew(cpr, cp, fsym, ip, ks);
+
+ err = thread_stack__push_cp(ts, ip, tm, ref, cp, true, false);
+ if (err)
+ return err;
+
+ return thread_stack__call_return(thread, ts, --ts->cnt, tm, ref, false);
}
static int thread_stack__trace_begin(struct thread *thread,
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 6/6] perf thread-stack: Hide x86 retpolines
2019-01-09 9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
2019-01-09 15:38 ` Jiri Olsa
@ 2019-02-22 19:42 ` Arnaldo Carvalho de Melo
2019-02-22 20:53 ` Arnaldo Carvalho de Melo
2019-02-28 7:49 ` [tip:perf/core] " tip-bot for Adrian Hunter
2 siblings, 1 reply; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-02-22 19:42 UTC (permalink / raw)
To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel
Em Wed, Jan 09, 2019 at 11:18:35AM +0200, Adrian Hunter escreveu:
> x86 retpoline functions pollute the call graph by showing up everywhere
> there is an indirect branch, but they do not really mean anything. Make
> changes so that the default retpoline functions will no longer appear in
> the call graph. Note this only affects the call graph, since all the
> original branches are left unchanged.
>
> This does not handle function return thunks, nor is there any improvement
> for the handling of inline thunks or extern thunks.
>
> Example:
>
> $ cat simple-retpoline.c
> __attribute__((noinline)) int bar(void)
> {
> return -1;
> }
>
> int foo(void)
> {
> return bar() + 1;
> }
>
> __attribute__((indirect_branch("thunk"))) int main()
> {
> int (*volatile fn)(void) = foo;
>
> fn();
> return fn();
> }
> $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
> $ objdump -d simple-retpoline
> <SNIP>
> 0000000000001040 <main>:
> 1040: 48 83 ec 18 sub $0x18,%rsp
> 1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
> 104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
> 1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> 1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
> 105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> 105f: 48 83 c4 18 add $0x18,%rsp
> 1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
> <SNIP>
> 0000000000001160 <bar>:
> 1160: b8 ff ff ff ff mov $0xffffffff,%eax
> 1165: c3 retq
> <SNIP>
> 0000000000001170 <foo>:
> 1170: e8 eb ff ff ff callq 1160 <bar>
> 1175: 83 c0 01 add $0x1,%eax
> 1178: c3 retq
> 0000000000001179 <__x86_indirect_thunk_rax>:
> 1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
> 117e: f3 90 pause
> 1180: 0f ae e8 lfence
> 1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
> 1185: 48 89 04 24 mov %rax,(%rsp)
> 1189: c3 retq
> <SNIP>
> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
> $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
> 2019-01-08 14:03:37.851655 Creating database...
> 2019-01-08 14:03:37.863256 Writing records...
> 2019-01-08 14:03:38.069750 Adding indexes
> 2019-01-08 14:03:38.078799 Done
> $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
>
> Before:
>
> main
> -> __x86_indirect_thunk_rax
> -> __x86_indirect_thunk_rax
> -> foo
> -> bar
>
> After:
>
> main
> -> foo
> -> bar
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> tools/perf/util/thread-stack.c | 112 ++++++++++++++++++++++++++++++++-
> 1 file changed, 109 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index 632c07a125ab..805e30836460 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -20,6 +20,7 @@
> #include "thread.h"
> #include "event.h"
> #include "machine.h"
> +#include "env.h"
> #include "util.h"
> #include "debug.h"
> #include "symbol.h"
> @@ -29,6 +30,19 @@
>
> #define STACK_GROWTH 2048
>
> +/*
> + * State of retpoline detection.
> + *
> + * RETPOLINE_NONE: no retpoline detection
> + * X86_RETPOLINE_POSSIBLE: x86 retpoline possible
> + * X86_RETPOLINE_DETECTED: x86 retpoline detected
> + */
> +enum retpoline_state_t {
> + RETPOLINE_NONE,
> + X86_RETPOLINE_POSSIBLE,
> + X86_RETPOLINE_DETECTED,
> +};
> +
> /**
> * struct thread_stack_entry - thread stack entry.
> * @ret_addr: return address
> @@ -64,6 +78,7 @@ struct thread_stack_entry {
> * @crp: call/return processor
> * @comm: current comm
> * @arr_sz: size of array if this is the first element of an array
> + * @rstate: used to detect retpolines
> */
> struct thread_stack {
> struct thread_stack_entry *stack;
> @@ -76,6 +91,7 @@ struct thread_stack {
> struct call_return_processor *crp;
> struct comm *comm;
> unsigned int arr_sz;
> + enum retpoline_state_t rstate;
> };
>
> /*
> @@ -115,10 +131,16 @@ static int thread_stack__init(struct thread_stack *ts, struct thread *thread,
> if (err)
> return err;
>
> - if (thread->mg && thread->mg->machine)
> - ts->kernel_start = machine__kernel_start(thread->mg->machine);
> - else
> + if (thread->mg && thread->mg->machine) {
> + struct machine *machine = thread->mg->machine;
> + const char *arch = perf_env__arch(machine->env);
> +
> + ts->kernel_start = machine__kernel_start(machine);
> + if (!strcmp(arch, "x86"))
> + ts->rstate = X86_RETPOLINE_POSSIBLE;
> + } else {
> ts->kernel_start = 1ULL << 63;
> + }
> ts->crp = crp;
>
> return 0;
> @@ -733,6 +755,70 @@ static int thread_stack__trace_end(struct thread_stack *ts,
> false, true);
> }
>
> +static bool is_x86_retpoline(const char *name)
> +{
> + const char *p = strstr(name, "__x86_indirect_thunk_");
> +
> + return p == name || !strcmp(name, "__indirect_thunk_start");
> +}
> +
> +/*
> + * x86 retpoline functions pollute the call graph. This function removes them.
> + * This does not handle function return thunks, nor is there any improvement
> + * for the handling of inline thunks or extern thunks.
> + */
> +static int thread_stack__x86_retpoline(struct thread_stack *ts,
> + struct perf_sample *sample,
> + struct addr_location *to_al)
> +{
> + struct thread_stack_entry *tse = &ts->stack[ts->cnt - 1];
> + struct call_path_root *cpr = ts->crp->cpr;
> + struct symbol *sym = tse->cp->sym;
> + struct symbol *tsym = to_al->sym;
> + struct call_path *cp;
> +
> + if (sym && sym->name && is_x86_retpoline(sym->name)) {
CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o
CC /tmp/build/perf/util/intel-pt.o
CC /tmp/build/perf/util/intel-pt-decoder/intel-pt-log.o
util/thread-stack.c:780:18: error: address of array 'sym->name' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
if (sym && sym->name && is_x86_retpoline(sym->name)) {
~~ ~~~~~^~~~
1 error generated.
mv: cannot stat '/tmp/build/perf/util/.thread-stack.o.tmp': No such file or directory
make[4]: *** [/git/linux/tools/build/Makefile.build:96: /tmp/build/perf/util/thread-stack.o] Error 1
[acme@quaco perf]$ pahole -C symbol ~/bin/perf
struct symbol {
struct rb_node rb_node; /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
u16 namelen; /* 40 2 */
u8 type:4; /* 42: 4 1 */
u8 binding:4; /* 42: 0 1 */
u8 idle:1; /* 43: 7 1 */
u8 ignore:1; /* 43: 6 1 */
u8 inlined:1; /* 43: 5 1 */
/* XXX 5 bits hole, try to pack */
u8 arch_sym; /* 44 1 */
_Bool annotate2; /* 45 1 */
char name[0]; /* 46 0 */
/* size: 48, cachelines: 1, members: 12 */
/* bit holes: 1, sum bit holes: 5 bits */
/* padding: 2 */
/* last cacheline: 48 bytes */
};
[acme@quaco perf]$
I'm removing that sym->name test.
> + /*
> + * This is a x86 retpoline fn. It pollutes the call graph by
> + * showing up everywhere there is an indirect branch, but does
> + * not itself mean anything. Here the top-of-stack is removed,
> + * by decrementing the stack count, and then further down, the
> + * resulting top-of-stack is replaced with the actual target.
> + * The result is that the retpoline functions will no longer
> + * appear in the call graph. Note this only affects the call
> + * graph, since all the original branches are left unchanged.
> + */
> + ts->cnt -= 1;
> + sym = ts->stack[ts->cnt - 2].cp->sym;
> + if (sym && sym == tsym && to_al->addr != tsym->start) {
> + /*
> + * Target is back to the middle of the symbol we came
> + * from so assume it is an indirect jmp and forget it
> + * altogether.
> + */
> + ts->cnt -= 1;
> + return 0;
> + }
> + } else if (sym && sym == tsym) {
> + /*
> + * Target is back to the symbol we came from so assume it is an
> + * indirect jmp and forget it altogether.
> + */
> + ts->cnt -= 1;
> + return 0;
> + }
> +
> + cp = call_path__findnew(cpr, ts->stack[ts->cnt - 2].cp, tsym,
> + sample->addr, ts->kernel_start);
> + if (!cp)
> + return -ENOMEM;
> +
> + /* Replace the top-of-stack with the actual target */
> + ts->stack[ts->cnt - 1].cp = cp;
> +
> + return 0;
> +}
> +
> int thread_stack__process(struct thread *thread, struct comm *comm,
> struct perf_sample *sample,
> struct addr_location *from_al,
> @@ -740,6 +826,7 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
> struct call_return_processor *crp)
> {
> struct thread_stack *ts = thread__stack(thread, sample->cpu);
> + enum retpoline_state_t rstate;
> int err = 0;
>
> if (ts && !ts->crp) {
> @@ -755,6 +842,10 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
> ts->comm = comm;
> }
>
> + rstate = ts->rstate;
> + if (rstate == X86_RETPOLINE_DETECTED)
> + ts->rstate = X86_RETPOLINE_POSSIBLE;
> +
> /* Flush stack on exec */
> if (ts->comm != comm && thread->pid_ == thread->tid) {
> err = __thread_stack__flush(thread, ts);
> @@ -791,10 +882,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
> ts->kernel_start);
> err = thread_stack__push_cp(ts, ret_addr, sample->time, ref,
> cp, false, trace_end);
> +
> + /*
> + * A call to the same symbol but not the start of the symbol,
> + * may be the start of a x86 retpoline.
> + */
> + if (!err && rstate == X86_RETPOLINE_POSSIBLE && to_al->sym &&
> + from_al->sym == to_al->sym &&
> + to_al->addr != to_al->sym->start)
> + ts->rstate = X86_RETPOLINE_DETECTED;
> +
> } else if (sample->flags & PERF_IP_FLAG_RETURN) {
> if (!sample->ip || !sample->addr)
> return 0;
>
> + /* x86 retpoline 'return' doesn't match the stack */
> + if (rstate == X86_RETPOLINE_DETECTED && ts->cnt > 2 &&
> + ts->stack[ts->cnt - 1].ret_addr != sample->addr)
> + return thread_stack__x86_retpoline(ts, sample, to_al);
> +
> err = thread_stack__pop_cp(thread, ts, sample->addr,
> sample->time, ref, from_al->sym);
> if (err) {
> --
> 2.17.1
--
- Arnaldo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/6] perf thread-stack: Hide x86 retpolines
2019-02-22 19:42 ` Arnaldo Carvalho de Melo
@ 2019-02-22 20:53 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-02-22 20:53 UTC (permalink / raw)
To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel
Em Fri, Feb 22, 2019 at 04:42:44PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Jan 09, 2019 at 11:18:35AM +0200, Adrian Hunter escreveu:
> > +static int thread_stack__x86_retpoline(struct thread_stack *ts,
> > + struct perf_sample *sample,
> > + struct addr_location *to_al)
> > +{
> > + struct thread_stack_entry *tse = &ts->stack[ts->cnt - 1];
> > + struct call_path_root *cpr = ts->crp->cpr;
> > + struct symbol *sym = tse->cp->sym;
> > + struct symbol *tsym = to_al->sym;
> > + struct call_path *cp;
> > +
> > + if (sym && sym->name && is_x86_retpoline(sym->name)) {
>
>
> CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o
> CC /tmp/build/perf/util/intel-pt.o
> CC /tmp/build/perf/util/intel-pt-decoder/intel-pt-log.o
> util/thread-stack.c:780:18: error: address of array 'sym->name' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
> if (sym && sym->name && is_x86_retpoline(sym->name)) {
> ~~ ~~~~~^~~~
> 1 error generated.
> mv: cannot stat '/tmp/build/perf/util/.thread-stack.o.tmp': No such file or directory
> make[4]: *** [/git/linux/tools/build/Makefile.build:96: /tmp/build/perf/util/thread-stack.o] Error 1
>
>
> [acme@quaco perf]$ pahole -C symbol ~/bin/perf
> struct symbol {
> struct rb_node rb_node; /* 0 24 */
> u64 start; /* 24 8 */
> u64 end; /* 32 8 */
> u16 namelen; /* 40 2 */
> u8 type:4; /* 42: 4 1 */
> u8 binding:4; /* 42: 0 1 */
> u8 idle:1; /* 43: 7 1 */
> u8 ignore:1; /* 43: 6 1 */
> u8 inlined:1; /* 43: 5 1 */
>
> /* XXX 5 bits hole, try to pack */
>
> u8 arch_sym; /* 44 1 */
> _Bool annotate2; /* 45 1 */
> char name[0]; /* 46 0 */
>
> /* size: 48, cachelines: 1, members: 12 */
> /* bit holes: 1, sum bit holes: 5 bits */
> /* padding: 2 */
> /* last cacheline: 48 bytes */
> };
> [acme@quaco perf]$
>
> I'm removing that sym->name test.
For completeness sake, this was one of the compilers where this failed:
clang version 7.0.1 (Fedora 7.0.1-2.fc30)
The systems where it broke:
18 debian:9 : FAIL
19 debian:experimental : FAIL
29 fedora:25 : FAIL
30 fedora:26 : FAIL
31 fedora:27 : FAIL
32 fedora:28 : FAIL
33 fedora:29 : FAIL
34 fedora:30 : FAIL
35 fedora:rawhide : FAIL
45 opensuse:tumbleweed : FAIL
51 ubuntu:16.04 : FAIL
58 ubuntu:17.10 : FAIL
59 ubuntu:18.04 : FAIL
70 ubuntu:18.10 : FAIL
71 ubuntu:19.04 : FAIL
- Arnaldo
^ permalink raw reply [flat|nested] 23+ messages in thread
* [tip:perf/core] perf thread-stack: Improve thread_stack__no_call_return()
2019-01-09 9:18 ` [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return() Adrian Hunter
2019-02-22 9:48 ` Adrian Hunter
@ 2019-02-28 7:49 ` tip-bot for Adrian Hunter
1 sibling, 0 replies; 23+ messages in thread
From: tip-bot for Adrian Hunter @ 2019-02-28 7:49 UTC (permalink / raw)
To: linux-tip-commits
Cc: jolsa, linux-kernel, acme, adrian.hunter, tglx, mingo, hpa
Commit-ID: 1f35cd65386e02430546727bd3cc2508d052e3ec
Gitweb: https://git.kernel.org/tip/1f35cd65386e02430546727bd3cc2508d052e3ec
Author: Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 9 Jan 2019 11:18:34 +0200
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 22 Feb 2019 11:42:34 -0300
perf thread-stack: Improve thread_stack__no_call_return()
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> bar
After:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
Committer testing:
Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:
Before:
simple-retpolin
PID:PID
_start
_start
__libc_start_main
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
bar
After:
Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
foo
bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/thread-stack.c | 49 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index f52c0f90915d..632c07a125ab 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -638,14 +638,57 @@ static int thread_stack__no_call_return(struct thread *thread,
else
parent = root;
- /* This 'return' had no 'call', so push and pop top of stack */
- cp = call_path__findnew(cpr, parent, fsym, ip, ks);
+ if (parent->sym == from_al->sym) {
+ /*
+ * At the bottom of the stack, assume the missing 'call' was
+ * before the trace started. So, pop the current symbol and push
+ * the 'to' symbol.
+ */
+ if (ts->cnt == 1) {
+ err = thread_stack__call_return(thread, ts, --ts->cnt,
+ tm, ref, false);
+ if (err)
+ return err;
+ }
+
+ if (!ts->cnt) {
+ cp = call_path__findnew(cpr, root, tsym, addr, ks);
+
+ return thread_stack__push_cp(ts, addr, tm, ref, cp,
+ true, false);
+ }
+
+ /*
+ * Otherwise assume the 'return' is being used as a jump (e.g.
+ * retpoline) and just push the 'to' symbol.
+ */
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
+
+ err = thread_stack__push_cp(ts, 0, tm, ref, cp, true, false);
+ if (!err)
+ ts->stack[ts->cnt - 1].non_call = true;
+
+ return err;
+ }
+
+ /*
+ * Assume 'parent' has not yet returned, so push 'to', and then push and
+ * pop 'from'.
+ */
+
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
if (err)
return err;
- return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
+ cp = call_path__findnew(cpr, cp, fsym, ip, ks);
+
+ err = thread_stack__push_cp(ts, ip, tm, ref, cp, true, false);
+ if (err)
+ return err;
+
+ return thread_stack__call_return(thread, ts, --ts->cnt, tm, ref, false);
}
static int thread_stack__trace_begin(struct thread *thread,
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:perf/core] perf thread-stack: Hide x86 retpolines
2019-01-09 9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
2019-01-09 15:38 ` Jiri Olsa
2019-02-22 19:42 ` Arnaldo Carvalho de Melo
@ 2019-02-28 7:49 ` tip-bot for Adrian Hunter
2 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Adrian Hunter @ 2019-02-28 7:49 UTC (permalink / raw)
To: linux-tip-commits
Cc: acme, jolsa, linux-kernel, mingo, adrian.hunter, tglx, hpa
Commit-ID: 3c0cd952cf051903929cd57b89034cc5f71f451d
Gitweb: https://git.kernel.org/tip/3c0cd952cf051903929cd57b89034cc5f71f451d
Author: Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 9 Jan 2019 11:18:35 +0200
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 22 Feb 2019 16:49:49 -0300
perf thread-stack: Hide x86 retpolines
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.
This does not handle function return thunks, nor is there any
improvement for the handling of inline thunks or extern thunks.
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
After:
main
-> foo
-> bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
[ Remove (sym->name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/thread-stack.c | 112 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 109 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 632c07a125ab..a8b45168513c 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -20,6 +20,7 @@
#include "thread.h"
#include "event.h"
#include "machine.h"
+#include "env.h"
#include "util.h"
#include "debug.h"
#include "symbol.h"
@@ -29,6 +30,19 @@
#define STACK_GROWTH 2048
+/*
+ * State of retpoline detection.
+ *
+ * RETPOLINE_NONE: no retpoline detection
+ * X86_RETPOLINE_POSSIBLE: x86 retpoline possible
+ * X86_RETPOLINE_DETECTED: x86 retpoline detected
+ */
+enum retpoline_state_t {
+ RETPOLINE_NONE,
+ X86_RETPOLINE_POSSIBLE,
+ X86_RETPOLINE_DETECTED,
+};
+
/**
* struct thread_stack_entry - thread stack entry.
* @ret_addr: return address
@@ -64,6 +78,7 @@ struct thread_stack_entry {
* @crp: call/return processor
* @comm: current comm
* @arr_sz: size of array if this is the first element of an array
+ * @rstate: used to detect retpolines
*/
struct thread_stack {
struct thread_stack_entry *stack;
@@ -76,6 +91,7 @@ struct thread_stack {
struct call_return_processor *crp;
struct comm *comm;
unsigned int arr_sz;
+ enum retpoline_state_t rstate;
};
/*
@@ -115,10 +131,16 @@ static int thread_stack__init(struct thread_stack *ts, struct thread *thread,
if (err)
return err;
- if (thread->mg && thread->mg->machine)
- ts->kernel_start = machine__kernel_start(thread->mg->machine);
- else
+ if (thread->mg && thread->mg->machine) {
+ struct machine *machine = thread->mg->machine;
+ const char *arch = perf_env__arch(machine->env);
+
+ ts->kernel_start = machine__kernel_start(machine);
+ if (!strcmp(arch, "x86"))
+ ts->rstate = X86_RETPOLINE_POSSIBLE;
+ } else {
ts->kernel_start = 1ULL << 63;
+ }
ts->crp = crp;
return 0;
@@ -733,6 +755,70 @@ static int thread_stack__trace_end(struct thread_stack *ts,
false, true);
}
+static bool is_x86_retpoline(const char *name)
+{
+ const char *p = strstr(name, "__x86_indirect_thunk_");
+
+ return p == name || !strcmp(name, "__indirect_thunk_start");
+}
+
+/*
+ * x86 retpoline functions pollute the call graph. This function removes them.
+ * This does not handle function return thunks, nor is there any improvement
+ * for the handling of inline thunks or extern thunks.
+ */
+static int thread_stack__x86_retpoline(struct thread_stack *ts,
+ struct perf_sample *sample,
+ struct addr_location *to_al)
+{
+ struct thread_stack_entry *tse = &ts->stack[ts->cnt - 1];
+ struct call_path_root *cpr = ts->crp->cpr;
+ struct symbol *sym = tse->cp->sym;
+ struct symbol *tsym = to_al->sym;
+ struct call_path *cp;
+
+ if (sym && is_x86_retpoline(sym->name)) {
+ /*
+ * This is a x86 retpoline fn. It pollutes the call graph by
+ * showing up everywhere there is an indirect branch, but does
+ * not itself mean anything. Here the top-of-stack is removed,
+ * by decrementing the stack count, and then further down, the
+ * resulting top-of-stack is replaced with the actual target.
+ * The result is that the retpoline functions will no longer
+ * appear in the call graph. Note this only affects the call
+ * graph, since all the original branches are left unchanged.
+ */
+ ts->cnt -= 1;
+ sym = ts->stack[ts->cnt - 2].cp->sym;
+ if (sym && sym == tsym && to_al->addr != tsym->start) {
+ /*
+ * Target is back to the middle of the symbol we came
+ * from so assume it is an indirect jmp and forget it
+ * altogether.
+ */
+ ts->cnt -= 1;
+ return 0;
+ }
+ } else if (sym && sym == tsym) {
+ /*
+ * Target is back to the symbol we came from so assume it is an
+ * indirect jmp and forget it altogether.
+ */
+ ts->cnt -= 1;
+ return 0;
+ }
+
+ cp = call_path__findnew(cpr, ts->stack[ts->cnt - 2].cp, tsym,
+ sample->addr, ts->kernel_start);
+ if (!cp)
+ return -ENOMEM;
+
+ /* Replace the top-of-stack with the actual target */
+ ts->stack[ts->cnt - 1].cp = cp;
+
+ return 0;
+}
+
int thread_stack__process(struct thread *thread, struct comm *comm,
struct perf_sample *sample,
struct addr_location *from_al,
@@ -740,6 +826,7 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
struct call_return_processor *crp)
{
struct thread_stack *ts = thread__stack(thread, sample->cpu);
+ enum retpoline_state_t rstate;
int err = 0;
if (ts && !ts->crp) {
@@ -755,6 +842,10 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
ts->comm = comm;
}
+ rstate = ts->rstate;
+ if (rstate == X86_RETPOLINE_DETECTED)
+ ts->rstate = X86_RETPOLINE_POSSIBLE;
+
/* Flush stack on exec */
if (ts->comm != comm && thread->pid_ == thread->tid) {
err = __thread_stack__flush(thread, ts);
@@ -791,10 +882,25 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
ts->kernel_start);
err = thread_stack__push_cp(ts, ret_addr, sample->time, ref,
cp, false, trace_end);
+
+ /*
+ * A call to the same symbol but not the start of the symbol,
+ * may be the start of a x86 retpoline.
+ */
+ if (!err && rstate == X86_RETPOLINE_POSSIBLE && to_al->sym &&
+ from_al->sym == to_al->sym &&
+ to_al->addr != to_al->sym->start)
+ ts->rstate = X86_RETPOLINE_DETECTED;
+
} else if (sample->flags & PERF_IP_FLAG_RETURN) {
if (!sample->ip || !sample->addr)
return 0;
+ /* x86 retpoline 'return' doesn't match the stack */
+ if (rstate == X86_RETPOLINE_DETECTED && ts->cnt > 2 &&
+ ts->stack[ts->cnt - 1].ret_addr != sample->addr)
+ return thread_stack__x86_retpoline(ts, sample, to_al);
+
err = thread_stack__pop_cp(thread, ts, sample->addr,
sample->time, ref, from_al->sym);
if (err) {
^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-02-28 7:49 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
2019-01-09 9:18 ` [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols Adrian Hunter
2019-02-09 12:57 ` [tip:perf/core] perf tools: Fix split_kallsyms_for_kcore() " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 2/6] perf thread-stack: Tidy thread_stack__push_cp() usage Adrian Hunter
2019-02-09 12:58 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 3/6] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables Adrian Hunter
2019-02-09 12:58 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol Adrian Hunter
2019-02-06 12:39 ` Arnaldo Carvalho de Melo
2019-02-06 13:25 ` Adrian Hunter
2019-02-09 12:59 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return() Adrian Hunter
2019-02-22 9:48 ` Adrian Hunter
2019-02-22 14:39 ` Arnaldo Carvalho de Melo
2019-02-28 7:49 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09 9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
2019-01-09 15:38 ` Jiri Olsa
2019-01-10 7:52 ` Adrian Hunter
2019-02-22 19:42 ` Arnaldo Carvalho de Melo
2019-02-22 20:53 ` Arnaldo Carvalho de Melo
2019-02-28 7:49 ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-10 9:55 ` [PATCH 0/6] perf thread-stack: " Jiri Olsa
2019-02-06 9:10 ` Adrian Hunter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.