All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with
@ 2015-05-02  5:58 ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:58 UTC (permalink / raw)
  To: will.deacon, a.p.zijlstra, paulus, acme, mingo
  Cc: wangnan0, catalin.marinas, linux-kernel, linux-arm-kernel

For arm & arm64, when tracing with tracepoint events, the IP and cpsr 
are set to 0, preventing the perf code parsing the callchain and 
resolving the symbols correctly. 

These two patches fix this by implementing perf_arch_fetch_caller_regs
for arm and arm64, which fills several necessary register info for 
callchain unwinding and symbol resolving.

v2->v3:
 - split the original patch into two, one for arm and the other arm64;
 - change '|=' to '=' when setting cpsr. 

Hou Pengyang (2):
  arm: perf: Fix callchain parse error with kernel tracepoint events
  arm64: perf: Fix callchain parse error with kernel tracepoint events

 arch/arm/include/asm/perf_event.h   | 7 +++++++
 arch/arm64/include/asm/perf_event.h | 7 +++++++
 2 files changed, 14 insertions(+)

-- 
1.8.3.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with
@ 2015-05-02  5:58 ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

For arm & arm64, when tracing with tracepoint events, the IP and cpsr 
are set to 0, preventing the perf code parsing the callchain and 
resolving the symbols correctly. 

These two patches fix this by implementing perf_arch_fetch_caller_regs
for arm and arm64, which fills several necessary register info for 
callchain unwinding and symbol resolving.

v2->v3:
 - split the original patch into two, one for arm and the other arm64;
 - change '|=' to '=' when setting cpsr. 

Hou Pengyang (2):
  arm: perf: Fix callchain parse error with kernel tracepoint events
  arm64: perf: Fix callchain parse error with kernel tracepoint events

 arch/arm/include/asm/perf_event.h   | 7 +++++++
 arch/arm64/include/asm/perf_event.h | 7 +++++++
 2 files changed, 14 insertions(+)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] arm: perf: Fix callchain parse error with kernel tracepoint events
  2015-05-02  5:58 ` Hou Pengyang
@ 2015-05-02  5:58   ` Hou Pengyang
  -1 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:58 UTC (permalink / raw)
  To: will.deacon, a.p.zijlstra, paulus, acme, mingo
  Cc: wangnan0, catalin.marinas, linux-kernel, linux-arm-kernel

For ARM, when tracing with tracepoint events, the IP and cpsr are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
    [ perf record: Captured and wrote 0.006 MB perf.data ]
 ./perf report -f
    Samples: 5  of event 'sched:sched_switch', Event count (approx.): 5 
    Children      Self    Command  Shared Object     Symbol
    100.00%       100.00%  ls       [unknown]         [.] 00000000

The fix is to implement perf_arch_fetch_caller_regs for ARM, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and cpsr.

With this patch, callchain can be parsed correctly as :
	
   .....
-  100.00%   100.00%  ls       [kernel.kallsyms]  [k] __sched_text_start 
   + __sched_text_start 
+   20.00%     0.00%  ls       libc-2.18.so       [.] _dl_addr 
+   20.00%     0.00%  ls       libc-2.18.so       [.] write    
   .....

Jean Pihet found this in ARM and come up with a patch:
http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

This patch rewrite Jean's patch in C.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm/include/asm/perf_event.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h
index d9cf138..4f9dec4 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -19,4 +19,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+	(regs)->ARM_pc = (__ip); \
+	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+	(regs)->ARM_sp = current_stack_pointer; \
+	(regs)->ARM_cpsr = SVC_MODE; \
+}
+
 #endif /* __ARM_PERF_EVENT_H__ */
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] arm: perf: Fix callchain parse error with kernel tracepoint events
@ 2015-05-02  5:58   ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

For ARM, when tracing with tracepoint events, the IP and cpsr are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
    [ perf record: Captured and wrote 0.006 MB perf.data ]
 ./perf report -f
    Samples: 5  of event 'sched:sched_switch', Event count (approx.): 5 
    Children      Self    Command  Shared Object     Symbol
    100.00%       100.00%  ls       [unknown]         [.] 00000000

The fix is to implement perf_arch_fetch_caller_regs for ARM, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and cpsr.

With this patch, callchain can be parsed correctly as :
	
   .....
-  100.00%   100.00%  ls       [kernel.kallsyms]  [k] __sched_text_start 
   + __sched_text_start 
+   20.00%     0.00%  ls       libc-2.18.so       [.] _dl_addr 
+   20.00%     0.00%  ls       libc-2.18.so       [.] write    
   .....

Jean Pihet found this in ARM and come up with a patch:
http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

This patch rewrite Jean's patch in C.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm/include/asm/perf_event.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h
index d9cf138..4f9dec4 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -19,4 +19,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+	(regs)->ARM_pc = (__ip); \
+	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+	(regs)->ARM_sp = current_stack_pointer; \
+	(regs)->ARM_cpsr = SVC_MODE; \
+}
+
 #endif /* __ARM_PERF_EVENT_H__ */
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-05-02  5:58 ` Hou Pengyang
@ 2015-05-02  5:58   ` Hou Pengyang
  -1 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:58 UTC (permalink / raw)
  To: will.deacon, a.p.zijlstra, paulus, acme, mingo
  Cc: wangnan0, catalin.marinas, linux-kernel, linux-arm-kernel

For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
    [ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
    Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
    Children      Self    Command  Shared Object     Symbol
    100.00%       100.00%  ls       [unknown]         [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

     ......
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
-    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
     pfkey_send_policy_notify
     pfkey_get
     v9fs_vfs_rename
     page_follow_link_light
     link_path_walk
     el0_svc_naked
    .......

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm64/include/asm/perf_event.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..cc92021 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+	(regs)->ARM_pc = (__ip);    \
+	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+	(regs)->ARM_sp = current_stack_pointer; \
+	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
+}
+
 #endif
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
@ 2015-05-02  5:58   ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
    [ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
    Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
    Children      Self    Command  Shared Object     Symbol
    100.00%       100.00%  ls       [unknown]         [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

     ......
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
-    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
     pfkey_send_policy_notify
     pfkey_get
     v9fs_vfs_rename
     page_follow_link_light
     link_path_walk
     el0_svc_naked
    .......

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm64/include/asm/perf_event.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..cc92021 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+	(regs)->ARM_pc = (__ip);    \
+	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+	(regs)->ARM_sp = current_stack_pointer; \
+	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
+}
+
 #endif
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-05-02  5:58   ` Hou Pengyang
@ 2015-05-05 17:00     ` Will Deacon
  -1 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2015-05-05 17:00 UTC (permalink / raw)
  To: Hou Pengyang
  Cc: a.p.zijlstra, paulus, acme, mingo, wangnan0, Catalin Marinas,
	linux-kernel, linux-arm-kernel

On Sat, May 02, 2015 at 06:58:17AM +0100, Hou Pengyang wrote:
> For ARM64, when tracing with tracepoint events, the IP and pstate are set
> to 0, preventing the perf code parsing the callchain and resolving the
> symbols correctly.
> 
>  ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>     [ perf record: Captured and wrote 0.146 MB perf.data ]
>  ./perf report -f
>     Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>     Children      Self    Command  Shared Object     Symbol
>     100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
> 
> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
> several necessary registers used for callchain unwinding, including pc,sp,
> fp and spsr .
> 
> With this patch, callchain can be parsed correctly as follows:
> 
>      ......
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>      pfkey_send_policy_notify
>      pfkey_get
>      v9fs_vfs_rename
>      page_follow_link_light
>      link_path_walk
>      el0_svc_naked
>     .......
> 
> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
> ---
>  arch/arm64/include/asm/perf_event.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> index d26d1d5..cc92021 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>  #define perf_misc_flags(regs)	perf_misc_flags(regs)
>  #endif
>  
> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
> +	(regs)->ARM_pc = (__ip);    \
> +	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
> +	(regs)->ARM_sp = current_stack_pointer; \
> +	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
> +}

This can't possibly compile, therefore you can't possibly have tested it.

Please fix the code and actually check that you're getting sensible
callchains before sending a new version of the patch.

Thanks,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
@ 2015-05-05 17:00     ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2015-05-05 17:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, May 02, 2015 at 06:58:17AM +0100, Hou Pengyang wrote:
> For ARM64, when tracing with tracepoint events, the IP and pstate are set
> to 0, preventing the perf code parsing the callchain and resolving the
> symbols correctly.
> 
>  ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>     [ perf record: Captured and wrote 0.146 MB perf.data ]
>  ./perf report -f
>     Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>     Children      Self    Command  Shared Object     Symbol
>     100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
> 
> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
> several necessary registers used for callchain unwinding, including pc,sp,
> fp and spsr .
> 
> With this patch, callchain can be parsed correctly as follows:
> 
>      ......
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>      pfkey_send_policy_notify
>      pfkey_get
>      v9fs_vfs_rename
>      page_follow_link_light
>      link_path_walk
>      el0_svc_naked
>     .......
> 
> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
> ---
>  arch/arm64/include/asm/perf_event.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> index d26d1d5..cc92021 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>  #define perf_misc_flags(regs)	perf_misc_flags(regs)
>  #endif
>  
> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
> +	(regs)->ARM_pc = (__ip);    \
> +	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
> +	(regs)->ARM_sp = current_stack_pointer; \
> +	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
> +}

This can't possibly compile, therefore you can't possibly have tested it.

Please fix the code and actually check that you're getting sensible
callchains before sending a new version of the patch.

Thanks,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-05-05 17:00     ` Will Deacon
@ 2015-05-06  4:13       ` Hou Pengyang
  -1 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-06  4:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: a.p.zijlstra, paulus, acme, mingo, wangnan0, Catalin Marinas,
	linux-kernel, linux-arm-kernel

On 2015/5/6 1:00, Will Deacon wrote:
> On Sat, May 02, 2015 at 06:58:17AM +0100, Hou Pengyang wrote:
>> For ARM64, when tracing with tracepoint events, the IP and pstate are set
>> to 0, preventing the perf code parsing the callchain and resolving the
>> symbols correctly.
>>
>>   ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>>      [ perf record: Captured and wrote 0.146 MB perf.data ]
>>   ./perf report -f
>>      Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>>      Children      Self    Command  Shared Object     Symbol
>>      100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
>>
>> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
>> several necessary registers used for callchain unwinding, including pc,sp,
>> fp and spsr .
>>
>> With this patch, callchain can be parsed correctly as follows:
>>
>>       ......
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
>> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>>       pfkey_send_policy_notify
>>       pfkey_get
>>       v9fs_vfs_rename
>>       page_follow_link_light
>>       link_path_walk
>>       el0_svc_naked
>>      .......
>>
>> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
>> ---
>>   arch/arm64/include/asm/perf_event.h | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
>> index d26d1d5..cc92021 100644
>> --- a/arch/arm64/include/asm/perf_event.h
>> +++ b/arch/arm64/include/asm/perf_event.h
>> @@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>>   #define perf_misc_flags(regs)	perf_misc_flags(regs)
>>   #endif
>>
>> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
>> +	(regs)->ARM_pc = (__ip);    \
>> +	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
>> +	(regs)->ARM_sp = current_stack_pointer; \
>> +	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
>> +}
>
> This can't possibly compile, therefore you can't possibly have tested it.
>
I am so sorry. I did test the patch, but on mainline 4.0 +
David long's patches for ARM64 kprobe which are not included in
mainline now. In David's patches, there are macros like ARM_pc, ARM_fp, 
ARM_sp and ARM_cpsr, my patch incorrectly used these macros which
results in such compile errors if applied to 4.0 directly:
	error: 'struct pt_regs' has no member named 'ARM_pc'
	error: 'struct pt_regs' has no member named 'ARM_fp'
	error: 'struct pt_regs' has no member named 'ARM_sp'
	error: 'struct pt_regs' has no member named 'ARM_cpsr'

I will fix the code and do more test.


> Please fix the code and actually check that you're getting sensible
> callchains before sending a new version of the patch.
>
> Thanks,
>
> Will
>
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
@ 2015-05-06  4:13       ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-06  4:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 2015/5/6 1:00, Will Deacon wrote:
> On Sat, May 02, 2015 at 06:58:17AM +0100, Hou Pengyang wrote:
>> For ARM64, when tracing with tracepoint events, the IP and pstate are set
>> to 0, preventing the perf code parsing the callchain and resolving the
>> symbols correctly.
>>
>>   ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>>      [ perf record: Captured and wrote 0.146 MB perf.data ]
>>   ./perf report -f
>>      Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
>>      Children      Self    Command  Shared Object     Symbol
>>      100.00%       100.00%  ls       [unknown]         [.] 0000000000000000
>>
>> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
>> several necessary registers used for callchain unwinding, including pc,sp,
>> fp and spsr .
>>
>> With this patch, callchain can be parsed correctly as follows:
>>
>>       ......
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
>> +    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
>> -    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
>>       pfkey_send_policy_notify
>>       pfkey_get
>>       v9fs_vfs_rename
>>       page_follow_link_light
>>       link_path_walk
>>       el0_svc_naked
>>      .......
>>
>> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
>> ---
>>   arch/arm64/include/asm/perf_event.h | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
>> index d26d1d5..cc92021 100644
>> --- a/arch/arm64/include/asm/perf_event.h
>> +++ b/arch/arm64/include/asm/perf_event.h
>> @@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>>   #define perf_misc_flags(regs)	perf_misc_flags(regs)
>>   #endif
>>
>> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
>> +	(regs)->ARM_pc = (__ip);    \
>> +	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
>> +	(regs)->ARM_sp = current_stack_pointer; \
>> +	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
>> +}
>
> This can't possibly compile, therefore you can't possibly have tested it.
>
I am so sorry. I did test the patch, but on mainline 4.0 +
David long's patches for ARM64 kprobe which are not included in
mainline now. In David's patches, there are macros like ARM_pc, ARM_fp, 
ARM_sp and ARM_cpsr, my patch incorrectly used these macros which
results in such compile errors if applied to 4.0 directly:
	error: 'struct pt_regs' has no member named 'ARM_pc'
	error: 'struct pt_regs' has no member named 'ARM_fp'
	error: 'struct pt_regs' has no member named 'ARM_sp'
	error: 'struct pt_regs' has no member named 'ARM_cpsr'

I will fix the code and do more test.


> Please fix the code and actually check that you're getting sensible
> callchains before sending a new version of the patch.
>
> Thanks,
>
> Will
>
> .
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
  2015-05-02  5:42 [PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with Hou Pengyang
@ 2015-05-02  5:42   ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:42 UTC (permalink / raw)
  To: will.deacon, a.p.zijlstra, paulus, acme, mingo
  Cc: wannan0, catalin.marinas, linux-kernel, linux-arm-kernel

For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
    [ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
    Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
    Children      Self    Command  Shared Object     Symbol
    100.00%       100.00%  ls       [unknown]         [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

     ......
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
-    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
     pfkey_send_policy_notify
     pfkey_get
     v9fs_vfs_rename
     page_follow_link_light
     link_path_walk
     el0_svc_naked
    .......

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm64/include/asm/perf_event.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..cc92021 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+	(regs)->ARM_pc = (__ip);    \
+	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+	(regs)->ARM_sp = current_stack_pointer; \
+	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
+}
+
 #endif
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events
@ 2015-05-02  5:42   ` Hou Pengyang
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Pengyang @ 2015-05-02  5:42 UTC (permalink / raw)
  To: linux-arm-kernel

For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
    [ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
    Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
    Children      Self    Command  Shared Object     Symbol
    100.00%       100.00%  ls       [unknown]         [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

     ......
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] vfs_symlink
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] follow_down
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_get
+    2.63%     0.00%  ls       [kernel.kallsyms]  [k] do_execveat_common.isra.33
-    2.63%     0.00%  ls       [kernel.kallsyms]  [k] pfkey_send_policy_notify
     pfkey_send_policy_notify
     pfkey_get
     v9fs_vfs_rename
     page_follow_link_light
     link_path_walk
     el0_svc_naked
    .......

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 arch/arm64/include/asm/perf_event.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..cc92021 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+	(regs)->ARM_pc = (__ip);    \
+	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+	(regs)->ARM_sp = current_stack_pointer; \
+	(regs)->ARM_cpsr = PSR_MODE_EL1h;	\
+}
+
 #endif
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-06  4:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-02  5:58 [PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with Hou Pengyang
2015-05-02  5:58 ` Hou Pengyang
2015-05-02  5:58 ` [PATCH v3 1/2] arm: perf: Fix callchain parse error with kernel tracepoint events Hou Pengyang
2015-05-02  5:58   ` Hou Pengyang
2015-05-02  5:58 ` [PATCH v3 2/2] arm64: " Hou Pengyang
2015-05-02  5:58   ` Hou Pengyang
2015-05-05 17:00   ` Will Deacon
2015-05-05 17:00     ` Will Deacon
2015-05-06  4:13     ` Hou Pengyang
2015-05-06  4:13       ` Hou Pengyang
  -- strict thread matches above, loose matches on Subject: below --
2015-05-02  5:42 [PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with Hou Pengyang
2015-05-02  5:42 ` [PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events Hou Pengyang
2015-05-02  5:42   ` Hou Pengyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.