All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
@ 2014-06-05  2:36 Liu ShuoX
  2014-06-05  7:19 ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Liu ShuoX @ 2014-06-05  2:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Zhang Yanmin, yanmin_zhang

From: Zhang Yanmin <yanmin.zhang@intel.com>

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

[   71.965351, 1] [       Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[   71.965360, 1] [       Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
[   71.965367, 1] [       Binder_2] PGD 6c65f067 PUD 0
[   71.965375, 1] [       Binder_2] Oops: 0000 [#1] PREEMPT SMP
[   71.965413, 1] [       Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
[   71.965420, 1] [       Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
[   71.965426, 1] [       Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
[   71.965439, 1] [       Binder_2] RIP: 0010:[<ffffffff82012091>]  [<ffffffæf82012091>] get_segment_base+0x71/0xc0
[   71.965<44, 1] [       Binder_2] RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
[   71.965447, 1] [      !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
[   71.965450, 1] [       Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[   71.965454, 1] [       Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
[   71.965458, 1] [       Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
[   71.965462, 1] [       Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
[   71.965468, 1_ [       Binder_2] FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
[   71.965472, 1] [       Binder_2] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[   71.965476, 1] [       Binder_2] CR2: 0000000000000004 CR3: 0000000076588000 CR4: 00000^P00001007e0
[   71.965480, 1] [       Binder_2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   71.965485, 1] [       Binder_2] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   71.966141, 1] [       Binder_2] Stack:
[   71.966152, 1] [       Binder_2]  ffff88005f266c00 0000000000000000 ffff88007ea87c18 ffffffff82013cac
[   71.966161, 1] [       Binder_2]  ffff88007ea87d58 00000016fe4704a0 00000000000001a7 ffff88007ea87ef8
[   71.966171, 1] [       Binder_6]  ffff88005f266c00 ffff88007ea87ef8 ffff8800!e07b400 ffff88005f266c00
[   71.966173, 1] [       Binder_2] Call Trace:
[   71.966179, 1] [       Binder_2]  <NMI>
[   71.966190, 1] [       Binder_2]  [<ffffffff82013cac>] perf_callchain_user+0x15c/0x240
[   71.966202, 1] [       Binder_2]  [<ffffffff82160754>] perf_callchain+0x134/0x180
[   71.966210, 1] [       Binder_2]  [<fffff&ff820e0787>] ? local_clock+0x47/0x60
[   71.966221, 1] [       Binder_2]  [<ffffffff8215d49b>] perf_prepare_sample+0x1bb/0x240
[   71.966231, 1] [       Binder_2]  [<ffffffff8215d667>] __perf_event_overflow+0x147/0x230
[   71.966241, 1] [       Binder_2]  [<ffffffff82012f68>] ? x86_perf_event_set_period+0xd8/0x150
[   71.966252, 1] [       Binder_2]  [<ffffffff8215df24>] perf_event_overflow+0x14/0x20
[   71.966260, 1] [       Binder_2]  [<ffffffff820194d2>] intel_pmu_handle_irq+0x1c2/0x270
[   71.966270, 1] [       Binder_2]  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[   71.966284, 1] [       Binder_2]$ [<ffffffff828aff01>] perf_event_nmi_handler+0x21/0x30
[   71.966293, 1] [       Binder_2]  [<ffffffff828af5b9>] nmi_handle.isr!.1+0x59/0x=0
[   71.966303, 1] [`      Binder_2]  [<ffffffff828af6d8>] default_do_nmi+0x58/0x240
[   71.966312, 1] [    "  Binder_2]  [<ffffffff828af978>] do_nmi+0xb8/0xf0
[   71.966321, 1] [       Binder_2]  [|ffffffgf828aebe7>] end_repeat_nmi+0x1e/0x2e
[   71.966332, 1] [       Binder_2]  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[   71.966341, 1] [       Binder_2]  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[   71.966350, 1] [       Binder_2]  [<fFffffff828b5d60>] ? call_softirq+0x30/0x30

Basically, ia32 uses sysenter to start system calls.

sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.

The patch fixes it  by moving the calling to trace_hardirqs_on_thunk ahead of
the stack popup.

Change-Id: I6c4fc46b009ea056f2321ce5b8f54cf8769a7bdd
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
---
  arch/x86/ia32/ia32entry.S | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..df61fdb 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -167,6 +167,7 @@ sysenter_dispatch:
  	testl	$_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
  	jnz	sysexit_audit
  sysexit_from_sys_call:
+	TRACE_IRQS_ON
  	andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
  	/* clear IF, that popfq doesn't enable interrupts early */
  	andl  $~0x200,EFLAGS-R11(%rsp) 
@@ -181,7 +182,6 @@ sysexit_from_sys_call:
  	/*CFI_RESTORE rflags*/
  	popq_cfi %rcx				/* User %esp */
  	CFI_REGISTER rsp,rcx
-	TRACE_IRQS_ON
  	ENABLE_INTERRUPTS_SYSEXIT32
  
  #ifdef CONFIG_AUDITSYSCALL
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  2:36 [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs Liu ShuoX
@ 2014-06-05  7:19 ` Peter Zijlstra
  2014-06-05  7:33   ` Liu ShuoX
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2014-06-05  7:19 UTC (permalink / raw)
  To: Liu ShuoX
  Cc: linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin, yanmin_zhang

[-- Attachment #1: Type: text/plain, Size: 3703 bytes --]

On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
> From: Zhang Yanmin <yanmin.zhang@intel.com>
> 
> We hit a kernel panic when running perf to collect some performance data.
> kenel is x86_64 and user space apps are 32bit.
> 
> [   71.965351, 1] [       Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [   71.965360, 1] [       Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
> [   71.965367, 1] [       Binder_2] PGD 6c65f067 PUD 0
> [   71.965375, 1] [       Binder_2] Oops: 0000 [#1] PREEMPT SMP
> [   71.965413, 1] [       Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
> [   71.965420, 1] [       Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
> [   71.965426, 1] [       Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
> [   71.965439, 1] [       Binder_2] RIP: 0010:[<ffffffff82012091>]  [<ffffffæf82012091>] get_segment_base+0x71/0xc0
                                                                              ^
> [   71.965<44, 1] [       Binder_2] RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
            ^                                       ^
> [   71.965447, 1] [      !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
                           ^
> [   71.965450, 1] [       Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
> [   71.965454, 1] [       Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
> [   71.965458, 1] [       Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
> [   71.965462, 1] [       Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
> [   71.965468, 1_ [       Binder_2] FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
                  ^

Are you suffering some serious corruption?

> Basically, ia32 uses sysenter to start system calls.
> 
> sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
> sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
> would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.
> 
> The patch fixes it  by moving the calling to trace_hardirqs_on_thunk ahead of
> the stack popup.
> 
> Change-Id: I6c4fc46b009ea056f2321ce5b8f54cf8769a7bdd

No idea what that is, but it needs to go.

I'll leave the actual patch to hpa, this isn't something I'm too
familiar with.

> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
> ---
>  arch/x86/ia32/ia32entry.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb0..df61fdb 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -167,6 +167,7 @@ sysenter_dispatch:
>  	testl	$_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>  	jnz	sysexit_audit
>  sysexit_from_sys_call:
> +	TRACE_IRQS_ON
>  	andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>  	/* clear IF, that popfq doesn't enable interrupts early */
>  	andl  $~0x200,EFLAGS-R11(%rsp) @@ -181,7 +182,6 @@ sysexit_from_sys_call:
>  	/*CFI_RESTORE rflags*/
>  	popq_cfi %rcx				/* User %esp */
>  	CFI_REGISTER rsp,rcx
> -	TRACE_IRQS_ON
>  	ENABLE_INTERRUPTS_SYSEXIT32
>  #ifdef CONFIG_AUDITSYSCALL
> -- 
> 1.8.3.2
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  7:19 ` Peter Zijlstra
@ 2014-06-05  7:33   ` Liu ShuoX
  2014-06-05  7:40     ` [PATCH v2] " Liu ShuoX
  2014-06-05  7:55     ` [PATCH] " Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Liu ShuoX @ 2014-06-05  7:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin, yanmin_zhang

On Thu  5.Jun'14 at  9:19:19 +0200, Peter Zijlstra wrote:
>On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
>> From: Zhang Yanmin <yanmin.zhang@intel.com>
>>
>> We hit a kernel panic when running perf to collect some performance data.
>> kenel is x86_64 and user space apps are 32bit.
>>
>> [   71.965351, 1] [       Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> [   71.965360, 1] [       Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
>> [   71.965367, 1] [       Binder_2] PGD 6c65f067 PUD 0
>> [   71.965375, 1] [       Binder_2] Oops: 0000 [#1] PREEMPT SMP
>> [   71.965413, 1] [       Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
>> [   71.965420, 1] [       Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
>> [   71.965426, 1] [       Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
>> [   71.965439, 1] [       Binder_2] RIP: 0010:[<ffffffff82012091>]  [<ffffff?f82012091>] get_segment_base+0x71/0xc0
>                                                                              ^
>> [   71.965<44, 1] [       Binder_2] RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
>            ^                                       ^
>> [   71.965447, 1] [      !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
>                           ^
>> [   71.965450, 1] [       Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
>> [   71.965454, 1] [       Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
>> [   71.965458, 1] [       Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
>> [   71.965462, 1] [       Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
>> [   71.965468, 1_ [       Binder_2] FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
>                  ^
>
>Are you suffering some serious corruption?
The log captured by pstore after rebooting, so there are some
corruption. Please ignore those.
>
>> Basically, ia32 uses sysenter to start system calls.
>>
>> sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
>> sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
>> would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.
>>
>> The patch fixes it  by moving the calling to trace_hardirqs_on_thunk ahead of
>> the stack popup.
>>
>> Change-Id: I6c4fc46b009ea056f2321ce5b8f54cf8769a7bdd
>
>No idea what that is, but it needs to go.
>
>I'll leave the actual patch to hpa, this isn't something I'm too
>familiar with.
Thanks. I will send out v2. We changed the code in v2.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  7:33   ` Liu ShuoX
@ 2014-06-05  7:40     ` Liu ShuoX
  2014-06-06  1:55       ` [PATCH v3] " Liu ShuoX
  2014-06-05  7:55     ` [PATCH] " Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Liu ShuoX @ 2014-06-05  7:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Zhang Yanmin, yanmin_zhang

From: Zhang Yanmin <yanmin.zhang@intel.com>

ChangeLog V2: 	Before sysexit, perf NMI might arrive. There is
		still a race. Here we change rsp to keep it pointing
		to pt_regs->orig_ax.
		In addition, after sti, before sysexit, an irq might
		arrives. That causes more chances for perf NMI to jump
		in.

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

[71.965351, 1] [Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[71.965360, 1] [Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
[71.965367, 1] [Binder_2] PGD 6c65f067 PUD 0
[71.965375, 1] [Binder_2] Oops: 0000 [#1] PREEMPT SMP
[71.965413, 1] [Binder_2] Modules linked in: <...>
[71.965420, 1] [Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
[71.965426, 1] [Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
[71.965439, 1] [Binder_2] RIP: 0010:[<ffffffff82012091>]  [<ffffffæf82012091>] get_segment_base+0x71/0xc0
[71.965<44, 1] [Binder_2] RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
[71.965447, 1] [Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
[71.965450, 1] [Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[71.965454, 1] [Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
[71.965458, 1] [Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
[71.965462, 1] [Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
[71.965468, 1_ [Binder_2] FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
[71.965472, 1] [Binder_2] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[71.965476, 1] [Binder_2] CR2: 0000000000000004 CR3: 0000000076588000 CR4: 00000^P00001007e0
[71.965480, 1] [Binder_2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[71.965485, 1] [Binder_2] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[71.966141, 1] [Binder_2] Stack:
[71.966152, 1] [Binder_2]  ffff88005f266c00 0000000000000000 ffff88007ea87c18 ffffffff82013cac
[71.966161, 1] [Binder_2]  ffff88007ea87d58 00000016fe4704a0 00000000000001a7 ffff88007ea87ef8
[71.966171, 1] [Binder_6]  ffff88005f266c00 ffff88007ea87ef8 ffff8800!e07b400 ffff88005f266c00
[71.966173, 1] [Binder_2] Call Trace:
[71.966179, 1] [Binder_2]  <NMI>
[71.966190, 1] [Binder_2]  [<ffffffff82013cac>] perf_callchain_user+0x15c/0x240
[71.966202, 1] [Binder_2]  [<ffffffff82160754>] perf_callchain+0x134/0x180
[71.966210, 1] [Binder_2]  [<fffff&ff820e0787>] ? local_clock+0x47/0x60
[71.966221, 1] [Binder_2]  [<ffffffff8215d49b>] perf_prepare_sample+0x1bb/0x240
[71.966231, 1] [Binder_2]  [<ffffffff8215d667>] __perf_event_overflow+0x147/0x230
[71.966241, 1] [Binder_2]  [<ffffffff82012f68>] ? x86_perf_event_set_period+0xd8/0x150
[71.966252, 1] [Binder_2]  [<ffffffff8215df24>] perf_event_overflow+0x14/0x20
[71.966260, 1] [Binder_2]  [<ffffffff820194d2>] intel_pmu_handle_irq+0x1c2/0x270
[71.966270, 1] [Binder_2]  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[71.966284, 1] [Binder_2]$ [<ffffffff828aff01>] perf_event_nmi_handler+0x21/0x30
[71.966293, 1] [Binder_2]  [<ffffffff828af5b9>] nmi_handle.isr!.1+0x59/0x=0
[71.966303, 1] [Binder_2]  [<ffffffff828af6d8>] default_do_nmi+0x58/0x240
[71.966312, 1] [Binder_2]  [<ffffffff828af978>] do_nmi+0xb8/0xf0
[71.966321, 1] [Binder_2]  [|ffffffgf828aebe7>] end_repeat_nmi+0x1e/0x2e
[71.966332, 1] [Binder_2]  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[71.966341, 1] [Binder_2]  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[71.966350, 1] [Binder_2]  [<fFffffff828b5d60>] ? call_softirq+0x30/0x30

Basically, ia32 uses sysenter to start system calls.

sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.

The patch fixes it by keeping rsp pointing under pt_regs->ip.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
---
  arch/x86/ia32/ia32entry.S | 11 ++++++-----
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..a695d8e 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -172,15 +172,16 @@ sysexit_from_sys_call:
  	andl  $~0x200,EFLAGS-R11(%rsp) 
  	movl	RIP-R11(%rsp),%edx		/* User %eip */
  	CFI_REGISTER rip,rdx
-	RESTORE_ARGS 0,24,0,0,0,0
-	xorq	%r8,%r8
+	RESTORE_ARGS 0,0,0,0,0,0
+	movq	24(%rsp),%r8			/* rflags */
+	movq	32(%rsp),%rcx			/* User %esp */
  	xorq	%r9,%r9
  	xorq	%r10,%r10
  	xorq	%r11,%r11
-	popfq_cfi
+	pushq_cfi %r8
  	/*CFI_RESTORE rflags*/
-	popq_cfi %rcx				/* User %esp */
-	CFI_REGISTER rsp,rcx
+	popfq_cfi
+	xorq	%r8,%r8
  	TRACE_IRQS_ON
  	ENABLE_INTERRUPTS_SYSEXIT32
  
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  7:33   ` Liu ShuoX
  2014-06-05  7:40     ` [PATCH v2] " Liu ShuoX
@ 2014-06-05  7:55     ` Peter Zijlstra
  2014-06-05  8:00       ` Zhang, Yanmin
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2014-06-05  7:55 UTC (permalink / raw)
  To: Liu ShuoX
  Cc: linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin, yanmin_zhang

[-- Attachment #1: Type: text/plain, Size: 2529 bytes --]

On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote:
> On Thu  5.Jun'14 at  9:19:19 +0200, Peter Zijlstra wrote:
> >On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
> >>From: Zhang Yanmin <yanmin.zhang@intel.com>
> >>
> >>We hit a kernel panic when running perf to collect some performance data.
> >>kenel is x86_64 and user space apps are 32bit.
> >>
> >>[   71.965351, 1] [       Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> >>[   71.965360, 1] [       Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
> >>[   71.965367, 1] [       Binder_2] PGD 6c65f067 PUD 0
> >>[   71.965375, 1] [       Binder_2] Oops: 0000 [#1] PREEMPT SMP
> >>[   71.965413, 1] [       Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
> >>[   71.965420, 1] [       Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
> >>[   71.965426, 1] [       Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
> >>[   71.965439, 1] [       Binder_2] RIP: 0010:[<ffffffff82012091>]  [<ffffff?f82012091>] get_segment_base+0x71/0xc0
> >                                                                             ^
> >>[   71.965<44, 1] [       Binder_2] RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
> >           ^                                       ^
> >>[   71.965447, 1] [      !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
> >                          ^
> >>[   71.965450, 1] [       Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
> >>[   71.965454, 1] [       Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
> >>[   71.965458, 1] [       Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
> >>[   71.965462, 1] [       Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
> >>[   71.965468, 1_ [       Binder_2] FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
> >                 ^
> >
> >Are you suffering some serious corruption?
> The log captured by pstore after rebooting, so there are some
> corruption. Please ignore those.

Why does pstore cause corruption? I thought that stuff was supposed to
be 'good' ?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  7:55     ` [PATCH] " Peter Zijlstra
@ 2014-06-05  8:00       ` Zhang, Yanmin
  2014-06-05  9:15         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Zhang, Yanmin @ 2014-06-05  8:00 UTC (permalink / raw)
  To: Peter Zijlstra, Liu ShuoX
  Cc: linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin

On 2014/6/5 15:55, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote:
>> On Thu  5.Jun'14 at  9:19:19 +0200, Peter Zijlstra wrote:
>>> On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
>>>> From: Zhang Yanmin <yanmin.zhang@intel.com>
>>>>
>>>> We hit a kernel panic when running perf to collect some performance data.
>>>> kenel is x86_64 and user space apps are 32bit.
>>>>
>>>> [   71.965351, 1] [       Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>>> [   71.965360, 1] [       Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
>>>> [   71.965367, 1] [       Binder_2] PGD 6c65f067 PUD 0
>>>> [   71.965375, 1] [       Binder_2] Oops: 0000 [#1] PREEMPT SMP
>>>> [   71.965413, 1] [       Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
>>>> [   71.965420, 1] [       Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
>>>> [   71.965426, 1] [       Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
>>>> [   71.965439, 1] [       Binder_2] RIP: 0010:[<ffffffff82012091>]  [<ffffff?f82012091>] get_segment_base+0x71/0xc0
>>>                                                                              ^
>>>> [   71.965<44, 1] [       Binder_2] RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
>>>            ^                                       ^
>>>> [   71.965447, 1] [      !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
>>>                           ^
>>>> [   71.965450, 1] [       Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
>>>> [   71.965454, 1] [       Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
>>>> [   71.965458, 1] [       Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
>>>> [   71.965462, 1] [       Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
>>>> [   71.965468, 1_ [       Binder_2] FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
>>>                  ^
>>>
>>> Are you suffering some serious corruption?
>> The log captured by pstore after rebooting, so there are some
>> corruption. Please ignore those.
> Why does pstore cause corruption? I thought that stuff was supposed to
> be 'good' ?
pstore is good if the board is reset by WarmReset as memory content is 
kept across rebooting.
If it's a ColdReset, memory might lose some or all contents. My board 
uses Coldreset, which is a very
fast ColdReset. Most memory content can be kept. But sometimes, some 
data has little change.

Yanmin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  8:00       ` Zhang, Yanmin
@ 2014-06-05  9:15         ` Peter Zijlstra
  2014-06-05 13:15           ` Zhang, Yanmin
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2014-06-05  9:15 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Liu ShuoX, linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote:
> On 2014/6/5 15:55, Peter Zijlstra wrote:

> >Why does pstore cause corruption? I thought that stuff was supposed to
> >be 'good' ?

> pstore is good if the board is reset by WarmReset as memory content is kept
> across rebooting.
> If it's a ColdReset, memory might lose some or all contents. My board uses
> Coldreset, which is a very
> fast ColdReset. Most memory content can be kept. But sometimes, some data
> has little change.

Oh, I thought pstore was persistent across cold resets or even power
outages. My bad then.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  9:15         ` Peter Zijlstra
@ 2014-06-05 13:15           ` Zhang, Yanmin
  2014-06-05 13:21             ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Zhang, Yanmin @ 2014-06-05 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liu ShuoX, linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin

On 2014/6/5 17:15, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote:
>> On 2014/6/5 15:55, Peter Zijlstra wrote:
>>> Why does pstore cause corruption? I thought that stuff was supposed to
>>> be 'good' ?
>> pstore is good if the board is reset by WarmReset as memory content is kept
>> across rebooting.
>> If it's a ColdReset, memory might lose some or all contents. My board uses
>> Coldreset, which is a very
>> fast ColdReset. Most memory content can be kept. But sometimes, some data
>> has little change.
> Oh, I thought pstore was persistent across cold resets or even power
> outages. My bad then.

Sorry for misleading you. It depends on backend. Pstore can use RAM or
other storage, for example, EFI-provided backend. If using EFI backend,
pstore can keep persistent across cold reset or even power outrages.

If using RAM, usually pstore can keep persistent at WarmReset. Since RAM
access (even at non-cache mode) is fast and more flexible, in fact, we often use
RAM as backend of pstore.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05 13:15           ` Zhang, Yanmin
@ 2014-06-05 13:21             ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2014-06-05 13:21 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Liu ShuoX, linux-kernel, H. Peter Anvin, Ingo Molnar, Zhang Yanmin

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Thu, Jun 05, 2014 at 09:15:36PM +0800, Zhang, Yanmin wrote:

> Sorry for misleading you. It depends on backend. Pstore can use RAM or
> other storage, for example, EFI-provided backend. If using EFI backend,
> pstore can keep persistent across cold reset or even power outrages.
> 
> If using RAM, usually pstore can keep persistent at WarmReset. Since RAM
> access (even at non-cache mode) is fast and more flexible, in fact, we often use
> RAM as backend of pstore.

Funny definition of persistent you end up with in that case... :-) But
yes, that does explain.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] perf: fix kernel panic when parsing user space CS saved in pt_regs
  2014-06-05  7:40     ` [PATCH v2] " Liu ShuoX
@ 2014-06-06  1:55       ` Liu ShuoX
  0 siblings, 0 replies; 10+ messages in thread
From: Liu ShuoX @ 2014-06-06  1:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Zhang Yanmin, yanmin_zhang

From: Zhang Yanmin <yanmin.zhang@intel.com>

ChangeLog V3: 	Keep rsp pointing to pt_regs before sysexit.

ChangeLog V2: 	Before sysexit, perf NMI might arrive. There is
		still a race. Here we change rsp to keep it pointing
		to pt_regs->orig_ax.
		In addition, after sti, before sysexit, an irq might
		arrives. That causes more chances for perf NMI to jump
		in.

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
PGD 6c65f067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:  <...>
CPU: 1 PID: 304 Comm: Binder_2 Tainted: G        W  O 3.10.20-263902-g184bfbc-dirty #14
task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
RIP: 0010:[<ffffffff82012091>]  [<ffffffæf82012091>] get_segment_base+0x71/0xc0
RSP: 0018:ffff^X8007ea87b98  EFLAGS: 00010092
RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
FS:  0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000000076588000 CR4: 00000^P00001007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
  ffff88005f266c00 0000000000000000 ffff88007ea87c18 ffffffff82013cac
  ffff88007ea87d58 00000016fe4704a0 00000000000001a7 ffff88007ea87ef8
  ffff88005f266c00 ffff88007ea87ef8 ffff8800!e07b400 ffff88005f266c00
Call Trace:
  <NMI>
  [<ffffffff82013cac>] perf_callchain_user+0x15c/0x240
  [<ffffffff82160754>] perf_callchain+0x134/0x180
  [<fffff&ff820e0787>] ? local_clock+0x47/0x60
  [<ffffffff8215d49b>] perf_prepare_sample+0x1bb/0x240
  [<ffffffff8215d667>] __perf_event_overflow+0x147/0x230
  [<ffffffff82012f68>] ? x86_perf_event_set_period+0xd8/0x150
  [<ffffffff8215df24>] perf_event_overflow+0x14/0x20
  [<ffffffff820194d2>] intel_pmu_handle_irq+0x1c2/0x270
  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
  [<ffffffff828aff01>] perf_event_nmi_handler+0x21/0x30
  [<ffffffff828af5b9>] nmi_handle.isr!.1+0x59/0x=0
  [<ffffffff828af6d8>] default_do_nmi+0x58/0x240
  [<ffffffff828af978>] do_nmi+0xb8/0xf0
  [|ffffffgf828aebe7>] end_repeat_nmi+0x1e/0x2e
  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
  [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
  [<fFffffff828b5d60>] ? call_softirq+0x30/0x30

Basically, ia32 uses sysenter to start system calls.

sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.

The patch fixes it by keeping rsp pointing to pt_regs.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Liu Shuox <shuox.liu@intel.com>
---
  arch/x86/ia32/ia32entry.S | 11 ++++++-----
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..d2f905b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -172,15 +172,16 @@ sysexit_from_sys_call:
  	andl  $~0x200,EFLAGS-R11(%rsp) 
  	movl	RIP-R11(%rsp),%edx		/* User %eip */
  	CFI_REGISTER rip,rdx
-	RESTORE_ARGS 0,24,0,0,0,0
-	xorq	%r8,%r8
+	RESTORE_ARGS 0,-ARG_SKIP,0,0,0,0
+	movq	EFLAGS-R11(%rsp),%r8		/* rflags */
+	movq	RSP-R11(%rsp),%rcx		/* User %esp */
  	xorq	%r9,%r9
  	xorq	%r10,%r10
  	xorq	%r11,%r11
-	popfq_cfi
+	pushq_cfi %r8
  	/*CFI_RESTORE rflags*/
-	popq_cfi %rcx				/* User %esp */
-	CFI_REGISTER rsp,rcx
+	popfq_cfi
+	xorq	%r8,%r8
  	TRACE_IRQS_ON
  	ENABLE_INTERRUPTS_SYSEXIT32
  
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-06-06  1:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-05  2:36 [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs Liu ShuoX
2014-06-05  7:19 ` Peter Zijlstra
2014-06-05  7:33   ` Liu ShuoX
2014-06-05  7:40     ` [PATCH v2] " Liu ShuoX
2014-06-06  1:55       ` [PATCH v3] " Liu ShuoX
2014-06-05  7:55     ` [PATCH] " Peter Zijlstra
2014-06-05  8:00       ` Zhang, Yanmin
2014-06-05  9:15         ` Peter Zijlstra
2014-06-05 13:15           ` Zhang, Yanmin
2014-06-05 13:21             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.