All of lore.kernel.org
 help / color / mirror / Atom feed
* perf: perf_fuzzer triggers vmalloc_fault (then crashes)
@ 2016-10-22  3:05 Vince Weaver
  2016-10-24 10:18 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Vince Weaver @ 2016-10-22  3:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo


This is on an AMD a10 system.  With paranoid=1.  Think it's 
probably unrelated to the (unreseolved) AMD IBS issues.
This is 4.9-rc0 just before rc1 (can't get actual rc1 to boot)

Machine locks hard after this.

[ 8098.085662] BAD LUCK: lost 42 message(s) from NMI context!
[ 8098.085663] ------------[ cut here ]------------
[ 8098.085664] WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
[ 8098.085668] CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
[ 8098.085668] Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
[ 8098.085670] Call Trace:
[ 8098.085670]  <NMI>  [<ffffffff81263c2f>] ? dump_stack+0x46/0x59
[ 8098.085670]  [<ffffffff810499c4>] ? __warn+0xd5/0xee
[ 8098.085671]  [<ffffffff8103b4de>] ? vmalloc_fault+0x58/0x1f0
[ 8098.085671]  [<ffffffff8103c193>] ? __do_page_fault+0x6d/0x48e
[ 8098.085671]  [<ffffffff810eaf5b>] ? perf_log_throttle+0xa4/0xf4
[ 8098.085672]  [<ffffffff8145df72>] ? trace_page_fault+0x22/0x30
[ 8098.085672]  [<ffffffff8103918d>] ? __unwind_start+0x28/0x42
[ 8098.085672]  [<ffffffff81004721>] ? perf_callchain_kernel+0x75/0xac
[ 8098.085672]  [<ffffffff810efb0a>] ? get_perf_callchain+0x13a/0x1f0
[ 8098.085673]  [<ffffffff810efc2a>] ? perf_callchain+0x6a/0x6c
[ 8098.085673]  [<ffffffff810ecf5d>] ? perf_prepare_sample+0x71/0x2eb
[ 8098.085673]  [<ffffffff810ed1f1>] ? perf_event_output_forward+0x1a/0x54
[ 8098.085674]  [<ffffffff810313bc>] ? __default_send_IPI_shortcut+0x10/0x2d
[ 8098.085674]  [<ffffffff810eb247>] ? __perf_event_overflow+0xfb/0x167
[ 8098.085674]  [<ffffffff81004247>] ? x86_pmu_handle_irq+0x113/0x150
[ 8098.085675]  [<ffffffff81003116>] ? native_read_msr+0x6/0x34
[ 8098.085675]  [<ffffffff81002dee>] ? perf_event_nmi_handler+0x22/0x39
[ 8098.085675]  [<ffffffff81006621>] ? perf_ibs_nmi_handler+0x4a/0x51
[ 8098.085676]  [<ffffffff81002dee>] ? perf_event_nmi_handler+0x22/0x39
[ 8098.085676]  [<ffffffff81018493>] ? nmi_handle+0x4d/0xf0
[ 8098.085676]  [<ffffffff810065d7>] ? perf_ibs_handle_irq+0x3d1/0x3d1
[ 8098.085676]  [<ffffffff810186dd>] ? default_do_nmi+0x3c/0xd5
[ 8098.085677]  [<ffffffff81018808>] ? do_nmi+0x92/0x102
[ 8098.085677]  [<ffffffff8145e2a7>] ? end_repeat_nmi+0x1a/0x1e
[ 8098.085677]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
[ 8098.085678]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
[ 8098.085678]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
[ 8098.085678]  <EOE> ^A4---[ end trace 632723104d47d31a ]---
[ 8098.085679] BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
[ 8098.085679] kernel stack overflow (page fault): 0000 [#1] SMP
[ 8098.085683] CPU: 0 PID: 21338 Comm: perf_fuzzer Tainted: G        W       4.8.0+ #37
[ 8098.085683] Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
[ 8098.085684] task: ffff8802265d2080 task.stack: ffffc900084fc000
[ 8098.085684] RIP: 0010:[<ffffffff8103918d>] ^Ac [<ffffffff8103918d>] __unwind_start+0x28/0x42
[ 8098.085684] RSP: 0018:ffff88022ec05af0  EFLAGS: 00010006
[ 8098.085685] RAX: 00000000ffffffea RBX: ffff88022ec05b08 RCX: ffffc90008500000
[ 8098.085685] RDX: ffff88022ec00000 RSI: 0000000000001000 RDI: 000000000000c4d0
[ 8098.085685] RBP: ffffc90008500000 R08: ffff88022ec08000 R09: 0000000000000000
[ 8098.085686] R10: 0000000000000002 R11: 0000000000000206 R12: ffff88022ec05b70
[ 8098.085686] R13: ffff88022ec05ef8 R14: 0000000000000000 R15: 0000000000000001
[ 8098.085687] FS:  00007f06e791c700(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000
[ 8098.085687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8098.085687] CR2: ffffc90008500000 CR3: 0000000223c25000 CR4: 00000000000407f0
[ 8098.085688] DR0: 0000000000000000 DR1: 0000000000005fc8 DR2: 0000000000005fc8
[ 8098.085688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 8098.085689] Call Trace:
[ 8098.085690]  <NMI> ^Ad [<ffffffff81004721>] ? perf_callchain_kernel+0x75/0xac
[ 8098.085690]  [<ffffffff8126da5e>] ? vsnprintf+0x380/0x3b4
[ 8098.085690]  [<ffffffff8126db7e>] ? sprintf+0x42/0x4a
[ 8098.085691]  [<ffffffff810a8982>] ? __sprint_symbol+0x9d/0xd1
[ 8098.085691]  [<ffffffff8126be12>] ? symbol_string+0x51/0x5d
[ 8098.085691]  [<ffffffff810a8982>] ? __sprint_symbol+0x9d/0xd1
[ 8098.085692]  [<ffffffff8126be12>] ? symbol_string+0x51/0x5d
[ 8098.085692]  [<ffffffff8126d3ea>] ? pointer+0x85/0x379
[ 8098.085692]  [<ffffffff8126d75e>] ? vsnprintf+0x80/0x3b4
[ 8098.085692]  [<ffffffff810dbe53>] ? irq_work_queue+0xa/0x66
[ 8098.085693]  [<ffffffff81084db5>] ? vprintk_nmi+0x88/0x97
[ 8098.085693]  [<ffffffff81084db5>] ? vprintk_nmi+0x88/0x97
[ 8098.085693]  [<ffffffff810f19b1>] ? printk+0x43/0x4b
[ 8098.085694]  [<ffffffff810a4919>] ? __module_text_address+0x9/0x4f
[ 8098.085694]  [<ffffffff810a8346>] ? is_module_text_address+0x5/0xc
[ 8098.085694]  [<ffffffff81017eed>] ? show_trace_log_lvl+0x108/0x195
[ 8098.085694]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
[ 8098.085695]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
[ 8098.085695]  [<ffffffff8101784a>] ? show_stack_log_lvl+0x15b/0x172
[ 8098.085695]  [<ffffffff810178c5>] ? show_regs+0x64/0x136
[ 8098.085696]  [<ffffffff81017dad>] ? __die+0x8c/0xc4
[ 8098.085696]  [<ffffffff8101802b>] ? die+0x3d/0x56
[ 8098.085696]  [<ffffffff810163ed>] ? handle_stack_overflow+0x47/0x51
[ 8098.085697]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
[ 8098.085697]  <EOE> ^AdCode: ^A1BUG: unable to handle kernel ^AcNULL pointer dereference^Ac at 0000000000000008
[ 8098.085697] IP:^Ac [<0000000000000008>] 0x8
[ 8098.085698] PGD 2231d5067 PUD 225162067 PMD 0 
[ 8098.085698] Oops: 0010 [#2] SMP
[ 8098.085702] 
[ 8098.957250] ---[ end trace 632723104d47d31b ]---
[ 8098.957250] Kernel panic - not syncing: Fatal exception in interrupt
[ 8098.957301] Kernel Offset: disabled
[ 8098.973814] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[ 8098.981719] ------------[ cut here ]------------
[ 8098.981720] WARNING: CPU: 0 PID: 21338 at arch/x86/kernel/smp.c:127 update_process_times+0x3b/0x45

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf: perf_fuzzer triggers vmalloc_fault (then crashes)
  2016-10-22  3:05 perf: perf_fuzzer triggers vmalloc_fault (then crashes) Vince Weaver
@ 2016-10-24 10:18 ` Peter Zijlstra
  2016-10-24 11:14   ` Josh Poimboeuf
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-10-24 10:18 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andy Lutomirski, Josh Poimboeuf

On Fri, Oct 21, 2016 at 11:05:40PM -0400, Vince Weaver wrote:
> 
> This is on an AMD a10 system.  With paranoid=1.  Think it's 
> probably unrelated to the (unreseolved) AMD IBS issues.
> This is 4.9-rc0 just before rc1 (can't get actual rc1 to boot)
> 
> Machine locks hard after this.
> 
> [ 8098.085662] BAD LUCK: lost 42 message(s) from NMI context!
> [ 8098.085663] ------------[ cut here ]------------
> [ 8098.085664] WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
> [ 8098.085668] CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
> [ 8098.085668] Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
> [ 8098.085670] Call Trace:
> [ 8098.085670]  <NMI>  [<ffffffff81263c2f>] ? dump_stack+0x46/0x59
> [ 8098.085670]  [<ffffffff810499c4>] ? __warn+0xd5/0xee
> [ 8098.085671]  [<ffffffff8103b4de>] ? vmalloc_fault+0x58/0x1f0
> [ 8098.085671]  [<ffffffff8103c193>] ? __do_page_fault+0x6d/0x48e
> [ 8098.085671]  [<ffffffff810eaf5b>] ? perf_log_throttle+0xa4/0xf4
> [ 8098.085672]  [<ffffffff8145df72>] ? trace_page_fault+0x22/0x30
> [ 8098.085672]  [<ffffffff8103918d>] ? __unwind_start+0x28/0x42
> [ 8098.085672]  [<ffffffff81004721>] ? perf_callchain_kernel+0x75/0xac
> [ 8098.085672]  [<ffffffff810efb0a>] ? get_perf_callchain+0x13a/0x1f0
> [ 8098.085673]  [<ffffffff810efc2a>] ? perf_callchain+0x6a/0x6c
> [ 8098.085673]  [<ffffffff810ecf5d>] ? perf_prepare_sample+0x71/0x2eb
> [ 8098.085673]  [<ffffffff810ed1f1>] ? perf_event_output_forward+0x1a/0x54
> [ 8098.085674]  [<ffffffff810313bc>] ? __default_send_IPI_shortcut+0x10/0x2d
> [ 8098.085674]  [<ffffffff810eb247>] ? __perf_event_overflow+0xfb/0x167
> [ 8098.085674]  [<ffffffff81004247>] ? x86_pmu_handle_irq+0x113/0x150
> [ 8098.085675]  [<ffffffff81003116>] ? native_read_msr+0x6/0x34
> [ 8098.085675]  [<ffffffff81002dee>] ? perf_event_nmi_handler+0x22/0x39
> [ 8098.085675]  [<ffffffff81006621>] ? perf_ibs_nmi_handler+0x4a/0x51
> [ 8098.085676]  [<ffffffff81002dee>] ? perf_event_nmi_handler+0x22/0x39
> [ 8098.085676]  [<ffffffff81018493>] ? nmi_handle+0x4d/0xf0
> [ 8098.085676]  [<ffffffff810065d7>] ? perf_ibs_handle_irq+0x3d1/0x3d1
> [ 8098.085676]  [<ffffffff810186dd>] ? default_do_nmi+0x3c/0xd5
> [ 8098.085677]  [<ffffffff81018808>] ? do_nmi+0x92/0x102
> [ 8098.085677]  [<ffffffff8145e2a7>] ? end_repeat_nmi+0x1a/0x1e
> [ 8098.085677]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
> [ 8098.085678]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
> [ 8098.085678]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
> [ 8098.085678]  <EOE> ^A4---[ end trace 632723104d47d31a ]---

So we get an NMI based stack unwind (without frame pointers) overrun the
actual stack, and tickle the new guard page thing:

> [ 8098.085679] BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
> [ 8098.085679] kernel stack overflow (page fault): 0000 [#1] SMP
> [ 8098.085683] CPU: 0 PID: 21338 Comm: perf_fuzzer Tainted: G        W       4.8.0+ #37
> [ 8098.085683] Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
> [ 8098.085684] task: ffff8802265d2080 task.stack: ffffc900084fc000
> [ 8098.085684] RIP: 0010:[<ffffffff8103918d>] ^Ac [<ffffffff8103918d>] __unwind_start+0x28/0x42
> [ 8098.085684] RSP: 0018:ffff88022ec05af0  EFLAGS: 00010006
> [ 8098.085685] RAX: 00000000ffffffea RBX: ffff88022ec05b08 RCX: ffffc90008500000
> [ 8098.085685] RDX: ffff88022ec00000 RSI: 0000000000001000 RDI: 000000000000c4d0
> [ 8098.085685] RBP: ffffc90008500000 R08: ffff88022ec08000 R09: 0000000000000000
> [ 8098.085686] R10: 0000000000000002 R11: 0000000000000206 R12: ffff88022ec05b70
> [ 8098.085686] R13: ffff88022ec05ef8 R14: 0000000000000000 R15: 0000000000000001
> [ 8098.085687] FS:  00007f06e791c700(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000
> [ 8098.085687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8098.085687] CR2: ffffc90008500000 CR3: 0000000223c25000 CR4: 00000000000407f0
> [ 8098.085688] DR0: 0000000000000000 DR1: 0000000000005fc8 DR2: 0000000000005fc8
> [ 8098.085688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [ 8098.085689] Call Trace:
> [ 8098.085690]  <NMI> ^Ad [<ffffffff81004721>] ? perf_callchain_kernel+0x75/0xac
> [ 8098.085690]  [<ffffffff8126da5e>] ? vsnprintf+0x380/0x3b4
> [ 8098.085690]  [<ffffffff8126db7e>] ? sprintf+0x42/0x4a
> [ 8098.085691]  [<ffffffff810a8982>] ? __sprint_symbol+0x9d/0xd1
> [ 8098.085691]  [<ffffffff8126be12>] ? symbol_string+0x51/0x5d
> [ 8098.085691]  [<ffffffff810a8982>] ? __sprint_symbol+0x9d/0xd1
> [ 8098.085692]  [<ffffffff8126be12>] ? symbol_string+0x51/0x5d
> [ 8098.085692]  [<ffffffff8126d3ea>] ? pointer+0x85/0x379
> [ 8098.085692]  [<ffffffff8126d75e>] ? vsnprintf+0x80/0x3b4
> [ 8098.085692]  [<ffffffff810dbe53>] ? irq_work_queue+0xa/0x66
> [ 8098.085693]  [<ffffffff81084db5>] ? vprintk_nmi+0x88/0x97
> [ 8098.085693]  [<ffffffff81084db5>] ? vprintk_nmi+0x88/0x97
> [ 8098.085693]  [<ffffffff810f19b1>] ? printk+0x43/0x4b
> [ 8098.085694]  [<ffffffff810a4919>] ? __module_text_address+0x9/0x4f
> [ 8098.085694]  [<ffffffff810a8346>] ? is_module_text_address+0x5/0xc
> [ 8098.085694]  [<ffffffff81017eed>] ? show_trace_log_lvl+0x108/0x195
> [ 8098.085694]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
> [ 8098.085695]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
> [ 8098.085695]  [<ffffffff8101784a>] ? show_stack_log_lvl+0x15b/0x172
> [ 8098.085695]  [<ffffffff810178c5>] ? show_regs+0x64/0x136
> [ 8098.085696]  [<ffffffff81017dad>] ? __die+0x8c/0xc4
> [ 8098.085696]  [<ffffffff8101802b>] ? die+0x3d/0x56
> [ 8098.085696]  [<ffffffff810163ed>] ? handle_stack_overflow+0x47/0x51
> [ 8098.085697]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
> [ 8098.085697]  <EOE> ^AdCode: ^A1BUG: unable to handle kernel ^AcNULL pointer dereference^Ac at 0000000000000008
> [ 8098.085697] IP:^Ac [<0000000000000008>] 0x8
> [ 8098.085698] PGD 2231d5067 PUD 225162067 PMD 0 
> [ 8098.085698] Oops: 0010 [#2] SMP
> [ 8098.085702] 
> [ 8098.957250] ---[ end trace 632723104d47d31b ]---
> [ 8098.957250] Kernel panic - not syncing: Fatal exception in interrupt
> [ 8098.957301] Kernel Offset: disabled
> [ 8098.973814] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> [ 8098.981719] ------------[ cut here ]------------
> [ 8098.981720] WARNING: CPU: 0 PID: 21338 at arch/x86/kernel/smp.c:127 update_process_times+0x3b/0x45

And then the machine (understandably) goes off the rails entirely..

Josh, Andy, any clue on how I should go about fixing this?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf: perf_fuzzer triggers vmalloc_fault (then crashes)
  2016-10-24 10:18 ` Peter Zijlstra
@ 2016-10-24 11:14   ` Josh Poimboeuf
  2016-10-24 11:16     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Poimboeuf @ 2016-10-24 11:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Andy Lutomirski

On Mon, Oct 24, 2016 at 12:18:02PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 21, 2016 at 11:05:40PM -0400, Vince Weaver wrote:
> > 
> > This is on an AMD a10 system.  With paranoid=1.  Think it's 
> > probably unrelated to the (unreseolved) AMD IBS issues.
> > This is 4.9-rc0 just before rc1 (can't get actual rc1 to boot)
> > 
> > Machine locks hard after this.
> > 
> > [ 8098.085662] BAD LUCK: lost 42 message(s) from NMI context!
> > [ 8098.085663] ------------[ cut here ]------------
> > [ 8098.085664] WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
> > [ 8098.085668] CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
> > [ 8098.085668] Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
> > [ 8098.085670] Call Trace:
> > [ 8098.085670]  <NMI>  [<ffffffff81263c2f>] ? dump_stack+0x46/0x59
> > [ 8098.085670]  [<ffffffff810499c4>] ? __warn+0xd5/0xee
> > [ 8098.085671]  [<ffffffff8103b4de>] ? vmalloc_fault+0x58/0x1f0
> > [ 8098.085671]  [<ffffffff8103c193>] ? __do_page_fault+0x6d/0x48e
> > [ 8098.085671]  [<ffffffff810eaf5b>] ? perf_log_throttle+0xa4/0xf4
> > [ 8098.085672]  [<ffffffff8145df72>] ? trace_page_fault+0x22/0x30
> > [ 8098.085672]  [<ffffffff8103918d>] ? __unwind_start+0x28/0x42
> > [ 8098.085672]  [<ffffffff81004721>] ? perf_callchain_kernel+0x75/0xac
> > [ 8098.085672]  [<ffffffff810efb0a>] ? get_perf_callchain+0x13a/0x1f0
> > [ 8098.085673]  [<ffffffff810efc2a>] ? perf_callchain+0x6a/0x6c
> > [ 8098.085673]  [<ffffffff810ecf5d>] ? perf_prepare_sample+0x71/0x2eb
> > [ 8098.085673]  [<ffffffff810ed1f1>] ? perf_event_output_forward+0x1a/0x54
> > [ 8098.085674]  [<ffffffff810313bc>] ? __default_send_IPI_shortcut+0x10/0x2d
> > [ 8098.085674]  [<ffffffff810eb247>] ? __perf_event_overflow+0xfb/0x167
> > [ 8098.085674]  [<ffffffff81004247>] ? x86_pmu_handle_irq+0x113/0x150
> > [ 8098.085675]  [<ffffffff81003116>] ? native_read_msr+0x6/0x34
> > [ 8098.085675]  [<ffffffff81002dee>] ? perf_event_nmi_handler+0x22/0x39
> > [ 8098.085675]  [<ffffffff81006621>] ? perf_ibs_nmi_handler+0x4a/0x51
> > [ 8098.085676]  [<ffffffff81002dee>] ? perf_event_nmi_handler+0x22/0x39
> > [ 8098.085676]  [<ffffffff81018493>] ? nmi_handle+0x4d/0xf0
> > [ 8098.085676]  [<ffffffff810065d7>] ? perf_ibs_handle_irq+0x3d1/0x3d1
> > [ 8098.085676]  [<ffffffff810186dd>] ? default_do_nmi+0x3c/0xd5
> > [ 8098.085677]  [<ffffffff81018808>] ? do_nmi+0x92/0x102
> > [ 8098.085677]  [<ffffffff8145e2a7>] ? end_repeat_nmi+0x1a/0x1e
> > [ 8098.085677]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
> > [ 8098.085678]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
> > [ 8098.085678]  [<ffffffff8145c2d5>] ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
> > [ 8098.085678]  <EOE> ^A4---[ end trace 632723104d47d31a ]---
> 
> So we get an NMI based stack unwind (without frame pointers) overrun the
> actual stack, and tickle the new guard page thing:
> 
> > [ 8098.085679] BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
> > [ 8098.085679] kernel stack overflow (page fault): 0000 [#1] SMP
> > [ 8098.085683] CPU: 0 PID: 21338 Comm: perf_fuzzer Tainted: G        W       4.8.0+ #37
> > [ 8098.085683] Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
> > [ 8098.085684] task: ffff8802265d2080 task.stack: ffffc900084fc000
> > [ 8098.085684] RIP: 0010:[<ffffffff8103918d>] ^Ac [<ffffffff8103918d>] __unwind_start+0x28/0x42
> > [ 8098.085684] RSP: 0018:ffff88022ec05af0  EFLAGS: 00010006
> > [ 8098.085685] RAX: 00000000ffffffea RBX: ffff88022ec05b08 RCX: ffffc90008500000
> > [ 8098.085685] RDX: ffff88022ec00000 RSI: 0000000000001000 RDI: 000000000000c4d0
> > [ 8098.085685] RBP: ffffc90008500000 R08: ffff88022ec08000 R09: 0000000000000000
> > [ 8098.085686] R10: 0000000000000002 R11: 0000000000000206 R12: ffff88022ec05b70
> > [ 8098.085686] R13: ffff88022ec05ef8 R14: 0000000000000000 R15: 0000000000000001
> > [ 8098.085687] FS:  00007f06e791c700(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000
> > [ 8098.085687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 8098.085687] CR2: ffffc90008500000 CR3: 0000000223c25000 CR4: 00000000000407f0
> > [ 8098.085688] DR0: 0000000000000000 DR1: 0000000000005fc8 DR2: 0000000000005fc8
> > [ 8098.085688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> > [ 8098.085689] Call Trace:
> > [ 8098.085690]  <NMI> ^Ad [<ffffffff81004721>] ? perf_callchain_kernel+0x75/0xac
> > [ 8098.085690]  [<ffffffff8126da5e>] ? vsnprintf+0x380/0x3b4
> > [ 8098.085690]  [<ffffffff8126db7e>] ? sprintf+0x42/0x4a
> > [ 8098.085691]  [<ffffffff810a8982>] ? __sprint_symbol+0x9d/0xd1
> > [ 8098.085691]  [<ffffffff8126be12>] ? symbol_string+0x51/0x5d
> > [ 8098.085691]  [<ffffffff810a8982>] ? __sprint_symbol+0x9d/0xd1
> > [ 8098.085692]  [<ffffffff8126be12>] ? symbol_string+0x51/0x5d
> > [ 8098.085692]  [<ffffffff8126d3ea>] ? pointer+0x85/0x379
> > [ 8098.085692]  [<ffffffff8126d75e>] ? vsnprintf+0x80/0x3b4
> > [ 8098.085692]  [<ffffffff810dbe53>] ? irq_work_queue+0xa/0x66
> > [ 8098.085693]  [<ffffffff81084db5>] ? vprintk_nmi+0x88/0x97
> > [ 8098.085693]  [<ffffffff81084db5>] ? vprintk_nmi+0x88/0x97
> > [ 8098.085693]  [<ffffffff810f19b1>] ? printk+0x43/0x4b
> > [ 8098.085694]  [<ffffffff810a4919>] ? __module_text_address+0x9/0x4f
> > [ 8098.085694]  [<ffffffff810a8346>] ? is_module_text_address+0x5/0xc
> > [ 8098.085694]  [<ffffffff81017eed>] ? show_trace_log_lvl+0x108/0x195
> > [ 8098.085694]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
> > [ 8098.085695]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
> > [ 8098.085695]  [<ffffffff8101784a>] ? show_stack_log_lvl+0x15b/0x172
> > [ 8098.085695]  [<ffffffff810178c5>] ? show_regs+0x64/0x136
> > [ 8098.085696]  [<ffffffff81017dad>] ? __die+0x8c/0xc4
> > [ 8098.085696]  [<ffffffff8101802b>] ? die+0x3d/0x56
> > [ 8098.085696]  [<ffffffff810163ed>] ? handle_stack_overflow+0x47/0x51
> > [ 8098.085697]  [<ffffffff8103bae3>] ? no_context+0x102/0x36c
> > [ 8098.085697]  <EOE> ^AdCode: ^A1BUG: unable to handle kernel ^AcNULL pointer dereference^Ac at 0000000000000008
> > [ 8098.085697] IP:^Ac [<0000000000000008>] 0x8
> > [ 8098.085698] PGD 2231d5067 PUD 225162067 PMD 0 
> > [ 8098.085698] Oops: 0010 [#2] SMP
> > [ 8098.085702] 
> > [ 8098.957250] ---[ end trace 632723104d47d31b ]---
> > [ 8098.957250] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 8098.957301] Kernel Offset: disabled
> > [ 8098.973814] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> > [ 8098.981719] ------------[ cut here ]------------
> > [ 8098.981720] WARNING: CPU: 0 PID: 21338 at arch/x86/kernel/smp.c:127 update_process_times+0x3b/0x45
> 
> And then the machine (understandably) goes off the rails entirely..
> 
> Josh, Andy, any clue on how I should go about fixing this?

This is a bug in the unwinder.  The NMI hit in the entry code right
after setting up the stack pointer from cpu_current_top_of_stack, so the
kernel stack was empty.  __unwind_start() tried to dereference the
pointer (0xffffc90008500000) at the top of the stack.  I'll make a
patch.

-- 
Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf: perf_fuzzer triggers vmalloc_fault (then crashes)
  2016-10-24 11:14   ` Josh Poimboeuf
@ 2016-10-24 11:16     ` Peter Zijlstra
  2016-10-24 13:31       ` [PATCH] x86/unwind: fix empty stack dereference in guess unwinder Josh Poimboeuf
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-10-24 11:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Andy Lutomirski

On Mon, Oct 24, 2016 at 06:14:02AM -0500, Josh Poimboeuf wrote:

> > Josh, Andy, any clue on how I should go about fixing this?
> 
> This is a bug in the unwinder.  The NMI hit in the entry code right
> after setting up the stack pointer from cpu_current_top_of_stack, so the
> kernel stack was empty.  __unwind_start() tried to dereference the
> pointer (0xffffc90008500000) at the top of the stack.  I'll make a
> patch.

Great, thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] x86/unwind: fix empty stack dereference in guess unwinder
  2016-10-24 11:16     ` Peter Zijlstra
@ 2016-10-24 13:31       ` Josh Poimboeuf
  2016-10-25 10:30         ` [tip:x86/urgent] x86/unwind: Fix " tip-bot for Josh Poimboeuf
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Poimboeuf @ 2016-10-24 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Andy Lutomirski, x86


Vince reported the following bug:

  WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
  CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
  Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
  Call Trace:
   <NMI>  ? dump_stack+0x46/0x59
   ? __warn+0xd5/0xee
   ? vmalloc_fault+0x58/0x1f0
   ? __do_page_fault+0x6d/0x48e
   ? perf_log_throttle+0xa4/0xf4
   ? trace_page_fault+0x22/0x30
   ? __unwind_start+0x28/0x42
   ? perf_callchain_kernel+0x75/0xac
   ? get_perf_callchain+0x13a/0x1f0
   ? perf_callchain+0x6a/0x6c
   ? perf_prepare_sample+0x71/0x2eb
   ? perf_event_output_forward+0x1a/0x54
   ? __default_send_IPI_shortcut+0x10/0x2d
   ? __perf_event_overflow+0xfb/0x167
   ? x86_pmu_handle_irq+0x113/0x150
   ? native_read_msr+0x6/0x34
   ? perf_event_nmi_handler+0x22/0x39
   ? perf_ibs_nmi_handler+0x4a/0x51
   ? perf_event_nmi_handler+0x22/0x39
   ? nmi_handle+0x4d/0xf0
   ? perf_ibs_handle_irq+0x3d1/0x3d1
   ? default_do_nmi+0x3c/0xd5
   ? do_nmi+0x92/0x102
   ? end_repeat_nmi+0x1a/0x1e
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   <EOE> ^A4---[ end trace 632723104d47d31a ]---
  BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
  kernel stack overflow (page fault): 0000 [#1] SMP
  ...

The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty.  The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.

Add a check in the guess unwinder to deal with an empty stack.  (The
frame pointer unwinder already has such a check.)

Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_guess.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
index 9298993..2d721e5 100644
--- a/arch/x86/kernel/unwind_guess.c
+++ b/arch/x86/kernel/unwind_guess.c
@@ -47,7 +47,14 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	get_stack_info(first_frame, state->task, &state->stack_info,
 		       &state->stack_mask);
 
-	if (!__kernel_text_address(*first_frame))
+	/*
+	 * The caller can provide the address of the first frame directly
+	 * (first_frame) or indirectly (regs->sp) to indicate which stack frame
+	 * to start unwinding at.  Skip ahead until we reach it.
+	 */
+	if (!unwind_done(state) &&
+	    (!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
+	    !__kernel_text_address(*first_frame)))
 		unwind_next_frame(state);
 }
 EXPORT_SYMBOL_GPL(__unwind_start);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:x86/urgent] x86/unwind: Fix empty stack dereference in guess unwinder
  2016-10-24 13:31       ` [PATCH] x86/unwind: fix empty stack dereference in guess unwinder Josh Poimboeuf
@ 2016-10-25 10:30         ` tip-bot for Josh Poimboeuf
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-10-25 10:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, acme, hpa, tglx, peterz, linux-kernel, vincent.weaver,
	jpoimboe, luto, mingo

Commit-ID:  7fbe6ac02485504b964b283aca62b36b4313ca79
Gitweb:     http://git.kernel.org/tip/7fbe6ac02485504b964b283aca62b36b4313ca79
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Mon, 24 Oct 2016 08:31:27 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 25 Oct 2016 11:36:43 +0200

x86/unwind: Fix empty stack dereference in guess unwinder

Vince Waver reported the following bug:

  WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
  CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
  Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
  Call Trace:
   <NMI>  ? dump_stack+0x46/0x59
   ? __warn+0xd5/0xee
   ? vmalloc_fault+0x58/0x1f0
   ? __do_page_fault+0x6d/0x48e
   ? perf_log_throttle+0xa4/0xf4
   ? trace_page_fault+0x22/0x30
   ? __unwind_start+0x28/0x42
   ? perf_callchain_kernel+0x75/0xac
   ? get_perf_callchain+0x13a/0x1f0
   ? perf_callchain+0x6a/0x6c
   ? perf_prepare_sample+0x71/0x2eb
   ? perf_event_output_forward+0x1a/0x54
   ? __default_send_IPI_shortcut+0x10/0x2d
   ? __perf_event_overflow+0xfb/0x167
   ? x86_pmu_handle_irq+0x113/0x150
   ? native_read_msr+0x6/0x34
   ? perf_event_nmi_handler+0x22/0x39
   ? perf_ibs_nmi_handler+0x4a/0x51
   ? perf_event_nmi_handler+0x22/0x39
   ? nmi_handle+0x4d/0xf0
   ? perf_ibs_handle_irq+0x3d1/0x3d1
   ? default_do_nmi+0x3c/0xd5
   ? do_nmi+0x92/0x102
   ? end_repeat_nmi+0x1a/0x1e
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   <EOE> ^A4---[ end trace 632723104d47d31a ]---
  BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
  kernel stack overflow (page fault): 0000 [#1] SMP
  ...

The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty.  The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.

Add a check in the guess unwinder to deal with an empty stack.  (The
frame pointer unwinder already has such a check.)

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/unwind_guess.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
index 9298993..2d721e5 100644
--- a/arch/x86/kernel/unwind_guess.c
+++ b/arch/x86/kernel/unwind_guess.c
@@ -47,7 +47,14 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	get_stack_info(first_frame, state->task, &state->stack_info,
 		       &state->stack_mask);
 
-	if (!__kernel_text_address(*first_frame))
+	/*
+	 * The caller can provide the address of the first frame directly
+	 * (first_frame) or indirectly (regs->sp) to indicate which stack frame
+	 * to start unwinding at.  Skip ahead until we reach it.
+	 */
+	if (!unwind_done(state) &&
+	    (!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
+	    !__kernel_text_address(*first_frame)))
 		unwind_next_frame(state);
 }
 EXPORT_SYMBOL_GPL(__unwind_start);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-25 10:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-22  3:05 perf: perf_fuzzer triggers vmalloc_fault (then crashes) Vince Weaver
2016-10-24 10:18 ` Peter Zijlstra
2016-10-24 11:14   ` Josh Poimboeuf
2016-10-24 11:16     ` Peter Zijlstra
2016-10-24 13:31       ` [PATCH] x86/unwind: fix empty stack dereference in guess unwinder Josh Poimboeuf
2016-10-25 10:30         ` [tip:x86/urgent] x86/unwind: Fix " tip-bot for Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.