linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SIGSEGV when using "perf record -g" with 3.13-rc* kernel
@ 2014-01-10 15:29 Waiman Long
  2014-01-10 16:58 ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2014-01-10 15:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Aswin Chandramouleeswaran, Scott J Norton

Peter,

I recently encountered a strange problem using 3.13-rc* kernel that
did not happen in a 3.12 kernel. When I ran the high_systime benchmark
of the AIM7 test suite, I saw the following errors:

   Child terminated by signal #11

   core dumped

   Child terminated by signal #11

   Child process called exit(), status = 139

   core dumped

   Child terminated by signal #11

This only happened when I monitored the running of the benchmark using
"perf record -g". There was no problem if callchain was not enabled.
Adding debug code to the kernel showed the following stack trace:

Call Trace:
<NMI>  [<ffffffff815710af>] dump_stack+0x49/0x62
  [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
  [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
  [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
  [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
  [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
  [<ffffffff81042cc9>] no_context+0x159/0x1f0
  [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
  [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
  [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
  [<ffffffff815791be>] do_page_fault+0xe/0x10
  [<ffffffff81575962>] page_fault+0x22/0x30
  [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
  [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
  [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
  [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
  [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
  [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
  [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
  [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
  [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
  [<ffffffff8157664a>] nmi_handle+0x8a/0x170
  [<ffffffff81576848>] default_do_nmi+0x68/0x210
  [<ffffffff81576a80>] do_nmi+0x90/0xe0
  [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
<<EOE>>  [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
  [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
  [<ffffffff815791be>] do_page_fault+0xe/0x10
  [<ffffffff81575962>] page_fault+0x22/0x30
---[ end trace 037bf09d279751ec ]---

So this is a double page faults. Looking at relevant changes in
3.13 kernel, I spotted the following one patch that modified the
perf_callchain_user() function shown up in the stack trace above:

     perf: Fix arch_perf_out_copy_user default

@@ -2041,7 +2041,7 @@ perf_callchain_user(struct perf_callchain_entry 
*entry, struct pt_regs *regs)
          frame.return_address = 0;

          bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
-        if (bytes != sizeof(frame))
+        if (bytes != 0)
              break;

          if (!valid_user_frame(fp, sizeof(frame)))

I wondered if it was the cause of the SIGSEGV. Please let me know your
thought on that.

-Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 15:29 SIGSEGV when using "perf record -g" with 3.13-rc* kernel Waiman Long
@ 2014-01-10 16:58 ` Peter Zijlstra
  2014-01-10 17:02   ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2014-01-10 16:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Aswin Chandramouleeswaran, Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
> Peter,
> 
> Call Trace:
> <NMI>  [<ffffffff815710af>] dump_stack+0x49/0x62
>  [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
>  [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
>  [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
>  [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
>  [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
>  [<ffffffff81042cc9>] no_context+0x159/0x1f0
>  [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
>  [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
>  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>  [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
>  [<ffffffff815791be>] do_page_fault+0xe/0x10
>  [<ffffffff81575962>] page_fault+0x22/0x30
>  [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
>  [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
>  [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
>  [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
>  [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
>  [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
>  [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
>  [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
>  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>  [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
>  [<ffffffff8157664a>] nmi_handle+0x8a/0x170
>  [<ffffffff81576848>] default_do_nmi+0x68/0x210
>  [<ffffffff81576a80>] do_nmi+0x90/0xe0
>  [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
>  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> <<EOE>>  [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
>  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>  [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
>  [<ffffffff815791be>] do_page_fault+0xe/0x10
>  [<ffffffff81575962>] page_fault+0x22/0x30
> ---[ end trace 037bf09d279751ec ]---
> 
> So this is a double page faults. Looking at relevant changes in
> 3.13 kernel, I spotted the following one patch that modified the
> perf_callchain_user() function shown up in the stack trace above:
> 

Hurm, that's an expected double fault, not something we should take the
process down for.

I'll have to look at how all that works for a bit.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 16:58 ` Peter Zijlstra
@ 2014-01-10 17:02   ` Peter Zijlstra
  2014-01-10 17:41     ` Peter Zijlstra
  2014-01-10 19:37     ` SIGSEGV when using "perf record -g" with 3.13-rc* kernel Waiman Long
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2014-01-10 17:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Aswin Chandramouleeswaran, Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
> > Peter,
> > 
> > Call Trace:
> > <NMI>  [<ffffffff815710af>] dump_stack+0x49/0x62
> >  [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
> >  [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
> >  [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
> >  [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
> >  [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
> >  [<ffffffff81042cc9>] no_context+0x159/0x1f0
> >  [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
> >  [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
> >  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
> >  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
> >  [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
> >  [<ffffffff815791be>] do_page_fault+0xe/0x10
> >  [<ffffffff81575962>] page_fault+0x22/0x30
> >  [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
> >  [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
> >  [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
> >  [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
> >  [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
> >  [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
> >  [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
> >  [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
> >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> >  [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
> >  [<ffffffff8157664a>] nmi_handle+0x8a/0x170
> >  [<ffffffff81576848>] default_do_nmi+0x68/0x210
> >  [<ffffffff81576a80>] do_nmi+0x90/0xe0
> >  [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
> >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> > <<EOE>>  [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
> >  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
> >  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
> >  [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
> >  [<ffffffff815791be>] do_page_fault+0xe/0x10
> >  [<ffffffff81575962>] page_fault+0x22/0x30
> > ---[ end trace 037bf09d279751ec ]---
> > 
> > So this is a double page faults. Looking at relevant changes in
> > 3.13 kernel, I spotted the following one patch that modified the
> > perf_callchain_user() function shown up in the stack trace above:
> > 
> 
> Hurm, that's an expected double fault, not something we should take the
> process down for.
> 
> I'll have to look at how all that works for a bit.

How easily can you reproduce this? Could you test something like the
below, which would allow us to take double faults from NMI context.

---
 arch/x86/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9ff85bb8dd69..18c498d4274d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 
 	/* Are we prepared to handle this kernel fault? */
 	if (fixup_exception(regs)) {
-		if (current_thread_info()->sig_on_uaccess_error && signal) {
+		if (!in_nmi() && current_thread_info()->sig_on_uaccess_error && signal) {
 			tsk->thread.trap_nr = X86_TRAP_PF;
 			tsk->thread.error_code = error_code | PF_USER;
 			tsk->thread.cr2 = address;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 17:02   ` Peter Zijlstra
@ 2014-01-10 17:41     ` Peter Zijlstra
  2014-01-10 18:54       ` Andy Lutomirski
  2014-01-10 19:37     ` SIGSEGV when using "perf record -g" with 3.13-rc* kernel Waiman Long
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2014-01-10 17:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Aswin Chandramouleeswaran, Scott J Norton, Linus Torvalds,
	Andy Lutomirski

On Fri, Jan 10, 2014 at 06:02:23PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
> > > Peter,
> > > 
> > > Call Trace:
> > > <NMI>  [<ffffffff815710af>] dump_stack+0x49/0x62
> > >  [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
> > >  [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
> > >  [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
> > >  [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
> > >  [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
> > >  [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
> > >  [<ffffffff81042cc9>] no_context+0x159/0x1f0
> > >  [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
> > >  [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
> > >  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
> > >  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
> > >  [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
> > >  [<ffffffff815791be>] do_page_fault+0xe/0x10
> > >  [<ffffffff81575962>] page_fault+0x22/0x30
> > >  [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
> > >  [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
> > >  [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
> > >  [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
> > >  [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
> > >  [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
> > >  [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
> > >  [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> > >  [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
> > >  [<ffffffff8157664a>] nmi_handle+0x8a/0x170
> > >  [<ffffffff81576848>] default_do_nmi+0x68/0x210
> > >  [<ffffffff81576a80>] do_nmi+0x90/0xe0
> > >  [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
> > > <<EOE>>  [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
> > >  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
> > >  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
> > >  [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
> > >  [<ffffffff815791be>] do_page_fault+0xe/0x10
> > >  [<ffffffff81575962>] page_fault+0x22/0x30
> > > ---[ end trace 037bf09d279751ec ]---
> > > 
> > > So this is a double page faults. Looking at relevant changes in
> > > 3.13 kernel, I spotted the following one patch that modified the
> > > perf_callchain_user() function shown up in the stack trace above:
> > > 
> > 
> > Hurm, that's an expected double fault, not something we should take the
> > process down for.
> > 
> > I'll have to look at how all that works for a bit.

Andy, introduced all this in 4fc3490114bb ("x86-64: Set siginfo and
context on vsyscall emulation faults").

It looks like your initial userspace fault hit the magic button and ends
up in emulate_vsyscall. Right at that point we trigger a PMI, which
tries to do a stack-trace. That stack-trace also stumbles into unmapped
memory (might be the same) and faults again.

Now at that point, we usually just give up on the callchain and proceed
like normal, however because of this double fault emulate-vsyscall
SIGSEGV magic you loose.

So the below might well be a valid fix.. Anybody? Andy?

> How easily can you reproduce this? Could you test something like the
> below, which would allow us to take double faults from NMI context.
> 
> ---
>  arch/x86/mm/fault.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9ff85bb8dd69..18c498d4274d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>  
>  	/* Are we prepared to handle this kernel fault? */
>  	if (fixup_exception(regs)) {
> -		if (current_thread_info()->sig_on_uaccess_error && signal) {
> +		if (!in_nmi() && current_thread_info()->sig_on_uaccess_error && signal) {
>  			tsk->thread.trap_nr = X86_TRAP_PF;
>  			tsk->thread.error_code = error_code | PF_USER;
>  			tsk->thread.cr2 = address;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 17:41     ` Peter Zijlstra
@ 2014-01-10 18:54       ` Andy Lutomirski
  2014-01-10 19:43         ` Waiman Long
  2014-01-10 20:06         ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Andy Lutomirski @ 2014-01-10 18:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 9:41 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jan 10, 2014 at 06:02:23PM +0100, Peter Zijlstra wrote:
>> On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote:
>> > On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
>> > > Peter,
>> > >
>> > > Call Trace:
>> > > <NMI>  [<ffffffff815710af>] dump_stack+0x49/0x62
>> > >  [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
>> > >  [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
>> > >  [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
>> > >  [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
>> > >  [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
>> > >  [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
>> > >  [<ffffffff81042cc9>] no_context+0x159/0x1f0
>> > >  [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
>> > >  [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
>> > >  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>> > >  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>> > >  [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
>> > >  [<ffffffff815791be>] do_page_fault+0xe/0x10
>> > >  [<ffffffff81575962>] page_fault+0x22/0x30
>> > >  [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
>> > >  [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
>> > >  [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
>> > >  [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
>> > >  [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
>> > >  [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
>> > >  [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
>> > >  [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
>> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>> > >  [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
>> > >  [<ffffffff8157664a>] nmi_handle+0x8a/0x170
>> > >  [<ffffffff81576848>] default_do_nmi+0x68/0x210
>> > >  [<ffffffff81576a80>] do_nmi+0x90/0xe0
>> > >  [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
>> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>> > >  [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>> > > <<EOE>>  [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
>> > >  [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>> > >  [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>> > >  [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
>> > >  [<ffffffff815791be>] do_page_fault+0xe/0x10
>> > >  [<ffffffff81575962>] page_fault+0x22/0x30
>> > > ---[ end trace 037bf09d279751ec ]---
>> > >
>> > > So this is a double page faults. Looking at relevant changes in
>> > > 3.13 kernel, I spotted the following one patch that modified the
>> > > perf_callchain_user() function shown up in the stack trace above:
>> > >
>> >
>> > Hurm, that's an expected double fault, not something we should take the
>> > process down for.
>> >
>> > I'll have to look at how all that works for a bit.
>
> Andy, introduced all this in 4fc3490114bb ("x86-64: Set siginfo and
> context on vsyscall emulation faults").
>
> It looks like your initial userspace fault hit the magic button and ends
> up in emulate_vsyscall. Right at that point we trigger a PMI, which
> tries to do a stack-trace. That stack-trace also stumbles into unmapped
> memory (might be the same) and faults again.
>
> Now at that point, we usually just give up on the callchain and proceed
> like normal, however because of this double fault emulate-vsyscall
> SIGSEGV magic you loose.
>
> So the below might well be a valid fix.. Anybody? Andy?

Yuck -- when I wrote that thing, I hadn't imagined that an interrupt
(there's nothing particularly special about NMIs here, I think) would
try to access user memory.  The fix below looks okay, but IMO it needs
a big fat comment explaining what's going on.

Is there a way to ask whether the previous entry into the kernel came
from user space?  The valid "sig_on_uaccess_error" case happens when
the current fault was triggered by a fault from userspace.  The
invalid case (and any invalid case from, say, an int3 that a
tracepoint stuck in there) would be a page fault triggered by a fault
handler that in turn started in kernel space (in particular, in
emulate_vsyscall).

For that matter, why does current_thread_info() work from an NMI at
all?  Does the NMI vector not have its own stack?  The call that
installs it is set_intr_gate_ist(X86_TRAP_NMI, &nmi, NMI_STACK).

In any case, this at least needs a comment.  I don't see why this same
bug couldn't be triggered by non-NMI based tracing mechanisms, though.

Sigh, corner cases of corner cases...

>
>> How easily can you reproduce this? Could you test something like the
>> below, which would allow us to take double faults from NMI context.
>>
>> ---
>>  arch/x86/mm/fault.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index 9ff85bb8dd69..18c498d4274d 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>>
>>       /* Are we prepared to handle this kernel fault? */
>>       if (fixup_exception(regs)) {
>> -             if (current_thread_info()->sig_on_uaccess_error && signal) {
>> +             if (!in_nmi() && current_thread_info()->sig_on_uaccess_error && signal) {
>>                       tsk->thread.trap_nr = X86_TRAP_PF;
>>                       tsk->thread.error_code = error_code | PF_USER;
>>                       tsk->thread.cr2 = address;


--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 17:02   ` Peter Zijlstra
  2014-01-10 17:41     ` Peter Zijlstra
@ 2014-01-10 19:37     ` Waiman Long
  2014-01-10 20:10       ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Waiman Long @ 2014-01-10 19:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Aswin Chandramouleeswaran, Scott J Norton, Linus Torvalds

On 01/10/2014 12:02 PM, Peter Zijlstra wrote:
> On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote:
>> On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
>>> Peter,
>>>
>>> Call Trace:
>>> <NMI>   [<ffffffff815710af>] dump_stack+0x49/0x62
>>>   [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
>>>   [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
>>>   [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
>>>   [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
>>>   [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
>>>   [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
>>>   [<ffffffff81042cc9>] no_context+0x159/0x1f0
>>>   [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
>>>   [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
>>>   [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>>>   [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>>>   [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
>>>   [<ffffffff815791be>] do_page_fault+0xe/0x10
>>>   [<ffffffff81575962>] page_fault+0x22/0x30
>>>   [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
>>>   [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
>>>   [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
>>>   [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
>>>   [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
>>>   [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
>>>   [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
>>>   [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>   [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
>>>   [<ffffffff8157664a>] nmi_handle+0x8a/0x170
>>>   [<ffffffff81576848>] default_do_nmi+0x68/0x210
>>>   [<ffffffff81576a80>] do_nmi+0x90/0xe0
>>>   [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>> <<EOE>>   [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
>>>   [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>>>   [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>>>   [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
>>>   [<ffffffff815791be>] do_page_fault+0xe/0x10
>>>   [<ffffffff81575962>] page_fault+0x22/0x30
>>> ---[ end trace 037bf09d279751ec ]---
>>>
>>> So this is a double page faults. Looking at relevant changes in
>>> 3.13 kernel, I spotted the following one patch that modified the
>>> perf_callchain_user() function shown up in the stack trace above:
>>>
>> Hurm, that's an expected double fault, not something we should take the
>> process down for.
>>
>> I'll have to look at how all that works for a bit.
> How easily can you reproduce this? Could you test something like the
> below, which would allow us to take double faults from NMI context.

The error can be readily reproducible in my current setup.

> ---
>   arch/x86/mm/fault.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9ff85bb8dd69..18c498d4274d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>
>   	/* Are we prepared to handle this kernel fault? */
>   	if (fixup_exception(regs)) {
> -		if (current_thread_info()->sig_on_uaccess_error&&  signal) {
> +		if (!in_nmi()&&  current_thread_info()->sig_on_uaccess_error&&  signal) {
>   			tsk->thread.trap_nr = X86_TRAP_PF;
>   			tsk->thread.error_code = error_code | PF_USER;
>   			tsk->thread.cr2 = address;

Yes, this change fixed the error that I got. I no longer see SIGSEGV 
when I run the test.

I did tried to back out your "perf: Fix arch_perf_out_copy_user default" 
patch, but it didn't fix the problem.

Thank for the quick turnaround!

-Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 18:54       ` Andy Lutomirski
@ 2014-01-10 19:43         ` Waiman Long
  2014-01-10 19:56           ` Andy Lutomirski
  2014-01-10 20:06         ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Waiman Long @ 2014-01-10 19:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On 01/10/2014 01:54 PM, Andy Lutomirski wrote:
> On Fri, Jan 10, 2014 at 9:41 AM, Peter Zijlstra<peterz@infradead.org>  wrote:
>> On Fri, Jan 10, 2014 at 06:02:23PM +0100, Peter Zijlstra wrote:
>>> On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote:
>>>> On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
>>>>> Peter,
>>>>>
>>>>> Call Trace:
>>>>> <NMI>   [<ffffffff815710af>] dump_stack+0x49/0x62
>>>>>   [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
>>>>>   [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
>>>>>   [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
>>>>>   [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
>>>>>   [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
>>>>>   [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
>>>>>   [<ffffffff81042cc9>] no_context+0x159/0x1f0
>>>>>   [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
>>>>>   [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
>>>>>   [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>>>>>   [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>>>>>   [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
>>>>>   [<ffffffff815791be>] do_page_fault+0xe/0x10
>>>>>   [<ffffffff81575962>] page_fault+0x22/0x30
>>>>>   [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
>>>>>   [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
>>>>>   [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
>>>>>   [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
>>>>>   [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
>>>>>   [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
>>>>>   [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
>>>>>   [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>   [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
>>>>>   [<ffffffff8157664a>] nmi_handle+0x8a/0x170
>>>>>   [<ffffffff81576848>] default_do_nmi+0x68/0x210
>>>>>   [<ffffffff81576a80>] do_nmi+0x90/0xe0
>>>>>   [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>> <<EOE>>   [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
>>>>>   [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>>>>>   [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>>>>>   [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
>>>>>   [<ffffffff815791be>] do_page_fault+0xe/0x10
>>>>>   [<ffffffff81575962>] page_fault+0x22/0x30
>>>>> ---[ end trace 037bf09d279751ec ]---
>>>>>
>>>>> So this is a double page faults. Looking at relevant changes in
>>>>> 3.13 kernel, I spotted the following one patch that modified the
>>>>> perf_callchain_user() function shown up in the stack trace above:
>>>>>
>>>> Hurm, that's an expected double fault, not something we should take the
>>>> process down for.
>>>>
>>>> I'll have to look at how all that works for a bit.
>> Andy, introduced all this in 4fc3490114bb ("x86-64: Set siginfo and
>> context on vsyscall emulation faults").
>>
>> It looks like your initial userspace fault hit the magic button and ends
>> up in emulate_vsyscall. Right at that point we trigger a PMI, which
>> tries to do a stack-trace. That stack-trace also stumbles into unmapped
>> memory (might be the same) and faults again.
>>
>> Now at that point, we usually just give up on the callchain and proceed
>> like normal, however because of this double fault emulate-vsyscall
>> SIGSEGV magic you loose.
>>
>> So the below might well be a valid fix.. Anybody? Andy?
> Yuck -- when I wrote that thing, I hadn't imagined that an interrupt
> (there's nothing particularly special about NMIs here, I think) would
> try to access user memory.  The fix below looks okay, but IMO it needs
> a big fat comment explaining what's going on.
>
> Is there a way to ask whether the previous entry into the kernel came
> from user space?  The valid "sig_on_uaccess_error" case happens when
> the current fault was triggered by a fault from userspace.  The
> invalid case (and any invalid case from, say, an int3 that a
> tracepoint stuck in there) would be a page fault triggered by a fault
> handler that in turn started in kernel space (in particular, in
> emulate_vsyscall).

The processes that got the SIGSEGV were all running shell scripts. I am 
not totally sure that they were running in user space when getting the 
PMIs, but are likely the case.

-Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 19:43         ` Waiman Long
@ 2014-01-10 19:56           ` Andy Lutomirski
  2014-01-10 20:12             ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2014-01-10 19:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 11:43 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 01/10/2014 01:54 PM, Andy Lutomirski wrote:
>>
>> On Fri, Jan 10, 2014 at 9:41 AM, Peter Zijlstra<peterz@infradead.org>
>> wrote:
>>>
>>> On Fri, Jan 10, 2014 at 06:02:23PM +0100, Peter Zijlstra wrote:
>>>>
>>>> On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote:
>>>>>
>>>>> On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote:
>>>>>>
>>>>>> Peter,
>>>>>>
>>>>>> Call Trace:
>>>>>> <NMI>   [<ffffffff815710af>] dump_stack+0x49/0x62
>>>>>>   [<ffffffff8104e3bc>] warn_slowpath_common+0x8c/0xc0
>>>>>>   [<ffffffff8104e40a>] warn_slowpath_null+0x1a/0x20
>>>>>>   [<ffffffff8105f1f1>] force_sig_info+0x131/0x140
>>>>>>   [<ffffffff81042a4f>] force_sig_info_fault+0x5f/0x70
>>>>>>   [<ffffffff8106d8da>] ? search_exception_tables+0x2a/0x50
>>>>>>   [<ffffffff81043b3d>] ? fixup_exception+0x1d/0x70
>>>>>>   [<ffffffff81042cc9>] no_context+0x159/0x1f0
>>>>>>   [<ffffffff81042e8d>] __bad_area_nosemaphore+0x12d/0x230
>>>>>>   [<ffffffff81042e8d>] ? __bad_area_nosemaphore+0x12d/0x230
>>>>>>   [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>>>>>>   [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>>>>>>   [<ffffffff81578fc2>] ? __do_page_fault+0x362/0x480
>>>>>>   [<ffffffff815791be>] do_page_fault+0xe/0x10
>>>>>>   [<ffffffff81575962>] page_fault+0x22/0x30
>>>>>>   [<ffffffff815817e4>] ? bad_to_user+0x5e/0x66b
>>>>>>   [<ffffffff81285316>] copy_from_user_nmi+0x76/0x90
>>>>>>   [<ffffffff81017a20>] perf_callchain_user+0xd0/0x360
>>>>>>   [<ffffffff8111f64f>] perf_callchain+0x1af/0x1f0
>>>>>>   [<ffffffff81117693>] perf_prepare_sample+0x2f3/0x3a0
>>>>>>   [<ffffffff8111a2af>] __perf_event_overflow+0x10f/0x220
>>>>>>   [<ffffffff8111ab14>] perf_event_overflow+0x14/0x20
>>>>>>   [<ffffffff8101f69e>] intel_pmu_handle_irq+0x1de/0x3c0
>>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>>   [<ffffffff81576e64>] perf_event_nmi_handler+0x34/0x60
>>>>>>   [<ffffffff8157664a>] nmi_handle+0x8a/0x170
>>>>>>   [<ffffffff81576848>] default_do_nmi+0x68/0x210
>>>>>>   [<ffffffff81576a80>] do_nmi+0x90/0xe0
>>>>>>   [<ffffffff81575c67>] end_repeat_nmi+0x1e/0x2e
>>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>>   [<ffffffff81008e44>] ? emulate_vsyscall+0x144/0x390
>>>>>> <<EOE>>   [<ffffffff81042f7d>] __bad_area_nosemaphore+0x21d/0x230
>>>>>>   [<ffffffff81042fa3>] bad_area_nosemaphore+0x13/0x20
>>>>>>   [<ffffffff81578fc2>] __do_page_fault+0x362/0x480
>>>>>>   [<ffffffff8113cfbc>] ? vm_mmap_pgoff+0xbc/0xe0
>>>>>>   [<ffffffff815791be>] do_page_fault+0xe/0x10
>>>>>>   [<ffffffff81575962>] page_fault+0x22/0x30
>>>>>> ---[ end trace 037bf09d279751ec ]---
>>>>>>
>>>>>> So this is a double page faults. Looking at relevant changes in
>>>>>> 3.13 kernel, I spotted the following one patch that modified the
>>>>>> perf_callchain_user() function shown up in the stack trace above:
>>>>>>
>>>>> Hurm, that's an expected double fault, not something we should take the
>>>>> process down for.
>>>>>
>>>>> I'll have to look at how all that works for a bit.
>>>
>>> Andy, introduced all this in 4fc3490114bb ("x86-64: Set siginfo and
>>> context on vsyscall emulation faults").
>>>
>>> It looks like your initial userspace fault hit the magic button and ends
>>> up in emulate_vsyscall. Right at that point we trigger a PMI, which
>>> tries to do a stack-trace. That stack-trace also stumbles into unmapped
>>> memory (might be the same) and faults again.
>>>
>>> Now at that point, we usually just give up on the callchain and proceed
>>> like normal, however because of this double fault emulate-vsyscall
>>> SIGSEGV magic you loose.
>>>
>>> So the below might well be a valid fix.. Anybody? Andy?
>>
>> Yuck -- when I wrote that thing, I hadn't imagined that an interrupt
>> (there's nothing particularly special about NMIs here, I think) would
>> try to access user memory.  The fix below looks okay, but IMO it needs
>> a big fat comment explaining what's going on.
>>
>> Is there a way to ask whether the previous entry into the kernel came
>> from user space?  The valid "sig_on_uaccess_error" case happens when
>> the current fault was triggered by a fault from userspace.  The
>> invalid case (and any invalid case from, say, an int3 that a
>> tracepoint stuck in there) would be a page fault triggered by a fault
>> handler that in turn started in kernel space (in particular, in
>> emulate_vsyscall).
>
>
> The processes that got the SIGSEGV were all running shell scripts. I am not
> totally sure that they were running in user space when getting the PMIs, but
> are likely the case.

The error you're seeing is:

 - Userspace code calls into the vsyscall page (e.g. old code calling
gettimeofday).
 - Calls to the vsyscall page trap, so you end up executing in kernel
space in emulate_vsyscall.
 - perf is running, so you end up in an NMI handler that triggers
while sig_on_uaccess_fault is true.
 - The perf callchain code tries to read from a bad userspace pointer
(not sure why -- the ip value in the vsyscall page *is* readable).
That traps (as expected), but the trap handler injects SIGSEGV due to
sig_on_uaccess_fault==1.

The cleanest fix I can think of is to change the condition to
sig_on_uaccess_fault==1 && (the get_user caller was a single kernel
entry away from userspace).  In the failure you're seeing, there was
an NMI in there.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 18:54       ` Andy Lutomirski
  2014-01-10 19:43         ` Waiman Long
@ 2014-01-10 20:06         ` Peter Zijlstra
  2014-01-10 20:28           ` Andy Lutomirski
                             ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Peter Zijlstra @ 2014-01-10 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Waiman Long, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 10:54:52AM -0800, Andy Lutomirski wrote:
> Yuck -- when I wrote that thing, I hadn't imagined that an interrupt
> (there's nothing particularly special about NMIs here, I think) would
> try to access user memory.  The fix below looks okay, but IMO it needs
> a big fat comment explaining what's going on.

Agreed on both points, we can equally trigger this using software
timers, so any interrupt must be exempt. And yes a comment!

> Is there a way to ask whether the previous entry into the kernel came
> from user space?

Not afaik, but in_interrupt() gets us any interrupt context, whatever
remains must be task context. Still not quite the same, but close enough
I think.

> The valid "sig_on_uaccess_error" case happens when
> the current fault was triggered by a fault from userspace.  The
> invalid case (and any invalid case from, say, an int3 that a
> tracepoint stuck in there) would be a page fault triggered by a fault
> handler that in turn started in kernel space (in particular, in
> emulate_vsyscall).
> 
> For that matter, why does current_thread_info() work from an NMI at
> all?  Does the NMI vector not have its own stack?  The call that
> installs it is set_intr_gate_ist(X86_TRAP_NMI, &nmi, NMI_STACK).

NMIs do have their own stack, however x86_64 grabs kernel_stack from a
per-cpu variable, not rsp.

> In any case, this at least needs a comment.  I don't see why this same
> bug couldn't be triggered by non-NMI based tracing mechanisms, though.
> 
> Sigh, corner cases of corner cases...

:-)

Something like this perhaps?

---
Subject: x86, mm: Allow double faults from interrupts

Waiman managed to trigger a PMI while in a emulate_vsyscall() fault, the
PMI in turn managed to trigger a fault while obtaining a stack trace.
This triggered the double fault logic and killed the process dead.

Fix this by explicitly excluding interrupts from the double fault logic.

Reported-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/mm/fault.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9ff85bb8dd69..4c8e32986aad 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -641,6 +641,20 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 
 	/* Are we prepared to handle this kernel fault? */
 	if (fixup_exception(regs)) {
+		/*
+		 * Any interrupt that takes a fault gets the fixup. This
+		 * makes the below double fault logic only apply to a
+		 * task double faulting from task context.
+		 */
+		if (in_interrupt())
+			return;
+
+		/*
+		 * Per the above we're !in_interrupt(), aka. task context.
+		 *
+		 * In this case we need to make sure we're not double faulting
+		 * through the emulate_vsyscall() logic.
+		 */
 		if (current_thread_info()->sig_on_uaccess_error && signal) {
 			tsk->thread.trap_nr = X86_TRAP_PF;
 			tsk->thread.error_code = error_code | PF_USER;
@@ -649,6 +663,10 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 			/* XXX: hwpoison faults will set the wrong code. */
 			force_sig_info_fault(signal, si_code, address, tsk, 0);
 		}
+
+		/*
+		 * Barring that, we can do the fixup and be happy.
+		 */
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 19:37     ` SIGSEGV when using "perf record -g" with 3.13-rc* kernel Waiman Long
@ 2014-01-10 20:10       ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2014-01-10 20:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Aswin Chandramouleeswaran, Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 02:37:10PM -0500, Waiman Long wrote:
> >@@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> >
> >  	/* Are we prepared to handle this kernel fault? */
> >  	if (fixup_exception(regs)) {
> >-		if (current_thread_info()->sig_on_uaccess_error&&  signal) {
> >+		if (!in_nmi()&&  current_thread_info()->sig_on_uaccess_error&&  signal) {
> >  			tsk->thread.trap_nr = X86_TRAP_PF;
> >  			tsk->thread.error_code = error_code | PF_USER;
> >  			tsk->thread.cr2 = address;
> 
> Yes, this change fixed the error that I got. I no longer see SIGSEGV when I
> run the test.

Awesome, just send a more elaborate version that should have the same
effect.

> I did tried to back out your "perf: Fix arch_perf_out_copy_user default"
> patch, but it didn't fix the problem.

The culprit was: e00b12e64be9 ("perf/x86: Further optimize
copy_from_user_nmi()")



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 19:56           ` Andy Lutomirski
@ 2014-01-10 20:12             ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2014-01-10 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Waiman Long, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 11:56:07AM -0800, Andy Lutomirski wrote:
>  - The perf callchain code tries to read from a bad userspace pointer
> (not sure why -- the ip value in the vsyscall page *is* readable).
> That traps (as expected), but the trap handler injects SIGSEGV due to
> sig_on_uaccess_fault==1.

So the perf thing tries to walk the entire stack in order to obtain a
callchain. Its fairly common to hit crap when attempting this :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 20:06         ` Peter Zijlstra
@ 2014-01-10 20:28           ` Andy Lutomirski
  2014-01-15 15:33           ` Waiman Long
  2014-01-16 13:39           ` [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2014-01-10 20:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On Fri, Jan 10, 2014 at 12:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jan 10, 2014 at 10:54:52AM -0800, Andy Lutomirski wrote:
>> Yuck -- when I wrote that thing, I hadn't imagined that an interrupt
>> (there's nothing particularly special about NMIs here, I think) would
>> try to access user memory.  The fix below looks okay, but IMO it needs
>> a big fat comment explaining what's going on.
>
> Agreed on both points, we can equally trigger this using software
> timers, so any interrupt must be exempt. And yes a comment!
>
>> Is there a way to ask whether the previous entry into the kernel came
>> from user space?
>
> Not afaik, but in_interrupt() gets us any interrupt context, whatever
> remains must be task context. Still not quite the same, but close enough
> I think.
>
>> The valid "sig_on_uaccess_error" case happens when
>> the current fault was triggered by a fault from userspace.  The
>> invalid case (and any invalid case from, say, an int3 that a
>> tracepoint stuck in there) would be a page fault triggered by a fault
>> handler that in turn started in kernel space (in particular, in
>> emulate_vsyscall).
>>
>> For that matter, why does current_thread_info() work from an NMI at
>> all?  Does the NMI vector not have its own stack?  The call that
>> installs it is set_intr_gate_ist(X86_TRAP_NMI, &nmi, NMI_STACK).
>
> NMIs do have their own stack, however x86_64 grabs kernel_stack from a
> per-cpu variable, not rsp.
>
>> In any case, this at least needs a comment.  I don't see why this same
>> bug couldn't be triggered by non-NMI based tracing mechanisms, though.
>>
>> Sigh, corner cases of corner cases...
>
> :-)
>
> Something like this perhaps?
>
> ---
> Subject: x86, mm: Allow double faults from interrupts
>
> Waiman managed to trigger a PMI while in a emulate_vsyscall() fault, the
> PMI in turn managed to trigger a fault while obtaining a stack trace.
> This triggered the double fault logic and killed the process dead.
>
> Fix this by explicitly excluding interrupts from the double fault logic.
>
> Reported-by: Waiman Long <waiman.long@hp.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/mm/fault.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9ff85bb8dd69..4c8e32986aad 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -641,6 +641,20 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>
>         /* Are we prepared to handle this kernel fault? */
>         if (fixup_exception(regs)) {
> +               /*
> +                * Any interrupt that takes a fault gets the fixup. This
> +                * makes the below double fault logic only apply to a
> +                * task double faulting from task context.
> +                */
> +               if (in_interrupt())
> +                       return;
> +
> +               /*
> +                * Per the above we're !in_interrupt(), aka. task context.
> +                *
> +                * In this case we need to make sure we're not double faulting
> +                * through the emulate_vsyscall() logic.
> +                */
>                 if (current_thread_info()->sig_on_uaccess_error && signal) {
>                         tsk->thread.trap_nr = X86_TRAP_PF;
>                         tsk->thread.error_code = error_code | PF_USER;
> @@ -649,6 +663,10 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>                         /* XXX: hwpoison faults will set the wrong code. */
>                         force_sig_info_fault(signal, si_code, address, tsk, 0);
>                 }
> +
> +               /*
> +                * Barring that, we can do the fixup and be happy.
> +                */
>                 return;
>         }
>

My only real nit is the same thing that confused me the first time I
read your email -- when I see "double fault", I think #DF.  How about
something like "It's possible for an interrupt to occur while
sig_on_uaccess_error is set.  In that case, uaccess faults that are
caused by the interrupt handler should not send signals."?

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel
  2014-01-10 20:06         ` Peter Zijlstra
  2014-01-10 20:28           ` Andy Lutomirski
@ 2014-01-15 15:33           ` Waiman Long
  2014-01-16 13:39           ` [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2014-01-15 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linux Kernel Mailing List, Aswin Chandramouleeswaran,
	Scott J Norton, Linus Torvalds

On 01/10/2014 03:06 PM, Peter Zijlstra wrote:
>
> :-)
>
> Something like this perhaps?
>
> ---
> Subject: x86, mm: Allow double faults from interrupts
>
> Waiman managed to trigger a PMI while in a emulate_vsyscall() fault, the
> PMI in turn managed to trigger a fault while obtaining a stack trace.
> This triggered the double fault logic and killed the process dead.
>
> Fix this by explicitly excluding interrupts from the double fault logic.
>
> Reported-by: Waiman Long<waiman.long@hp.com>
> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
> ---
>   arch/x86/mm/fault.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9ff85bb8dd69..4c8e32986aad 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -641,6 +641,20 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>
>   	/* Are we prepared to handle this kernel fault? */
>   	if (fixup_exception(regs)) {
> +		/*
> +		 * Any interrupt that takes a fault gets the fixup. This
> +		 * makes the below double fault logic only apply to a
> +		 * task double faulting from task context.
> +		 */
> +		if (in_interrupt())
> +			return;
> +
> +		/*
> +		 * Per the above we're !in_interrupt(), aka. task context.
> +		 *
> +		 * In this case we need to make sure we're not double faulting
> +		 * through the emulate_vsyscall() logic.
> +		 */
>   		if (current_thread_info()->sig_on_uaccess_error&&  signal) {
>   			tsk->thread.trap_nr = X86_TRAP_PF;
>   			tsk->thread.error_code = error_code | PF_USER;
> @@ -649,6 +663,10 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>   			/* XXX: hwpoison faults will set the wrong code. */
>   			force_sig_info_fault(signal, si_code, address, tsk, 0);
>   		}
> +
> +		/*
> +		 * Barring that, we can do the fixup and be happy.
> +		 */
>   		return;
>   	}
>

Are you going to send out an official patch to fix this problem? I 
really like to see it merged into 3.13 before it is released.

-Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts
  2014-01-10 20:06         ` Peter Zijlstra
  2014-01-10 20:28           ` Andy Lutomirski
  2014-01-15 15:33           ` Waiman Long
@ 2014-01-16 13:39           ` tip-bot for Peter Zijlstra
  2014-01-17 18:10             ` Waiman Long
  2 siblings, 1 reply; 18+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-01-16 13:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, waiman.long, acme, peterz,
	luto, akpm, tglx, scott.norton, aswin

Commit-ID:  c026b3591e4f2a4993df773183704bb31634e0bd
Gitweb:     http://git.kernel.org/tip/c026b3591e4f2a4993df773183704bb31634e0bd
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 10 Jan 2014 21:06:03 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Jan 2014 09:19:48 +0100

x86, mm, perf: Allow recursive faults from interrupts

Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
the PMI in turn managed to trigger a fault while obtaining a stack
trace. This triggered the sig_on_uaccess_error recursive fault logic
and killed the process dead.

Fix this by explicitly excluding interrupts from the recursive fault
logic.

Reported-and-Tested-by: Waiman Long <waiman.long@hp.com>
Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/fault.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9ff85bb..9d591c8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -641,6 +641,20 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 
 	/* Are we prepared to handle this kernel fault? */
 	if (fixup_exception(regs)) {
+		/*
+		 * Any interrupt that takes a fault gets the fixup. This makes
+		 * the below recursive fault logic only apply to a faults from
+		 * task context.
+		 */
+		if (in_interrupt())
+			return;
+
+		/*
+		 * Per the above we're !in_interrupt(), aka. task context.
+		 *
+		 * In this case we need to make sure we're not recursively
+		 * faulting through the emulate_vsyscall() logic.
+		 */
 		if (current_thread_info()->sig_on_uaccess_error && signal) {
 			tsk->thread.trap_nr = X86_TRAP_PF;
 			tsk->thread.error_code = error_code | PF_USER;
@@ -649,6 +663,10 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 			/* XXX: hwpoison faults will set the wrong code. */
 			force_sig_info_fault(signal, si_code, address, tsk, 0);
 		}
+
+		/*
+		 * Barring that, we can do the fixup and be happy.
+		 */
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts
  2014-01-16 13:39           ` [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts tip-bot for Peter Zijlstra
@ 2014-01-17 18:10             ` Waiman Long
  2014-01-17 19:17               ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2014-01-17 18:10 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, waiman.long, torvalds, peterz, acme,
	luto, akpm, tglx, scott.norton, aswin
  Cc: tip-bot for Peter Zijlstra, linux-tip-commits

On 01/16/2014 08:39 AM, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  c026b3591e4f2a4993df773183704bb31634e0bd
> Gitweb:     http://git.kernel.org/tip/c026b3591e4f2a4993df773183704bb31634e0bd
> Author:     Peter Zijlstra<peterz@infradead.org>
> AuthorDate: Fri, 10 Jan 2014 21:06:03 +0100
> Committer:  Ingo Molnar<mingo@kernel.org>
> CommitDate: Thu, 16 Jan 2014 09:19:48 +0100
>
> x86, mm, perf: Allow recursive faults from interrupts
>
> Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
> the PMI in turn managed to trigger a fault while obtaining a stack
> trace. This triggered the sig_on_uaccess_error recursive fault logic
> and killed the process dead.
>
> Fix this by explicitly excluding interrupts from the recursive fault
> logic.
>
> Reported-and-Tested-by: Waiman Long<waiman.long@hp.com>
> Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
> Cc: Aswin Chandramouleeswaran<aswin@hp.com>
> Cc: Scott J Norton<scott.norton@hp.com>
> Cc: Linus Torvalds<torvalds@linux-foundation.org>
> Cc: Andy Lutomirski<luto@amacapital.net>
> Cc: Arnaldo Carvalho de Melo<acme@ghostprotocols.net>
> Cc: Andrew Morton<akpm@linux-foundation.org>
> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net
> Signed-off-by: Ingo Molnar<mingo@kernel.org>
> ---
>   arch/x86/mm/fault.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
>

Will that be picked up by Linus as it is a 3.13 regression?

-Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts
  2014-01-17 18:10             ` Waiman Long
@ 2014-01-17 19:17               ` Andy Lutomirski
  2014-01-17 20:08                 ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2014-01-17 19:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, H. Peter Anvin, linux-kernel, Linus Torvalds,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Andrew Morton,
	Thomas Gleixner, Scott J Norton, Aswin Chandramouleeswaran,
	tip-bot for Peter Zijlstra, linux-tip-commits

On Fri, Jan 17, 2014 at 10:10 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 01/16/2014 08:39 AM, tip-bot for Peter Zijlstra wrote:
>>
>> Commit-ID:  c026b3591e4f2a4993df773183704bb31634e0bd
>> Gitweb:
>> http://git.kernel.org/tip/c026b3591e4f2a4993df773183704bb31634e0bd
>> Author:     Peter Zijlstra<peterz@infradead.org>
>> AuthorDate: Fri, 10 Jan 2014 21:06:03 +0100
>> Committer:  Ingo Molnar<mingo@kernel.org>
>> CommitDate: Thu, 16 Jan 2014 09:19:48 +0100
>>
>> x86, mm, perf: Allow recursive faults from interrupts
>>
>> Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
>> the PMI in turn managed to trigger a fault while obtaining a stack
>> trace. This triggered the sig_on_uaccess_error recursive fault logic
>> and killed the process dead.
>>
>> Fix this by explicitly excluding interrupts from the recursive fault
>> logic.
>>
>> Reported-and-Tested-by: Waiman Long<waiman.long@hp.com>
>> Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
>> Cc: Aswin Chandramouleeswaran<aswin@hp.com>
>> Cc: Scott J Norton<scott.norton@hp.com>
>> Cc: Linus Torvalds<torvalds@linux-foundation.org>
>> Cc: Andy Lutomirski<luto@amacapital.net>
>> Cc: Arnaldo Carvalho de Melo<acme@ghostprotocols.net>
>> Cc: Andrew Morton<akpm@linux-foundation.org>
>> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
>> Link:
>> http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net
>> Signed-off-by: Ingo Molnar<mingo@kernel.org>
>> ---
>>   arch/x86/mm/fault.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>>
>
> Will that be picked up by Linus as it is a 3.13 regression?

Does anyone actually know why this regressed recently?  The buggy code
has been there for quite a while.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts
  2014-01-17 19:17               ` Andy Lutomirski
@ 2014-01-17 20:08                 ` Waiman Long
  2014-01-17 21:07                   ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2014-01-17 20:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, H. Peter Anvin, linux-kernel, Linus Torvalds,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Andrew Morton,
	Thomas Gleixner, Scott J Norton, Aswin Chandramouleeswaran,
	tip-bot for Peter Zijlstra, linux-tip-commits

On 01/17/2014 02:17 PM, Andy Lutomirski wrote:
> On Fri, Jan 17, 2014 at 10:10 AM, Waiman Long<waiman.long@hp.com>  wrote:
>> On 01/16/2014 08:39 AM, tip-bot for Peter Zijlstra wrote:
>>> Commit-ID:  c026b3591e4f2a4993df773183704bb31634e0bd
>>> Gitweb:
>>> http://git.kernel.org/tip/c026b3591e4f2a4993df773183704bb31634e0bd
>>> Author:     Peter Zijlstra<peterz@infradead.org>
>>> AuthorDate: Fri, 10 Jan 2014 21:06:03 +0100
>>> Committer:  Ingo Molnar<mingo@kernel.org>
>>> CommitDate: Thu, 16 Jan 2014 09:19:48 +0100
>>>
>>> x86, mm, perf: Allow recursive faults from interrupts
>>>
>>> Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
>>> the PMI in turn managed to trigger a fault while obtaining a stack
>>> trace. This triggered the sig_on_uaccess_error recursive fault logic
>>> and killed the process dead.
>>>
>>> Fix this by explicitly excluding interrupts from the recursive fault
>>> logic.
>>>
>>> Reported-and-Tested-by: Waiman Long<waiman.long@hp.com>
>>> Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
>>> Cc: Aswin Chandramouleeswaran<aswin@hp.com>
>>> Cc: Scott J Norton<scott.norton@hp.com>
>>> Cc: Linus Torvalds<torvalds@linux-foundation.org>
>>> Cc: Andy Lutomirski<luto@amacapital.net>
>>> Cc: Arnaldo Carvalho de Melo<acme@ghostprotocols.net>
>>> Cc: Andrew Morton<akpm@linux-foundation.org>
>>> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
>>> Link:
>>> http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net
>>> Signed-off-by: Ingo Molnar<mingo@kernel.org>
>>> ---
>>>    arch/x86/mm/fault.c | 18 ++++++++++++++++++
>>>    1 file changed, 18 insertions(+)
>>>
>>>
>> Will that be picked up by Linus as it is a 3.13 regression?
> Does anyone actually know why this regressed recently?  The buggy code
> has been there for quite a while.
>
> --Andy

Yes, the bug was there for a while, but a recent change by Peter (see 
the "Fixes:" line above) made it much easier to hit it.

-Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts
  2014-01-17 20:08                 ` Waiman Long
@ 2014-01-17 21:07                   ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2014-01-17 21:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, H. Peter Anvin, linux-kernel, Linus Torvalds,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Andrew Morton,
	Thomas Gleixner, Scott J Norton, Aswin Chandramouleeswaran,
	tip-bot for Peter Zijlstra, linux-tip-commits

On Fri, Jan 17, 2014 at 12:08 PM, Waiman Long <waiman.long@hp.com> wrote:
> On 01/17/2014 02:17 PM, Andy Lutomirski wrote:
>>
>> On Fri, Jan 17, 2014 at 10:10 AM, Waiman Long<waiman.long@hp.com>  wrote:
>>>
>>> On 01/16/2014 08:39 AM, tip-bot for Peter Zijlstra wrote:
>>>>
>>>> Commit-ID:  c026b3591e4f2a4993df773183704bb31634e0bd
>>>> Gitweb:
>>>> http://git.kernel.org/tip/c026b3591e4f2a4993df773183704bb31634e0bd
>>>> Author:     Peter Zijlstra<peterz@infradead.org>
>>>> AuthorDate: Fri, 10 Jan 2014 21:06:03 +0100
>>>> Committer:  Ingo Molnar<mingo@kernel.org>
>>>> CommitDate: Thu, 16 Jan 2014 09:19:48 +0100
>>>>
>>>> x86, mm, perf: Allow recursive faults from interrupts
>>>>
>>>> Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
>>>> the PMI in turn managed to trigger a fault while obtaining a stack
>>>> trace. This triggered the sig_on_uaccess_error recursive fault logic
>>>> and killed the process dead.
>>>>
>>>> Fix this by explicitly excluding interrupts from the recursive fault
>>>> logic.
>>>>
>>>> Reported-and-Tested-by: Waiman Long<waiman.long@hp.com>
>>>> Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
>>>> Cc: Aswin Chandramouleeswaran<aswin@hp.com>
>>>> Cc: Scott J Norton<scott.norton@hp.com>
>>>> Cc: Linus Torvalds<torvalds@linux-foundation.org>
>>>> Cc: Andy Lutomirski<luto@amacapital.net>
>>>> Cc: Arnaldo Carvalho de Melo<acme@ghostprotocols.net>
>>>> Cc: Andrew Morton<akpm@linux-foundation.org>
>>>> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
>>>> Link:
>>>>
>>>> http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net
>>>> Signed-off-by: Ingo Molnar<mingo@kernel.org>
>>>> ---
>>>>    arch/x86/mm/fault.c | 18 ++++++++++++++++++
>>>>    1 file changed, 18 insertions(+)
>>>>
>>>>
>>> Will that be picked up by Linus as it is a 3.13 regression?
>>
>> Does anyone actually know why this regressed recently?  The buggy code
>> has been there for quite a while.
>>
>> --Andy
>
>
> Yes, the bug was there for a while, but a recent change by Peter (see the
> "Fixes:" line above) made it much easier to hit it.

Thanks!

So I feel slightly better now -- this particular bug didn't actually
exist when I wrote the offending code :)  But that also means that
this should really be fixed in 3.13.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-01-17 21:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-10 15:29 SIGSEGV when using "perf record -g" with 3.13-rc* kernel Waiman Long
2014-01-10 16:58 ` Peter Zijlstra
2014-01-10 17:02   ` Peter Zijlstra
2014-01-10 17:41     ` Peter Zijlstra
2014-01-10 18:54       ` Andy Lutomirski
2014-01-10 19:43         ` Waiman Long
2014-01-10 19:56           ` Andy Lutomirski
2014-01-10 20:12             ` Peter Zijlstra
2014-01-10 20:06         ` Peter Zijlstra
2014-01-10 20:28           ` Andy Lutomirski
2014-01-15 15:33           ` Waiman Long
2014-01-16 13:39           ` [tip:perf/core] x86, mm, perf: Allow recursive faults from interrupts tip-bot for Peter Zijlstra
2014-01-17 18:10             ` Waiman Long
2014-01-17 19:17               ` Andy Lutomirski
2014-01-17 20:08                 ` Waiman Long
2014-01-17 21:07                   ` Andy Lutomirski
2014-01-10 19:37     ` SIGSEGV when using "perf record -g" with 3.13-rc* kernel Waiman Long
2014-01-10 20:10       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).