linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Sometimes, there is OOPS happened when we use oprofile.
@ 2012-10-29  2:33 Zhang, Jun
  2012-10-31 21:05 ` Robert Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Zhang, Jun @ 2012-10-29  2:33 UTC (permalink / raw)
  To: Robert Richter, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, oprofile-list, linux-kernel
  Cc: Zhang, Jun

>From fff479313342940372444797814edee996b18fc9 Mon Sep 17 00:00:00 2001
From: jzha144 <jun.zhang@intel.com>
Date: Mon, 29 Oct 2012 09:07:22 +0800
Subject: [PATCH] Sometimes, there is OOPS happened when we use oprofile. next
 is the call stack. From call stack, we find in
 call_on_stack if there is a nmi interrupt between "xchgl
 %%ebx,%%esp" and "call *%%edi", system will OOPS.

 BUG: unable to handle kernel paging request at ff06383f
 IP: [<c12051cd>] print_context_stack+0x4d/0x100
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211
 compat btwilink atomisp lm3554 mt9m114 mt9e013 videobuf2_memops videobuf2_core st_drv matrix(C)

 Pid: 162, comm: adbd Tainted: G        WC  3.0.34-140446-g9e77874-dirty #1 Intel Corporation
 EIP: 0060:[<c12051cd>] EFLAGS: 00010083 CPU: 1
 EIP is at print_context_stack+0x4d/0x100
 EAX: ff063ffc EBX: ff06383f ECX: f4a0bd74 EDX: ff06383f
 ESI: 00000000 EDI: ffffe000 EBP: f58dbe48 ESP: f58dbe24
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 Process adbd (pid: 162, ti=f58da000 task=f430a730 task.ti=f4a0a000)
 Stack:
  0000000c ff063ffc f4a0bd74 ffffe000 ff062000 f4a0bd74 ff06383f c1b2b1c0
  ff062000 f58dbe74 c120428f c1b2b1c0 f58dbe98 00000000 f58dbe60 00000000
  00000000 f4a0bd74 f58dbfc4 00000005 f58dbebc c172d52f f4a0bd74 c1b2b1c0
 Call Trace:
  [<c120428f>] dump_trace+0x7f/0xf0
  [<c172d52f>] x86_backtrace+0x13f/0x150
  [<c172b504>] ? op_cpu_buffer_write_commit+0x14/0x20
  [<c172b66e>] ? log_sample+0x8e/0xb0
  [<c172b8ca>] oprofile_add_sample+0x9a/0xc0
  [<c172f09e>] ppro_check_ctrs+0x8e/0x110
  [<c12a31ce>] ? rb_reserve_next_event+0x3e/0x370
  [<c172d8d7>] profile_exceptions_notify+0x67/0x70
  [<c18694c7>] notifier_call_chain+0x47/0x90
  [<c1869548>] __atomic_notifier_call_chain+0x38/0x50
  [<c1250930>] ? remote_softirq_receive+0x110/0x110
  [<c186957f>] atomic_notifier_call_chain+0x1f/0x30
  [<c18695bd>] notify_die+0x2d/0x30
  [<c1867390>] do_nmi+0xb0/0x300
  [<c124fcef>] ? __local_bh_enable+0x4f/0xa0
  [<c1866f95>] nmi_stack_correct+0x28/0x2d
  [<c1250930>] ? remote_softirq_receive+0x110/0x110
  [<c120412f>] ? do_softirq+0x8f/0xe0
  <IRQ>
  [<c1250e26>] irq_exit+0x86/0xd0
  [<c186cb49>] smp_apic_timer_interrupt+0x59/0x88
  [<c1496738>] ? trace_hardirqs_off_thunk+0xc/0x14
  [<c1866ca7>] apic_timer_interrupt+0x2f/0x34
  [<c122007b>] ? handle_vm86_fault+0x78b/0x9b0
  [<c186661f>] ? _raw_spin_unlock_irqrestore+0x3f/0x50
  [<c1230d3c>] __wake_up_sync_key+0x4c/0x60
  [<c17353f0>] sock_def_readable+0x40/0x70
  [<c17d050d>] unix_stream_sendmsg+0x22d/0x390
  [<c173103b>] sock_aio_write+0x11b/0x140
  [<c186375d>] ? __schedule+0x23d/0x8d0
  [<c1866f95>] ? nmi_stack_correct+0x28/0x2d
  [<c12feaf9>] do_sync_write+0xa9/0xe0
  [<c186942d>] ? sub_preempt_count+0x3d/0x50
  [<c12ff321>] vfs_write+0x151/0x160
  [<c1300798>] ? fget_light+0x58/0xd0
  [<c12ff53d>] sys_write+0x3d/0x70
  [<c18669a1>] syscall_call+0x7/0xb
 Code: f6 89 4d f0 89 4d e4 89 45 e0 89 7d e8 74 5e 8d b4 26 00 00 00 00 39
 f3 72 0c 8b 45 f0 83 c4 18 5b 5e 5f 5d c3 90 3b 5d e8 72 ef <8b> 3b 89 f8
 89 7d dc e8 c7 07 06 00 85 c0 74 2b 8b 45 f0 83 c0
 EIP: [<c12051cd>] print_context_stack+0x4d/0x100 SS:ESP 0068:f58dbe24
 CR2: 00000000ff06383f

Signed-off-by: jzha144 <jun.zhang@intel.com>
---
 arch/x86/oprofile/backtrace.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d6aa6e8..c1af4f0 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,6 +113,10 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 
 	if (!user_mode_vm(regs)) {
 		unsigned long stack = kernel_stack_pointer(regs);
+
+		if (!((unsigned long)stack & (THREAD_SIZE - 1)))
+			stack = 0;
+
 		if (depth)
 			dump_trace(NULL, regs, (unsigned long *)stack, 0,
 				   &backtrace_ops, &depth);
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Sometimes, there is OOPS happened when we use oprofile.
  2012-10-29  2:33 [PATCH] Sometimes, there is OOPS happened when we use oprofile Zhang, Jun
@ 2012-10-31 21:05 ` Robert Richter
  2012-10-31 21:27   ` H. Peter Anvin
  2012-10-31 21:33   ` H. Peter Anvin
  0 siblings, 2 replies; 5+ messages in thread
From: Robert Richter @ 2012-10-31 21:05 UTC (permalink / raw)
  To: Zhang, Jun, Ingo Molnar, H. Peter Anvin
  Cc: Thomas Gleixner, x86, oprofile-list, linux-kernel

Jun,

On 29.10.12 02:33:54, Zhang, Jun wrote:
> Sometimes, there is OOPS happened when we use oprofile. next
> is the call stack. From call stack, we find in
> call_on_stack if there is a nmi interrupt between "xchgl
> %%ebx,%%esp" and "call *%%edi", system will OOPS.

this should be related and fixed with:

 https://lkml.org/lkml/2012/9/12/269

Ingo, HPA,

please apply the fix of kernel_stack_pointer().

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Sometimes, there is OOPS happened when we use oprofile.
  2012-10-31 21:05 ` Robert Richter
@ 2012-10-31 21:27   ` H. Peter Anvin
  2012-10-31 21:33   ` H. Peter Anvin
  1 sibling, 0 replies; 5+ messages in thread
From: H. Peter Anvin @ 2012-10-31 21:27 UTC (permalink / raw)
  To: Robert Richter
  Cc: Zhang, Jun, Ingo Molnar, Thomas Gleixner, x86, oprofile-list,
	linux-kernel

On 10/31/2012 02:05 PM, Robert Richter wrote:
> Jun,
>
> On 29.10.12 02:33:54, Zhang, Jun wrote:
>> Sometimes, there is OOPS happened when we use oprofile. next
>> is the call stack. From call stack, we find in
>> call_on_stack if there is a nmi interrupt between "xchgl
>> %%ebx,%%esp" and "call *%%edi", system will OOPS.
>
> this should be related and fixed with:
>
>   https://lkml.org/lkml/2012/9/12/269
>
> Ingo, HPA,
>
> please apply the fix of kernel_stack_pointer().
>

Thanks for the reminder.  Ingo bounced this one to me for review while I 
was away and it fell between the cracks.

	-hpa
-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Sometimes, there is OOPS happened when we use oprofile.
  2012-10-31 21:05 ` Robert Richter
  2012-10-31 21:27   ` H. Peter Anvin
@ 2012-10-31 21:33   ` H. Peter Anvin
  2012-10-31 22:45     ` Robert Richter
  1 sibling, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2012-10-31 21:33 UTC (permalink / raw)
  To: Robert Richter
  Cc: Zhang, Jun, Ingo Molnar, Thomas Gleixner, x86, oprofile-list,
	linux-kernel

On 10/31/2012 02:05 PM, Robert Richter wrote:
> Jun,
>
> On 29.10.12 02:33:54, Zhang, Jun wrote:
>> Sometimes, there is OOPS happened when we use oprofile. next
>> is the call stack. From call stack, we find in
>> call_on_stack if there is a nmi interrupt between "xchgl
>> %%ebx,%%esp" and "call *%%edi", system will OOPS.
>
> this should be related and fixed with:
>
>   https://lkml.org/lkml/2012/9/12/269
>
> Ingo, HPA,
>
> please apply the fix of kernel_stack_pointer().
>

I'm vaguely concerned about the following:

+ * To always return a non-null
+ * stack pointer we fall back to regs as stack if no previous stack
+ * exists.

The logic being that if there is no stack pointer and the stack is too 
empty, to simply assume regs point to the top of the stack?  Is this 
possible to ever be actually seen?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Sometimes, there is OOPS happened when we use oprofile.
  2012-10-31 21:33   ` H. Peter Anvin
@ 2012-10-31 22:45     ` Robert Richter
  0 siblings, 0 replies; 5+ messages in thread
From: Robert Richter @ 2012-10-31 22:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Zhang, Jun, Ingo Molnar, Thomas Gleixner, x86, oprofile-list,
	linux-kernel, Steven Rostedt

On 31.10.12 14:33:17, H. Peter Anvin wrote:
> I'm vaguely concerned about the following:
> 
> + * To always return a non-null
> + * stack pointer we fall back to regs as stack if no previous stack
> + * exists.
> 
> The logic being that if there is no stack pointer and the stack is
> too empty, to simply assume regs point to the top of the stack?  Is
> this possible to ever be actually seen?

I discussed this with Steven too (https://lkml.org/lkml/2012/9/6/322)
and we both had a bad feeling with returning a null pointer by
kernel_stack_pointer() (implemented in version 1 of this patch). It
could be null if tinfo->previous_esp is null (last stack). Not sure
when this may happen.

So using regs as fallback seemed to be ok as this was in for years:

 7b6c6c7 x86, 32-bit: fix kernel_trap_sp()

-Robert

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-31 22:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-29  2:33 [PATCH] Sometimes, there is OOPS happened when we use oprofile Zhang, Jun
2012-10-31 21:05 ` Robert Richter
2012-10-31 21:27   ` H. Peter Anvin
2012-10-31 21:33   ` H. Peter Anvin
2012-10-31 22:45     ` Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).