linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Alistair John Strachan <s0348365@sms.ed.ac.uk>
Cc: Mikael Pettersson <mikpe@it.uu.se>,
	76306.1226@compuserve.com, akpm@osdl.org, bunk@stusta.de,
	greg@kroah.com, linux-kernel@vger.kernel.org,
	yanmin_zhang@linux.intel.com
Subject: Re: kernel + gcc 4.1 = several problems
Date: Fri, 5 Jan 2007 08:49:24 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0701050827290.3661@woody.osdl.org> (raw)
In-Reply-To: <200701051619.54977.s0348365@sms.ed.ac.uk>



On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> 
> (I realise with problems like these it's almost always some sort of obscure 
> hardware problem, but I find that very difficult to believe when I can toggle 
> from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
> also ran extensive stability test programs on the hardware with absolutely no 
> negative results.)

The thing is, I agree with you - it does seem to be compiler-related. But 
at the same time, I'm almost positive that it's not in "pipe_poll()" 
itself, because that function is just too simple, and looking at the 
assembly code, I don't see how what you describe could happen in THAT 
function.

HOWEVER.

I can easily see an NMI coming in, or another interrupt, or something, and 
that one corrupting the stack under it because of a compiler bug (or a 
kernel bug that just needs a specific compiler to trigger). For example, 
we've had problems before with the compiler thinking it owns the stack 
frame for an "asmlinkage" function, and us having no way to tell the 
compiler to keep its hands off - so the compiler ended up touching 
registers that were actually in the "save area" of the interrupt or system 
call, and then returning with corrupted state.

Here's a stupid patch. It just adds more debugging to the oops message, 
and shows all the code pointers it can find on the WHOLE stack.

It also makes the raw stack dumping print out as much of the stack 
contents _under_ the stack pointer as it does above it too.

However, this patch is mostly useless if you have a separate stack for 
IRQ's (since if that happens, any interrupt will be taken on a different 
stack which we don't see any more), so you should NOT enable the 4KSTACKS 
config option if you try this out.

I'm not sure how enlightening any of the output might be, but it is 
probably worth trying.

		Linus

---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 0efad8a..2359eed 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs *regs,
 	show_trace_log_lvl(task, regs, stack, "");
 }
 
+static void show_all_stack_addresses(unsigned long *esp)
+{
+	struct thread_info *tinfo = (void *) ((unsigned long)esp & (~(THREAD_SIZE - 1)));
+	unsigned long *stack = (unsigned long *)(tinfo+1);
+
+	printk("All stack code pointers:\n");
+	while (valid_stack_ptr(tinfo, stack)) {
+		unsigned long addr = *stack++;
+		if (__kernel_text_address(addr))
+			print_symbol(" %s", addr);
+	}
+	printk("\n");
+}
+
 static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			       unsigned long *esp, char *log_lvl)
 {
@@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			esp = (unsigned long *)&esp;
 	}
 
+	show_all_stack_addresses(esp);
 	stack = esp;
-	for(i = 0; i < kstack_depth_to_print; i++) {
+	stack -= kstack_depth_to_print;
+	for(i = 0; i < 2*kstack_depth_to_print; i++) {
 		if (kstack_end(stack))
 			break;
 		if (i && ((i % 8) == 0))

  reply	other threads:[~2007-01-05 16:57 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-03  2:12 kernel + gcc 4.1 = several problems Mikael Pettersson
2007-01-03  2:20 ` Alistair John Strachan
2007-01-05 15:53   ` Alistair John Strachan
2007-01-05 16:02     ` Linus Torvalds
2007-01-05 16:19       ` Alistair John Strachan
2007-01-05 16:49         ` Linus Torvalds [this message]
2007-01-07  0:36           ` Pavel Machek
2007-01-07  0:57             ` Alistair John Strachan
2007-01-03  5:55 ` Willy Tarreau
2007-01-03 10:29 ` Alan
2007-01-03 10:32   ` Grzegorz Kulewski
2007-01-03 11:51     ` Jeff Garzik
2007-01-03 12:44     ` Alan
2007-01-03 13:32       ` Arjan van de Ven
2007-01-03 13:58         ` Jakub Jelinek
2007-01-03 14:28         ` Alan
2007-01-03 16:06           ` Linus Torvalds
2007-01-03 16:03     ` Linus Torvalds
2007-01-03 17:01       ` l.genoni
2007-01-03 17:45         ` Tim Schmielau
2007-01-03 20:24           ` Linus Torvalds
2007-01-03 17:06       ` l.genoni
2007-01-03 17:53       ` Mariusz Kozlowski
2007-01-03 19:47       ` Denis Vlasenko
2007-01-03 20:38         ` Linus Torvalds
2007-01-03 21:48           ` Denis Vlasenko
2007-01-03 22:13             ` Linus Torvalds
2007-01-03 21:44       ` Thomas Sailer
2007-01-03 22:08         ` Linus Torvalds
2007-01-04  3:08       ` Zou, Nanhai
2007-01-04 15:34         ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2007-01-04  7:11 Albert Cahalan
2007-01-04 16:43 ` Segher Boessenkool
2007-01-04 17:04   ` Albert Cahalan
2007-01-04 17:24     ` Segher Boessenkool
2007-01-04 17:47       ` Linus Torvalds
2007-01-04 18:53         ` Segher Boessenkool
2007-01-04 19:10         ` Al Viro
2007-01-05 17:17       ` Pavel Machek
2007-01-06  8:23         ` Segher Boessenkool
2007-01-04 17:37     ` Linus Torvalds
2007-01-04 18:34       ` Segher Boessenkool
2007-01-04 22:02         ` Geert Bosch
2007-01-07  4:25       ` Denis Vlasenko
2007-01-07  4:45         ` Linus Torvalds
2007-01-07  5:26           ` Jeff Garzik
2007-01-07 15:10         ` Segher Boessenkool
2007-01-26 22:05           ` Michael K. Edwards
2007-01-04 18:08     ` Andreas Schwab
2006-12-20 14:21 Oops in 2.6.19.1 Alistair John Strachan
2006-12-30 16:59 ` Alistair John Strachan
2006-12-31 16:27   ` Adrian Bunk
2006-12-31 16:55     ` Alistair John Strachan
2007-01-02 21:10       ` kernel + gcc 4.1 = several problems Adrian Bunk
2007-01-02 21:56         ` Alistair John Strachan
2007-01-02 22:06           ` D. Hazelton
2007-01-02 23:24             ` Adrian Bunk
2007-01-02 23:41               ` D. Hazelton
2007-01-03  2:05                 ` Horst H. von Brand
2007-01-02 22:13           ` Linus Torvalds
2007-01-02 23:18             ` Alistair John Strachan
2007-01-03  1:43               ` Linus Torvalds
2007-01-02 22:01         ` Linus Torvalds
2007-01-02 23:09           ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0701050827290.3661@woody.osdl.org \
    --to=torvalds@osdl.org \
    --cc=76306.1226@compuserve.com \
    --cc=akpm@osdl.org \
    --cc=bunk@stusta.de \
    --cc=greg@kroah.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikpe@it.uu.se \
    --cc=s0348365@sms.ed.ac.uk \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).