From: Linus Torvalds <torvalds@osdl.org>
To: Alistair John Strachan <s0348365@sms.ed.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>,
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
LKML <linux-kernel@vger.kernel.org>, Greg KH <greg@kroah.com>,
Chuck Ebbert <76306.1226@compuserve.com>,
Andrew Morton <akpm@osdl.org>
Subject: Re: kernel + gcc 4.1 = several problems
Date: Tue, 2 Jan 2007 17:43:00 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0701021640110.4473@woody.osdl.org> (raw)
In-Reply-To: <200701022318.11680.s0348365@sms.ed.ac.uk>
On Tue, 2 Jan 2007, Alistair John Strachan wrote:
>
> eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000
> esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c
>
> Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8
> 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f
> 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
> EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c
>
> Chuck observed that the kernel tries to reenter pipe_poll half way through an
> instruction (c0156f5f->c0156f60); it's not a single-bit error but an
> off-by-one.
It's not an off-by-one either (eg say we're taking an exception and
screiwing up %eip by one somehow).
The code sequence in question is
mov %ecx,%edx
mov 0x6c(%esi),%eax
or $0x10,%edx
cmp 0x168(%edi),%eax <--
cmovne %edx,%ecx
jmp ...
and it's in the second byte of the "cmp".
And yes, it definitely entered there, because trying other random
entry-points will have either invalid instructions or instructions that
would fault due to NULL pointers. HOWEVER, it's also not as simple as
"took an interrupt, and returned with %eip incremented by one", becasue
your %edx is zero, so it won't have done that "or $10,%edx" and then some
interrupt happened and screwed up just %eip.
So it's literally a random %eip, but since you say it's consistently in
that function, it's not truly "random". There's something that triggers it
just _there_.
However, that's a damn simple function. There's _nothing_ there. The
particular code that is involved right there is literally
if (!pipe->writers && filp->f_version != pipe->w_counter)
mask |= POLLHUP;
and that's it. There's not even anything half-way interesting around it,
except for the "poll_wait()" call, but even that is about as common as
you can humanly get..
Looking at the register set and the stack, I see:
Stack: 00000000
00000000 <- saved %ebx (dunno, seems dead in caller)
f70f3e9c <- saved %esi (== pollfd in do_pollfd)
f6e111c0 <- saved %edi (== filp)
f70f3fa4 <- outer EBP (looks reasonable)
c015d7f3 <- return address (do_sys_poll+0x253/0x480)
and the strange thing is that when the oops happens, it really looks like
%esi _still_ contains the value it had originally (and that is saved on
the stack). But afaik, from your disassembly, it should have been
overwritten by the initial %eax, which should have had the same value as
%edi on entry...
IOW, none of it really makes any sense. The stack frames look fine, so we
_did_ enter at the beginning of the function (and it wasn't the *poll fn
pointer that was corrupt.
> The suggestions I've had so far which I have not yet tried:
>
> - Select a different x86 CPU in the config.
> - Unfortunately the C3-2 flags seem to simply tell GCC
> to schedule for ppro (like i686) and enabled MMX and SSE
> - Probably useless
Actually, try this one. Try using something that doesn't like "cmov".
Maybe the C3-2 simply has some internal cmov bugginess.
Linus
next prev parent reply other threads:[~2007-01-03 1:46 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-20 14:21 Oops in 2.6.19.1 Alistair John Strachan
2006-12-20 16:30 ` Greg KH
2006-12-20 16:44 ` Alistair John Strachan
2006-12-23 15:40 ` Alistair John Strachan
2006-12-27 2:07 ` Zhang, Yanmin
2006-12-27 12:35 ` Alistair John Strachan
2006-12-28 2:41 ` Zhang, Yanmin
2006-12-28 4:02 ` Alistair John Strachan
2006-12-28 4:14 ` Alistair John Strachan
2006-12-30 16:59 ` Alistair John Strachan
2006-12-31 13:47 ` Alistair John Strachan
2006-12-31 16:27 ` Adrian Bunk
2006-12-31 16:55 ` Alistair John Strachan
2007-01-02 21:10 ` kernel + gcc 4.1 = several problems Adrian Bunk
2007-01-02 21:56 ` Alistair John Strachan
2007-01-02 22:06 ` D. Hazelton
2007-01-02 23:24 ` Adrian Bunk
2007-01-02 23:41 ` D. Hazelton
2007-01-03 2:05 ` Horst H. von Brand
2007-01-02 22:13 ` Linus Torvalds
2007-01-02 23:18 ` Alistair John Strachan
2007-01-03 1:43 ` Linus Torvalds [this message]
2007-01-02 22:01 ` Linus Torvalds
2007-01-02 23:09 ` David Rientjes
2007-01-03 2:12 Mikael Pettersson
2007-01-03 2:20 ` Alistair John Strachan
2007-01-05 15:53 ` Alistair John Strachan
2007-01-05 16:02 ` Linus Torvalds
2007-01-05 16:19 ` Alistair John Strachan
2007-01-05 16:49 ` Linus Torvalds
2007-01-07 0:36 ` Pavel Machek
2007-01-07 0:57 ` Alistair John Strachan
2007-01-03 5:55 ` Willy Tarreau
2007-01-03 10:29 ` Alan
2007-01-03 10:32 ` Grzegorz Kulewski
2007-01-03 11:51 ` Jeff Garzik
2007-01-03 12:44 ` Alan
2007-01-03 13:32 ` Arjan van de Ven
2007-01-03 13:58 ` Jakub Jelinek
2007-01-03 14:28 ` Alan
2007-01-03 16:06 ` Linus Torvalds
2007-01-03 16:03 ` Linus Torvalds
2007-01-03 17:01 ` l.genoni
2007-01-03 17:45 ` Tim Schmielau
2007-01-03 20:24 ` Linus Torvalds
2007-01-03 17:06 ` l.genoni
2007-01-03 17:53 ` Mariusz Kozlowski
2007-01-03 19:47 ` Denis Vlasenko
2007-01-03 20:38 ` Linus Torvalds
2007-01-03 21:48 ` Denis Vlasenko
2007-01-03 22:13 ` Linus Torvalds
2007-01-03 21:44 ` Thomas Sailer
2007-01-03 22:08 ` Linus Torvalds
2007-01-04 3:08 ` Zou, Nanhai
2007-01-04 15:34 ` Linus Torvalds
2007-01-04 7:11 Albert Cahalan
2007-01-04 16:43 ` Segher Boessenkool
2007-01-04 17:04 ` Albert Cahalan
2007-01-04 17:24 ` Segher Boessenkool
2007-01-04 17:47 ` Linus Torvalds
2007-01-04 18:53 ` Segher Boessenkool
2007-01-04 19:10 ` Al Viro
2007-01-05 17:17 ` Pavel Machek
2007-01-06 8:23 ` Segher Boessenkool
2007-01-04 17:37 ` Linus Torvalds
2007-01-04 18:34 ` Segher Boessenkool
2007-01-04 22:02 ` Geert Bosch
2007-01-07 4:25 ` Denis Vlasenko
2007-01-07 4:45 ` Linus Torvalds
2007-01-07 5:26 ` Jeff Garzik
2007-01-07 15:10 ` Segher Boessenkool
2007-01-26 22:05 ` Michael K. Edwards
2007-01-04 18:08 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0701021640110.4473@woody.osdl.org \
--to=torvalds@osdl.org \
--cc=76306.1226@compuserve.com \
--cc=akpm@osdl.org \
--cc=bunk@stusta.de \
--cc=greg@kroah.com \
--cc=linux-kernel@vger.kernel.org \
--cc=s0348365@sms.ed.ac.uk \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).