linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
@ 2008-03-25 23:08 Christian Kujau
  2008-03-26  6:33 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Kujau @ 2008-03-25 23:08 UTC (permalink / raw)
  To: LKML

Hi,

2.6.25-rc6 is a strong beast :)
Another[0] BUG is printed and the box is still alive:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c0179114>] __d_lookup+0x94/0x150
*pde = 00000000 
Oops: 0000 [#1] 
Modules linked in: fuse sha256_generic xt_tcpudp ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ftp nf_nat nf_conntrack_ftp xt_conntrack nf_conntrack iptable_filter ip_tables ipt_ULOG x_tables nfsd lockd nfs_acl auth_rpcgss exportfs tun sunrpc twofish_i586 twofish_common eeprom w83l785ts asb100 hwmon_vid usb_storage zd1211rw firmware_class mac80211 snd_intel8x0 snd_ac97_codec i2c_nforce2 cfg80211 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_core [last unloaded: fuse]
Pid: 15705, comm: imap Not tainted (2.6.25-rc6 #5)
EIP: 0060:[<c0179114>] EFLAGS: 00010286 CPU: 0
EIP is at __d_lookup+0x94/0x150
EAX: 00000000 EBX: 0006bc44 ECX: 00000001 EDX: d60634e8
ESI: c2020a00 EDI: c56ebf30 EBP: c478ad6c ESP: c56ebd7c
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process imap (pid: 15705, ti=c56eb000 task=e153c000 task.ti=c56eb000)
Stack: 00000002 00000001 c0179080 f4826be0 00000246 c56ebe08 0000000b f66a800b
        d60634e8 c56ebe08 0000002f c56ebf30 c56ebe08 c016f388 c56ebe14 f7faff80
        c016ee97 01eb3b48 c56ebe08 0000002f c56ebe14 f66a8017 c0170a70 c56ebf30 
Call Trace:
  [<c0179080>] __d_lookup+0x0/0x150
  [<c016f388>] do_lookup+0x28/0x1a0
  [<c016ee97>] permission+0xb7/0x120
  [<c0170a70>] __link_path_walk+0x140/0xcd0
  [<c043f5e4>] _spin_unlock+0x14/0x20
  [<c02c3e1a>] _atomic_dec_and_lock+0x2a/0x40
  [<c0179855>] dput+0x65/0xf0
  [<c017163a>] link_path_walk+0x3a/0xa0
  [<c043f5e4>] _spin_unlock+0x14/0x20
  [<c01662bb>] get_unused_fd_flags+0xab/0xd0
  [<c017189e>] do_path_lookup+0x6e/0x180
  [<c0169088>] get_empty_filp+0xa8/0x120
  [<c01724b1>] __path_lookup_intent_open+0x51/0xa0
  [<c0172590>] path_lookup_open+0x20/0x30
  [<c0172686>] open_namei+0x66/0x5f0
  [<c01665ae>] do_filp_open+0x2e/0x60
  [<c043f5e4>] _spin_unlock+0x14/0x20
  [<c01662bb>] get_unused_fd_flags+0xab/0xd0
  [<c016662c>] do_sys_open+0x4c/0xe0
  [<c01666fc>] sys_open+0x1c/0x20
  [<c0102dee>] sysenter_past_esp+0x5f/0xa5
  =======================
Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db 
EIP: [<c0179114>] __d_lookup+0x94/0x150 SS:ESP 0068:c56ebd7c
---[ end trace 274145890e21aa9a ]---


I've put some more details (.config, dmesg, some sysrq printouts) on:
http://nerdbynature.de/bits/2.6.25-rc6/Oops_d_lookup/

Please tell me not to worry :)
Christian.

[0] http://lkml.org/lkml/2008/3/23/245
-- 
BOFH excuse #85:

Windows 95 undocumented "feature"

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-25 23:08 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference Christian Kujau
@ 2008-03-26  6:33 ` Andrew Morton
  2008-03-26 21:56   ` Rafael J. Wysocki
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrew Morton @ 2008-03-26  6:33 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, Markus Rehbach, Rafael J. Wysocki

On Wed, 26 Mar 2008 00:08:48 +0100 (CET) Christian Kujau <lists@nerdbynature.de> wrote:

> Hi,
> 
> 2.6.25-rc6 is a strong beast :)
> Another[0] BUG is printed and the box is still alive:
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<c0179114>] __d_lookup+0x94/0x150
> *pde = 00000000 
> Oops: 0000 [#1] 
> Modules linked in: fuse sha256_generic xt_tcpudp ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ftp nf_nat nf_conntrack_ftp xt_conntrack nf_conntrack iptable_filter ip_tables ipt_ULOG x_tables nfsd lockd nfs_acl auth_rpcgss exportfs tun sunrpc twofish_i586 twofish_common eeprom w83l785ts asb100 hwmon_vid usb_storage zd1211rw firmware_class mac80211 snd_intel8x0 snd_ac97_codec i2c_nforce2 cfg80211 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_core [last unloaded: fuse]
> Pid: 15705, comm: imap Not tainted (2.6.25-rc6 #5)
> EIP: 0060:[<c0179114>] EFLAGS: 00010286 CPU: 0
> EIP is at __d_lookup+0x94/0x150
> EAX: 00000000 EBX: 0006bc44 ECX: 00000001 EDX: d60634e8
> ESI: c2020a00 EDI: c56ebf30 EBP: c478ad6c ESP: c56ebd7c
>   DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process imap (pid: 15705, ti=c56eb000 task=e153c000 task.ti=c56eb000)
> Stack: 00000002 00000001 c0179080 f4826be0 00000246 c56ebe08 0000000b f66a800b
>         d60634e8 c56ebe08 0000002f c56ebf30 c56ebe08 c016f388 c56ebe14 f7faff80
>         c016ee97 01eb3b48 c56ebe08 0000002f c56ebe14 f66a8017 c0170a70 c56ebf30 
> Call Trace:
>   [<c0179080>] __d_lookup+0x0/0x150
>   [<c016f388>] do_lookup+0x28/0x1a0
>   [<c016ee97>] permission+0xb7/0x120
>   [<c0170a70>] __link_path_walk+0x140/0xcd0
>   [<c043f5e4>] _spin_unlock+0x14/0x20
>   [<c02c3e1a>] _atomic_dec_and_lock+0x2a/0x40
>   [<c0179855>] dput+0x65/0xf0
>   [<c017163a>] link_path_walk+0x3a/0xa0
>   [<c043f5e4>] _spin_unlock+0x14/0x20
>   [<c01662bb>] get_unused_fd_flags+0xab/0xd0
>   [<c017189e>] do_path_lookup+0x6e/0x180
>   [<c0169088>] get_empty_filp+0xa8/0x120
>   [<c01724b1>] __path_lookup_intent_open+0x51/0xa0
>   [<c0172590>] path_lookup_open+0x20/0x30
>   [<c0172686>] open_namei+0x66/0x5f0
>   [<c01665ae>] do_filp_open+0x2e/0x60
>   [<c043f5e4>] _spin_unlock+0x14/0x20
>   [<c01662bb>] get_unused_fd_flags+0xab/0xd0
>   [<c016662c>] do_sys_open+0x4c/0xe0
>   [<c01666fc>] sys_open+0x1c/0x20
>   [<c0102dee>] sysenter_past_esp+0x5f/0xa5
>   =======================
> Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db 
> EIP: [<c0179114>] __d_lookup+0x94/0x150 SS:ESP 0068:c56ebd7c
> ---[ end trace 274145890e21aa9a ]---
> 
> 
> I've put some more details (.config, dmesg, some sysrq printouts) on:
> http://nerdbynature.de/bits/2.6.25-rc6/Oops_d_lookup/
> 
> Please tell me not to worry :)
> Christian.
> 
> [0] http://lkml.org/lkml/2008/3/23/245

Markus reported what looks to be the same thing here:
http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.

I guess you've confirmed that this wasn't a mystery
once-off-on-that-machine.

I can't think what we did to cause this.  Were you doing anything unusual
on that machine?  I see the fuse module was loaded - was it being used? 
Were any oddball (ie: non-ext3 ;)) filesystems being used?  etc.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-26  6:33 ` Andrew Morton
@ 2008-03-26 21:56   ` Rafael J. Wysocki
  2008-03-26 23:57   ` Christian Kujau
  2008-03-27 15:20   ` Thomas Gleixner
  2 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2008-03-26 21:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christian Kujau, LKML, Markus Rehbach

On Wednesday, 26 of March 2008, Andrew Morton wrote:
> On Wed, 26 Mar 2008 00:08:48 +0100 (CET) Christian Kujau <lists@nerdbynature.de> wrote:
> 
> > Hi,
> > 
> > 2.6.25-rc6 is a strong beast :)
> > Another[0] BUG is printed and the box is still alive:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 00000000
> > IP: [<c0179114>] __d_lookup+0x94/0x150
> > *pde = 00000000 
> > Oops: 0000 [#1] 
> > Modules linked in: fuse sha256_generic xt_tcpudp ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ftp nf_nat nf_conntrack_ftp xt_conntrack nf_conntrack iptable_filter ip_tables ipt_ULOG x_tables nfsd lockd nfs_acl auth_rpcgss exportfs tun sunrpc twofish_i586 twofish_common eeprom w83l785ts asb100 hwmon_vid usb_storage zd1211rw firmware_class mac80211 snd_intel8x0 snd_ac97_codec i2c_nforce2 cfg80211 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_core [last unloaded: fuse]
> > Pid: 15705, comm: imap Not tainted (2.6.25-rc6 #5)
> > EIP: 0060:[<c0179114>] EFLAGS: 00010286 CPU: 0
> > EIP is at __d_lookup+0x94/0x150
> > EAX: 00000000 EBX: 0006bc44 ECX: 00000001 EDX: d60634e8
> > ESI: c2020a00 EDI: c56ebf30 EBP: c478ad6c ESP: c56ebd7c
> >   DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> > Process imap (pid: 15705, ti=c56eb000 task=e153c000 task.ti=c56eb000)
> > Stack: 00000002 00000001 c0179080 f4826be0 00000246 c56ebe08 0000000b f66a800b
> >         d60634e8 c56ebe08 0000002f c56ebf30 c56ebe08 c016f388 c56ebe14 f7faff80
> >         c016ee97 01eb3b48 c56ebe08 0000002f c56ebe14 f66a8017 c0170a70 c56ebf30 
> > Call Trace:
> >   [<c0179080>] __d_lookup+0x0/0x150
> >   [<c016f388>] do_lookup+0x28/0x1a0
> >   [<c016ee97>] permission+0xb7/0x120
> >   [<c0170a70>] __link_path_walk+0x140/0xcd0
> >   [<c043f5e4>] _spin_unlock+0x14/0x20
> >   [<c02c3e1a>] _atomic_dec_and_lock+0x2a/0x40
> >   [<c0179855>] dput+0x65/0xf0
> >   [<c017163a>] link_path_walk+0x3a/0xa0
> >   [<c043f5e4>] _spin_unlock+0x14/0x20
> >   [<c01662bb>] get_unused_fd_flags+0xab/0xd0
> >   [<c017189e>] do_path_lookup+0x6e/0x180
> >   [<c0169088>] get_empty_filp+0xa8/0x120
> >   [<c01724b1>] __path_lookup_intent_open+0x51/0xa0
> >   [<c0172590>] path_lookup_open+0x20/0x30
> >   [<c0172686>] open_namei+0x66/0x5f0
> >   [<c01665ae>] do_filp_open+0x2e/0x60
> >   [<c043f5e4>] _spin_unlock+0x14/0x20
> >   [<c01662bb>] get_unused_fd_flags+0xab/0xd0
> >   [<c016662c>] do_sys_open+0x4c/0xe0
> >   [<c01666fc>] sys_open+0x1c/0x20
> >   [<c0102dee>] sysenter_past_esp+0x5f/0xa5
> >   =======================
> > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db 
> > EIP: [<c0179114>] __d_lookup+0x94/0x150 SS:ESP 0068:c56ebd7c
> > ---[ end trace 274145890e21aa9a ]---
> > 
> > 
> > I've put some more details (.config, dmesg, some sysrq printouts) on:
> > http://nerdbynature.de/bits/2.6.25-rc6/Oops_d_lookup/
> > 
> > Please tell me not to worry :)
> > Christian.
> > 
> > [0] http://lkml.org/lkml/2008/3/23/245
> 
> Markus reported what looks to be the same thing here:
> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.
> 
> I guess you've confirmed that this wasn't a mystery
> once-off-on-that-machine.
> 
> I can't think what we did to cause this.  Were you doing anything unusual
> on that machine?  I see the fuse module was loaded - was it being used? 
> Were any oddball (ie: non-ext3 ;)) filesystems being used?  etc.

Well, we seem to get mm-related traces on x86-32 at random places.

http://www.ussg.iu.edu/hypermail/linux/kernel/0803.3/0782.html for example.

I'm starting to think there's some arch-related mm issue lurking in there.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-26  6:33 ` Andrew Morton
  2008-03-26 21:56   ` Rafael J. Wysocki
@ 2008-03-26 23:57   ` Christian Kujau
  2008-03-27 15:20   ` Thomas Gleixner
  2 siblings, 0 replies; 12+ messages in thread
From: Christian Kujau @ 2008-03-26 23:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Markus Rehbach, Rafael J. Wysocki

On Tue, 25 Mar 2008, Andrew Morton wrote:
> Markus reported what looks to be the same thing here:
> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.

Yes, I've found 3 more reports for __d_lookup on kerneloops.org, first 
seen for 2.6.25-rc5-git5.

> I can't think what we did to cause this.  Were you doing anything unusual
> on that machine?

Well, I was reading mail...and suddenly alpine complained that the imap 
server was gone - and indeed "imap" was in the Oops message. But apart 
from that, nothing exotic going on.

>  I see the fuse module was loaded - was it being used?

No, it's loaded, but it was not in use.

> Were any oddball (ie: non-ext3 ;)) filesystems being used?  etc.

There's ext2/3/4, jfs, xfs, reiserfs (not reiser4) - the whole family.
The only oddball coming to mind is zd1211rw with its binary firmware. But 
no SMP, no ACPI, no preempt...

Christian.
-- 
BOFH excuse #90:

Budget cuts

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-26  6:33 ` Andrew Morton
  2008-03-26 21:56   ` Rafael J. Wysocki
  2008-03-26 23:57   ` Christian Kujau
@ 2008-03-27 15:20   ` Thomas Gleixner
  2008-03-27 15:26     ` Ingo Molnar
                       ` (3 more replies)
  2 siblings, 4 replies; 12+ messages in thread
From: Thomas Gleixner @ 2008-03-27 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Kujau, LKML, Markus Rehbach, Rafael J. Wysocki, Ingo Molnar

On Tue, 25 Mar 2008, Andrew Morton wrote:
> > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db 

It faults in a prefetch.

> Markus reported what looks to be the same thing here:
> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.

Same here. And both are AMD X2 early stepping machines.
 
> I guess you've confirmed that this wasn't a mystery
> once-off-on-that-machine.
> 
> I can't think what we did to cause this.

I had a lengthy bug decoding session with Ingo and we found the root
cause:

A dropped workaround for the prefetch bug in early X2s and
Opterons. Patch below.

Thanks,

	tglx

--------------->
Subject: x86: fix prefetch workaround
From: Ingo Molnar <mingo@elte.hu>
Date: Thu Mar 27 15:58:28 CET 2008

some early Athlon XP's and Opterons generate bogus faults on prefetch
instructions. The workaround for this regressed over .24 - reinstate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/mm/fault.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-x86.q/arch/x86/mm/fault.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/fault.c
+++ linux-x86.q/arch/x86/mm/fault.c
@@ -104,7 +104,8 @@ static int is_prefetch(struct pt_regs *r
 	unsigned char *max_instr;
 
 #ifdef CONFIG_X86_32
-	if (!(__supported_pte_mask & _PAGE_NX))
+	/* Catch an obscure case of prefetch inside an NX page: */
+	if ((__supported_pte_mask & _PAGE_NX) && (error_code & 16))
 		return 0;
 #endif
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-27 15:20   ` Thomas Gleixner
@ 2008-03-27 15:26     ` Ingo Molnar
  2008-03-27 18:30     ` Markus Rehbach
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-03-27 15:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Christian Kujau, LKML, Markus Rehbach, Rafael J. Wysocki


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, 25 Mar 2008, Andrew Morton wrote:
> > > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db 
> 
> It faults in a prefetch.
> 
> > Markus reported what looks to be the same thing here:
> > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.
> 
> Same here. And both are AMD X2 early stepping machines.
>  
> > I guess you've confirmed that this wasn't a mystery
> > once-off-on-that-machine.
> > 
> > I can't think what we did to cause this.
> 
> I had a lengthy bug decoding session with Ingo and we found the root
> cause:
> 
> A dropped workaround for the prefetch bug in early X2s and
> Opterons. Patch below.

can also be tested by picking up x86.git/latest, which has this patch 
included:

   http://people.redhat.com/mingo/x86.git/README

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-27 15:20   ` Thomas Gleixner
  2008-03-27 15:26     ` Ingo Molnar
@ 2008-03-27 18:30     ` Markus Rehbach
  2008-03-27 19:26       ` Thomas Gleixner
  2008-03-27 23:50     ` Björn Steinbrink
  2008-03-28  1:46     ` Christian Kujau
  3 siblings, 1 reply; 12+ messages in thread
From: Markus Rehbach @ 2008-03-27 18:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Christian Kujau, LKML, Rafael J. Wysocki, Ingo Molnar

Thomas Gleixner schrieb:
 > On Tue, 25 Mar 2008, Andrew Morton wrote:

 >> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.
 >
 > Same here. And both are AMD X2 early stepping machines.

 > A dropped workaround for the prefetch bug in early X2s and
 > Opterons. Patch below.

The patch cures it. Tested with rc5-git5, and it was 100%
reproducible here.

Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-27 18:30     ` Markus Rehbach
@ 2008-03-27 19:26       ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2008-03-27 19:26 UTC (permalink / raw)
  To: Markus Rehbach
  Cc: Andrew Morton, Christian Kujau, LKML, Rafael J. Wysocki, Ingo Molnar

On Thu, 27 Mar 2008, Markus Rehbach wrote:
> Thomas Gleixner schrieb:
> > On Tue, 25 Mar 2008, Andrew Morton wrote:
> 
> >> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.
> >
> > Same here. And both are AMD X2 early stepping machines.
> 
> > A dropped workaround for the prefetch bug in early X2s and
> > Opterons. Patch below.
> 
> The patch cures it. Tested with rc5-git5, and it was 100%
> reproducible here.

Thanks for testing. Fix is queued for Linus.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-27 15:20   ` Thomas Gleixner
  2008-03-27 15:26     ` Ingo Molnar
  2008-03-27 18:30     ` Markus Rehbach
@ 2008-03-27 23:50     ` Björn Steinbrink
  2008-03-28  8:50       ` Christian Kujau
  2008-03-28  1:46     ` Christian Kujau
  3 siblings, 1 reply; 12+ messages in thread
From: Björn Steinbrink @ 2008-03-27 23:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Christian Kujau, LKML, Markus Rehbach,
	Rafael J. Wysocki, Ingo Molnar

On 2008.03.27 16:20:53 +0100, Thomas Gleixner wrote:
> On Tue, 25 Mar 2008, Andrew Morton wrote:
> > > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db 
> 
> It faults in a prefetch.
> 
> > Markus reported what looks to be the same thing here:
> > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list.
> 
> Same here. And both are AMD X2 early stepping machines.
>  
> > I guess you've confirmed that this wasn't a mystery
> > once-off-on-that-machine.
> > 
> > I can't think what we did to cause this.
> 
> I had a lengthy bug decoding session with Ingo and we found the root
> cause:
> 
> A dropped workaround for the prefetch bug in early X2s and
> Opterons. Patch below.
> 
> Thanks,
> 
> 	tglx
> 
> --------------->
> Subject: x86: fix prefetch workaround
> From: Ingo Molnar <mingo@elte.hu>
> Date: Thu Mar 27 15:58:28 CET 2008
> 
> some early Athlon XP's and Opterons generate bogus faults on prefetch
                    ^^

Umh, XP? Didn't you say X2 above? And looking at the patch, X2 seems
more plausible as well, I don't think that the XP supported the NX bit,
did it?

Björn

> instructions. The workaround for this regressed over .24 - reinstate it.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  arch/x86/mm/fault.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-x86.q/arch/x86/mm/fault.c
> ===================================================================
> --- linux-x86.q.orig/arch/x86/mm/fault.c
> +++ linux-x86.q/arch/x86/mm/fault.c
> @@ -104,7 +104,8 @@ static int is_prefetch(struct pt_regs *r
>  	unsigned char *max_instr;
>  
>  #ifdef CONFIG_X86_32
> -	if (!(__supported_pte_mask & _PAGE_NX))
> +	/* Catch an obscure case of prefetch inside an NX page: */
> +	if ((__supported_pte_mask & _PAGE_NX) && (error_code & 16))
>  		return 0;
>  #endif
>  
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-27 15:20   ` Thomas Gleixner
                       ` (2 preceding siblings ...)
  2008-03-27 23:50     ` Björn Steinbrink
@ 2008-03-28  1:46     ` Christian Kujau
  2008-03-28 10:25       ` Ingo Molnar
  3 siblings, 1 reply; 12+ messages in thread
From: Christian Kujau @ 2008-03-28  1:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, LKML, Markus Rehbach, Rafael J. Wysocki, Ingo Molnar

On Thu, 27 Mar 2008, Thomas Gleixner wrote:
> I had a lengthy bug decoding session with Ingo and we found the root
> cause:
> A dropped workaround for the prefetch bug in early X2s and
> Opterons. Patch below.

Although I reported it, I could not repoduce the bug. Anyway, I've applied 
your patch to -rc7 and no BUG so far :)

Thanks!
Christian.
-- 
BOFH excuse #385:

Dyslexics retyping hosts file on servers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-27 23:50     ` Björn Steinbrink
@ 2008-03-28  8:50       ` Christian Kujau
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Kujau @ 2008-03-28  8:50 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Thomas Gleixner, Andrew Morton, LKML, Markus Rehbach,
	Rafael J. Wysocki, Ingo Molnar

On Fri, 28 Mar 2008, Björn Steinbrink wrote:
>> Subject: x86: fix prefetch workaround
>> From: Ingo Molnar <mingo@elte.hu>
>> Date: Thu Mar 27 15:58:28 CET 2008
>>
>> some early Athlon XP's and Opterons generate bogus faults on prefetch
>                    ^^
>
> Umh, XP? Didn't you say X2 above? And looking at the patch, X2 seems
> more plausible as well, I don't think that the XP supported the NX bit,
> did it?

Hm, would be a shame because I have an XP 2600+. /proc/cpuinfo tells me:

flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr mca cmov pat pse36 
mmx fxsr sse syscall mmxext 3dnowext 3dnow ts

...no NX in there. I wonder why this (already applied) patch should do 
anything on my box at all.

Thanks,
C.
-- 
BOFH excuse #216:

What office are you in? Oh, that one.  Did you know that your building was built over the universities first nuclear research site? And wow, aren't you the lucky one, your office is right over where the core is buried!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
  2008-03-28  1:46     ` Christian Kujau
@ 2008-03-28 10:25       ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-03-28 10:25 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Thomas Gleixner, Andrew Morton, LKML, Markus Rehbach, Rafael J. Wysocki


* Christian Kujau <lists@nerdbynature.de> wrote:

> On Thu, 27 Mar 2008, Thomas Gleixner wrote:
>> I had a lengthy bug decoding session with Ingo and we found the root
>> cause:
>> A dropped workaround for the prefetch bug in early X2s and
>> Opterons. Patch below.
>
> Although I reported it, I could not repoduce the bug. Anyway, I've 
> applied your patch to -rc7 and no BUG so far :)

yeah, the condition would normally be very sporadic and it can easily 
depend on a specific layout of your kernel image, etc.

the (updated) fix is in Linus' latest git tree as well, and in 
x86.git/latest.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-03-28 10:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-25 23:08 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference Christian Kujau
2008-03-26  6:33 ` Andrew Morton
2008-03-26 21:56   ` Rafael J. Wysocki
2008-03-26 23:57   ` Christian Kujau
2008-03-27 15:20   ` Thomas Gleixner
2008-03-27 15:26     ` Ingo Molnar
2008-03-27 18:30     ` Markus Rehbach
2008-03-27 19:26       ` Thomas Gleixner
2008-03-27 23:50     ` Björn Steinbrink
2008-03-28  8:50       ` Christian Kujau
2008-03-28  1:46     ` Christian Kujau
2008-03-28 10:25       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).