All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH]: AMD Northbridge: Verify NB's node is online
@ 2009-11-12 18:09 Prarit Bhargava
  2009-11-14  0:58 ` Ingo Molnar
  2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
  0 siblings, 2 replies; 5+ messages in thread
From: Prarit Bhargava @ 2009-11-12 18:09 UTC (permalink / raw)
  To: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3, mingo
  Cc: Prarit Bhargava

Panic seen on some IBM and HP systems on 2.6.32-rc6.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
PGD 2735ba067 PUD 2735d5067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/platform/pcspkr/modalias
CPU 7 
Modules linked in: k8temp(+) pcspkr edac_core serio_raw hwmon shpchp cciss dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
Pid: 616, comm: modprobe Not tainted 2.6.32-rc6 #2 ProLiant DL585 G2   
RIP: 0010:[<ffffffff8120bf3f>]  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
RSP: 0018:ffff8802736fdd18  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8182f680 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
RBP: ffff8802736fdd18 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81d922e0 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffffa007e720 R14: 0000000000000001 R15: 00000000015b19e0
FS:  00007f0a474086f0(0000) GS:ffff880036400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000273cbb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 616, threadinfo ffff8802736fc000, task ffff8802743b5c00)
Stack:
 ffff8802736fdd38 ffffffff8120bbde ffff88027646b0d8 ffff88027646b168
<0> ffff8802736fdd88 ffffffff81225c62 ffffffffa007e720 ffff88027646b0d8
<0> ffffffffa007e930 ffffffff812b9be6 ffff88027646b168 ffffffffa007e780
Call Trace:
 [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
 [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
 [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
 [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
 [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
 [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
 [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
 [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
 [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
 [<ffffffff812ba1e7>] driver_register+0x98/0x109
 [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
 [<ffffffff81072776>] ? up_read+0x26/0x2a
 [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
 [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
 [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
 [<ffffffff8108d765>] sys_init_module+0xd3/0x236
 [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
Code: 49 83 c0 40 eb 14 49 8b 01 48 85 c0 75 39 49 83 c1 08 49 83 c0 40 48 83 ef 40 48 f7 c7 c0 ff ff ff 75 e3 48 85 ff 4c 89 c0 74 23 <49> 8b 01 b9 40 00 00 00 48 83 ca ff 29 f9 48 d3 ea 48 21 d0 75 
RIP  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
 RSP <ffff8802736fdd18>
CR2: 0000000000000000
---[ end trace a3d7e2941e8a6320 ]---

Hardware maybe programmed incorrectly and return a bogus node ID.  Check to
see if the node is actually online before setting the numa node for an AMD
northbridge in quirk_amd_nb_node().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6c3b2c6..9308ba7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -507,7 +507,8 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
 		return;
 
 	pci_read_config_dword(nb_ht, 0x60, &val);
-	set_dev_node(&dev->dev, val & 7);
+	if (node_online(val & 7))
+		set_dev_node(&dev->dev, val & 7);
 	pci_dev_put(nb_ht);
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
  2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
@ 2009-11-14  0:58 ` Ingo Molnar
  2009-11-16 13:39   ` Prarit Bhargava
  2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2009-11-14  0:58 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3


* Prarit Bhargava <prarit@redhat.com> wrote:

> Panic seen on some IBM and HP systems on 2.6.32-rc6.
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> PGD 2735ba067 PUD 2735d5067 PMD 0 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/platform/pcspkr/modalias
> CPU 7 
> Modules linked in: k8temp(+) pcspkr edac_core serio_raw hwmon shpchp cciss dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
> Pid: 616, comm: modprobe Not tainted 2.6.32-rc6 #2 ProLiant DL585 G2   
> RIP: 0010:[<ffffffff8120bf3f>]  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> RSP: 0018:ffff8802736fdd18  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffffff8182f680 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
> RBP: ffff8802736fdd18 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81d922e0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffffa007e720 R14: 0000000000000001 R15: 00000000015b19e0
> FS:  00007f0a474086f0(0000) GS:ffff880036400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000273cbb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process modprobe (pid: 616, threadinfo ffff8802736fc000, task ffff8802743b5c00)
> Stack:
>  ffff8802736fdd38 ffffffff8120bbde ffff88027646b0d8 ffff88027646b168
> <0> ffff8802736fdd88 ffffffff81225c62 ffffffffa007e720 ffff88027646b0d8
> <0> ffffffffa007e930 ffffffff812b9be6 ffff88027646b168 ffffffffa007e780
> Call Trace:
>  [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
>  [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
>  [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
>  [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
>  [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
>  [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
>  [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
>  [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
>  [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
>  [<ffffffff812ba1e7>] driver_register+0x98/0x109
>  [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
>  [<ffffffff81072776>] ? up_read+0x26/0x2a
>  [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
>  [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
>  [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
>  [<ffffffff8108d765>] sys_init_module+0xd3/0x236
>  [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
> Code: 49 83 c0 40 eb 14 49 8b 01 48 85 c0 75 39 49 83 c1 08 49 83 c0 40 48 83 ef 40 48 f7 c7 c0 ff ff ff 75 e3 48 85 ff 4c 89 c0 74 23 <49> 8b 01 b9 40 00 00 00 48 83 ca ff 29 f9 48 d3 ea 48 21 d0 75 
> RIP  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
>  RSP <ffff8802736fdd18>
> CR2: 0000000000000000
> ---[ end trace a3d7e2941e8a6320 ]---
> 
> Hardware maybe programmed incorrectly and return a bogus node ID.  
> Check to see if the node is actually online before setting the numa 
> node for an AMD northbridge in quirk_amd_nb_node().

Hm, could you stick a printk in there, what precise node ID does the 
hardware return?

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
  2009-11-14  0:58 ` Ingo Molnar
@ 2009-11-16 13:39   ` Prarit Bhargava
  2009-11-16 14:44     ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Prarit Bhargava @ 2009-11-16 13:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3


>>
>> Hardware maybe programmed incorrectly and return a bogus node ID.  
>> Check to see if the node is actually online before setting the numa 
>> node for an AMD northbridge in quirk_amd_nb_node().
>>     
>
> Hm, could you stick a printk in there, what precise node ID does the 
> hardware return?
>
>   

Ingo, yup -- I put in a printk and commented out the set_dev_node() call 
when debugging this
and got this output:

quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3

The issue appears to be that the HW has set val to a valid value, 
however, the system is only configured for a single node -- 0.

I realize that I'm working around broken HW ... but I think that a 
quirk, quirk_amd_nb_node(), should at least keep systems booting ...

P.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
  2009-11-16 13:39   ` Prarit Bhargava
@ 2009-11-16 14:44     ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2009-11-16 14:44 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3


* Prarit Bhargava <prarit@redhat.com> wrote:

> 
> >>
> >>Hardware maybe programmed incorrectly and return a bogus node
> >>ID.  Check to see if the node is actually online before setting
> >>the numa node for an AMD northbridge in quirk_amd_nb_node().
> >
> >Hm, could you stick a printk in there, what precise node ID does
> >the hardware return?
> >
> 
> Ingo, yup -- I put in a printk and commented out the set_dev_node()
> call when debugging this
> and got this output:
> 
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3
> 
> The issue appears to be that the HW has set val to a valid value, 
> however, the system is only configured for a single node -- 0.
> 
> I realize that I'm working around broken HW ... but I think that a 
> quirk, quirk_amd_nb_node(), should at least keep systems booting ...

Ok. I cleaned up the patch a bit and added a comment explaining the 
logic - and also expanded the changelog with your new debug data, and 
applied it to tip:x86/urgent. Please check the commit notification email 
whether it's all OK.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:x86/urgent] x86: AMD Northbridge: Verify NB's node is online
  2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
  2009-11-14  0:58 ` Ingo Molnar
@ 2009-11-16 16:10 ` tip-bot for Prarit Bhargava
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Prarit Bhargava @ 2009-11-16 16:10 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, prarit, mingo

Commit-ID:  303fc0870f8fbfabe260c5c32b18e53458d597ea
Gitweb:     http://git.kernel.org/tip/303fc0870f8fbfabe260c5c32b18e53458d597ea
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Thu, 12 Nov 2009 13:09:31 -0500
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 16 Nov 2009 15:43:05 +0100

x86: AMD Northbridge: Verify NB's node is online

Fix panic seen on some IBM and HP systems on 2.6.32-rc6:

 BUG: unable to handle kernel NULL pointer dereference at (null)
 IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
  [...]
  [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
  [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
  [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
  [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
  [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
  [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
  [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
  [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
  [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
  [<ffffffff812ba1e7>] driver_register+0x98/0x109
  [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
  [<ffffffff81072776>] ? up_read+0x26/0x2a
  [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
  [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
  [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
  [<ffffffff8108d765>] sys_init_module+0xd3/0x236
  [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b

I put in a printk and commented out the set_dev_node()
call when and got this output:

 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3

I.e. the issue appears to be that the HW has set val to a valid
value, however, the system is only configured for a single
node -- 0, the others are offline.

Check to see if the node is actually online before setting
the numa node for an AMD northbridge in quirk_amd_nb_node().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: bhavna.sarathy@amd.com
Cc: jbarnes@virtuousgeek.org
Cc: andreas.herrmann3@amd.com
LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com>
[ v2: clean up the code and add comments ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/quirks.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6c3b2c6..18093d7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -499,6 +499,7 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
 {
 	struct pci_dev *nb_ht;
 	unsigned int devfn;
+	u32 node;
 	u32 val;
 
 	devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
@@ -507,7 +508,13 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
 		return;
 
 	pci_read_config_dword(nb_ht, 0x60, &val);
-	set_dev_node(&dev->dev, val & 7);
+	node = val & 7;
+	/*
+	 * Some hardware may return an invalid node ID,
+	 * so check it first:
+	 */
+	if (node_online(node))
+		set_dev_node(&dev->dev, node);
 	pci_dev_put(nb_ht);
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-11-16 16:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
2009-11-14  0:58 ` Ingo Molnar
2009-11-16 13:39   ` Prarit Bhargava
2009-11-16 14:44     ` Ingo Molnar
2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.