linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
@ 2006-10-02 20:10 Martin Bligh
  2006-10-02 20:39 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Bligh @ 2006-10-02 20:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Andy Whitcroft

Panics on boot.

http://test.kernel.org/abat/50728/debug/console.log

Unable to handle kernel NULL pointer dereference at 0000000000000500 RIP:
  [<ffffffff803fa9af>] mptspi_dv_renegotiate_work+0x10/0x4a
PGD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 14, comm: events/0 Not tainted 2.6.18-mm2-autokern1 #1
RIP: 0010:[<ffffffff803fa9af>]  [<ffffffff803fa9af>] 
mptspi_dv_renegotiate_work+0x10/0x4a
RSP: 0000:ffff8101000e1e20  EFLAGS: 00010286
RAX: 0000000000000001 RBX: ffff810001fea8c0 RCX: 000000000000001f
RDX: 0000000000000000 RSI: ffff810001fea8c0 RDI: 0000000000001fea
RBP: ffff8101000e1e30 R08: ffff8101000e0000 R09: 0000000000000011
R10: ffff810001014820 R11: ffff810001014820 R12: 0000000000000500
R13: ffff810001ef1640 R14: 0000000000000202 R15: ffff810001fea8c0
FS:  0000000000000000(0000) GS:ffffffff80582000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000500 CR3: 0000000000201000 CR4: 00000000000006e0
Process events/0 (pid: 14, threadinfo ffff8101000e0000, task 
ffff8100816b1040)
Stack:  ffff810001fea8c0 ffff810001fea8c8 ffff8101000e1e70 ffffffff802387e3
  ffffffff803fa99f ffff810001ef1640 ffff810001f0dd40 ffffffff80238827
  00000000fffffffc ffffffff804b0298 ffff8101000e1f00 ffffffff8023891a
Call Trace:
  [<ffffffff802387e3>] run_workqueue+0xa2/0xe6
  [<ffffffff803fa99f>] mptspi_dv_renegotiate_work+0x0/0x4a
  [<ffffffff80238827>] worker_thread+0x0/0x126
  [<ffffffff8023891a>] worker_thread+0xf3/0x126
  [<ffffffff80224498>] default_wake_function+0x0/0xf
  [<ffffffff80224498>] default_wake_function+0x0/0xf
  [<ffffffff80238827>] worker_thread+0x0/0x126
  [<ffffffff8023b984>] kthread+0xd0/0xfc
  [<ffffffff8020a658>] child_rip+0xa/0x12
  [<ffffffff8023b8b4>] kthread+0x0/0xfc

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
  2006-10-02 20:10 Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2 Martin Bligh
@ 2006-10-02 20:39 ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2006-10-02 20:39 UTC (permalink / raw)
  To: Martin Bligh; +Cc: LKML, Andy Whitcroft, Moore, Eric Dean, linux-scsi

On Mon, 02 Oct 2006 13:10:26 -0700
Martin Bligh <mbligh@google.com> wrote:

> Panics on boot.
> 
> http://test.kernel.org/abat/50728/debug/console.log
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000500 RIP:
>   [<ffffffff803fa9af>] mptspi_dv_renegotiate_work+0x10/0x4a
> PGD 0
> Oops: 0000 [1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
> Pid: 14, comm: events/0 Not tainted 2.6.18-mm2-autokern1 #1
> RIP: 0010:[<ffffffff803fa9af>]  [<ffffffff803fa9af>] 
> mptspi_dv_renegotiate_work+0x10/0x4a
> RSP: 0000:ffff8101000e1e20  EFLAGS: 00010286
> RAX: 0000000000000001 RBX: ffff810001fea8c0 RCX: 000000000000001f
> RDX: 0000000000000000 RSI: ffff810001fea8c0 RDI: 0000000000001fea
> RBP: ffff8101000e1e30 R08: ffff8101000e0000 R09: 0000000000000011
> R10: ffff810001014820 R11: ffff810001014820 R12: 0000000000000500
> R13: ffff810001ef1640 R14: 0000000000000202 R15: ffff810001fea8c0
> FS:  0000000000000000(0000) GS:ffffffff80582000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000500 CR3: 0000000000201000 CR4: 00000000000006e0
> Process events/0 (pid: 14, threadinfo ffff8101000e0000, task 
> ffff8100816b1040)
> Stack:  ffff810001fea8c0 ffff810001fea8c8 ffff8101000e1e70 ffffffff802387e3
>   ffffffff803fa99f ffff810001ef1640 ffff810001f0dd40 ffffffff80238827
>   00000000fffffffc ffffffff804b0298 ffff8101000e1f00 ffffffff8023891a
> Call Trace:
>   [<ffffffff802387e3>] run_workqueue+0xa2/0xe6
>   [<ffffffff803fa99f>] mptspi_dv_renegotiate_work+0x0/0x4a
>   [<ffffffff80238827>] worker_thread+0x0/0x126
>   [<ffffffff8023891a>] worker_thread+0xf3/0x126
>   [<ffffffff80224498>] default_wake_function+0x0/0xf
>   [<ffffffff80224498>] default_wake_function+0x0/0xf
>   [<ffffffff80238827>] worker_thread+0x0/0x126
>   [<ffffffff8023b984>] kthread+0xd0/0xfc
>   [<ffffffff8020a658>] child_rip+0xa/0x12
>   [<ffffffff8023b8b4>] kthread+0x0/0xfc

Yeah, Bryce@osdl is hitting this.  Apparently it can be worked around
by compiling the driver as a module.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
  2006-10-03  0:41     ` Andrew Morton
  2006-10-03  0:51       ` Jeff Garzik
@ 2006-10-03  1:35       ` Jeff Garzik
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2006-10-03  1:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Moore, Eric, Martin Bligh, LKML, Andy Whitcroft, linux-scsi

Andrew Morton wrote:
> On Mon, 02 Oct 2006 20:32:13 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> 
>> FWIW, I am seeing precisely this problem, in the latest -git.
> 
> I just sent this to Linus.  Fingers crossed, it'll fix...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> 54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various people's machines
> to fail to map PCI resources.
> 
> Revert it in preparation for addressing the show-APICs-in-/proc/iomem
> requirement in a different manner.
> 
> Cc: Aaron Durbin <adurbin@google.com>
> Cc: Andi Kleen <ak@muc.de>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>

ACK, this fixes sata_mv timeouts and mptsas oopsen here.

FWIW, both sata_mv and mptsas are only accessible on this machine after 
applying my PCI domains patchset.

	Jeff



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
  2006-10-03  0:41     ` Andrew Morton
@ 2006-10-03  0:51       ` Jeff Garzik
  2006-10-03  1:35       ` Jeff Garzik
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2006-10-03  0:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Moore, Eric, Martin Bligh, LKML, Andy Whitcroft, linux-scsi

Andrew Morton wrote:
> On Mon, 02 Oct 2006 20:32:13 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> 
>> FWIW, I am seeing precisely this problem, in the latest -git.
> 
> I just sent this to Linus.  Fingers crossed, it'll fix...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> 54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various people's machines
> to fail to map PCI resources.
> 
> Revert it in preparation for addressing the show-APICs-in-/proc/iomem
> requirement in a different manner.
> 
> Cc: Aaron Durbin <adurbin@google.com>
> Cc: Andi Kleen <ak@muc.de>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>

I'll give it a good test.  My sata_mv (requires PCI domains) also died 
with a bunch of timeouts.  Lack of interrupts, or lack of PCI resources, 
is definitely indicative of a cause.

	Jeff




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
  2006-10-03  0:32   ` Jeff Garzik
@ 2006-10-03  0:41     ` Andrew Morton
  2006-10-03  0:51       ` Jeff Garzik
  2006-10-03  1:35       ` Jeff Garzik
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2006-10-03  0:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Moore, Eric, Martin Bligh, LKML, Andy Whitcroft, linux-scsi

On Mon, 02 Oct 2006 20:32:13 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> FWIW, I am seeing precisely this problem, in the latest -git.

I just sent this to Linus.  Fingers crossed, it'll fix...

From: Andrew Morton <akpm@osdl.org>

54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various people's machines
to fail to map PCI resources.

Revert it in preparation for addressing the show-APICs-in-/proc/iomem
requirement in a different manner.

Cc: Aaron Durbin <adurbin@google.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 arch/x86_64/kernel/apic.c |   54 ------------------------------------
 1 file changed, 54 deletions(-)

diff -puN arch/x86_64/kernel/apic.c~revert-insert-ioapics-and-local-apic-into-resource-map arch/x86_64/kernel/apic.c
--- a/arch/x86_64/kernel/apic.c~revert-insert-ioapics-and-local-apic-into-resource-map
+++ a/arch/x86_64/kernel/apic.c
@@ -25,7 +25,6 @@
 #include <linux/kernel_stat.h>
 #include <linux/sysdev.h>
 #include <linux/module.h>
-#include <linux/ioport.h>
 
 #include <asm/atomic.h>
 #include <asm/smp.h>
@@ -46,11 +45,6 @@ int apic_calibrate_pmtmr __initdata;
 
 int disable_apic_timer __initdata;
 
-static struct resource lapic_resource = {
-	.name = "Local APIC",
-	.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
-};
-
 /*
  * cpu_mask that denotes the CPUs that needs timer interrupt coming in as
  * IPIs in place of local APIC timers
@@ -591,40 +585,6 @@ static int __init detect_init_APIC (void
 	return 0;
 }
 
-#ifdef CONFIG_X86_IO_APIC
-static struct resource * __init ioapic_setup_resources(void)
-{
-#define IOAPIC_RESOURCE_NAME_SIZE 11
-	unsigned long n;
-	struct resource *res;
-	char *mem;
-	int i;
-
-	if (nr_ioapics <= 0)
-		return NULL;
-
-	n = IOAPIC_RESOURCE_NAME_SIZE + sizeof(struct resource);
-	n *= nr_ioapics;
-
-	res = alloc_bootmem(n);
-
-	if (!res)
-		return NULL;
-
-	memset(res, 0, n);
-	mem = (void *)&res[nr_ioapics];
-
-	for (i = 0; i < nr_ioapics; i++) {
-		res[i].name = mem;
-		res[i].flags = IORESOURCE_MEM | IORESOURCE_BUSY;
-		snprintf(mem, IOAPIC_RESOURCE_NAME_SIZE, "IOAPIC %u", i);
-		mem += IOAPIC_RESOURCE_NAME_SIZE;
-	}
-
-	return res;
-}
-#endif
-
 void __init init_apic_mappings(void)
 {
 	unsigned long apic_phys;
@@ -644,11 +604,6 @@ void __init init_apic_mappings(void)
 	apic_mapped = 1;
 	apic_printk(APIC_VERBOSE,"mapped APIC to %16lx (%16lx)\n", APIC_BASE, apic_phys);
 
-	/* Put local APIC into the resource map. */
-	lapic_resource.start = apic_phys;
-	lapic_resource.end = lapic_resource.start + PAGE_SIZE - 1;
-	insert_resource(&iomem_resource, &lapic_resource);
-
 	/*
 	 * Fetch the APIC ID of the BSP in case we have a
 	 * default configuration (or the MP table is broken).
@@ -658,9 +613,7 @@ void __init init_apic_mappings(void)
 	{
 		unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
 		int i;
-		struct resource *ioapic_res;
 
-		ioapic_res = ioapic_setup_resources();
 		for (i = 0; i < nr_ioapics; i++) {
 			if (smp_found_config) {
 				ioapic_phys = mp_ioapics[i].mpc_apicaddr;
@@ -672,13 +625,6 @@ void __init init_apic_mappings(void)
 			apic_printk(APIC_VERBOSE,"mapped IOAPIC to %016lx (%016lx)\n",
 					__fix_to_virt(idx), ioapic_phys);
 			idx++;
-
-			if (ioapic_res) {
-				ioapic_res->start = ioapic_phys;
-				ioapic_res->end = ioapic_phys + (4 * 1024) - 1;
-				insert_resource(&iomem_resource, ioapic_res);
-				ioapic_res++;
-			}
 		}
 	}
 }
_


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
  2006-10-02 23:37 ` Andrew Morton
@ 2006-10-03  0:32   ` Jeff Garzik
  2006-10-03  0:41     ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2006-10-03  0:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Moore, Eric, Martin Bligh, LKML, Andy Whitcroft, linux-scsi

Andrew Morton wrote:
> On Mon, 2 Oct 2006 17:21:08 -0600
> "Moore, Eric" <Eric.Moore@lsil.com> wrote:
> 
>> On Monday, October 02, 2006 2:40 PM, Andrew Morton wrote: 
>>
>>> Yeah, Bryce@osdl is hitting this.  Apparently it can be worked around
>>> by compiling the driver as a module.
>>>
>> What I saw in Bryces trace was the driver was not receiving interrupts
>> for
>> the first command sent after interrutps were enabled.  This was a config
>> page
>> for spi port pages.  Since this command timed out, an internal timeout
>> handler was called,
>> and we issued an internal host reset.  The host reset called each
>> driver,
>> such as mptspi, mptfc, mptsas,  callback handers.  That ended with
>> as pacin in mptspi, due to we assume ioc->hd to be a valid pointer.  
>> We don't allocate ioc->hd to well after mpt_attach, which is where the
>> config
>> page that timed out.    We could prevent the panic in mptspi, but that 
>> doesn't fix the problem why we are not getting interrupts.   
>>
>> I have a 2.6.18 gold kernel, and that works fine with modules.  
>> There are no changes in mpt stack since 2.6.18 that would effect
>> interrupts.  
>> Do you know of any changes in kernel effecting interrupts?   I suspect
>> that
>> modules versus linked drivers into kernel would matter, or would it?
> 
> There are lots and lots of interrupt changes, some now in mainline, some
> not.
> 
> There's a known-problematic PCI resource allocation bug now in mainline
> too.  It appears that this can cause devices to not get assigned an
> interrupt.
> 
> So yes, this is probably the trigger.  But as a secondary thing, it appears
> that the driver will crash if something goes wrong with the interrupt
> setup?

FWIW, I am seeing precisely this problem, in the latest -git.

	Jeff




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
  2006-10-02 23:21 Moore, Eric
@ 2006-10-02 23:37 ` Andrew Morton
  2006-10-03  0:32   ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2006-10-02 23:37 UTC (permalink / raw)
  To: Moore, Eric; +Cc: Martin Bligh, LKML, Andy Whitcroft, linux-scsi

On Mon, 2 Oct 2006 17:21:08 -0600
"Moore, Eric" <Eric.Moore@lsil.com> wrote:

> On Monday, October 02, 2006 2:40 PM, Andrew Morton wrote: 
> 
> > 
> > Yeah, Bryce@osdl is hitting this.  Apparently it can be worked around
> > by compiling the driver as a module.
> >
> 
> What I saw in Bryces trace was the driver was not receiving interrupts
> for
> the first command sent after interrutps were enabled.  This was a config
> page
> for spi port pages.  Since this command timed out, an internal timeout
> handler was called,
> and we issued an internal host reset.  The host reset called each
> driver,
> such as mptspi, mptfc, mptsas,  callback handers.  That ended with
> as pacin in mptspi, due to we assume ioc->hd to be a valid pointer.  
> We don't allocate ioc->hd to well after mpt_attach, which is where the
> config
> page that timed out.    We could prevent the panic in mptspi, but that 
> doesn't fix the problem why we are not getting interrupts.   
> 
> I have a 2.6.18 gold kernel, and that works fine with modules.  
> There are no changes in mpt stack since 2.6.18 that would effect
> interrupts.  
> Do you know of any changes in kernel effecting interrupts?   I suspect
> that
> modules versus linked drivers into kernel would matter, or would it?

There are lots and lots of interrupt changes, some now in mainline, some
not.

There's a known-problematic PCI resource allocation bug now in mainline
too.  It appears that this can cause devices to not get assigned an
interrupt.

So yes, this is probably the trigger.  But as a secondary thing, it appears
that the driver will crash if something goes wrong with the interrupt
setup?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2
@ 2006-10-02 23:21 Moore, Eric
  2006-10-02 23:37 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Moore, Eric @ 2006-10-02 23:21 UTC (permalink / raw)
  To: Andrew Morton, Martin Bligh; +Cc: LKML, Andy Whitcroft, linux-scsi

On Monday, October 02, 2006 2:40 PM, Andrew Morton wrote: 

> 
> Yeah, Bryce@osdl is hitting this.  Apparently it can be worked around
> by compiling the driver as a module.
>

What I saw in Bryces trace was the driver was not receiving interrupts
for
the first command sent after interrutps were enabled.  This was a config
page
for spi port pages.  Since this command timed out, an internal timeout
handler was called,
and we issued an internal host reset.  The host reset called each
driver,
such as mptspi, mptfc, mptsas,  callback handers.  That ended with
as pacin in mptspi, due to we assume ioc->hd to be a valid pointer.  
We don't allocate ioc->hd to well after mpt_attach, which is where the
config
page that timed out.    We could prevent the panic in mptspi, but that 
doesn't fix the problem why we are not getting interrupts.   

I have a 2.6.18 gold kernel, and that works fine with modules.  
There are no changes in mpt stack since 2.6.18 that would effect
interrupts.  
Do you know of any changes in kernel effecting interrupts?   I suspect
that
modules versus linked drivers into kernel would matter, or would it?

I've been busy with SAS issues today, and not had time to replicat this.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-10-03  1:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-02 20:10 Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2 Martin Bligh
2006-10-02 20:39 ` Andrew Morton
2006-10-02 23:21 Moore, Eric
2006-10-02 23:37 ` Andrew Morton
2006-10-03  0:32   ` Jeff Garzik
2006-10-03  0:41     ` Andrew Morton
2006-10-03  0:51       ` Jeff Garzik
2006-10-03  1:35       ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).