All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/pci: Avoid overriding MSI interrupt
@ 2013-06-28 13:10 Gavin Shan
  2013-06-29  7:20 ` Benjamin Herrenschmidt
  2013-06-29 23:09 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 6+ messages in thread
From: Gavin Shan @ 2013-06-28 13:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Yuanquan.Chen, Gavin Shan

The issue was introduced by commit 37f02195 ("powerpc/pci: fix
PCI-e devices rescan issue on powerpc platform"). The field
(struct pci_dev::irq) is reused by PCI core to trace the base
MSI interrupt number if the MSI stuff is enabled on the corresponding
device. When running to pcibios_setup_device(), we possibly still
have enabled MSI interrupt on the device. That means "pci_dev->irq"
still have the base MSI interrupt number and it will be overwritten
if we're going fix "pci_dev->irq" again by pci_read_irq_line().
Eventually, when we enable the device, it runs to kernel crash caused
by fetching the the MSI interrupt descriptor (struct msi_desc) from
non-MSI interrupt and using the NULL descriptor.

The patch adds more check inside pcibios_setup_device() and don't
fix the interrupt number if we already had MSI interrupt enabled on
the device.

Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000004177ac
cpu 0x6: Vector: 300 (Data Access) at [c000000fa24b7690]
    pc: c0000000004177ac: .pci_restore_msi_state+0x30c/0x3b0
    lr: c00000000041777c: .pci_restore_msi_state+0x2dc/0x3b0
    sp: c000000fa24b7910
   msr: 8000000000009032
   dar: 8
 dsisr: 40000000
  current = 0xc000000fb68542c0
  paca    = 0xc00000000ecd1500	 softe: 0	 irq_happened: 0x00
    pid   = 5367, comm = eehd
enter ? for help
[c000000fa24b79b0] c000000000405d2c .pci_restore_state.part.27+0x11c/0x2a0
[c000000fa24b7a40] c0000000005ea128 .e1000_io_slot_reset+0xa8/0x230
[c000000fa24b7ad0] c00000000005fcd4 .eeh_report_reset+0x94/0x120
[c000000fa24b7b60] c00000000005e97c .eeh_pe_dev_traverse+0x9c/0x190
[c000000fa24b7c10] c000000000060078 .eeh_handle_event+0x218/0x330
[c000000fa24b7ca0] c0000000000602c0 .eeh_event_handler+0x130/0x1a0
[c000000fa24b7d30] c0000000000ad6f8 .kthread+0xe8/0xf0
[c000000fa24b7e30] c00000000000a05c .ret_from_kernel_thread+0x5c/0x80

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index eabeec9..d3a00e8 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1009,10 +1009,18 @@ void pcibios_setup_device(struct pci_dev *dev)
 	if (ppc_md.pci_dma_dev_setup)
 		ppc_md.pci_dma_dev_setup(dev);
 
-	/* Read default IRQs and fixup if necessary */
-	pci_read_irq_line(dev);
-	if (ppc_md.pci_irq_fixup)
-		ppc_md.pci_irq_fixup(dev);
+	/*
+	 * Read default IRQs and fixup if necessary. We probably
+	 * has MSI interrupt enabled on the device and that hasn't
+	 * been unloaded yet. For that case, "dev->irq" is tracing
+	 * the base MSI interrupt number and it's going to overrite
+	 * the MSI interrupt number to fix "dev->irq" here.
+	 */
+	if (!dev->msi_enabled) {
+		pci_read_irq_line(dev);
+		if (ppc_md.pci_irq_fixup)
+			ppc_md.pci_irq_fixup(dev);
+	}
 }
 
 void pcibios_setup_bus_devices(struct pci_bus *bus)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
  2013-06-28 13:10 [PATCH] powerpc/pci: Avoid overriding MSI interrupt Gavin Shan
@ 2013-06-29  7:20 ` Benjamin Herrenschmidt
  2013-06-29 23:09 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-29  7:20 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Yuanquan.Chen, linuxppc-dev

On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> PCI-e devices rescan issue on powerpc platform"). The field

That "fix" caused more problems than it solved. There's a better
approach floating around that I will merge eventually next week.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
  2013-06-28 13:10 [PATCH] powerpc/pci: Avoid overriding MSI interrupt Gavin Shan
  2013-06-29  7:20 ` Benjamin Herrenschmidt
@ 2013-06-29 23:09 ` Benjamin Herrenschmidt
  2013-06-30  1:29   ` Guenter Roeck
  2013-06-30  1:51   ` Gavin Shan
  1 sibling, 2 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-29 23:09 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Yuanquan.Chen, linuxppc-dev, Guenter Roeck

On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> PCI-e devices rescan issue on powerpc platform"). The field
> (struct pci_dev::irq) is reused by PCI core to trace the base
> MSI interrupt number if the MSI stuff is enabled on the corresponding
> device. When running to pcibios_setup_device(), we possibly still
> have enabled MSI interrupt on the device. That means "pci_dev->irq"
> still have the base MSI interrupt number and it will be overwritten
> if we're going fix "pci_dev->irq" again by pci_read_irq_line().
> Eventually, when we enable the device, it runs to kernel crash caused
> by fetching the the MSI interrupt descriptor (struct msi_desc) from
> non-MSI interrupt and using the NULL descriptor.

So finally I decided instead to apply Guenter patch

[PATCH v2] powerpc/pci: Improve device hotplug initialization

Which fixes the underlying problem instead.

I'm running some tests, so far it looks good. However, Gavin, when you
have a chance on vpl3, try injecting errors to other adapters, for
example the VGA adapter (you need to do lspci to trigger the EEH
detection after that since there's no driver and use the "loc code"
variant off errinjct) or eth2 (the cxgb3).

All I get from EEH with these is:

[  362.962564] EEH: Detected PCI bus error on PHB#7-PE#10000
[  362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#10000

and

[  424.381083] EEH: Detected PCI bus error on PHB#6-PE#10000
[  424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#10000

Followed by ... nothing.

This is a tree which has Cascardo patch and Gunther patch (usual
location on vpl3).

Can you have a look ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
  2013-06-29 23:09 ` Benjamin Herrenschmidt
@ 2013-06-30  1:29   ` Guenter Roeck
  2013-06-30  1:51   ` Gavin Shan
  1 sibling, 0 replies; 6+ messages in thread
From: Guenter Roeck @ 2013-06-30  1:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Yuanquan.Chen, linuxppc-dev, Gavin Shan

On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> > The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> > PCI-e devices rescan issue on powerpc platform"). The field
> > (struct pci_dev::irq) is reused by PCI core to trace the base
> > MSI interrupt number if the MSI stuff is enabled on the corresponding
> > device. When running to pcibios_setup_device(), we possibly still
> > have enabled MSI interrupt on the device. That means "pci_dev->irq"
> > still have the base MSI interrupt number and it will be overwritten
> > if we're going fix "pci_dev->irq" again by pci_read_irq_line().
> > Eventually, when we enable the device, it runs to kernel crash caused
> > by fetching the the MSI interrupt descriptor (struct msi_desc) from
> > non-MSI interrupt and using the NULL descriptor.
> 
> So finally I decided instead to apply Guenter patch
> 
> [PATCH v2] powerpc/pci: Improve device hotplug initialization
> 
> Which fixes the underlying problem instead.
> 

Guess I am not hitting above bug because I have my own patch applied ;).

Thanks a lot!

Guenter

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
  2013-06-29 23:09 ` Benjamin Herrenschmidt
  2013-06-30  1:29   ` Guenter Roeck
@ 2013-06-30  1:51   ` Gavin Shan
  2013-06-30  4:06     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 6+ messages in thread
From: Gavin Shan @ 2013-06-30  1:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yuanquan.Chen, linuxppc-dev, Gavin Shan, Guenter Roeck

On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote:
>On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:

.../...

>I'm running some tests, so far it looks good. However, Gavin, when you
>have a chance on vpl3, try injecting errors to other adapters, for
>example the VGA adapter (you need to do lspci to trigger the EEH
>detection after that since there's no driver and use the "loc code"
>variant off errinjct) or eth2 (the cxgb3).
>
>All I get from EEH with these is:
>
>[  362.962564] EEH: Detected PCI bus error on PHB#7-PE#10000
>[  362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#10000
>
>and
>
>[  424.381083] EEH: Detected PCI bus error on PHB#6-PE#10000
>[  424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#10000
>
>Followed by ... nothing.
>

Ben, I think one patch was lost from mainline and that fixes the problem.

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d

I had the patch applied to /home/benh/linux-test and have following commands
to inject errors, everything looks good :-)

errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0
errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
  2013-06-30  1:51   ` Gavin Shan
@ 2013-06-30  4:06     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-30  4:06 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Yuanquan.Chen, linuxppc-dev, Guenter Roeck

On Sun, 2013-06-30 at 09:51 +0800, Gavin Shan wrote:

> Ben, I think one patch was lost from mainline and that fixes the problem.
> 
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d
> 
> I had the patch applied to /home/benh/linux-test and have following commands
> to inject errors, everything looks good :-)
> 
> errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0
> errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0

Ok, thanks, it's in -next and not upstream yet. I'll ask Linus to pull
that one.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-30  4:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-28 13:10 [PATCH] powerpc/pci: Avoid overriding MSI interrupt Gavin Shan
2013-06-29  7:20 ` Benjamin Herrenschmidt
2013-06-29 23:09 ` Benjamin Herrenschmidt
2013-06-30  1:29   ` Guenter Roeck
2013-06-30  1:51   ` Gavin Shan
2013-06-30  4:06     ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.