* [PATCH] powerpc/pci: Avoid overriding MSI interrupt
@ 2013-06-28 13:10 Gavin Shan
2013-06-29 7:20 ` Benjamin Herrenschmidt
2013-06-29 23:09 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 6+ messages in thread
From: Gavin Shan @ 2013-06-28 13:10 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Yuanquan.Chen, Gavin Shan
The issue was introduced by commit 37f02195 ("powerpc/pci: fix
PCI-e devices rescan issue on powerpc platform"). The field
(struct pci_dev::irq) is reused by PCI core to trace the base
MSI interrupt number if the MSI stuff is enabled on the corresponding
device. When running to pcibios_setup_device(), we possibly still
have enabled MSI interrupt on the device. That means "pci_dev->irq"
still have the base MSI interrupt number and it will be overwritten
if we're going fix "pci_dev->irq" again by pci_read_irq_line().
Eventually, when we enable the device, it runs to kernel crash caused
by fetching the the MSI interrupt descriptor (struct msi_desc) from
non-MSI interrupt and using the NULL descriptor.
The patch adds more check inside pcibios_setup_device() and don't
fix the interrupt number if we already had MSI interrupt enabled on
the device.
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000004177ac
cpu 0x6: Vector: 300 (Data Access) at [c000000fa24b7690]
pc: c0000000004177ac: .pci_restore_msi_state+0x30c/0x3b0
lr: c00000000041777c: .pci_restore_msi_state+0x2dc/0x3b0
sp: c000000fa24b7910
msr: 8000000000009032
dar: 8
dsisr: 40000000
current = 0xc000000fb68542c0
paca = 0xc00000000ecd1500 softe: 0 irq_happened: 0x00
pid = 5367, comm = eehd
enter ? for help
[c000000fa24b79b0] c000000000405d2c .pci_restore_state.part.27+0x11c/0x2a0
[c000000fa24b7a40] c0000000005ea128 .e1000_io_slot_reset+0xa8/0x230
[c000000fa24b7ad0] c00000000005fcd4 .eeh_report_reset+0x94/0x120
[c000000fa24b7b60] c00000000005e97c .eeh_pe_dev_traverse+0x9c/0x190
[c000000fa24b7c10] c000000000060078 .eeh_handle_event+0x218/0x330
[c000000fa24b7ca0] c0000000000602c0 .eeh_event_handler+0x130/0x1a0
[c000000fa24b7d30] c0000000000ad6f8 .kthread+0xe8/0xf0
[c000000fa24b7e30] c00000000000a05c .ret_from_kernel_thread+0x5c/0x80
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
arch/powerpc/kernel/pci-common.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index eabeec9..d3a00e8 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1009,10 +1009,18 @@ void pcibios_setup_device(struct pci_dev *dev)
if (ppc_md.pci_dma_dev_setup)
ppc_md.pci_dma_dev_setup(dev);
- /* Read default IRQs and fixup if necessary */
- pci_read_irq_line(dev);
- if (ppc_md.pci_irq_fixup)
- ppc_md.pci_irq_fixup(dev);
+ /*
+ * Read default IRQs and fixup if necessary. We probably
+ * has MSI interrupt enabled on the device and that hasn't
+ * been unloaded yet. For that case, "dev->irq" is tracing
+ * the base MSI interrupt number and it's going to overrite
+ * the MSI interrupt number to fix "dev->irq" here.
+ */
+ if (!dev->msi_enabled) {
+ pci_read_irq_line(dev);
+ if (ppc_md.pci_irq_fixup)
+ ppc_md.pci_irq_fixup(dev);
+ }
}
void pcibios_setup_bus_devices(struct pci_bus *bus)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
2013-06-28 13:10 [PATCH] powerpc/pci: Avoid overriding MSI interrupt Gavin Shan
@ 2013-06-29 7:20 ` Benjamin Herrenschmidt
2013-06-29 23:09 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-29 7:20 UTC (permalink / raw)
To: Gavin Shan; +Cc: Yuanquan.Chen, linuxppc-dev
On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> PCI-e devices rescan issue on powerpc platform"). The field
That "fix" caused more problems than it solved. There's a better
approach floating around that I will merge eventually next week.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
2013-06-28 13:10 [PATCH] powerpc/pci: Avoid overriding MSI interrupt Gavin Shan
2013-06-29 7:20 ` Benjamin Herrenschmidt
@ 2013-06-29 23:09 ` Benjamin Herrenschmidt
2013-06-30 1:29 ` Guenter Roeck
2013-06-30 1:51 ` Gavin Shan
1 sibling, 2 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-29 23:09 UTC (permalink / raw)
To: Gavin Shan; +Cc: Yuanquan.Chen, linuxppc-dev, Guenter Roeck
On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> PCI-e devices rescan issue on powerpc platform"). The field
> (struct pci_dev::irq) is reused by PCI core to trace the base
> MSI interrupt number if the MSI stuff is enabled on the corresponding
> device. When running to pcibios_setup_device(), we possibly still
> have enabled MSI interrupt on the device. That means "pci_dev->irq"
> still have the base MSI interrupt number and it will be overwritten
> if we're going fix "pci_dev->irq" again by pci_read_irq_line().
> Eventually, when we enable the device, it runs to kernel crash caused
> by fetching the the MSI interrupt descriptor (struct msi_desc) from
> non-MSI interrupt and using the NULL descriptor.
So finally I decided instead to apply Guenter patch
[PATCH v2] powerpc/pci: Improve device hotplug initialization
Which fixes the underlying problem instead.
I'm running some tests, so far it looks good. However, Gavin, when you
have a chance on vpl3, try injecting errors to other adapters, for
example the VGA adapter (you need to do lspci to trigger the EEH
detection after that since there's no driver and use the "loc code"
variant off errinjct) or eth2 (the cxgb3).
All I get from EEH with these is:
[ 362.962564] EEH: Detected PCI bus error on PHB#7-PE#10000
[ 362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#10000
and
[ 424.381083] EEH: Detected PCI bus error on PHB#6-PE#10000
[ 424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#10000
Followed by ... nothing.
This is a tree which has Cascardo patch and Gunther patch (usual
location on vpl3).
Can you have a look ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
2013-06-29 23:09 ` Benjamin Herrenschmidt
@ 2013-06-30 1:29 ` Guenter Roeck
2013-06-30 1:51 ` Gavin Shan
1 sibling, 0 replies; 6+ messages in thread
From: Guenter Roeck @ 2013-06-30 1:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Yuanquan.Chen, linuxppc-dev, Gavin Shan
On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> > The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> > PCI-e devices rescan issue on powerpc platform"). The field
> > (struct pci_dev::irq) is reused by PCI core to trace the base
> > MSI interrupt number if the MSI stuff is enabled on the corresponding
> > device. When running to pcibios_setup_device(), we possibly still
> > have enabled MSI interrupt on the device. That means "pci_dev->irq"
> > still have the base MSI interrupt number and it will be overwritten
> > if we're going fix "pci_dev->irq" again by pci_read_irq_line().
> > Eventually, when we enable the device, it runs to kernel crash caused
> > by fetching the the MSI interrupt descriptor (struct msi_desc) from
> > non-MSI interrupt and using the NULL descriptor.
>
> So finally I decided instead to apply Guenter patch
>
> [PATCH v2] powerpc/pci: Improve device hotplug initialization
>
> Which fixes the underlying problem instead.
>
Guess I am not hitting above bug because I have my own patch applied ;).
Thanks a lot!
Guenter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
2013-06-29 23:09 ` Benjamin Herrenschmidt
2013-06-30 1:29 ` Guenter Roeck
@ 2013-06-30 1:51 ` Gavin Shan
2013-06-30 4:06 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 6+ messages in thread
From: Gavin Shan @ 2013-06-30 1:51 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Yuanquan.Chen, linuxppc-dev, Gavin Shan, Guenter Roeck
On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote:
>On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
.../...
>I'm running some tests, so far it looks good. However, Gavin, when you
>have a chance on vpl3, try injecting errors to other adapters, for
>example the VGA adapter (you need to do lspci to trigger the EEH
>detection after that since there's no driver and use the "loc code"
>variant off errinjct) or eth2 (the cxgb3).
>
>All I get from EEH with these is:
>
>[ 362.962564] EEH: Detected PCI bus error on PHB#7-PE#10000
>[ 362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#10000
>
>and
>
>[ 424.381083] EEH: Detected PCI bus error on PHB#6-PE#10000
>[ 424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#10000
>
>Followed by ... nothing.
>
Ben, I think one patch was lost from mainline and that fixes the problem.
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d
I had the patch applied to /home/benh/linux-test and have following commands
to inject errors, everything looks good :-)
errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0
errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0
Thanks,
Gavin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
2013-06-30 1:51 ` Gavin Shan
@ 2013-06-30 4:06 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-30 4:06 UTC (permalink / raw)
To: Gavin Shan; +Cc: Yuanquan.Chen, linuxppc-dev, Guenter Roeck
On Sun, 2013-06-30 at 09:51 +0800, Gavin Shan wrote:
> Ben, I think one patch was lost from mainline and that fixes the problem.
>
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d
>
> I had the patch applied to /home/benh/linux-test and have following commands
> to inject errors, everything looks good :-)
>
> errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0
> errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0
Ok, thanks, it's in -next and not upstream yet. I'll ask Linus to pull
that one.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-06-30 4:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-28 13:10 [PATCH] powerpc/pci: Avoid overriding MSI interrupt Gavin Shan
2013-06-29 7:20 ` Benjamin Herrenschmidt
2013-06-29 23:09 ` Benjamin Herrenschmidt
2013-06-30 1:29 ` Guenter Roeck
2013-06-30 1:51 ` Gavin Shan
2013-06-30 4:06 ` Benjamin Herrenschmidt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.