All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bnx2: Fix IRQ failures during kdump.
@ 2010-05-29  3:24 Michael Chan
  2010-05-29  5:33 ` David Miller
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Michael Chan @ 2010-05-29  3:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-pci

When switching from the crashed kernel to the kdump kernel without going
through PCI reset, IRQs may not work if a different IRQ mode is used on
the kdump kernel.  The original IRQ mode used in the crashed kernel may
still be enabled and the new IRQ mode may not work.  For example, it
will fail when going from MSI-X mode to MSI mode.

We fix this by disabling MSI/MSI-X and enabling INTX in bnx2_init_board().

pci_save_state() is also moved to the end of bnx2_init_board() after
all config register fixups (including the new IRQ fixups) have been done.

Export pci_msi_off() from drivers/pci/pci.c for this purpose.

Update bnx2 version to 2.0.16.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   17 ++++++++++++++---
 drivers/pci/pci.c  |    1 +
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 188e356..1b8ba14 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
 #include "bnx2_fw.h"
 
 #define DRV_MODULE_NAME		"bnx2"
-#define DRV_MODULE_VERSION	"2.0.15"
-#define DRV_MODULE_RELDATE	"May 4, 2010"
+#define DRV_MODULE_VERSION	"2.0.16"
+#define DRV_MODULE_RELDATE	"May 28, 2010"
 #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-5.0.0.j6.fw"
 #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
 #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-5.0.0.j15.fw"
@@ -7877,7 +7877,6 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 	}
 
 	pci_set_master(pdev);
-	pci_save_state(pdev);
 
 	bp->pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
 	if (bp->pm_cap == 0) {
@@ -7953,6 +7952,16 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 			bp->flags |= BNX2_FLAG_MSI_CAP;
 	}
 
+	/* When going from a crashed kernel to a kdump kernel without PCI
+	 * reset, MSI/MSI-X may still be enabled.  We need to disable
+	 * MSI/MSI-X and enable INTX because the kdump driver may operate
+	 * the device in a different IRQ mode.
+	 */
+	if (bp->flags & (BNX2_FLAG_MSI_CAP | BNX2_FLAG_MSIX_CAP)) {
+		pci_msi_off(pdev);
+		pci_intx(pdev, 1);
+	}
+
 	/* 5708 cannot support DMA addresses > 40-bit.  */
 	if (CHIP_NUM(bp) == CHIP_NUM_5708)
 		persist_dma_mask = dma_mask = DMA_BIT_MASK(40);
@@ -8188,6 +8197,8 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 	bp->timer.data = (unsigned long) bp;
 	bp->timer.function = bnx2_timer;
 
+	pci_save_state(pdev);
+
 	return 0;
 
 err_out_unmap:
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1df7c50..a46b49d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2294,6 +2294,7 @@ void pci_msi_off(struct pci_dev *dev)
 		pci_write_config_word(dev, pos + PCI_MSIX_FLAGS, control);
 	}
 }
+EXPORT_SYMBOL(pci_msi_off);
 
 #ifndef HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_SIZE
 int pci_set_dma_max_seg_size(struct pci_dev *dev, unsigned int size)
-- 
1.6.4.GIT

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29  3:24 [PATCH] bnx2: Fix IRQ failures during kdump Michael Chan
@ 2010-05-29  5:33 ` David Miller
  2010-05-29  6:45 ` Grant Grundler
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2010-05-29  5:33 UTC (permalink / raw)
  To: mchan; +Cc: netdev, linux-pci

From: "Michael Chan" <mchan@broadcom.com>
Date: Fri, 28 May 2010 20:24:22 -0700

> When switching from the crashed kernel to the kdump kernel without going
> through PCI reset, IRQs may not work if a different IRQ mode is used on
> the kdump kernel.  The original IRQ mode used in the crashed kernel may
> still be enabled and the new IRQ mode may not work.  For example, it
> will fail when going from MSI-X mode to MSI mode.
> 
> We fix this by disabling MSI/MSI-X and enabling INTX in bnx2_init_board().
> 
> pci_save_state() is also moved to the end of bnx2_init_board() after
> all config register fixups (including the new IRQ fixups) have been done.
> 
> Export pci_msi_off() from drivers/pci/pci.c for this purpose.
> 
> Update bnx2 version to 2.0.16.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

I sincerely doubt that your's will be the only device which will
ever run into this issue.  Therefore handling it manually in
each and every device driver, which is the trend you will be
setting with this patch, doesn't make much sense.

Any device which uses MSI in any way can run into this scenerio,
wherein the device will be left with MSI enabled when we leave the
crash kernel and jump into the kdump kernel.

So this needs to be handled generically in the PCI layer or similar.

I'm not applying this patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29  3:24 [PATCH] bnx2: Fix IRQ failures during kdump Michael Chan
  2010-05-29  5:33 ` David Miller
@ 2010-05-29  6:45 ` Grant Grundler
  2010-05-29  6:50   ` David Miller
  2010-05-29 16:01 ` Stephen Hemminger
  2010-05-30  9:43 ` Andi Kleen
  3 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2010-05-29  6:45 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, linux-pci

On Fri, May 28, 2010 at 08:24:22PM -0700, Michael Chan wrote:
> When switching from the crashed kernel to the kdump kernel without going
> through PCI reset, IRQs may not work if a different IRQ mode is used on
> the kdump kernel.  The original IRQ mode used in the crashed kernel may
> still be enabled and the new IRQ mode may not work.  For example, it
> will fail when going from MSI-X mode to MSI mode.
> 
> We fix this by disabling MSI/MSI-X and enabling INTX in bnx2_init_board().
> 
> pci_save_state() is also moved to the end of bnx2_init_board() after
> all config register fixups (including the new IRQ fixups) have been done.
> 
> Export pci_msi_off() from drivers/pci/pci.c for this purpose.
> 
> Update bnx2 version to 2.0.16.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> ---
>  drivers/net/bnx2.c |   17 ++++++++++++++---
>  drivers/pci/pci.c  |    1 +
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 188e356..1b8ba14 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -58,8 +58,8 @@
>  #include "bnx2_fw.h"
>  
>  #define DRV_MODULE_NAME		"bnx2"
> -#define DRV_MODULE_VERSION	"2.0.15"
> -#define DRV_MODULE_RELDATE	"May 4, 2010"
> +#define DRV_MODULE_VERSION	"2.0.16"
> +#define DRV_MODULE_RELDATE	"May 28, 2010"
>  #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-5.0.0.j6.fw"
>  #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
>  #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-5.0.0.j15.fw"
> @@ -7877,7 +7877,6 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
>  	}
>  
>  	pci_set_master(pdev);
> -	pci_save_state(pdev);
>  
>  	bp->pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
>  	if (bp->pm_cap == 0) {
> @@ -7953,6 +7952,16 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
>  			bp->flags |= BNX2_FLAG_MSI_CAP;
>  	}
>  
> +	/* When going from a crashed kernel to a kdump kernel without PCI
> +	 * reset, MSI/MSI-X may still be enabled.  We need to disable
> +	 * MSI/MSI-X and enable INTX because the kdump driver may operate
> +	 * the device in a different IRQ mode.
> +	 */
> +	if (bp->flags & (BNX2_FLAG_MSI_CAP | BNX2_FLAG_MSIX_CAP)) {
> +		pci_msi_off(pdev);
> +		pci_intx(pdev, 1);

Does the driver have to register a different Interrupt handler when
switching from MSI(-X) to IRQ interrupts?

(I'm thinking of IRQ interrupts might have DMA vs IRQ ordering dependency)

If that's true, then this can't be handled in the generic PCI layer (as
suggested by davem) unless the device driver could register multiple interrupt
handlers even if only one is active at a time.

thanks,
grant

> +	}
> +
>  	/* 5708 cannot support DMA addresses > 40-bit.  */
>  	if (CHIP_NUM(bp) == CHIP_NUM_5708)
>  		persist_dma_mask = dma_mask = DMA_BIT_MASK(40);
> @@ -8188,6 +8197,8 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
>  	bp->timer.data = (unsigned long) bp;
>  	bp->timer.function = bnx2_timer;
>  
> +	pci_save_state(pdev);
> +
>  	return 0;
>  
>  err_out_unmap:
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1df7c50..a46b49d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2294,6 +2294,7 @@ void pci_msi_off(struct pci_dev *dev)
>  		pci_write_config_word(dev, pos + PCI_MSIX_FLAGS, control);
>  	}
>  }
> +EXPORT_SYMBOL(pci_msi_off);
>  
>  #ifndef HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_SIZE
>  int pci_set_dma_max_seg_size(struct pci_dev *dev, unsigned int size)
> -- 
> 1.6.4.GIT
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29  6:45 ` Grant Grundler
@ 2010-05-29  6:50   ` David Miller
  2010-05-29 16:22     ` Michael Chan
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2010-05-29  6:50 UTC (permalink / raw)
  To: grundler; +Cc: mchan, netdev, linux-pci

From: Grant Grundler <grundler@parisc-linux.org>
Date: Sat, 29 May 2010 00:45:12 -0600

> If that's true, then this can't be handled in the generic PCI layer (as
> suggested by davem) unless the device driver could register multiple interrupt
> handlers even if only one is active at a time.

The generic PCI layer very well can turn off MSI on all devices
when it starts up or a device is plugged in.

That's all he is doing.

Drivers essentially expect that the device comes up in INTX mode
when the driver probes the device.  All his change is doing
is forcing that to be true, and there is no reason the generic
PCI code can't do that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29  3:24 [PATCH] bnx2: Fix IRQ failures during kdump Michael Chan
  2010-05-29  5:33 ` David Miller
  2010-05-29  6:45 ` Grant Grundler
@ 2010-05-29 16:01 ` Stephen Hemminger
  2010-05-30  9:43 ` Andi Kleen
  3 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2010-05-29 16:01 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, linux-pci

On Fri, 28 May 2010 20:24:22 -0700
"Michael Chan" <mchan@broadcom.com> wrote:

> When switching from the crashed kernel to the kdump kernel without going
> through PCI reset, IRQs may not work if a different IRQ mode is used on
> the kdump kernel.  The original IRQ mode used in the crashed kernel may
> still be enabled and the new IRQ mode may not work.  For example, it
> will fail when going from MSI-X mode to MSI mode.
> 
> We fix this by disabling MSI/MSI-X and enabling INTX in bnx2_init_board().
> 
> pci_save_state() is also moved to the end of bnx2_init_board() after
> all config register fixups (including the new IRQ fixups) have been done.
> 
> Export pci_msi_off() from drivers/pci/pci.c for this purpose.
> 
> Update bnx2 version to 2.0.16.

This is probably a generic problem for many drivers. So why not
get the kdump code to fix it for all.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29  6:50   ` David Miller
@ 2010-05-29 16:22     ` Michael Chan
  2010-05-29 23:05       ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Chan @ 2010-05-29 16:22 UTC (permalink / raw)
  To: 'David Miller', grundler; +Cc: netdev, linux-pci

David Miller wrote:

>
> The generic PCI layer very well can turn off MSI on all devices
> when it starts up or a device is plugged in.
>
> That's all he is doing.
>
> Drivers essentially expect that the device comes up in INTX mode
> when the driver probes the device.  All his change is doing
> is forcing that to be true, and there is no reason the generic
> PCI code can't do that.
>

I think there may be more issues after thinking about it some more.
The device is essentially still active at this time.  The PCI
layer can turn off certain things, but enabling INTX can lead to
"irq x: nobody cared" if the driver is not ready for it.  The
device really needs to be reset by the driver to be totally
reliable.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29 16:22     ` Michael Chan
@ 2010-05-29 23:05       ` David Miller
  2010-05-30  1:24         ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2010-05-29 23:05 UTC (permalink / raw)
  To: mchan; +Cc: grundler, netdev, linux-pci

From: "Michael Chan" <mchan@broadcom.com>
Date: Sat, 29 May 2010 09:22:07 -0700

> I think there may be more issues after thinking about it some more.
> The device is essentially still active at this time.  The PCI
> layer can turn off certain things, but enabling INTX can lead to
> "irq x: nobody cared" if the driver is not ready for it.  The
> device really needs to be reset by the driver to be totally
> reliable.

We still have to find some generic way to do this.

My position still stands, and it is entirely rediculious to
have every single driver have to attend to all of these
esoteric details just to handle interrupts properly.  Drivers
are hard enough to write as-is.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29 23:05       ` David Miller
@ 2010-05-30  1:24         ` Matthew Wilcox
  2010-05-30  3:49           ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2010-05-30  1:24 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, grundler, netdev, linux-pci

On Sat, May 29, 2010 at 04:05:27PM -0700, David Miller wrote:
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Sat, 29 May 2010 09:22:07 -0700
> 
> > I think there may be more issues after thinking about it some more.
> > The device is essentially still active at this time.  The PCI
> > layer can turn off certain things, but enabling INTX can lead to
> > "irq x: nobody cared" if the driver is not ready for it.  The
> > device really needs to be reset by the driver to be totally
> > reliable.
> 
> We still have to find some generic way to do this.
> 
> My position still stands, and it is entirely rediculious to
> have every single driver have to attend to all of these
> esoteric details just to handle interrupts properly.  Drivers
> are hard enough to write as-is.

I think we can do this generically.  PCI has disable bits for MSI,
MSI-X and pin-based interrupts.  So we can leave MSIs enabled, but disable
interrupt generation.

We should probably set the interrupt type back to pin-based before the
kexec kernel starts, right?  Or do we expect drivers to handle being
initialised with the device still set to MSI mode?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-30  1:24         ` Matthew Wilcox
@ 2010-05-30  3:49           ` David Miller
  2010-05-30  9:44             ` Andi Kleen
  2010-05-30 16:32             ` Michael Chan
  0 siblings, 2 replies; 15+ messages in thread
From: David Miller @ 2010-05-30  3:49 UTC (permalink / raw)
  To: matthew; +Cc: mchan, grundler, netdev, linux-pci

From: Matthew Wilcox <matthew@wil.cx>
Date: Sat, 29 May 2010 19:24:01 -0600

> We should probably set the interrupt type back to pin-based before the
> kexec kernel starts, right?  Or do we expect drivers to handle being
> initialised with the device still set to MSI mode?

The expectation is that the device comes up in INTX mode, which is the
default after a PCI reset.

Basically all of these issues tend to be about the fact that unlike on
a normal boot, after a kexec an intermediate PCI reset has not occured.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-29  3:24 [PATCH] bnx2: Fix IRQ failures during kdump Michael Chan
                   ` (2 preceding siblings ...)
  2010-05-29 16:01 ` Stephen Hemminger
@ 2010-05-30  9:43 ` Andi Kleen
  2010-05-30 16:12   ` Michael Chan
  3 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2010-05-30  9:43 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, linux-pci

"Michael Chan" <mchan@broadcom.com> writes:

> When switching from the crashed kernel to the kdump kernel without going
> through PCI reset, IRQs may not work if a different IRQ mode is used on

PCIe with AER actually does support per link root port reset
(e.g. used for AER)

I've been wondering for some time if kexec should not simply
use that to reset all the devices, instead of addings hacks
around this to all drivers.

That would fix your problems too, right?

The question is just if AER is widely enough supported for this.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-30  3:49           ` David Miller
@ 2010-05-30  9:44             ` Andi Kleen
  2010-05-30 16:32             ` Michael Chan
  1 sibling, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2010-05-30  9:44 UTC (permalink / raw)
  To: David Miller; +Cc: matthew, mchan, grundler, netdev, linux-pci

David Miller <davem@davemloft.net> writes:
>
> Basically all of these issues tend to be about the fact that unlike on
> a normal boot, after a kexec an intermediate PCI reset has not occured.

This could be fixed, assuming the system has AER capability ...

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-30  9:43 ` Andi Kleen
@ 2010-05-30 16:12   ` Michael Chan
  2010-05-30 17:30     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Chan @ 2010-05-30 16:12 UTC (permalink / raw)
  To: 'Andi Kleen'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	'linux-pci@vger.kernel.org'

Andi Kleen wrote:

> "Michael Chan" <mchan@broadcom.com> writes:
> 
> > When switching from the crashed kernel to the kdump kernel without
> going
> > through PCI reset, IRQs may not work if a different IRQ mode is used
> on
> 
> PCIe with AER actually does support per link root port reset
> (e.g. used for AER)

Do you mean the slot_reset function in the pci_error_handlers?  This
needs to be called in the context of the crashed kernel, right?

> 
> I've been wondering for some time if kexec should not simply
> use that to reset all the devices, instead of addings hacks
> around this to all drivers.
> 
> That would fix your problems too, right?

If it is called in the context of the crashed kernel, it won't work.
We would reset it and put in back into the same IRQ mode.

> 
> The question is just if AER is widely enough supported for this.
> 

Some newer PCIe devices support Function Level Reset, and that would
be ideal.  But most existing devices including bnx2 devices don't have
this feature.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-30  3:49           ` David Miller
  2010-05-30  9:44             ` Andi Kleen
@ 2010-05-30 16:32             ` Michael Chan
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Chan @ 2010-05-30 16:32 UTC (permalink / raw)
  To: 'David Miller', 'matthew@wil.cx'
  Cc: 'grundler@parisc-linux.org',
	'netdev@vger.kernel.org',
	'linux-pci@vger.kernel.org'

David Miller wrote:

> From: Matthew Wilcox <matthew@wil.cx>
> Date: Sat, 29 May 2010 19:24:01 -0600
> 
> > We should probably set the interrupt type back to pin-based before
> the
> > kexec kernel starts, right?  Or do we expect drivers to handle being
> > initialised with the device still set to MSI mode?
> 
> The expectation is that the device comes up in INTX mode, which is the
> default after a PCI reset.

We need to be very careful because the device may still be active as I
said earlier.  Turning INTX on may lead to an IRQ storm that nobody will
handle.  Some older devices don't have the INTX enable bit, and INTX will
automatically be enabled when MSI is disabled.

> 
> Basically all of these issues tend to be about the fact that unlike on
> a normal boot, after a kexec an intermediate PCI reset has not occured.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-30 16:12   ` Michael Chan
@ 2010-05-30 17:30     ` Andi Kleen
  2010-05-31  4:43       ` Michael Chan
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2010-05-30 17:30 UTC (permalink / raw)
  To: Michael Chan
  Cc: 'Andi Kleen', 'davem@davemloft.net',
	'netdev@vger.kernel.org',
	'linux-pci@vger.kernel.org'

On Sun, May 30, 2010 at 09:12:15AM -0700, Michael Chan wrote:
> Andi Kleen wrote:
> 
> > "Michael Chan" <mchan@broadcom.com> writes:
> > 
> > > When switching from the crashed kernel to the kdump kernel without
> > going
> > > through PCI reset, IRQs may not work if a different IRQ mode is used
> > on
> > 
> > PCIe with AER actually does support per link root port reset
> > (e.g. used for AER)
> 
> Do you mean the slot_reset function in the pci_error_handlers?  This

Well the fallback code in the PCIE root port driver 
that does the actual resets.

It could be called directly before kexec.

> needs to be called in the context of the crashed kernel, right?

It could be done on kexec, however of course you would rely
on PCI root port data structures still being intact on a crash
(I guess that's reasonable, they are not very complicated)

> 
> > 
> > I've been wondering for some time if kexec should not simply
> > use that to reset all the devices, instead of addings hacks
> > around this to all drivers.
> > 
> > That would fix your problems too, right?
> 
> If it is called in the context of the crashed kernel, it won't work.
> We would reset it and put in back into the same IRQ mode.

Who would put it back? Your driver wouldn't be called anymore.

> 
> > 
> > The question is just if AER is widely enough supported for this.
> > 
> 
> Some newer PCIe devices support Function Level Reset, and that would
> be ideal.  But most existing devices including bnx2 devices don't have
> this feature.

Root port reset should be fine for this case. Even if some
innocent device on the same root port gets reset too that shouldn't matter. 
Only drawback for the NIC would be that you have to renegotiate links I think. 

Also there are systems without AER support.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
  2010-05-30 17:30     ` Andi Kleen
@ 2010-05-31  4:43       ` Michael Chan
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Chan @ 2010-05-31  4:43 UTC (permalink / raw)
  To: 'Andi Kleen'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	'linux-pci@vger.kernel.org'

Andi Kleen wrote:

> On Sun, May 30, 2010 at 09:12:15AM -0700, Michael Chan wrote:
> > Andi Kleen wrote:
> >
> > > "Michael Chan" <mchan@broadcom.com> writes:
> > >
> > > > When switching from the crashed kernel to the kdump kernel
> without
> > > going
> > > > through PCI reset, IRQs may not work if a different IRQ mode is
> used
> > > on
> > >
> > > PCIe with AER actually does support per link root port reset
> > > (e.g. used for AER)
> >
> > Do you mean the slot_reset function in the pci_error_handlers?  This
> 
> Well the fallback code in the PCIE root port driver
> that does the actual resets.

aer_root_reset() in aerdrv.c?

> 
> It could be called directly before kexec.
> 
> > needs to be called in the context of the crashed kernel, right?
> 
> It could be done on kexec, however of course you would rely
> on PCI root port data structures still being intact on a crash
> (I guess that's reasonable, they are not very complicated)
> 
> >
> > >
> > > I've been wondering for some time if kexec should not simply
> > > use that to reset all the devices, instead of addings hacks
> > > around this to all drivers.
> > >
> > > That would fix your problems too, right?
> >
> > If it is called in the context of the crashed kernel, it won't work.
> > We would reset it and put in back into the same IRQ mode.
> 
> Who would put it back? Your driver wouldn't be called anymore.

The bnx2 driver like many other drivers has a slot_reset function in the
pci_driver struct's err_handler.  If the AER code calls this function,
we would reset the chip and put it back to the same IRQ mode.  Without
calling this per driver reset function, I'm not sure if you can reset
the device if the device does not support Function Level Reset.

> 
> >
> > >
> > > The question is just if AER is widely enough supported for this.
> > >
> >
> > Some newer PCIe devices support Function Level Reset, and that would
> > be ideal.  But most existing devices including bnx2 devices don't
> have
> > this feature.
> 
> Root port reset should be fine for this case. Even if some
> innocent device on the same root port gets reset too that shouldn't
> matter.
> Only drawback for the NIC would be that you have to renegotiate links I
> think.
> 
> Also there are systems without AER support.
> 
> -Andi
> --
> ak@linux.intel.com -- Speaking for myself only.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-05-31  4:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-29  3:24 [PATCH] bnx2: Fix IRQ failures during kdump Michael Chan
2010-05-29  5:33 ` David Miller
2010-05-29  6:45 ` Grant Grundler
2010-05-29  6:50   ` David Miller
2010-05-29 16:22     ` Michael Chan
2010-05-29 23:05       ` David Miller
2010-05-30  1:24         ` Matthew Wilcox
2010-05-30  3:49           ` David Miller
2010-05-30  9:44             ` Andi Kleen
2010-05-30 16:32             ` Michael Chan
2010-05-29 16:01 ` Stephen Hemminger
2010-05-30  9:43 ` Andi Kleen
2010-05-30 16:12   ` Michael Chan
2010-05-30 17:30     ` Andi Kleen
2010-05-31  4:43       ` Michael Chan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.