linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] PCI: disable SERR for kdump kernel
@ 2017-04-19  0:31 Yinghai Lu
  2017-04-20 17:14 ` Sinan Kaya
  0 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2017-04-19  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, Yinghai Lu

Found one system with infiniband with SRIOV enabled, kdump kernel
SRIOV BAR probing trigger one pci fatal error.
That assert error pin, and host get reset by BMC.

We can just ignore that error to let kernel go on
and kdump to create vmcore.

-v2: add debug print out

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/probe.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -17,6 +17,8 @@
 #include <linux/acpi.h>
 #include <linux/irqdomain.h>
 #include <linux/pm_runtime.h>
+#include <linux/crash_dump.h>
+
 #include "pci.h"
 
 #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */
@@ -1515,6 +1517,24 @@ static void pci_msi_setup_pci_dev(struct
 		pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 }
 
+static void pci_disable_serr(struct pci_dev *dev)
+{
+	u16 pci_cmd, pci_bctl;
+
+	pci_read_config_word(dev, PCI_COMMAND, &pci_cmd);
+	pci_cmd &= ~PCI_COMMAND_SERR;
+	pci_write_config_word(dev, PCI_COMMAND, pci_cmd);
+	dev_printk(KERN_DEBUG, &dev->dev, "SERR cleared\n");
+
+	/* Program bridge control value */
+	if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &pci_bctl);
+		pci_bctl &= ~PCI_BRIDGE_CTL_SERR;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, pci_bctl);
+		dev_printk(KERN_DEBUG, &dev->dev, "BRIDGE SERR cleared\n");
+	}
+}
+
 /**
  * pci_setup_device - fill in class and map information of a device
  * @dev: the device structure to fill
@@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
 	/* device class may be changed after fixup */
 	class = dev->class >> 8;
 
+	if (is_kdump_kernel())
+		pci_disable_serr(dev);
+
 	if (dev->non_compliant_bars) {
 		pci_read_config_word(dev, PCI_COMMAND, &cmd);
 		if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] PCI: disable SERR for kdump kernel
  2017-04-19  0:31 [PATCH v2] PCI: disable SERR for kdump kernel Yinghai Lu
@ 2017-04-20 17:14 ` Sinan Kaya
  2017-04-20 18:38   ` Bjorn Helgaas
  2017-04-20 23:37   ` Yinghai Lu
  0 siblings, 2 replies; 6+ messages in thread
From: Sinan Kaya @ 2017-04-20 17:14 UTC (permalink / raw)
  To: Yinghai Lu, Bjorn Helgaas; +Cc: linux-pci, linux-kernel

On 4/18/2017 8:31 PM, Yinghai Lu wrote:
> * pci_setup_device - fill in class and map information of a device
>   * @dev: the device structure to fill
> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>  	/* device class may be changed after fixup */
>  	class = dev->class >> 8;
>  
> +	if (is_kdump_kernel())
> +		pci_disable_serr(dev);
> +

This sounds like something that needs to be done while shutting down
the first kernel as part of the kdump procedure rather than boot of 
the kdump kernel in pci setup.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] PCI: disable SERR for kdump kernel
  2017-04-20 17:14 ` Sinan Kaya
@ 2017-04-20 18:38   ` Bjorn Helgaas
  2017-04-20 18:52     ` Sinan Kaya
  2017-04-20 23:37   ` Yinghai Lu
  1 sibling, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2017-04-20 18:38 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: Yinghai Lu, linux-pci, linux-kernel

On Thu, Apr 20, 2017 at 12:14 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 4/18/2017 8:31 PM, Yinghai Lu wrote:
>> * pci_setup_device - fill in class and map information of a device
>>   * @dev: the device structure to fill
>> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>>       /* device class may be changed after fixup */
>>       class = dev->class >> 8;
>>
>> +     if (is_kdump_kernel())
>> +             pci_disable_serr(dev);
>> +
>
> This sounds like something that needs to be done while shutting down
> the first kernel as part of the kdump procedure rather than boot of
> the kdump kernel in pci setup.

In general, I would rather make the new kernel more tolerant than make
assumptions about how the old kernel shut down.  I don't know if
there's an explicit statement of kexec philosophy on this (it'd be
nice if there were), but it seems like a more robust strategy, e.g.,
less prone to revlock issues between the old/new kernels.

Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] PCI: disable SERR for kdump kernel
  2017-04-20 18:38   ` Bjorn Helgaas
@ 2017-04-20 18:52     ` Sinan Kaya
  0 siblings, 0 replies; 6+ messages in thread
From: Sinan Kaya @ 2017-04-20 18:52 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yinghai Lu, linux-pci, linux-kernel

On 4/20/2017 2:38 PM, Bjorn Helgaas wrote:
> On Thu, Apr 20, 2017 at 12:14 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> On 4/18/2017 8:31 PM, Yinghai Lu wrote:
>>> * pci_setup_device - fill in class and map information of a device
>>>   * @dev: the device structure to fill
>>> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>>>       /* device class may be changed after fixup */
>>>       class = dev->class >> 8;
>>>
>>> +     if (is_kdump_kernel())
>>> +             pci_disable_serr(dev);
>>> +
>>
>> This sounds like something that needs to be done while shutting down
>> the first kernel as part of the kdump procedure rather than boot of
>> the kdump kernel in pci setup.
> 
> In general, I would rather make the new kernel more tolerant than make
> assumptions about how the old kernel shut down.  I don't know if
> there's an explicit statement of kexec philosophy on this (it'd be
> nice if there were), but it seems like a more robust strategy, e.g.,
> less prone to revlock issues between the old/new kernels.
> 

What if the secondary kernel never gets a chance to boot due to excessive
errors? Code might not even make to the point where PCI driver is executed.

If I remember this right, kexec is already doing PCI cleanup operation during
shutdown and it is also calling the shutdown hook of device drivers. 

(I recently added a shutdown hook to my own HIDMA driver for the very same reason)

The requirement for the second kernel boot is not to have any pending DMA
and IRQs so that secondary kernel can boot safely.

Maybe, the right thing is to look for a way to put PCI into some safe mode.
There should be some code there disabling the COMMAND enable bits if not already.

This code could be added to the same place.

http://lxr.free-electrons.com/ident?i=pci_device_shutdown

469         /*
470          * If this is a kexec reboot, turn off Bus Master bit on the
471          * device to tell it to not continue to do DMA. Don't touch
472          * devices in D3cold or unknown states.
473          * If it is not a kexec reboot, firmware will hit the PCI
474          * devices with big hammer and stop their DMA any way.
475          */
476         if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
477                 pci_clear_master(pci_dev);

> Bjorn
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] PCI: disable SERR for kdump kernel
  2017-04-20 17:14 ` Sinan Kaya
  2017-04-20 18:38   ` Bjorn Helgaas
@ 2017-04-20 23:37   ` Yinghai Lu
  2017-04-21  0:04     ` Sinan Kaya
  1 sibling, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2017-04-20 23:37 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: Bjorn Helgaas, linux-pci, Linux Kernel Mailing List

On Thu, Apr 20, 2017 at 10:14 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 4/18/2017 8:31 PM, Yinghai Lu wrote:
>> * pci_setup_device - fill in class and map information of a device
>>   * @dev: the device structure to fill
>> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>>       /* device class may be changed after fixup */
>>       class = dev->class >> 8;
>>
>> +     if (is_kdump_kernel())
>> +             pci_disable_serr(dev);
>> +
>
> This sounds like something that needs to be done while shutting down
> the first kernel as part of the kdump procedure rather than boot of
> the kdump kernel in pci setup.

For kdump path, first kernel shutdown path is not called.

We have to do sth in second kernel instead.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] PCI: disable SERR for kdump kernel
  2017-04-20 23:37   ` Yinghai Lu
@ 2017-04-21  0:04     ` Sinan Kaya
  0 siblings, 0 replies; 6+ messages in thread
From: Sinan Kaya @ 2017-04-21  0:04 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, Linux Kernel Mailing List

On 4/20/2017 7:37 PM, Yinghai Lu wrote:
> On Thu, Apr 20, 2017 at 10:14 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> On 4/18/2017 8:31 PM, Yinghai Lu wrote:
>>> * pci_setup_device - fill in class and map information of a device
>>>   * @dev: the device structure to fill
>>> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>>>       /* device class may be changed after fixup */
>>>       class = dev->class >> 8;
>>>
>>> +     if (is_kdump_kernel())
>>> +             pci_disable_serr(dev);
>>> +
>>
>> This sounds like something that needs to be done while shutting down
>> the first kernel as part of the kdump procedure rather than boot of
>> the kdump kernel in pci setup.
> 
> For kdump path, first kernel shutdown path is not called.
> 
> We have to do sth in second kernel instead.
> 

I didn't know that. Bjorn's compatibility point is also a concern. 
Ideally you want something in both places.

> Thanks
> 
> Yinghai
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-21  0:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19  0:31 [PATCH v2] PCI: disable SERR for kdump kernel Yinghai Lu
2017-04-20 17:14 ` Sinan Kaya
2017-04-20 18:38   ` Bjorn Helgaas
2017-04-20 18:52     ` Sinan Kaya
2017-04-20 23:37   ` Yinghai Lu
2017-04-21  0:04     ` Sinan Kaya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).