linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: unlisted-recipients:; (no To-header on input)
Cc: "Kuppuswamy\,
	Sathyanarayanan"  <sathyanarayanan.kuppuswamy@linux.intel.com>,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [PATCH] PCI/ERR: Resolve regression in pcie_do_recovery
Date: Wed, 06 May 2020 11:08:35 -0700	[thread overview]
Message-ID: <4681.1588788515@famine> (raw)
In-Reply-To: <14682.1588279297@famine>

Jay Vosburgh <jay.vosburgh@canonical.com> wrote:

>"Kuppuswamy, Sathyanarayanan" wrote:
>
>>Hi Jay,
>>
>>On 4/29/20 6:15 PM, Kuppuswamy, Sathyanarayanan wrote:
>>>
>>>
>>> On 4/29/20 5:42 PM, Jay Vosburgh wrote:
>>>>     Commit 6d2c89441571 ("PCI/ERR: Update error status after
>>>> reset_link()"), introduced a regression, as pcie_do_recovery will
>>>> discard the status result from report_frozen_detected.  This can cause a
>>>> failure to recover if _NEED_RESET is returned by report_frozen_detected
>>>> and report_slot_reset is not invoked.
>>>>
>>>>     Such an event can be induced for testing purposes by reducing
>>>> the Max_Payload_Size of a PCIe bridge to less than that of a device
>>>> downstream from the bridge, and then initating I/O through the device,
>>>> resulting in oversize transactions.  In the presence of DPC, this
>>>> results in a containment event and attempted reset and recovery via
>>>> pcie_do_recovery.  After 6d2c89441571 report_slot_reset is not invoked,
>>>> and the device does not recover.
>>>
>>> I think this issue is related to the issue discussed in following
>>> thread (DPC non-hotplug support).
>>>
>>> https://lkml.org/lkml/2020/3/28/328
>>>
>>> If my assumption is correct, you are dealing with devices which are
>>> not hotplug capable. If the devices are hotplug capable then you don't
>>> need to proceed to report_slot_reset(), since hotplug handler will
>>> remove/re-enumerate the devices correctly.
>
>	Correct, this particular device (a network card) is in a
>non-hotplug slot.
>
>>Can you check whether following fix works for you?
>
>	Yes, it does.
>
>	I fixed up the whitespace and made a minor change to add braces
>in what look like the correct places around the "if (reset_link)" block;
>the patch I tested with is below.  I'll also install this on another
>machine with hotplug capable slots to test there as well.

	We've tested the below patch on a couple of different machines
and devices (network card, NVMe device) and it appears to solve the
recovery issue in our testing.

	Is there anything further we need to do, or can this be
considered for inclusion upstream at this time?

	-J

>diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>index 14bb8f54723e..db80e1ecb2dc 100644
>--- a/drivers/pci/pcie/err.c
>+++ b/drivers/pci/pcie/err.c
>@@ -165,13 +165,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> 	pci_dbg(dev, "broadcast error_detected message\n");
> 	if (state == pci_channel_io_frozen) {
> 		pci_walk_bus(bus, report_frozen_detected, &status);
>-		status = reset_link(dev);
>-		if (status != PCI_ERS_RESULT_RECOVERED) {
>+		status = PCI_ERS_RESULT_NEED_RESET;
>+	} else {
>+		pci_walk_bus(bus, report_normal_detected, &status);
>+	}
>+
>+	if (status == PCI_ERS_RESULT_NEED_RESET) {
>+		if (reset_link) {
>+			if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED)
>+				status = PCI_ERS_RESULT_DISCONNECT;
>+		} else {
>+			if (pci_bus_error_reset(dev))
>+				status = PCI_ERS_RESULT_DISCONNECT;
>+		}
>+
>+		if (status == PCI_ERS_RESULT_DISCONNECT) {
> 			pci_warn(dev, "link reset failed\n");
> 			goto failed;
> 		}
>-	} else {
>-		pci_walk_bus(bus, report_normal_detected, &status);
> 	}
> 
> 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>
>
>	-J
>
>>This includes support for bus_reset in recovery function itself.
>>
>>index 14bb8f54723e..c9eaab68ab7a 100644
>>--- a/drivers/pci/pcie/err.c
>>+++ b/drivers/pci/pcie/err.c
>>@@ -165,13 +165,23 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>        pci_dbg(dev, "broadcast error_detected message\n");
>>        if (state == pci_channel_io_frozen) {
>>        if (state == pci_channel_io_frozen) {
>>                pci_walk_bus(bus, report_frozen_detected, &status);
>>-               status = reset_link(dev);
>>-               if (status != PCI_ERS_RESULT_RECOVERED) {
>>+               status = PCI_ERS_RESULT_NEED_RESET;
>>+       } else {
>>+               pci_walk_bus(bus, report_normal_detected, &status);
>>+       }
>>+
>>+       if (status == PCI_ERS_RESULT_NEED_RESET) {
>>+               if (reset_link)
>>+                       if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED)
>>+                               status = PCI_ERS_RESULT_DISCONNECT;
>>+               else
>>+                       if (pci_bus_error_reset(dev))
>>+                               status = PCI_ERS_RESULT_DISCONNECT;
>>+
>>+               if (status == PCI_ERS_RESULT_DISCONNECT) {
>>                        pci_warn(dev, "link reset failed\n");
>>                        goto failed;
>>                }
>>-       } else {
>>-               pci_walk_bus(bus, report_normal_detected, &status);
>>        }
>>
>>        if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>
>>
>>>
>>>>
>>>>     Inspection shows a similar path is plausible for a return of
>>>> _CAN_RECOVER and the invocation of report_mmio_enabled.
>>>>
>>>>     Resolve this by preserving the result of report_frozen_detected if
>>>> reset_link does not return _DISCONNECT.
>>>>
>>>> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
>>>> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>>>>
>>>> ---
>>>>   drivers/pci/pcie/err.c | 11 +++++++++--
>>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index 14bb8f54723e..e4274562f3a0 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -164,10 +164,17 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev
>>>> *dev,
>>>>       pci_dbg(dev, "broadcast error_detected message\n");
>>>>       if (state == pci_channel_io_frozen) {
>>>> +        pci_ers_result_t status2;
>>>> +
>>>>           pci_walk_bus(bus, report_frozen_detected, &status);
>>>> -        status = reset_link(dev);
>>>> -        if (status != PCI_ERS_RESULT_RECOVERED) {
>>>> +        /* preserve status from report_frozen_detected to
>>>> +         * insure report_mmio_enabled or report_slot_reset are
>>>> +         * invoked even if reset_link returns _RECOVERED.
>>>> +         */
>>>> +        status2 = reset_link(dev);
>>>> +        if (status2 != PCI_ERS_RESULT_RECOVERED) {
>>>>               pci_warn(dev, "link reset failed\n");
>>>> +            status = status2;
>>>>               goto failed;
>>>>           }
>>>>       } else {
>>>>
>
>---
>	-Jay Vosburgh, jay.vosburgh@canonical.com

  reply	other threads:[~2020-05-06 18:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30  0:42 [PATCH] PCI/ERR: Resolve regression in pcie_do_recovery Jay Vosburgh
2020-04-30  1:15 ` Kuppuswamy, Sathyanarayanan
2020-04-30 19:35   ` Kuppuswamy, Sathyanarayanan
2020-04-30 20:41     ` Jay Vosburgh
2020-05-06 18:08       ` Jay Vosburgh [this message]
2020-05-09  6:35       ` Yicong Yang
2020-05-09 17:55         ` Kuppuswamy, Sathyanarayanan
2020-05-09  8:34 ` Yicong Yang
     [not found] <20200506203249.GA453633@bjorn-Precision-5520>
2020-05-07  0:56 ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4681.1588788515@famine \
    --to=jay.vosburgh@canonical.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).