All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: sathyanarayanan.kuppuswamy@linux.intel.com
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, ashok.raj@intel.com
Subject: Re: [PATCH v2 1/2] PCI/ERR: Fix fatal error recovery for non-hotplug capable devices
Date: Thu, 04 Jun 2020 21:47:24 -0700	[thread overview]
Message-ID: <25283.1591332444@famine> (raw)
In-Reply-To: <ce417fbf81a8a46a89535f44b9224ee9fbb55a29.1591307288.git.sathyanarayanan.kuppuswamy@linux.intel.com>

sathyanarayanan.kuppuswamy@linux.intel.com wrote:

>From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
>Fatal (DPC) error recovery is currently broken for non-hotplug
>capable devices. With current implementation, after successful
>fatal error recovery, non-hotplug capable device state won't be
>restored properly. You can find related issues in following links.
>
>https://lkml.org/lkml/2020/5/27/290
>https://lore.kernel.org/linux-pci/12115.1588207324@famine/
>https://lkml.org/lkml/2020/3/28/328
>
>Current fatal error recovery implementation relies on hotplug handler
>for detaching/re-enumerating the affected devices/drivers on DLLSC
>state changes. So when dealing with non-hotplug capable devices,
>recovery code does not restore the state of the affected devices
>correctly. Correct implementation should call report_slot_reset()
>function after resetting the link to restore the state of the
>device/driver.
>
>So use PCI_ERS_RESULT_NEED_RESET as error status for successful
>reset_link() operation and use PCI_ERS_RESULT_DISCONNECT for failure
>case. PCI_ERS_RESULT_NEED_RESET error state will ensure slot_reset()
>is called after reset link operation which will also fix the above
>mentioned issue.
>
>[original patch is from jay.vosburgh@canonical.com]
>[original patch link https://lore.kernel.org/linux-pci/12115.1588207324@famine/]
>Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
>Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

	I've tested this patch set on one of our test machines, and it
resolves the issue.  I plan to test with other systems tomorrow.

	-J

>---
> drivers/pci/pcie/err.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>index 14bb8f54723e..5fe8561c7185 100644
>--- a/drivers/pci/pcie/err.c
>+++ b/drivers/pci/pcie/err.c
>@@ -165,8 +165,28 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> 	pci_dbg(dev, "broadcast error_detected message\n");
> 	if (state == pci_channel_io_frozen) {
> 		pci_walk_bus(bus, report_frozen_detected, &status);
>-		status = reset_link(dev);
>-		if (status != PCI_ERS_RESULT_RECOVERED) {
>+		/*
>+		 * After resetting the link using reset_link() call, the
>+		 * possible value of error status is either
>+		 * PCI_ERS_RESULT_DISCONNECT (failure case) or
>+		 * PCI_ERS_RESULT_NEED_RESET (success case).
>+		 * So ignore the return value of report_error_detected()
>+		 * call for fatal errors. Instead use
>+		 * PCI_ERS_RESULT_NEED_RESET as initial status value.
>+		 *
>+		 * Ignoring the status return value of report_error_detected()
>+		 * call will also help in case of EDR mode based error
>+		 * recovery. In EDR mode AER and DPC Capabilities are owned by
>+		 * firmware and hence report_error_detected() call will possibly
>+		 * return PCI_ERS_RESULT_NO_AER_DRIVER. So if we don't ignore
>+		 * the return value of report_error_detected() then
>+		 * pcie_do_recovery() would report incorrect status after
>+		 * successful recovery. Ignoring PCI_ERS_RESULT_NO_AER_DRIVER
>+		 * in non EDR case should not have any functional impact.
>+		 */
>+		status = PCI_ERS_RESULT_NEED_RESET;
>+		if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) {
>+			status = PCI_ERS_RESULT_DISCONNECT;
> 			pci_warn(dev, "link reset failed\n");
> 			goto failed;
> 		}
>-- 
>2.17.1
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

  parent reply	other threads:[~2020-06-05  4:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 21:50 [PATCH v2 1/2] PCI/ERR: Fix fatal error recovery for non-hotplug capable devices sathyanarayanan.kuppuswamy
2020-06-04 21:50 ` [PATCH v2 2/2] PCI/ERR: Add reset support for non fatal errors sathyanarayanan.kuppuswamy
2020-06-28 12:57   ` Yicong Yang
2020-06-05  4:47 ` Jay Vosburgh [this message]
2020-06-24 18:52   ` [PATCH v2 1/2] PCI/ERR: Fix fatal error recovery for non-hotplug capable devices Jay Vosburgh
2020-06-28 12:59     ` Yicong Yang
2020-07-14 23:08 ` Bjorn Helgaas
2020-07-16  1:54   ` Kuppuswamy, Sathyanarayanan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25283.1591332444@famine \
    --to=jay.vosburgh@canonical.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.