Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: sathyanarayanan.kuppuswamy@linux.intel.com
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	ashok.raj@intel.com
Subject: Re: [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case
Date: Tue, 24 Mar 2020 18:49:44 -0500
Message-ID: <20200324234944.GA73526@google.com> (raw)
In-Reply-To: <0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com>

On Mon, Mar 23, 2020 at 05:26:00PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> If hotplug is supported, during DPC event, hotplug
> driver would remove the affected devices and detach
> the drivers on DLLSC link down event and will
> re-enumerate it once the DPC recovery is handled
> and link comes back online (on DLLSC LINK up event).
> Hence we don't depend on .mmio_enabled or .slot_reset
> callbacks in error recovery handler to restore the
> device.
> 
> But if hotplug is not supported/enabled, then we need
> to let the error recovery handler attempt
> the recovery of the devices using slot reset.
> 
> So if hotplug is not supported, then instead of
> returning PCI_ERS_RESULT_RECOVERED, return
> PCI_ERS_RESULT_NEED_RESET.
> 
> Also modify the way error recovery handler processes
> the recovery value.
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/pcie/dpc.c | 8 ++++++++
>  drivers/pci/pcie/err.c | 5 +++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index e06f42f58d3d..0e356ed0d73f 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -13,6 +13,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/init.h>
>  #include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
>  
>  #include "portdrv.h"
>  #include "../pci.h"
> @@ -144,6 +145,13 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>  	if (!pcie_wait_for_link(pdev, true))
>  		return PCI_ERS_RESULT_DISCONNECT;
>  
> +	/*
> +	 * If hotplug is not supported/enabled then let the device
> +	 * recover using slot reset.
> +	 */
> +	if (!hotplug_is_native(pdev))
> +		return PCI_ERS_RESULT_NEED_RESET;

I don't understand why hotplug is relevant here.  This path
(dpc_reset_link()) is only used for downstream ports that support DPC.
DPC has already disabled the link, which resets everything below the
port, regardless of whether the port supports hotplug.

I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more
than it actually *does*.  The doc (pci-error-recovery.rst) says
.error_detected() can return PCI_ERS_RESULT_NEED_RESET to *request* a
slot reset.  But if that happens, pcie_do_recovery() doesn't do a
reset at all.  It calls the driver's .slot_reset() method, which tells
the driver "we've reset your device; please re-initialize the
hardware."

I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of
that implementation deficiency in pcie_do_recovery(): we know the
downstream devices have already been reset via DPC, and returning
PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the
driver about that reset.

I can see how this achieves the desired result, but if/when we fix
pcie_do_recovery() to actually *do* the reset promised by
PCI_ERS_RESULT_NEED_RESET, we will be doing *two* resets: the first
via DPC and a second via whatever slot reset mechanism
pcie_do_recovery() would use.

So I guess the real issue (as you allude to in the commit log) is that
we rely on hotplug to unbind/rebind the driver, and without hotplug we
need to at least tell the driver the device was reset.

I'll try to expand the comment here so it reminds me what's going on
when we have to look at this again :)  Let me know if I'm on the right
track.

>  	return PCI_ERS_RESULT_RECOVERED;
>  }
>  
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 1ac57e9e1e71..6e52591a4722 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -178,7 +178,8 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service)
>  		return PCI_ERS_RESULT_DISCONNECT;
>  	}
>  
> -	if (status != PCI_ERS_RESULT_RECOVERED) {
> +	if ((status != PCI_ERS_RESULT_RECOVERED) &&
> +	    (status != PCI_ERS_RESULT_NEED_RESET)) {
>  		pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n",
>  			pci_name(dev));
>  		return PCI_ERS_RESULT_DISCONNECT;
> @@ -206,7 +207,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>  	if (state == pci_channel_io_frozen) {
>  		pci_walk_bus(bus, report_frozen_detected, &status);
>  		status = reset_link(dev, service);
> -		if (status != PCI_ERS_RESULT_RECOVERED)
> +		if (status == PCI_ERS_RESULT_DISCONNECT)
>  			goto failed;
>  	} else {
>  		pci_walk_bus(bus, report_normal_detected, &status);
> -- 
> 2.17.1
> 

  reply index

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24  0:25 [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-24  0:25 ` [PATCH v18 01/11] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
2020-03-24  0:25 ` [PATCH v18 02/11] PCI: move {pciehp,shpchp}_is_native() definitions to pci.c sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 03/11] PCI/DPC: Fix DPC recovery issue in non hotplug case sathyanarayanan.kuppuswamy
2020-03-24 23:49   ` Bjorn Helgaas [this message]
2020-03-25  1:17     ` Kuppuswamy, Sathyanarayanan
2020-03-28 17:10       ` Bjorn Helgaas
2020-03-28 22:04         ` Kuppuswamy, Sathyanarayanan
2020-03-28 22:21           ` Bjorn Helgaas
2020-03-28 22:40             ` Kuppuswamy, Sathyanarayanan
2020-03-24  0:26 ` [PATCH v18 04/11] PCI/DPC: Move DPC data into struct pci_dev sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 05/11] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-28 21:12   ` Kuppuswamy, Sathyanarayanan
2020-03-28 21:32     ` Bjorn Helgaas
2020-03-28 21:55       ` Kuppuswamy, Sathyanarayanan
2020-03-28 22:16         ` Bjorn Helgaas
2020-03-24  0:26 ` [PATCH v18 06/11] PCI/ERR: Return status of pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 07/11] PCI/DPC: Cache DPC capabilities in pci_init_capabilities() sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 08/11] PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 09/11] PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR sathyanarayanan.kuppuswamy
2020-03-24  0:26 ` [PATCH v18 10/11] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-24 21:37   ` Bjorn Helgaas
2020-03-25  1:00     ` Kuppuswamy, Sathyanarayanan
2020-03-26 22:36       ` Bjorn Helgaas
2020-03-24  0:26 ` [PATCH v18 11/11] PCI/AER: Rationalize error status register clearing sathyanarayanan.kuppuswamy
2020-03-31 15:28 ` [PATCH v18 00/11] Add Error Disconnect Recover (EDR) support Bjorn Helgaas
2020-03-31 16:28   ` Kuppuswamy, Sathyanarayanan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200324234944.GA73526@google.com \
    --to=helgaas@kernel.org \
    --cc=ashok.raj@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git