linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
@ 2021-05-01  8:29 Lukas Wunner
  2021-05-01  8:38 ` Lukas Wunner
  2021-06-16 22:19 ` Bjorn Helgaas
  0 siblings, 2 replies; 12+ messages in thread
From: Lukas Wunner @ 2021-05-01  8:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Kuppuswamy Sathyanarayanan, Dan Williams, Ethan Zhao, Sinan Kaya,
	Ashok Raj, Keith Busch, Yicong Yang, linux-pci, Russell Currey,
	Oliver OHalloran, Stuart Hayes, Mika Westerberg

Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
link upon an error and attempts to re-enable it when instructed by the
DPC driver.

A slot which is both DPC- and hotplug-capable is currently brought down
by pciehp once DPC is triggered (due to the link change) and brought up
on successful recovery.  That's undesirable, the slot should remain up
so that the hotplugged device remains bound to its driver.  DPC notifies
the driver of the error and of successful recovery in pcie_do_recovery()
and the driver may then restore the device to working state.

Moreover, Sinan points out that turning off slot power by pciehp may
foil recovery by DPC:  Power off/on is a cold reset concurrently to
DPC's warm reset.  Sathyanarayanan reports extended delays or failure
in link retraining by DPC if pciehp brings down the slot.

Fix by detecting whether a Link Down event is caused by DPC and awaiting
recovery if so.  On successful recovery, ignore both the Link Down and
the subsequent Link Up event.

Afterwards, check whether the link is down to detect surprise-removal or
another DPC event immediately after DPC recovery.  Ensure that the
corresponding DLLSC event is not ignored by synthesizing it and
invoking irq_wake_thread() to trigger a re-run of pciehp_ist().

The IRQ threads of the hotplug and DPC drivers, pciehp_ist() and
dpc_handler(), race against each other.  If pciehp is faster than DPC,
it will wait until DPC recovery completes.

Recovery consists of two steps:  The first step (waiting for link
disablement) is recognizable by pciehp through a set DPC Trigger Status
bit.  The second step (waiting for link retraining) is recognizable
through a newly introduced PCI_DPC_RECOVERING flag.

If DPC is faster than pciehp, neither of the two flags will be set and
pciehp may glean the recovery status from the new PCI_DPC_RECOVERED flag.
The flag is zero if DPC didn't occur at all, hence DLLSC events are not
ignored by default.

pciehp waits up to 4 seconds before assuming that DPC recovery failed
and bringing down the slot.  This timeout is not taken from the spec
(it doesn't mandate one) but based on a report from Yicong Yang that
DPC may take a bit more than 3 seconds on HiSilicon's Kunpeng platform.

The timeout is necessary because the DPC Trigger Status bit may never
clear:  On Root Ports which support RP Extensions for DPC, the DPC
driver polls the DPC RP Busy bit for up to 1 second before giving up on
DPC recovery.  Without the timeout, pciehp would then wait indefinitely
for DPC to complete.

This commit draws inspiration from previous attempts to synchronize DPC
with pciehp:

By Sinan Kaya, August 2018:
https://lore.kernel.org/linux-pci/20180818065126.77912-1-okaya@kernel.org/

By Ethan Zhao, October 2020:
https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/

By Kuppuswamy Sathyanarayanan, March 2021:
https://lore.kernel.org/linux-pci/59cb30f5e5ac6d65427ceaadf1012b2ba8dbf66c.1615606143.git.sathyanarayanan.kuppuswamy@linux.intel.com/

Reported-by: Sinan Kaya <okaya@kernel.org>
Reported-by: Ethan Zhao <haifeng.zhao@intel.com>
Reported-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/hotplug/pciehp_hpc.c | 36 ++++++++++++++++
 drivers/pci/pci.h                |  4 ++
 drivers/pci/pcie/dpc.c           | 74 +++++++++++++++++++++++++++++---
 3 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index fb3840e222ad..9d06939736c0 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -563,6 +563,32 @@ void pciehp_power_off_slot(struct controller *ctrl)
 		 PCI_EXP_SLTCTL_PWR_OFF);
 }
 
+static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
+					  struct pci_dev *pdev, int irq)
+{
+	/*
+	 * Ignore link changes which occurred while waiting for DPC recovery.
+	 * Could be several if DPC triggered multiple times consecutively.
+	 */
+	synchronize_hardirq(irq);
+	atomic_and(~PCI_EXP_SLTSTA_DLLSC, &ctrl->pending_events);
+	if (pciehp_poll_mode)
+		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
+					   PCI_EXP_SLTSTA_DLLSC);
+	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
+		  slot_name(ctrl));
+
+	/*
+	 * If the link is unexpectedly down after successful recovery,
+	 * the corresponding link change may have been ignored above.
+	 * Synthesize it to ensure that it is acted on.
+	 */
+	down_read(&ctrl->reset_lock);
+	if (!pciehp_check_link_active(ctrl))
+		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+	up_read(&ctrl->reset_lock);
+}
+
 static irqreturn_t pciehp_isr(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
@@ -706,6 +732,16 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 				      PCI_EXP_SLTCTL_ATTN_IND_ON);
 	}
 
+	/*
+	 * Ignore Link Down/Up events caused by Downstream Port Containment
+	 * if recovery from the error succeeded.
+	 */
+	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
+	    ctrl->state == ON_STATE) {
+		events &= ~PCI_EXP_SLTSTA_DLLSC;
+		pciehp_ignore_dpc_link_change(ctrl, pdev, irq);
+	}
+
 	/*
 	 * Disable requests have higher priority than Presence Detect Changed
 	 * or Data Link Layer State Changed events.
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4c13e2ff05eb..587cc92e182d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -385,6 +385,8 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DPC_RECOVERED 1
+#define PCI_DPC_RECOVERING 2
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -439,10 +441,12 @@ void pci_restore_dpc_state(struct pci_dev *dev);
 void pci_dpc_init(struct pci_dev *pdev);
 void dpc_process_error(struct pci_dev *pdev);
 pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
+bool pci_dpc_recovered(struct pci_dev *pdev);
 #else
 static inline void pci_save_dpc_state(struct pci_dev *dev) {}
 static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
 static inline void pci_dpc_init(struct pci_dev *pdev) {}
+static inline bool pci_dpc_recovered(struct pci_dev *pdev) { return false; }
 #endif
 
 #ifdef CONFIG_PCIEPORTBUS
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index e05aba86a317..c556e7beafe3 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -71,6 +71,58 @@ void pci_restore_dpc_state(struct pci_dev *dev)
 	pci_write_config_word(dev, dev->dpc_cap + PCI_EXP_DPC_CTL, *cap);
 }
 
+static DECLARE_WAIT_QUEUE_HEAD(dpc_completed_waitqueue);
+
+#ifdef CONFIG_HOTPLUG_PCI_PCIE
+static bool dpc_completed(struct pci_dev *pdev)
+{
+	u16 status;
+
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
+	if ((status != 0xffff) && (status & PCI_EXP_DPC_STATUS_TRIGGER))
+		return false;
+
+	if (test_bit(PCI_DPC_RECOVERING, &pdev->priv_flags))
+		return false;
+
+	return true;
+}
+
+/**
+ * pci_dpc_recovered - whether DPC triggered and has recovered successfully
+ * @pdev: PCI device
+ *
+ * Return true if DPC was triggered for @pdev and has recovered successfully.
+ * Wait for recovery if it hasn't completed yet.  Called from the PCIe hotplug
+ * driver to recognize and ignore Link Down/Up events caused by DPC.
+ */
+bool pci_dpc_recovered(struct pci_dev *pdev)
+{
+	struct pci_host_bridge *host;
+
+	if (!pdev->dpc_cap)
+		return false;
+
+	/*
+	 * Synchronization between hotplug and DPC is not supported
+	 * if DPC is owned by firmware and EDR is not enabled.
+	 */
+	host = pci_find_host_bridge(pdev->bus);
+	if (!host->native_dpc && !IS_ENABLED(CONFIG_PCIE_EDR))
+		return false;
+
+	/*
+	 * Need a timeout in case DPC never completes due to failure of
+	 * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
+	 * but reports indicate that DPC completes within 4 seconds.
+	 */
+	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
+			   msecs_to_jiffies(4000));
+
+	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
+}
+#endif /* CONFIG_HOTPLUG_PCI_PCIE */
+
 static int dpc_wait_rp_inactive(struct pci_dev *pdev)
 {
 	unsigned long timeout = jiffies + HZ;
@@ -91,8 +143,11 @@ static int dpc_wait_rp_inactive(struct pci_dev *pdev)
 
 pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 {
+	pci_ers_result_t ret;
 	u16 cap;
 
+	set_bit(PCI_DPC_RECOVERING, &pdev->priv_flags);
+
 	/*
 	 * DPC disables the Link automatically in hardware, so it has
 	 * already been reset by the time we get here.
@@ -106,18 +161,27 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	if (!pcie_wait_for_link(pdev, false))
 		pci_info(pdev, "Data Link Layer Link Active not cleared in 1000 msec\n");
 
-	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
-		return PCI_ERS_RESULT_DISCONNECT;
+	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) {
+		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
+		ret = PCI_ERS_RESULT_DISCONNECT;
+		goto out;
+	}
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
 
 	if (!pcie_wait_for_link(pdev, true)) {
 		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
-		return PCI_ERS_RESULT_DISCONNECT;
+		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
+		ret = PCI_ERS_RESULT_DISCONNECT;
+	} else {
+		set_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
+		ret = PCI_ERS_RESULT_RECOVERED;
 	}
-
-	return PCI_ERS_RESULT_RECOVERED;
+out:
+	clear_bit(PCI_DPC_RECOVERING, &pdev->priv_flags);
+	wake_up_all(&dpc_completed_waitqueue);
+	return ret;
 }
 
 static void dpc_process_rp_pio_error(struct pci_dev *pdev)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-05-01  8:29 [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC Lukas Wunner
@ 2021-05-01  8:38 ` Lukas Wunner
  2021-06-16 22:19 ` Bjorn Helgaas
  1 sibling, 0 replies; 12+ messages in thread
From: Lukas Wunner @ 2021-05-01  8:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Kuppuswamy Sathyanarayanan, Dan Williams, Ethan Zhao, Sinan Kaya,
	Ashok Raj, Keith Busch, Yicong Yang, linux-pci, Russell Currey,
	Oliver OHalloran, Stuart Hayes, Mika Westerberg

On Sat, May 01, 2021 at 10:29:00AM +0200, Lukas Wunner wrote:
> Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
> link upon an error and attempts to re-enable it when instructed by the
> DPC driver.
[...]

Sorry, forgot to include the changelog v1 -> v2:

* Raise timeout for DPC completion from 3 sec to 4 sec (Yicong Yang)

* Amend commit message to clarify that the timeout is not taken from
  the spec but rather based on reports (Yicong Yang, Ethan Zhao)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-05-01  8:29 [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC Lukas Wunner
  2021-05-01  8:38 ` Lukas Wunner
@ 2021-06-16 22:19 ` Bjorn Helgaas
  2021-06-20  7:38   ` Lukas Wunner
  1 sibling, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2021-06-16 22:19 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Kuppuswamy Sathyanarayanan, Dan Williams, Ethan Zhao, Sinan Kaya,
	Ashok Raj, Keith Busch, Yicong Yang, linux-pci, Russell Currey,
	Oliver OHalloran, Stuart Hayes, Mika Westerberg

On Sat, May 01, 2021 at 10:29:00AM +0200, Lukas Wunner wrote:
> Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
> link upon an error and attempts to re-enable it when instructed by the
> DPC driver.
> 
> A slot which is both DPC- and hotplug-capable is currently brought down
> by pciehp once DPC is triggered (due to the link change) and brought up
> on successful recovery.  That's undesirable, the slot should remain up
> so that the hotplugged device remains bound to its driver.

I think the slot being "brought down" means slot power is turned off,
right?

I reworded it along those lines and applied this to pci/hotplug for
v5.14, thanks!

> DPC notifies
> the driver of the error and of successful recovery in pcie_do_recovery()
> and the driver may then restore the device to working state.
> 
> Moreover, Sinan points out that turning off slot power by pciehp may
> foil recovery by DPC:  Power off/on is a cold reset concurrently to
> DPC's warm reset.  Sathyanarayanan reports extended delays or failure
> in link retraining by DPC if pciehp brings down the slot.
> 
> Fix by detecting whether a Link Down event is caused by DPC and awaiting
> recovery if so.  On successful recovery, ignore both the Link Down and
> the subsequent Link Up event.
> 
> Afterwards, check whether the link is down to detect surprise-removal or
> another DPC event immediately after DPC recovery.  Ensure that the
> corresponding DLLSC event is not ignored by synthesizing it and
> invoking irq_wake_thread() to trigger a re-run of pciehp_ist().
> 
> The IRQ threads of the hotplug and DPC drivers, pciehp_ist() and
> dpc_handler(), race against each other.  If pciehp is faster than DPC,
> it will wait until DPC recovery completes.
> 
> Recovery consists of two steps:  The first step (waiting for link
> disablement) is recognizable by pciehp through a set DPC Trigger Status
> bit.  The second step (waiting for link retraining) is recognizable
> through a newly introduced PCI_DPC_RECOVERING flag.
> 
> If DPC is faster than pciehp, neither of the two flags will be set and
> pciehp may glean the recovery status from the new PCI_DPC_RECOVERED flag.
> The flag is zero if DPC didn't occur at all, hence DLLSC events are not
> ignored by default.
> 
> pciehp waits up to 4 seconds before assuming that DPC recovery failed
> and bringing down the slot.  This timeout is not taken from the spec
> (it doesn't mandate one) but based on a report from Yicong Yang that
> DPC may take a bit more than 3 seconds on HiSilicon's Kunpeng platform.
> 
> The timeout is necessary because the DPC Trigger Status bit may never
> clear:  On Root Ports which support RP Extensions for DPC, the DPC
> driver polls the DPC RP Busy bit for up to 1 second before giving up on
> DPC recovery.  Without the timeout, pciehp would then wait indefinitely
> for DPC to complete.
> 
> This commit draws inspiration from previous attempts to synchronize DPC
> with pciehp:
> 
> By Sinan Kaya, August 2018:
> https://lore.kernel.org/linux-pci/20180818065126.77912-1-okaya@kernel.org/
> 
> By Ethan Zhao, October 2020:
> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/
> 
> By Kuppuswamy Sathyanarayanan, March 2021:
> https://lore.kernel.org/linux-pci/59cb30f5e5ac6d65427ceaadf1012b2ba8dbf66c.1615606143.git.sathyanarayanan.kuppuswamy@linux.intel.com/
> 
> Reported-by: Sinan Kaya <okaya@kernel.org>
> Reported-by: Ethan Zhao <haifeng.zhao@intel.com>
> Reported-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Tested-by: Yicong Yang <yangyicong@hisilicon.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 36 ++++++++++++++++
>  drivers/pci/pci.h                |  4 ++
>  drivers/pci/pcie/dpc.c           | 74 +++++++++++++++++++++++++++++---
>  3 files changed, 109 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index fb3840e222ad..9d06939736c0 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -563,6 +563,32 @@ void pciehp_power_off_slot(struct controller *ctrl)
>  		 PCI_EXP_SLTCTL_PWR_OFF);
>  }
>  
> +static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> +					  struct pci_dev *pdev, int irq)
> +{
> +	/*
> +	 * Ignore link changes which occurred while waiting for DPC recovery.
> +	 * Could be several if DPC triggered multiple times consecutively.
> +	 */
> +	synchronize_hardirq(irq);
> +	atomic_and(~PCI_EXP_SLTSTA_DLLSC, &ctrl->pending_events);
> +	if (pciehp_poll_mode)
> +		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> +					   PCI_EXP_SLTSTA_DLLSC);
> +	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
> +		  slot_name(ctrl));
> +
> +	/*
> +	 * If the link is unexpectedly down after successful recovery,
> +	 * the corresponding link change may have been ignored above.
> +	 * Synthesize it to ensure that it is acted on.
> +	 */
> +	down_read(&ctrl->reset_lock);
> +	if (!pciehp_check_link_active(ctrl))
> +		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
> +	up_read(&ctrl->reset_lock);
> +}
> +
>  static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  {
>  	struct controller *ctrl = (struct controller *)dev_id;
> @@ -706,6 +732,16 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  				      PCI_EXP_SLTCTL_ATTN_IND_ON);
>  	}
>  
> +	/*
> +	 * Ignore Link Down/Up events caused by Downstream Port Containment
> +	 * if recovery from the error succeeded.
> +	 */
> +	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
> +	    ctrl->state == ON_STATE) {
> +		events &= ~PCI_EXP_SLTSTA_DLLSC;
> +		pciehp_ignore_dpc_link_change(ctrl, pdev, irq);
> +	}
> +
>  	/*
>  	 * Disable requests have higher priority than Presence Detect Changed
>  	 * or Data Link Layer State Changed events.
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4c13e2ff05eb..587cc92e182d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -385,6 +385,8 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>  
>  /* pci_dev priv_flags */
>  #define PCI_DEV_ADDED 0
> +#define PCI_DPC_RECOVERED 1
> +#define PCI_DPC_RECOVERING 2
>  
>  static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
>  {
> @@ -439,10 +441,12 @@ void pci_restore_dpc_state(struct pci_dev *dev);
>  void pci_dpc_init(struct pci_dev *pdev);
>  void dpc_process_error(struct pci_dev *pdev);
>  pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
> +bool pci_dpc_recovered(struct pci_dev *pdev);
>  #else
>  static inline void pci_save_dpc_state(struct pci_dev *dev) {}
>  static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
>  static inline void pci_dpc_init(struct pci_dev *pdev) {}
> +static inline bool pci_dpc_recovered(struct pci_dev *pdev) { return false; }
>  #endif
>  
>  #ifdef CONFIG_PCIEPORTBUS
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index e05aba86a317..c556e7beafe3 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -71,6 +71,58 @@ void pci_restore_dpc_state(struct pci_dev *dev)
>  	pci_write_config_word(dev, dev->dpc_cap + PCI_EXP_DPC_CTL, *cap);
>  }
>  
> +static DECLARE_WAIT_QUEUE_HEAD(dpc_completed_waitqueue);
> +
> +#ifdef CONFIG_HOTPLUG_PCI_PCIE
> +static bool dpc_completed(struct pci_dev *pdev)
> +{
> +	u16 status;
> +
> +	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
> +	if ((status != 0xffff) && (status & PCI_EXP_DPC_STATUS_TRIGGER))
> +		return false;
> +
> +	if (test_bit(PCI_DPC_RECOVERING, &pdev->priv_flags))
> +		return false;
> +
> +	return true;
> +}
> +
> +/**
> + * pci_dpc_recovered - whether DPC triggered and has recovered successfully
> + * @pdev: PCI device
> + *
> + * Return true if DPC was triggered for @pdev and has recovered successfully.
> + * Wait for recovery if it hasn't completed yet.  Called from the PCIe hotplug
> + * driver to recognize and ignore Link Down/Up events caused by DPC.
> + */
> +bool pci_dpc_recovered(struct pci_dev *pdev)
> +{
> +	struct pci_host_bridge *host;
> +
> +	if (!pdev->dpc_cap)
> +		return false;
> +
> +	/*
> +	 * Synchronization between hotplug and DPC is not supported
> +	 * if DPC is owned by firmware and EDR is not enabled.
> +	 */
> +	host = pci_find_host_bridge(pdev->bus);
> +	if (!host->native_dpc && !IS_ENABLED(CONFIG_PCIE_EDR))
> +		return false;
> +
> +	/*
> +	 * Need a timeout in case DPC never completes due to failure of
> +	 * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
> +	 * but reports indicate that DPC completes within 4 seconds.
> +	 */
> +	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
> +			   msecs_to_jiffies(4000));
> +
> +	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
> +}
> +#endif /* CONFIG_HOTPLUG_PCI_PCIE */
> +
>  static int dpc_wait_rp_inactive(struct pci_dev *pdev)
>  {
>  	unsigned long timeout = jiffies + HZ;
> @@ -91,8 +143,11 @@ static int dpc_wait_rp_inactive(struct pci_dev *pdev)
>  
>  pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>  {
> +	pci_ers_result_t ret;
>  	u16 cap;
>  
> +	set_bit(PCI_DPC_RECOVERING, &pdev->priv_flags);
> +
>  	/*
>  	 * DPC disables the Link automatically in hardware, so it has
>  	 * already been reset by the time we get here.
> @@ -106,18 +161,27 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>  	if (!pcie_wait_for_link(pdev, false))
>  		pci_info(pdev, "Data Link Layer Link Active not cleared in 1000 msec\n");
>  
> -	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
> -		return PCI_ERS_RESULT_DISCONNECT;
> +	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) {
> +		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
> +		ret = PCI_ERS_RESULT_DISCONNECT;
> +		goto out;
> +	}
>  
>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>  			      PCI_EXP_DPC_STATUS_TRIGGER);
>  
>  	if (!pcie_wait_for_link(pdev, true)) {
>  		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
> -		return PCI_ERS_RESULT_DISCONNECT;
> +		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
> +		ret = PCI_ERS_RESULT_DISCONNECT;
> +	} else {
> +		set_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
> +		ret = PCI_ERS_RESULT_RECOVERED;
>  	}
> -
> -	return PCI_ERS_RESULT_RECOVERED;
> +out:
> +	clear_bit(PCI_DPC_RECOVERING, &pdev->priv_flags);
> +	wake_up_all(&dpc_completed_waitqueue);
> +	return ret;
>  }
>  
>  static void dpc_process_rp_pio_error(struct pci_dev *pdev)
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-06-16 22:19 ` Bjorn Helgaas
@ 2021-06-20  7:38   ` Lukas Wunner
  2021-06-25 20:38     ` stuart hayes
  0 siblings, 1 reply; 12+ messages in thread
From: Lukas Wunner @ 2021-06-20  7:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Kuppuswamy Sathyanarayanan, Dan Williams, Ethan Zhao, Sinan Kaya,
	Ashok Raj, Keith Busch, Yicong Yang, linux-pci, Russell Currey,
	Oliver OHalloran, Stuart Hayes, Mika Westerberg

On Wed, Jun 16, 2021 at 05:19:45PM -0500, Bjorn Helgaas wrote:
> On Sat, May 01, 2021 at 10:29:00AM +0200, Lukas Wunner wrote:
> > Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
> > link upon an error and attempts to re-enable it when instructed by the
> > DPC driver.
> > 
> > A slot which is both DPC- and hotplug-capable is currently brought down
> > by pciehp once DPC is triggered (due to the link change) and brought up
> > on successful recovery.  That's undesirable, the slot should remain up
> > so that the hotplugged device remains bound to its driver.
> 
> I think the slot being "brought down" means slot power is turned off,
> right?
> 
> I reworded it along those lines and applied this to pci/hotplug for
> v5.14, thanks!

Thanks, the reworded commit message LGTM and is more readable.

"Being brought down" is just a colloquial term for pciehp_disable_slot(),
i.e. unbinding and removal of the pci_dev's below the hotplug port,
removing slot power, turning off the power LED and setting the slot's
state to OFF_STATE.

Indeed, turning off slot power concurrently to DPC recovery is wrong
and likely the biggest contributor to the problems seen.

Another issue is that after bringing down the slot due to the Link Change
event, pciehp will notice that Presence Detect State is set and will try
to bring the slot up again, even though DPC recovery may not have completed
yet.

The commit should solve all those synchronization issues between pciehp
and DPC.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-06-20  7:38   ` Lukas Wunner
@ 2021-06-25 20:38     ` stuart hayes
  2021-06-26  6:50       ` Lukas Wunner
  2021-07-19 15:10       ` Lukas Wunner
  0 siblings, 2 replies; 12+ messages in thread
From: stuart hayes @ 2021-06-25 20:38 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Kuppuswamy Sathyanarayanan, Dan Williams, Ethan Zhao, Sinan Kaya,
	Ashok Raj, Keith Busch, Yicong Yang, linux-pci, Russell Currey,
	Oliver OHalloran, Mika Westerberg



On 6/20/2021 2:38 AM, Lukas Wunner wrote:
> On Wed, Jun 16, 2021 at 05:19:45PM -0500, Bjorn Helgaas wrote:
>> On Sat, May 01, 2021 at 10:29:00AM +0200, Lukas Wunner wrote:
>>> Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
>>> link upon an error and attempts to re-enable it when instructed by the
>>> DPC driver.
>>>
>>> A slot which is both DPC- and hotplug-capable is currently brought down
>>> by pciehp once DPC is triggered (due to the link change) and brought up
>>> on successful recovery.  That's undesirable, the slot should remain up
>>> so that the hotplugged device remains bound to its driver.
>>
>> I think the slot being "brought down" means slot power is turned off,
>> right?
>>
>> I reworded it along those lines and applied this to pci/hotplug for
>> v5.14, thanks!
> 
> Thanks, the reworded commit message LGTM and is more readable.
> 
> "Being brought down" is just a colloquial term for pciehp_disable_slot(),
> i.e. unbinding and removal of the pci_dev's below the hotplug port,
> removing slot power, turning off the power LED and setting the slot's
> state to OFF_STATE.
> 
> Indeed, turning off slot power concurrently to DPC recovery is wrong
> and likely the biggest contributor to the problems seen.
> 
> Another issue is that after bringing down the slot due to the Link Change
> event, pciehp will notice that Presence Detect State is set and will try
> to bring the slot up again, even though DPC recovery may not have completed
> yet.
> 
> The commit should solve all those synchronization issues between pciehp
> and DPC.
> 
> Thanks,
> 
> Lukas
> 

Lukas--

I have a system that is failing to recover after an EDR event with (or 
without...) this patch.  It looks like the problem is similar to what 
this patch is trying to fix, except that on my system, the hotplug port 
is downstream of the root port that has DPC, so the "link down" event on 
it is not being ignored.  So the hotplug code disables the slot (which 
contains an NVMe device on this system) while the nvme driver is trying 
to use it, which results in a failed recovery and another EDR event, and 
the kernel ends up with the DPC trigger status bit set in the root port, 
so everything downstream is gone.

I added the hack below so the hotplug code will ignore the "link down" 
events on the ports downstream of the root port during DPC recovery, and 
it recovers no problem.  (I'm not proposing this as a correct fix.)

Does this sound like a real issue, or am I possibly misunderstanding 
something about how this should work?

Thanks
Stuart

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b576aa890c76..dfd983c3c5bf 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -119,8 +132,10 @@ static int report_slot_reset(struct pci_dev *dev, 
void *data)
  		!dev->driver->err_handler->slot_reset)
  		goto out;

+	set_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
  	err_handler = dev->driver->err_handler;
  	vote = err_handler->slot_reset(dev);
+	clear_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
  	*result = merge_result(*result, vote);
  out:
  	device_unlock(&dev->dev);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-06-25 20:38     ` stuart hayes
@ 2021-06-26  6:50       ` Lukas Wunner
  2021-07-06 22:15         ` stuart hayes
  2021-07-19 15:10       ` Lukas Wunner
  1 sibling, 1 reply; 12+ messages in thread
From: Lukas Wunner @ 2021-06-26  6:50 UTC (permalink / raw)
  To: stuart hayes
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg

On Fri, Jun 25, 2021 at 03:38:41PM -0500, stuart hayes wrote:
> I have a system that is failing to recover after an EDR event with (or
> without...) this patch.  It looks like the problem is similar to what this
> patch is trying to fix, except that on my system, the hotplug port is
> downstream of the root port that has DPC, so the "link down" event on it is
> not being ignored.  So the hotplug code disables the slot (which contains an
> NVMe device on this system) while the nvme driver is trying to use it, which
> results in a failed recovery and another EDR event, and the kernel ends up
> with the DPC trigger status bit set in the root port, so everything
> downstream is gone.
> 
> I added the hack below so the hotplug code will ignore the "link down"
> events on the ports downstream of the root port during DPC recovery, and it
> recovers no problem.  (I'm not proposing this as a correct fix.)

Please help me understand what's causing the Link Down event in the
first place:

With DPC, the hardware (only) disables the link on the port containing the
error.  Since that's the Root Port above the hotplug port in your case,
the link between the hotplug port and the NVMe drive should remain up.

Since your patch sets the PCI_DPC_RECOVERING flag during invocation
of the dev->driver->err_handler->slot_reset() hook, I assume that's
what's causing the Link Down.  However pcie_portdrv_slot_reset()
only restores and saves PCI config space, I don't think that's
causing a Link Down?

Is maybe nvme_slot_reset() causing the Link Down on the parent hotplug port?

Thanks,

Lukas

> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index b576aa890c76..dfd983c3c5bf 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -119,8 +132,10 @@ static int report_slot_reset(struct pci_dev *dev, void
> *data)
>  		!dev->driver->err_handler->slot_reset)
>  		goto out;
> 
> +	set_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
>  	err_handler = dev->driver->err_handler;
>  	vote = err_handler->slot_reset(dev);
> +	clear_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
>  	*result = merge_result(*result, vote);
>  out:
>  	device_unlock(&dev->dev);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-06-26  6:50       ` Lukas Wunner
@ 2021-07-06 22:15         ` stuart hayes
  2021-07-18 21:26           ` Lukas Wunner
  0 siblings, 1 reply; 12+ messages in thread
From: stuart hayes @ 2021-07-06 22:15 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg



On 6/26/2021 1:50 AM, Lukas Wunner wrote:
> On Fri, Jun 25, 2021 at 03:38:41PM -0500, stuart hayes wrote:
>> I have a system that is failing to recover after an EDR event with (or
>> without...) this patch.  It looks like the problem is similar to what this
>> patch is trying to fix, except that on my system, the hotplug port is
>> downstream of the root port that has DPC, so the "link down" event on it is
>> not being ignored.  So the hotplug code disables the slot (which contains an
>> NVMe device on this system) while the nvme driver is trying to use it, which
>> results in a failed recovery and another EDR event, and the kernel ends up
>> with the DPC trigger status bit set in the root port, so everything
>> downstream is gone.
>>
>> I added the hack below so the hotplug code will ignore the "link down"
>> events on the ports downstream of the root port during DPC recovery, and it
>> recovers no problem.  (I'm not proposing this as a correct fix.)
> 
> Please help me understand what's causing the Link Down event in the
> first place:
> 
> With DPC, the hardware (only) disables the link on the port containing the
> error.  Since that's the Root Port above the hotplug port in your case,
> the link between the hotplug port and the NVMe drive should remain up.
> 
> Since your patch sets the PCI_DPC_RECOVERING flag during invocation
> of the dev->driver->err_handler->slot_reset() hook, I assume that's
> what's causing the Link Down.  However pcie_portdrv_slot_reset()
> only restores and saves PCI config space, I don't think that's
> causing a Link Down?
> 
> Is maybe nvme_slot_reset() causing the Link Down on the parent hotplug port?
> 
> Thanks,
> 
> Lukas
> 

Sorry for the delayed response--I was out of town.

I believe the Link Down is happening because a hot reset is propagated 
down when the link is lost under the root port 64:02.0.  From the PCIe 
Base Spec 5.0, section 6.6.1 "conventional reset":

• For a Switch, the following must cause a hot reset to be sent on all 
Downstream Ports:
   ...
   ◦ The Data Link Layer of the Upstream Port reporting DL_Down status. 
In Switches that support Link speeds greater than 5.0 GT/s, the Upstream 
Port must direct the LTSSM of each Downstream Port to the Hot Reset 
state, but not hold the LTSSMs in that state. This permits each 
downstream Port to begin Link training immediately after its hot reset 
completes. This behavior is recommended for all Switches.
   ◦ Receiving a hot reset on the Upstream Port
(end of paste from pci spec)

For reference, here's the "lspci -t" output covering the root port 
64:02.0 that is getting the DPC... there are NVMe drives at 69:00.0, 
6a:00.0, 6c:00.0, and 6e:00.0, and a SAS controller at 79:00.0.

  +-[0000:64]-+-00.0
  |           +-00.1
  |           +-00.2
  |           +-00.4
  | 
\-02.0-[65-79]----00.0-[66-79]--+-00.0-[67-70]----00.0-[68-70]--+-00.0-[69]----00.0
  |                                           | 
       +-04.0-[6a]----00.0
  |                                           | 
       +-08.0-[6b]--
  |                                           | 
       +-0c.0-[6c]----00.0
  |                                           | 
       +-10.0-[6d]--
  |                                           | 
       +-14.0-[6e]----00.0
  |                                           | 
       +-18.0-[6f]--
  |                                           | 
       \-1c.0-[70]--
  | 
+-04.0-[71-76]----00.0-[72-76]--+-10.0-[73]--
  |                                           | 
       +-14.0-[74]--
  |                                           | 
       +-18.0-[75]--
  |                                           | 
       \-1c.0-[76]--
  |                                           +-08.0-[77-78]----00.0-[78]--
  |                                           \-1c.0-[79]----00.0

I put in some debug code to printk the config registers before the 
config space is restored.  Before I trigger the DPC, the slot status 
register at 68:00.0 reads 0x40 (presence detected), and after the DPC 
(but before restoring PCI config space for 68:00.0), it reads 0x140 (DLL 
state changed + presence detected).

Before config space is restored to 68:00.0, the command register is 0. 
After config space is restored, I see "pcieport 0000:68:00.0: pciehp: 
pending interrupts 0x0010 from Slot Status" followed by "...pciehp: 
Slot(211): Link Down".  So I assume as soon as it is able to (when its 
config space is restored), 68:00.0 sends the hotplug interrupt, which 
takes down 69:00.0.


>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index b576aa890c76..dfd983c3c5bf 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -119,8 +132,10 @@ static int report_slot_reset(struct pci_dev *dev, void
>> *data)
>>   		!dev->driver->err_handler->slot_reset)
>>   		goto out;
>>
>> +	set_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
>>   	err_handler = dev->driver->err_handler;
>>   	vote = err_handler->slot_reset(dev);
>> +	clear_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
>>   	*result = merge_result(*result, vote);
>>   out:
>>   	device_unlock(&dev->dev);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-07-06 22:15         ` stuart hayes
@ 2021-07-18 21:26           ` Lukas Wunner
  0 siblings, 0 replies; 12+ messages in thread
From: Lukas Wunner @ 2021-07-18 21:26 UTC (permalink / raw)
  To: stuart hayes
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg

On Tue, Jul 06, 2021 at 05:15:02PM -0500, stuart hayes wrote:
> I believe the Link Down is happening because a hot reset is propagated down
> when the link is lost under the root port 64:02.0.  From the PCIe Base Spec
> 5.0, section 6.6.1 "conventional reset":
[...]

Hm, sounds plausible.  Just so that I understand this correctly,
the hotplug port at 0000:68:00.0 is DPC-capable, but the error
that is contained by DPC at the Root Port occurs further up in
the hierarchy, right? (I.e. somewhere above the hotplug port.)

The patch you're using to work around the issue would break if
the hotplug port is *not* DPC-capable.  Yes, yes, I understand
that it's not meant as a real patch, but it shows how tricky
it is to fix the issue.  I need to do a little more thinking
what a proper solution could look like.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-06-25 20:38     ` stuart hayes
  2021-06-26  6:50       ` Lukas Wunner
@ 2021-07-19 15:10       ` Lukas Wunner
  2021-07-19 19:00         ` stuart hayes
  1 sibling, 1 reply; 12+ messages in thread
From: Lukas Wunner @ 2021-07-19 15:10 UTC (permalink / raw)
  To: stuart hayes
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg

On Fri, Jun 25, 2021 at 03:38:41PM -0500, stuart hayes wrote:
> I have a system that is failing to recover after an EDR event with (or
> without...) this patch.  It looks like the problem is similar to what this
> patch is trying to fix, except that on my system, the hotplug port is
> downstream of the root port that has DPC, so the "link down" event on it is
> not being ignored.  So the hotplug code disables the slot (which contains an
> NVMe device on this system) while the nvme driver is trying to use it, which
> results in a failed recovery and another EDR event, and the kernel ends up
> with the DPC trigger status bit set in the root port, so everything
> downstream is gone.
> 
> I added the hack below so the hotplug code will ignore the "link down"
> events on the ports downstream of the root port during DPC recovery, and it
> recovers no problem.  (I'm not proposing this as a correct fix.)

Could you test if the below patch fixes the issue?

Note, this is a hack as well, but I can turn it into a proper patch
if it works as expected.

Thanks!

Lukas

-- >8 --

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..893c7ae1a54d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,10 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
 
 static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 {
+	if (dev->is_hotplug_bridge)
+		pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
+					   PCI_EXP_SLTSTA_DLLSC);
+
 	pci_restore_state(dev);
 	pci_save_state(dev);
 	return PCI_ERS_RESULT_RECOVERED;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-07-19 15:10       ` Lukas Wunner
@ 2021-07-19 19:00         ` stuart hayes
  2021-07-20  6:57           ` Lukas Wunner
  0 siblings, 1 reply; 12+ messages in thread
From: stuart hayes @ 2021-07-19 19:00 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg



On 7/19/2021 10:10 AM, Lukas Wunner wrote:
> On Fri, Jun 25, 2021 at 03:38:41PM -0500, stuart hayes wrote:
>> I have a system that is failing to recover after an EDR event with (or
>> without...) this patch.  It looks like the problem is similar to what this
>> patch is trying to fix, except that on my system, the hotplug port is
>> downstream of the root port that has DPC, so the "link down" event on it is
>> not being ignored.  So the hotplug code disables the slot (which contains an
>> NVMe device on this system) while the nvme driver is trying to use it, which
>> results in a failed recovery and another EDR event, and the kernel ends up
>> with the DPC trigger status bit set in the root port, so everything
>> downstream is gone.
>>
>> I added the hack below so the hotplug code will ignore the "link down"
>> events on the ports downstream of the root port during DPC recovery, and it
>> recovers no problem.  (I'm not proposing this as a correct fix.)
> 
> Could you test if the below patch fixes the issue?
> 
> Note, this is a hack as well, but I can turn it into a proper patch
> if it works as expected.
> 
> Thanks!
> 
> Lukas
> 

That does appear to fix the issue, thanks!  Without your patch, the PCIe 
devices under 64:02.0 disappear (the triggered bit is still set in the 
DPC capability).  With your patch, recovery is successful and all of the 
PCIe devices are still there.

If you are interested, here's the log showing the EDR before and after 
the patch was applied:

> [  180.295300] pcieport 0000:64:02.0: EDR: EDR event received
> [  180.325225] pcieport 0000:64:02.0: DPC: containment event, status:0x1f07 source:0x0000
> [  180.325229] pcieport 0000:64:02.0: DPC: RP PIO error detected
> [  180.325231] pcieport 0000:64:02.0: DPC: rp_pio_status: 0x00000000, rp_pio_mask: 0x00000303
> [  180.325237] pcieport 0000:64:02.0: DPC: RP PIO severity=0x00010000, syserror=0x00000000, exception=0x00000000
> [  180.325240] pcieport 0000:64:02.0: DPC: TLP Header: 0x00000002 0xfc2a3eff 0xbf5fffe0 0x00000000
> [  180.325250] nvme nvme0: frozen state error detected, reset controller
> [  180.343110] nvme nvme1: frozen state error detected, reset controller
> [  180.357158] nvme nvme2: frozen state error detected, reset controller
> [  180.371203] nvme nvme3: frozen state error detected, reset controller
> [  180.385219] mpt3sas_cm0: PCI error: detected callback, state(2)!!
> [  180.534915] pcieport 0000:66:08.0: can't change power state from D3cold to D0 (config space inaccessible)
> [  180.551467] nvme nvme0: restart after slot reset
> [  180.551487] pcieport 0000:68:00.0: pciehp: Slot(211): Link Down
> [  180.551691] pcieport 0000:68:04.0: pciehp: Slot(210): Link Down
> [  180.551727] nvme nvme1: restart after slot reset
> [  180.612127] nvme nvme2: restart after slot reset
> [  180.612153] pcieport 0000:68:0c.0: pciehp: Slot(208): Link Down
> [  180.612422] pcieport 0000:68:14.0: pciehp: Slot(206): Link Down
> [  180.612439] nvme nvme3: restart after slot reset
> [  180.673870] nvme 0000:69:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [  180.675887] nvme nvme0: failed to mark controller CONNECTING
> [  180.675891] nvme nvme0: Removing after probe failure status: -16
> [  180.675893] mpt3sas_cm0: PCI error: slot reset callback!!
> [  180.675946] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (15845080 kB)
> [  180.678894] nvme 0000:6a:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [  180.683932] nvme nvme1: Removing after probe failure status: -19
> [  180.685886] nvme 0000:6c:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [  180.685914] nvme 0000:6e:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [  180.686149] nvme nvme2: Removing after probe failure status: -19
> [  180.686174] nvme nvme3: Removing after probe failure status: -19
> [  180.697896] nvme2n1: detected capacity change from 3125627568 to 0
> [  180.698902] nvme3n1: detected capacity change from 15002931888 to 0
> [  180.698949] nvme1n1: detected capacity change from 1562824368 to 0
> [  180.709893] nvme0n1: detected capacity change from 732585168 to 0
> [  188.331419] mpt3sas_cm0: _base_wait_for_doorbell_ack: failed due to timeout count(5000), int_status(ffffffff)!
> [  188.331433] mpt3sas_cm0: doorbell handshake sending request failed (line=6648)
> [  188.331439] mpt3sas_cm0: _base_get_ioc_facts: handshake failed (r=-14)
> [  188.331506] pcieport 0000:64:02.0: AER: device recovery failed
> [  188.337031] pcieport 0000:64:02.0: EDR: EDR event received
> [  188.369794] pcieport 0000:64:02.0: DPC: containment event, status:0x1f07 source:0x0000
> [  188.369797] pcieport 0000:64:02.0: DPC: RP PIO error detected
> [  188.369799] pcieport 0000:64:02.0: DPC: rp_pio_status: 0x00000000, rp_pio_mask: 0x00000303
> [  188.369802] pcieport 0000:64:02.0: DPC: RP PIO severity=0x00010000, syserror=0x00000000, exception=0x00000000
> [  188.369806] pcieport 0000:64:02.0: DPC: TLP Header: 0x00000001 0xfc003e0f 0xbf30001c 0x00000000
> [  188.369811] pci 0000:69:00.0: AER: can't recover (no error_detected callback)
> [  188.369812] pci 0000:6a:00.0: AER: can't recover (no error_detected callback)
> [  188.369814] pci 0000:6c:00.0: AER: can't recover (no error_detected callback)
> [  188.369815] pci 0000:6e:00.0: AER: can't recover (no error_detected callback)
> [  188.369819] mpt3sas_cm0: PCI error: detected callback, state(2)!!
> [  188.534982] pcieport 0000:64:02.0: AER: device recovery failed
> [  188.546336] mpt3sas_cm0: remove hba_port entry: 00000000d72416cf port: 0 from hba_port list
> [  188.546343] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x54cd98f310000108)
> [  188.546346] mpt3sas_cm0: removing handle(0x0002), sas_addr(0x54cd98f310000108)
> [  188.546348] mpt3sas_cm0: enclosure logical id(0x54cd98f310000100), slot(0)
> [  188.558192] pcieport 0000:66:08.0: can't change power state from D3cold to D0 (config space inaccessible)
> [  189.790967] pcieport 0000:77:00.0: can't change power state from D3cold to D0 (config space inaccessible)
> [  189.791154] pcieport 0000:72:1c.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.791168] pcieport 0000:72:1c.0: pciehp: pciehp_isr: no response from device
> [  189.791453] pcieport 0000:72:18.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.791463] pcieport 0000:72:18.0: pciehp: pciehp_isr: no response from device
> [  189.791695] pcieport 0000:72:14.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.791705] pcieport 0000:72:14.0: pciehp: pciehp_isr: no response from device
> [  189.791875] pcieport 0000:72:10.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.791884] pcieport 0000:72:10.0: pciehp: pciehp_isr: no response from device
> [  189.792158] pcieport 0000:68:1c.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.792168] pcieport 0000:68:1c.0: pciehp: pciehp_isr: no response from device
> [  189.792348] pcieport 0000:68:18.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.792355] pcieport 0000:68:18.0: pciehp: pciehp_isr: no response from device
> [  189.792526] pcieport 0000:68:14.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.792535] pcieport 0000:68:14.0: pciehp: pciehp_isr: no response from device
> [  189.792719] pcieport 0000:68:10.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.792727] pcieport 0000:68:10.0: pciehp: pciehp_isr: no response from device
> [  189.792890] pcieport 0000:68:0c.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.792898] pcieport 0000:68:0c.0: pciehp: pciehp_isr: no response from device
> [  189.793093] pcieport 0000:68:08.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.793110] pcieport 0000:68:08.0: pciehp: pciehp_isr: no response from device
> [  189.793347] pcieport 0000:68:04.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.793356] pcieport 0000:68:04.0: pciehp: pciehp_isr: no response from device
> [  189.793543] pcieport 0000:68:00.0: pciehp: pcie_do_write_cmd: no response from device
> [  189.793553] pcieport 0000:68:00.0: pciehp: pciehp_isr: no response from device
> [  189.793820] pci_bus 0000:69: busn_res: [bus 69] is released
> [  189.794176] pci_bus 0000:6a: busn_res: [bus 6a] is released
> [  189.794367] pci_bus 0000:6b: busn_res: [bus 6b] is released
> [  189.794532] pci_bus 0000:6c: busn_res: [bus 6c] is released
> [  189.794621] pci_bus 0000:6d: busn_res: [bus 6d] is released
> [  189.794758] pci_bus 0000:6e: busn_res: [bus 6e] is released
> [  189.794938] pci_bus 0000:6f: busn_res: [bus 6f] is released
> [  189.795089] pci_bus 0000:70: busn_res: [bus 70] is released
> [  189.795472] pci_bus 0000:68: busn_res: [bus 68-70] is released
> [  189.795649] pci_bus 0000:67: busn_res: [bus 67-70] is released
> [  189.795867] pci_bus 0000:73: busn_res: [bus 73] is released
> [  189.796096] pci_bus 0000:74: busn_res: [bus 74] is released
> [  189.796267] pci_bus 0000:75: busn_res: [bus 75] is released
> [  189.796450] pci_bus 0000:76: busn_res: [bus 76] is released
> [  189.796555] pci_bus 0000:72: busn_res: [bus 72-76] is released
> [  189.796691] pci_bus 0000:71: busn_res: [bus 71-76] is released
> [  189.796802] pci_bus 0000:78: busn_res: [bus 78] is released
> [  189.796962] pci_bus 0000:77: busn_res: [bus 77-78] is released
> [  189.797296] pci_bus 0000:79: busn_res: [bus 79] is released
> [  189.797410] pci_bus 0000:66: busn_res: [bus 66-79] is released

After your patch:

> [  171.943721] pcieport 0000:64:02.0: EDR: EDR event received
> [  171.974078] pcieport 0000:64:02.0: DPC: containment event, status:0x1f07 source:0x0000
> [  171.974084] pcieport 0000:64:02.0: DPC: RP PIO error detected
> [  171.974086] pcieport 0000:64:02.0: DPC: rp_pio_status: 0x00000000, rp_pio_mask: 0x00000303
> [  171.974092] pcieport 0000:64:02.0: DPC: RP PIO severity=0x00010000, syserror=0x00000000, exception=0x00000000
> [  171.974095] pcieport 0000:64:02.0: DPC: TLP Header: 0x00000002 0xfc023eff 0xbf5fffe0 0x00000000
> [  171.974105] nvme nvme0: frozen state error detected, reset controller
> [  171.992269] nvme nvme1: frozen state error detected, reset controller
> [  172.006241] nvme nvme2: frozen state error detected, reset controller
> [  172.020233] nvme nvme3: frozen state error detected, reset controller
> [  172.038237] mpt3sas_cm0: PCI error: detected callback, state(2)!!
> [  172.205497] nvme nvme0: restart after slot reset
> [  172.205747] nvme nvme1: restart after slot reset
> [  172.206196] nvme nvme2: restart after slot reset
> [  172.206627] nvme nvme3: restart after slot reset
> [  172.208281] mpt3sas_cm0: PCI error: slot reset callback!!
> [  172.208361] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (15845080 kB)
...(cutting a lot of mpt3sas informational messages)...
> [  172.368391] mpt3sas_cm0: scan devices: complete
> [  174.553148] nvme nvme1: 8/0/0 default/read/poll queues
> [  174.554573] mpt3sas_cm0: PCI error: resume callback!!
> [  174.554980] pcieport 0000:64:02.0: AER: device recovery successful



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-07-19 19:00         ` stuart hayes
@ 2021-07-20  6:57           ` Lukas Wunner
  2021-07-20 22:11             ` stuart hayes
  0 siblings, 1 reply; 12+ messages in thread
From: Lukas Wunner @ 2021-07-20  6:57 UTC (permalink / raw)
  To: stuart hayes
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg

On Mon, Jul 19, 2021 at 02:00:51PM -0500, stuart hayes wrote:
> On 7/19/2021 10:10 AM, Lukas Wunner wrote:
> > Could you test if the below patch fixes the issue?
> 
> That does appear to fix the issue, thanks!  Without your patch, the PCIe
> devices under 64:02.0 disappear (the triggered bit is still set in the DPC
> capability).  With your patch, recovery is successful and all of the PCIe
> devices are still there.

Thanks for testing.

The test patch clears DLLSC because the Hot Reset that is propagated
down the hierarchy causes the link to flap.  I'm wondering though if
that's sufficient or if PDC needs to be cleared as well.  According
to PCIe Base Spec sec. 4.2.6, LTSSM transitions from "Hot Reset" state
to "Detect", then "Polling".  If I understand the table "Link Status
Mapped to the LTSSM" in the spec correctly, in-band presence is 0b
in Detect state, hence I'd expect PDC to flap as well as a result of
a Hot Reset being propagated down the hierarchy.

Does the hotplug port at 0000:68:00.0 support In-Band Presence Disable?
That would explain why only clearing DLLSC is sufficient.

The problem is, if PDC is cleared as well, we lose the ability to
detect that a device was hot-removed while the reset was ongoing,
which is unfortunate.

If an error is handled by aer_root_reset() (instead of dpc_reset_link())
and the reset is performed at a hotplug port, then pciehp_reset_slot()
is invoked:

aer_root_reset()
  pci_bus_error_reset()
    pci_slot_reset()
      pci_reset_hotplug_slot()
        pciehp_reset_slot()

pciehp_reset_slot() temporarily masks both DLLSC *and* PDC events,
then performs a Secondary Bus Reset at the hotplug port.

If there are further hotplug ports below that hotplug port
where the SBR is performed, my expectation is that the Hot Reset
is likewise propagated down the hierarchy (just as with DPC),
so those cascaded hotplug ports should also see their link go down.

In other words, the issue you're seeing isn't really DPC-specific.
However, the test patch should fix the issue for AER-handled errors
as well.  Do you agree with this analysis or did I miss anything?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
  2021-07-20  6:57           ` Lukas Wunner
@ 2021-07-20 22:11             ` stuart hayes
  0 siblings, 0 replies; 12+ messages in thread
From: stuart hayes @ 2021-07-20 22:11 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Dan Williams,
	Ethan Zhao, Sinan Kaya, Ashok Raj, Keith Busch, Yicong Yang,
	linux-pci, Russell Currey, Oliver OHalloran, Mika Westerberg



On 7/20/2021 1:57 AM, Lukas Wunner wrote:
> On Mon, Jul 19, 2021 at 02:00:51PM -0500, stuart hayes wrote:
>> On 7/19/2021 10:10 AM, Lukas Wunner wrote:
>>> Could you test if the below patch fixes the issue?
>>
>> That does appear to fix the issue, thanks!  Without your patch, the PCIe
>> devices under 64:02.0 disappear (the triggered bit is still set in the DPC
>> capability).  With your patch, recovery is successful and all of the PCIe
>> devices are still there.
> 
> Thanks for testing.
> 
> The test patch clears DLLSC because the Hot Reset that is propagated
> down the hierarchy causes the link to flap.  I'm wondering though if
> that's sufficient or if PDC needs to be cleared as well.  According
> to PCIe Base Spec sec. 4.2.6, LTSSM transitions from "Hot Reset" state
> to "Detect", then "Polling".  If I understand the table "Link Status
> Mapped to the LTSSM" in the spec correctly, in-band presence is 0b
> in Detect state, hence I'd expect PDC to flap as well as a result of
> a Hot Reset being propagated down the hierarchy.
> 

I think the table "Link Status Mapped to the LTSSM" is saying that when 
in-band presence is 0, the LTSSM state must be "Detect" (not that being 
in "Detect" will force in-band presence to zero).

I would not expect PDC to flap since the presence detect (even in-band) 
should not go away during hot reset.

On the system I'm using, I modified the kernel to read and print the 
slot status register right before your test patch clears DLLSC, and it 
reads 0x140 (link status changed, presence is detected, but PDC is not set).

> Does the hotplug port at 0000:68:00.0 support In-Band Presence Disable?
> That would explain why only clearing DLLSC is sufficient.
> 

No... the slot capabilities 2 register is 0.

> The problem is, if PDC is cleared as well, we lose the ability to
> detect that a device was hot-removed while the reset was ongoing,
> which is unfortunate.
> 

Agreed, but I don't think PDC should get set on hot reset.

> If an error is handled by aer_root_reset() (instead of dpc_reset_link())
> and the reset is performed at a hotplug port, then pciehp_reset_slot()
> is invoked:
> 
> aer_root_reset()
>    pci_bus_error_reset()
>      pci_slot_reset()
>        pci_reset_hotplug_slot()
>          pciehp_reset_slot()
> 
> pciehp_reset_slot() temporarily masks both DLLSC *and* PDC events,
> then performs a Secondary Bus Reset at the hotplug port.
> 
> If there are further hotplug ports below that hotplug port
> where the SBR is performed, my expectation is that the Hot Reset
> is likewise propagated down the hierarchy (just as with DPC),
> so those cascaded hotplug ports should also see their link go down.
> 
> In other words, the issue you're seeing isn't really DPC-specific.
> However, the test patch should fix the issue for AER-handled errors
> as well.  Do you agree with this analysis or did I miss anything?
> 

That looks correct to me.

> Thanks,
> 
> Lukas
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-20 22:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-01  8:29 [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC Lukas Wunner
2021-05-01  8:38 ` Lukas Wunner
2021-06-16 22:19 ` Bjorn Helgaas
2021-06-20  7:38   ` Lukas Wunner
2021-06-25 20:38     ` stuart hayes
2021-06-26  6:50       ` Lukas Wunner
2021-07-06 22:15         ` stuart hayes
2021-07-18 21:26           ` Lukas Wunner
2021-07-19 15:10       ` Lukas Wunner
2021-07-19 19:00         ` stuart hayes
2021-07-20  6:57           ` Lukas Wunner
2021-07-20 22:11             ` stuart hayes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).