linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] PCI reset delay fixes
@ 2023-01-15  8:20 Lukas Wunner
  2023-01-15  8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Lukas Wunner @ 2023-01-15  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, Yang Su

Delay fixes for recovery from Secondary Bus Resets and Downstream Port
Containment, v2:


Changes v1 -> v2:

* [PATCH 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3
  * Add Reviewed-by tags (Mika, Sathyanarayanan)

* [PATCH 2/3] PCI: Unify delay handling for reset and resume
  * Introduce PCI_RESET_WAIT macro for 1 sec timeout prescribed by
    PCIe r6.0 sec 6.6.1 (Bjorn)
  * Note in kernel-doc of pci_bridge_wait_for_secondary_bus()
    that timeout parameter is in milliseconds (Bjorn)
  * Add Reviewed-by tags (Mika, Sathyanarayanan)

* [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset
  * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
    PCI_RESET_WAIT from patch [2/3] and extend its code comment
  * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
  * Avoid first person plural in commit message (Sathyanarayanan)
  * Add Reviewed-by tag (Mika)


Link to v1:

https://lore.kernel.org/linux-pci/cover.1672511016.git.lukas@wunner.de/


Lukas Wunner (3):
  PCI/PM: Observe reset delay irrespective of bridge_d3
  PCI: Unify delay handling for reset and resume
  PCI/DPC: Await readiness of secondary bus after reset

 drivers/pci/pci-driver.c |  2 +-
 drivers/pci/pci.c        | 59 +++++++++++++++++-----------------------
 drivers/pci/pci.h        | 16 ++++++++++-
 drivers/pci/pcie/dpc.c   |  4 +--
 4 files changed, 43 insertions(+), 38 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3
  2023-01-15  8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
@ 2023-01-15  8:20 ` Lukas Wunner
  2023-02-18 13:22   ` Yang Su
  2023-01-15  8:20 ` [PATCH v2 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Lukas Wunner @ 2023-01-15  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, Yang Su

If a PCI bridge is suspended to D3cold upon entering system sleep,
resuming it entails a Fundamental Reset per PCIe r6.0 sec 5.8.

The delay prescribed after a Fundamental Reset in PCIe r6.0 sec 6.6.1
is sought to be observed by:

  pci_pm_resume_noirq()
    pci_pm_bridge_power_up_actions()
      pci_bridge_wait_for_secondary_bus()

However, pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3
flag is not set.  That flag indicates whether a bridge is allowed to
suspend to D3cold at *runtime*.

Hence *no* delay is observed on resume from system sleep if runtime
D3cold is forbidden.  That doesn't make any sense, so drop the bridge_d3
check from pci_bridge_wait_for_secondary_bus().

The purpose of the bridge_d3 check was probably to avoid delays if a
bridge remained in D0 during suspend.  However the sole caller of
pci_bridge_wait_for_secondary_bus(), pci_pm_bridge_power_up_actions(),
is only invoked if the previous power state was D3cold.  Hence the
additional bridge_d3 check seems superfluous.

Fixes: ad9001f2f411 ("PCI/PM: Add missing link delays required by the PCIe spec")
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v5.5+
---
Changes v1 -> v2:
 * Add Reviewed-by tags (Mika, Sathyanarayanan)

 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index fba95486caaf..f43f3e84f634 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4964,7 +4964,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	if (pci_dev_is_disconnected(dev))
 		return;
 
-	if (!pci_is_bridge(dev) || !dev->bridge_d3)
+	if (!pci_is_bridge(dev))
 		return;
 
 	down_read(&pci_bus_sem);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] PCI: Unify delay handling for reset and resume
  2023-01-15  8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
  2023-01-15  8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
@ 2023-01-15  8:20 ` Lukas Wunner
  2023-02-23 11:01   ` Yang Su
  2023-01-15  8:20 ` [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
  2023-02-07 19:03 ` [PATCH v2 0/3] PCI reset delay fixes Bjorn Helgaas
  3 siblings, 1 reply; 11+ messages in thread
From: Lukas Wunner @ 2023-01-15  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, Yang Su

Sheng Bi reports that pci_bridge_secondary_bus_reset() may fail to wait
for devices on the secondary bus to become accessible after reset:

Although it does call pci_dev_wait(), it erroneously passes the bridge's
pci_dev rather than that of a child.  The bridge of course is always
accessible while its secondary bus is reset, so pci_dev_wait() returns
immediately.

Sheng Bi proposes introducing a new pci_bridge_secondary_bus_wait()
function which is called from pci_bridge_secondary_bus_reset():

https://lore.kernel.org/linux-pci/20220523171517.32407-1-windy.bi.enflame@gmail.com/

However we already have pci_bridge_wait_for_secondary_bus() which does
almost exactly what we need.  So far it's only called on resume from
D3cold (which implies a Fundamental Reset per PCIe r6.0 sec 5.8).
Re-using it for Secondary Bus Resets is a leaner and more rational
approach than introducing a new function.

That only requires a few minor tweaks:

- Amend pci_bridge_wait_for_secondary_bus() to await accessibility of
  the first device on the secondary bus by calling pci_dev_wait() after
  performing the prescribed delays.  pci_dev_wait() needs two parameters,
  a reset reason and a timeout, which callers must now pass to
  pci_bridge_wait_for_secondary_bus().  The timeout is 1 sec for resume
  (PCIe r6.0 sec 6.6.1) and 60 sec for reset (commit 821cdad5c46c ("PCI:
  Wait up to 60 seconds for device to become ready after FLR")).
  Introduce a PCI_RESET_WAIT macro for the 1 sec timeout.

- Amend pci_bridge_wait_for_secondary_bus() to return 0 on success or
  -ENOTTY on error for consumption by pci_bridge_secondary_bus_reset().

- Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
  is now performed by pci_bridge_wait_for_secondary_bus().  A static
  delay this long is only necessary for Conventional PCI, so modern
  PCIe systems benefit from shorter reset times as a side effect.

Fixes: 6b2f1351af56 ("PCI: Wait for device to become ready after secondary bus reset")
Reported-by: Sheng Bi <windy.bi.enflame@gmail.com>
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v4.17+
---
Changes v1 -> v2:
 * Introduce PCI_RESET_WAIT macro for 1 sec timeout prescribed by
   PCIe r6.0 sec 6.6.1 (Bjorn)
 * Note in kernel-doc of pci_bridge_wait_for_secondary_bus()
   that timeout parameter is in milliseconds (Bjorn)
 * Add Reviewed-by tags (Mika, Sathyanarayanan)

 drivers/pci/pci-driver.c |  2 +-
 drivers/pci/pci.c        | 54 ++++++++++++++++++----------------------
 drivers/pci/pci.h        | 10 +++++++-
 3 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a2ceeacc33eb..7a19f11daca3 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -572,7 +572,7 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 
 static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
 {
-	pci_bridge_wait_for_secondary_bus(pci_dev);
+	pci_bridge_wait_for_secondary_bus(pci_dev, "resume", PCI_RESET_WAIT);
 	/*
 	 * When powering on a bridge from D3cold, the whole hierarchy may be
 	 * powered on into D0uninitialized state, resume them to give them a
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f43f3e84f634..509f6b5c9e14 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1174,7 +1174,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 			return -ENOTTY;
 		}
 
-		if (delay > 1000)
+		if (delay > PCI_RESET_WAIT)
 			pci_info(dev, "not ready %dms after %s; waiting\n",
 				 delay - 1, reset_type);
 
@@ -1183,7 +1183,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 		pci_read_config_dword(dev, PCI_COMMAND, &id);
 	}
 
-	if (delay > 1000)
+	if (delay > PCI_RESET_WAIT)
 		pci_info(dev, "ready %dms after %s\n", delay - 1,
 			 reset_type);
 
@@ -4948,24 +4948,31 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
 /**
  * pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible
  * @dev: PCI bridge
+ * @reset_type: reset type in human-readable form
+ * @timeout: maximum time to wait for devices on secondary bus (milliseconds)
  *
  * Handle necessary delays before access to the devices on the secondary
- * side of the bridge are permitted after D3cold to D0 transition.
+ * side of the bridge are permitted after D3cold to D0 transition
+ * or Conventional Reset.
  *
  * For PCIe this means the delays in PCIe 5.0 section 6.6.1. For
  * conventional PCI it means Tpvrh + Trhfa specified in PCI 3.0 section
  * 4.3.2.
+ *
+ * Return 0 on success or -ENOTTY if the first device on the secondary bus
+ * failed to become accessible.
  */
-void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
+int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
+				      int timeout)
 {
 	struct pci_dev *child;
 	int delay;
 
 	if (pci_dev_is_disconnected(dev))
-		return;
+		return 0;
 
 	if (!pci_is_bridge(dev))
-		return;
+		return 0;
 
 	down_read(&pci_bus_sem);
 
@@ -4977,14 +4984,14 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	 */
 	if (!dev->subordinate || list_empty(&dev->subordinate->devices)) {
 		up_read(&pci_bus_sem);
-		return;
+		return 0;
 	}
 
 	/* Take d3cold_delay requirements into account */
 	delay = pci_bus_max_d3cold_delay(dev->subordinate);
 	if (!delay) {
 		up_read(&pci_bus_sem);
-		return;
+		return 0;
 	}
 
 	child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
@@ -4993,14 +5000,12 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 
 	/*
 	 * Conventional PCI and PCI-X we need to wait Tpvrh + Trhfa before
-	 * accessing the device after reset (that is 1000 ms + 100 ms). In
-	 * practice this should not be needed because we don't do power
-	 * management for them (see pci_bridge_d3_possible()).
+	 * accessing the device after reset (that is 1000 ms + 100 ms).
 	 */
 	if (!pci_is_pcie(dev)) {
 		pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
 		msleep(1000 + delay);
-		return;
+		return 0;
 	}
 
 	/*
@@ -5017,11 +5022,11 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	 * configuration requests if we only wait for 100 ms (see
 	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
 	 *
-	 * Therefore we wait for 100 ms and check for the device presence.
-	 * If it is still not present give it an additional 100 ms.
+	 * Therefore we wait for 100 ms and check for the device presence
+	 * until the timeout expires.
 	 */
 	if (!pcie_downstream_port(dev))
-		return;
+		return 0;
 
 	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
@@ -5032,14 +5037,11 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 		if (!pcie_wait_for_link_delay(dev, true, delay)) {
 			/* Did not train, no need to wait any further */
 			pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
-			return;
+			return -ENOTTY;
 		}
 	}
 
-	if (!pci_device_is_present(child)) {
-		pci_dbg(child, "waiting additional %d ms to become accessible\n", delay);
-		msleep(delay);
-	}
+	return pci_dev_wait(child, reset_type, timeout - delay);
 }
 
 void pci_reset_secondary_bus(struct pci_dev *dev)
@@ -5058,15 +5060,6 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 
 	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
-
-	/*
-	 * Trhfa for conventional PCI is 2^25 clock cycles.
-	 * Assuming a minimum 33MHz clock this results in a 1s
-	 * delay before we can consider subordinate devices to
-	 * be re-initialized.  PCIe has some ways to shorten this,
-	 * but we don't make use of them yet.
-	 */
-	ssleep(1);
 }
 
 void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
@@ -5085,7 +5078,8 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
 {
 	pcibios_reset_secondary_bus(dev);
 
-	return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
+	return pci_bridge_wait_for_secondary_bus(dev, "bus reset",
+						 PCIE_RESET_READY_POLL_MS);
 }
 EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9ed3b5550043..ce1fc3a90b3f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -64,6 +64,13 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
 #define PCI_PM_D3HOT_WAIT       10	/* msec */
 #define PCI_PM_D3COLD_WAIT      100	/* msec */
 
+/*
+ * Following exit from Conventional Reset, devices must be ready within 1 sec
+ * (PCIe r6.0 sec 6.6.1).  A D3cold to D0 transition implies a Conventional
+ * Reset (PCIe r6.0 sec 5.8).
+ */
+#define PCI_RESET_WAIT		1000	/* msec */
+
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
 void pci_refresh_power_state(struct pci_dev *dev);
 int pci_power_up(struct pci_dev *dev);
@@ -86,8 +93,9 @@ void pci_msi_init(struct pci_dev *dev);
 void pci_msix_init(struct pci_dev *dev);
 bool pci_bridge_d3_possible(struct pci_dev *dev);
 void pci_bridge_d3_update(struct pci_dev *dev);
-void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev);
 void pci_bridge_reconfigure_ltr(struct pci_dev *dev);
+int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
+				      int timeout);
 
 static inline void pci_wakeup_event(struct pci_dev *dev)
 {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset
  2023-01-15  8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
  2023-01-15  8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
  2023-01-15  8:20 ` [PATCH v2 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
@ 2023-01-15  8:20 ` Lukas Wunner
  2023-02-18 13:23   ` Yang Su
  2023-02-07 19:03 ` [PATCH v2 0/3] PCI reset delay fixes Bjorn Helgaas
  3 siblings, 1 reply; 11+ messages in thread
From: Lukas Wunner @ 2023-01-15  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, Yang Su

pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
Reset, but not after a DPC-induced Hot Reset.

As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
observed and devices on the secondary bus may be accessed before
they're ready.

One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
PCIe switch whose upstream port is not immediately ready after reset.
Because its config space is restored too early, it remains in
D0uninitialized, its subordinate devices remain inaccessible and DPC
recovery fails with messages such as:

i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
pcieport 0000:89:02.0: AER: device recovery failed

Fix it.

Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
---
Changes v1 -> v2:
 * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
   PCI_RESET_WAIT from patch [2/3] and extend its code comment
 * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
 * Avoid first person plural in commit message (Sathyanarayanan)
 * Add Reviewed-by tag (Mika)

 drivers/pci/pci.c      | 3 ---
 drivers/pci/pci.h      | 6 ++++++
 drivers/pci/pcie/dpc.c | 4 ++--
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 509f6b5c9e14..d31c21ea9688 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
-/* Time to wait after a reset for device to become responsive */
-#define PCIE_RESET_READY_POLL_MS 60000
-
 /**
  * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
  * @bus: pointer to PCI bus structure to search
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ce1fc3a90b3f..8f5d4bd5b410 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
  * Reset (PCIe r6.0 sec 5.8).
  */
 #define PCI_RESET_WAIT		1000	/* msec */
+/*
+ * Devices may extend the 1 sec period through Request Retry Status completions
+ * (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper limit, but 60 sec
+ * ought to be enough for any device to become responsive.
+ */
+#define PCIE_RESET_READY_POLL_MS 60000	/* msec */
 
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
 void pci_refresh_power_state(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index f5ffea17c7f8..a5d7c69b764e 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
 
-	if (!pcie_wait_for_link(pdev, true)) {
-		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
+	if (pci_bridge_wait_for_secondary_bus(pdev, "DPC",
+					      PCIE_RESET_READY_POLL_MS)) {
 		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
 		ret = PCI_ERS_RESULT_DISCONNECT;
 	} else {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] PCI reset delay fixes
  2023-01-15  8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
                   ` (2 preceding siblings ...)
  2023-01-15  8:20 ` [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
@ 2023-02-07 19:03 ` Bjorn Helgaas
  3 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2023-02-07 19:03 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, Yang Su

On Sun, Jan 15, 2023 at 09:20:30AM +0100, Lukas Wunner wrote:
> Delay fixes for recovery from Secondary Bus Resets and Downstream Port
> Containment, v2:
> 
> 
> Changes v1 -> v2:
> 
> * [PATCH 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3
>   * Add Reviewed-by tags (Mika, Sathyanarayanan)
> 
> * [PATCH 2/3] PCI: Unify delay handling for reset and resume
>   * Introduce PCI_RESET_WAIT macro for 1 sec timeout prescribed by
>     PCIe r6.0 sec 6.6.1 (Bjorn)
>   * Note in kernel-doc of pci_bridge_wait_for_secondary_bus()
>     that timeout parameter is in milliseconds (Bjorn)
>   * Add Reviewed-by tags (Mika, Sathyanarayanan)
> 
> * [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset
>   * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
>     PCI_RESET_WAIT from patch [2/3] and extend its code comment
>   * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
>   * Avoid first person plural in commit message (Sathyanarayanan)
>   * Add Reviewed-by tag (Mika)
> 
> 
> Link to v1:
> 
> https://lore.kernel.org/linux-pci/cover.1672511016.git.lukas@wunner.de/
> 
> 
> Lukas Wunner (3):
>   PCI/PM: Observe reset delay irrespective of bridge_d3
>   PCI: Unify delay handling for reset and resume
>   PCI/DPC: Await readiness of secondary bus after reset
> 
>  drivers/pci/pci-driver.c |  2 +-
>  drivers/pci/pci.c        | 59 +++++++++++++++++-----------------------
>  drivers/pci/pci.h        | 16 ++++++++++-
>  drivers/pci/pcie/dpc.c   |  4 +--
>  4 files changed, 43 insertions(+), 38 deletions(-)

Applied to pci/reset for v6.3, thanks, Lukas!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3
  2023-01-15  8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
@ 2023-02-18 13:22   ` Yang Su
  2023-02-19  5:07     ` Lukas Wunner
  0 siblings, 1 reply; 11+ messages in thread
From: Yang Su @ 2023-02-18 13:22 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, shuo.tan

Hi Lucas,

I figue out the reason of pci_bridge_secondary_bus_reset() why not work 
for NVIDIA GPU T4

which bind vfio passthrough hypervisor. I used the original func 
pci_bridge_secondary_bus_reset()

not your patch, your patch remove bridge_d3 flag, the real reason is 
bridge_d3 flag.

> However, pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3
> flag is not set.  That flag indicates whether a bridge is allowed to
> suspend to D3cold at *runtime*.
>
>   drivers/pci/pci.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index fba95486caaf..f43f3e84f634 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4964,7 +4964,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>   	if (pci_dev_is_disconnected(dev))
>   		return;
>   
> -	if (!pci_is_bridge(dev) || !dev->bridge_d3)
> +	if (!pci_is_bridge(dev))
>   		return;
>   
>   	down_read(&pci_bus_sem);

When I test the original func pci_bridge_secondary_bus_reset() in 
different machine node

which all consist of the same type NVIDIA GPU T4, I found 
pci_bridge_wait_for_secondary_bus()

bails out if the bridge_d3 flag is not set, but I still confused why 
same gpu some machine node not set

the bridge_d3 flag.

I find the linux kernel only two func will init bridge_d3 which is func 
pci_pm_init() and pci_bridge_d3_update().

If you know, please give me some hint.

On 2023/1/15 16:20, Lukas Wunner wrote:
> If a PCI bridge is suspended to D3cold upon entering system sleep,
> resuming it entails a Fundamental Reset per PCIe r6.0 sec 5.8.
>
> The delay prescribed after a Fundamental Reset in PCIe r6.0 sec 6.6.1
> is sought to be observed by:
>
>    pci_pm_resume_noirq()
>      pci_pm_bridge_power_up_actions()
>        pci_bridge_wait_for_secondary_bus()
>
> However, pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3
> flag is not set.  That flag indicates whether a bridge is allowed to
> suspend to D3cold at *runtime*.
>
> Hence *no* delay is observed on resume from system sleep if runtime
> D3cold is forbidden.  That doesn't make any sense, so drop the bridge_d3
> check from pci_bridge_wait_for_secondary_bus().
>
> The purpose of the bridge_d3 check was probably to avoid delays if a
> bridge remained in D0 during suspend.  However the sole caller of
> pci_bridge_wait_for_secondary_bus(), pci_pm_bridge_power_up_actions(),
> is only invoked if the previous power state was D3cold.  Hence the
> additional bridge_d3 check seems superfluous.
>
> Fixes: ad9001f2f411 ("PCI/PM: Add missing link delays required by the PCIe spec")
> Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com>
> Signed-off-by: Lukas Wunner<lukas@wunner.de>
> Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan<sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc:stable@vger.kernel.org  # v5.5+
> ---
> Changes v1 -> v2:
>   * Add Reviewed-by tags (Mika, Sathyanarayanan)
>
>   drivers/pci/pci.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index fba95486caaf..f43f3e84f634 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4964,7 +4964,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>   	if (pci_dev_is_disconnected(dev))
>   		return;
>   
> -	if (!pci_is_bridge(dev) || !dev->bridge_d3)
> +	if (!pci_is_bridge(dev))
>   		return;
>   
>   	down_read(&pci_bus_sem);



Below is reply for your another email about the device invoke secondary 
bus reset.

Yes, as your previous email describes the endpoint can not reset its 
secondary bus.

> Normally pci_bridge_secondary_bus_reset() should never be executed for
> an endpoint device such as the Nvidia GPU T4.  It should only be executed
> for one of the bridges above the GPU.  An endpoint cannot reset its
> secondary bus, it doesn't have one.

As previous email I give the pci topology for the NVIDIA GPU T4, here I 
copy as below,

you say that means:

0000:17:00.0 = Root Port
0000:18:00.0 = Switch Upstream Port of PLX 9797
0000:19:04.0 = Switch Downstream Port of PLX 9797
0000:19:08.0 = Switch Downstream Port of PLX 9797
0000:4f:00.0 = Nvidia GPU T4

I apply your debug patch, the pci_info print the device is 0000:19:08.0 
which is the parent of

the endpoint device 0000:4f:00.0.

> pci_parent_bus_reset() invokes pci_bridge_secondary_bus_reset() on the
> parent of the device to be reset.  In this case that's the Switch Downstream
> Port 0000:19:08.0.  It finds the parent by following dev->bus->self.
>
> Perhaps you can apply the small debug patch below.  It should print a message
> when entering pci_bridge_secondary_bus_reset() and the message contains the
> pci_name() of the device for which the function is executed.  If this is
> not the Switch Downstream Port, you've found a bug.
>
> Usually a better way of finding the parent device is to call
> pci_upstream_bridge() instead of following dev->bus->self.  As you can
> see in the definition of pci_upstream_bridge() in include/linux/pci.h,
> this will find the parent of pci_phys_fn(dev).  So in virtualization
> scenarios, the result may indeed be different from dev->bus->self,
> but I still don't understand how.  You may want to try replacing
> dev->bus->self with pci_upstream_bridge(dev) in pci_parent_bus_reset()
> and see if that fixes the issue for you.
>
> Thanks,
>
> Lukas
>
> -- >8 --
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 19fe0ef0e583..f383e5d29bb1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5073,6 +5073,9 @@ void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
>    */
>   int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
>   {
> +	pci_info(dev, "%s\n", __func__);
> +	dump_stack();
> +
>   	pcibios_reset_secondary_bus(dev);
>   
>   	return pci_bridge_wait_for_secondary_bus(dev, "bus reset",



Thanks,

Yang




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset
  2023-01-15  8:20 ` [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
@ 2023-02-18 13:23   ` Yang Su
  2023-02-19  5:12     ` Lukas Wunner
  0 siblings, 1 reply; 11+ messages in thread
From: Yang Su @ 2023-02-18 13:23 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, shuo.tan

Hi Lucas,

I do not understand why pci_bridge_wait_for_secondary_bus() can fix 
Intel's Ponte Vecchio HPC GPU

after a DPC-induced Hot Reset.


The func pci_bridge_wait_for_secondary_bus() also use 
pcie_wait_for_link_delay() which time depends on

the max device delay time of one bus, for the GPU which bus only one 
device, I think the time is 100ms as

the input parater in pcie_wait_for_link_delay().


pcie_wait_for_link() also wait fixed 100ms and then wait the device data 
link is ready. So another wait time

is pci_dev_wait() in your patch? pci_dev_wait() to receive the CRS from 
the device to check the device

whether is ready.


Please help me understand which difference work.


On 2023/1/15 16:20, Lukas Wunner wrote:
> pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
> Reset, but not after a DPC-induced Hot Reset.
>
> As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
> observed and devices on the secondary bus may be accessed before
> they're ready.
>
> One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
> PCIe switch whose upstream port is not immediately ready after reset.
> Because its config space is restored too early, it remains in
> D0uninitialized, its subordinate devices remain inaccessible and DPC
> recovery fails with messages such as:
>
> i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
> intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> pcieport 0000:89:02.0: AER: device recovery failed
>
> Fix it.
>
> Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com>
> Signed-off-by: Lukas Wunner<lukas@wunner.de>
> Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com>
> Cc:stable@vger.kernel.org
> ---
> Changes v1 -> v2:
>   * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
>     PCI_RESET_WAIT from patch [2/3] and extend its code comment
>   * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
>   * Avoid first person plural in commit message (Sathyanarayanan)
>   * Add Reviewed-by tag (Mika)
>
>   drivers/pci/pci.c      | 3 ---
>   drivers/pci/pci.h      | 6 ++++++
>   drivers/pci/pcie/dpc.c | 4 ++--
>   3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 509f6b5c9e14..d31c21ea9688 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str)
>   }
>   __setup("pcie_port_pm=", pcie_port_pm_setup);
>   
> -/* Time to wait after a reset for device to become responsive */
> -#define PCIE_RESET_READY_POLL_MS 60000
> -
>   /**
>    * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
>    * @bus: pointer to PCI bus structure to search
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index ce1fc3a90b3f..8f5d4bd5b410 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
>    * Reset (PCIe r6.0 sec 5.8).
>    */
>   #define PCI_RESET_WAIT		1000	/* msec */
> +/*
> + * Devices may extend the 1 sec period through Request Retry Status completions
> + * (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper limit, but 60 sec
> + * ought to be enough for any device to become responsive.
> + */
> +#define PCIE_RESET_READY_POLL_MS 60000	/* msec */
>   
>   void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
>   void pci_refresh_power_state(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index f5ffea17c7f8..a5d7c69b764e 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>   	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>   			      PCI_EXP_DPC_STATUS_TRIGGER);
>   
> -	if (!pcie_wait_for_link(pdev, true)) {
> -		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
> +	if (pci_bridge_wait_for_secondary_bus(pdev, "DPC",
> +					      PCIE_RESET_READY_POLL_MS)) {
>   		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>   		ret = PCI_ERS_RESULT_DISCONNECT;
>   	} else {


Thanks,


Yang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3
  2023-02-18 13:22   ` Yang Su
@ 2023-02-19  5:07     ` Lukas Wunner
  0 siblings, 0 replies; 11+ messages in thread
From: Lukas Wunner @ 2023-02-19  5:07 UTC (permalink / raw)
  To: Yang Su
  Cc: Bjorn Helgaas, linux-pci, Keith Busch, Ashok Raj,
	Sathyanarayanan Kuppuswamy, Ravi Kishore Koppuravuri,
	Mika Westerberg, Sheng Bi, Stanislav Spassov, shuo.tan

On Sat, Feb 18, 2023 at 09:22:37PM +0800, Yang Su wrote:
> I figue out the reason of pci_bridge_secondary_bus_reset() why not work
> for NVIDIA GPU T4 which bind vfio passthrough hypervisor. I used the
> original func pci_bridge_secondary_bus_reset() not your patch, your
> patch remove bridge_d3 flag, the real reason is bridge_d3 flag.
> 
> When I test the original func pci_bridge_secondary_bus_reset() in
> different machine node which all consist of the same type NVIDIA GPU T4,
> I found pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3
> flag is not set, but I still confused why same gpu some machine node
> not set the bridge_d3 flag.
> 
> I find the linux kernel only two func will init bridge_d3 which is func
> pci_pm_init() and pci_bridge_d3_update().
> 
> If you know, please give me some hint.

It sounds like patch [1/3] in this series fixes the issue your seeing.
That's good to hear.

The bridge_d3 flag indicates whether a PCIe port is allowed to
runtime suspend to D3:

First of all, pci_bridge_d3_possible() decides whether the PCIe port
may runtime suspend at all.  E.g. hotplug ports are generally not
allowed to runtime suspend except in certain known-to-work situations,
such as Thunderbolt or when specific ACPI properties are present.

Second, a device below the PCIe port may block the port from runtime
suspending to D3.  That is decided in pci_dev_check_d3cold().  E.g. if
user space disallows D3 via the "d3cold_allowed" attribute in sysfs,
that will block D3 on PCIe ports in the ancestry.

If you're seeing different values for bridge_d3 on different machines,
even though the device below the PCIe port is always the same type of
GPU, then either pci_bridge_d3_possible() returns a different result
for the PCIe port in question (because it's a hotplug port or has
different ACPI properties etc), or one of its children blocks runtime
suspend to D3 (e.g. via sysfs).

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset
  2023-02-18 13:23   ` Yang Su
@ 2023-02-19  5:12     ` Lukas Wunner
  0 siblings, 0 replies; 11+ messages in thread
From: Lukas Wunner @ 2023-02-19  5:12 UTC (permalink / raw)
  To: Yang Su
  Cc: Bjorn Helgaas, linux-pci, Keith Busch, Ashok Raj,
	Sathyanarayanan Kuppuswamy, Ravi Kishore Koppuravuri,
	Mika Westerberg, Sheng Bi, Stanislav Spassov, shuo.tan

On Sat, Feb 18, 2023 at 09:23:47PM +0800, Yang Su wrote:
> I do not understand why pci_bridge_wait_for_secondary_bus() can fix
> Intel's Ponte Vecchio HPC GPU after a DPC-induced Hot Reset.
> 
> The func pci_bridge_wait_for_secondary_bus() also use
> pcie_wait_for_link_delay() which time depends on the max device delay
> time of one bus, for the GPU which bus only one device, I think the
> time is 100ms as the input parater in pcie_wait_for_link_delay().
> 
> pcie_wait_for_link() also wait fixed 100ms and then wait the device data
> link is ready. So another wait time is pci_dev_wait() in your patch?
> pci_dev_wait() to receive the CRS from the device to check the device
> whether is ready.
> 
> Please help me understand which difference work.

The crucial difference is the invocation of pci_dev_wait(), which waits
up to 60 seconds for the device to come out of reset.

The spec allows 1 second but that may be extended via CRS.  Ponte Vecchio
has been witnessed to take more than 4 seconds in some cases, hence the
need to wait longer than 1 second.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] PCI: Unify delay handling for reset and resume
  2023-01-15  8:20 ` [PATCH v2 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
@ 2023-02-23 11:01   ` Yang Su
  2023-03-01  6:31     ` Lukas Wunner
  0 siblings, 1 reply; 11+ messages in thread
From: Yang Su @ 2023-02-23 11:01 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Keith Busch, Ashok Raj, Sathyanarayanan Kuppuswamy,
	Ravi Kishore Koppuravuri, Mika Westerberg, Sheng Bi,
	Stanislav Spassov, shuo.tan

Hi Lucas,


As you describe, Sheng Bi report a bug, the pci_dev_wait() check the 
bridge whether ready rather

than that of a child, I agree with that. I used to use original 
pci_bridge_wait_for_secondary_bus()

to test secondary bus reset (SBR), the func will directly return because 
of no setted bridge_d3.


The pci_dev_wait() in pci_bridge_secondary_bus_reset() check the bridge 
whether ready but ignore

the subordinate device such as the Nvidia GPU T4.


But in your patch the pci_bridge_wait_for_secondary_bus() we only check 
the first subordinate device

of the bridge whether ready via pci_dev_wait().

> Sheng Bi reports that pci_bridge_secondary_bus_reset() may fail to wait
> for devices on the secondary bus to become accessible after reset:
>
> Although it does call pci_dev_wait(), it erroneously passes the bridge's
> pci_dev rather than that of a child.  The bridge of course is always
> accessible while its secondary bus is reset, so pci_dev_wait() returns
> immediately.
>   
>   	child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
> @@ -4993,14 +5000,12 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>
>
> -	if (!pci_device_is_present(child)) {
> -		pci_dbg(child, "waiting additional %d ms to become accessible\n", delay);
> -		msleep(delay);
> -	}
> +	return pci_dev_wait(child, reset_type, timeout - delay);


Why not wait all the downstream devices become ready? As Sheng Bi 
Introduce pci_bridge_secondary_bus_wait()

to fix 6b2f1351af56 ("PCI: Wait for device to become ready after 
secondary bus reset"), using list_for_each_entry.

https://lore.kernel.org/linux-pci/20220523171517.32407-1-windy.bi.enflame@gmail.com/

> +static int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int 
> timeout)
>
> +{
>
> +         struct pci_dev *dev;
>
> +         unsigned long start_jiffies;
>
> +
>
> + down_read(&pci_bus_sem);
>
> +
>
> +         if (!bridge->subordinate || 
> list_empty(&bridge->subordinate->devices)) {
>
> + up_read(&pci_bus_sem);
>
> + return 0;
>
> +         }
>
> +
>
> + list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
>
> + start_jiffies = jiffies;
>
> +
>
> +                 if (timeout < 0 || pci_dev_wait(dev, "bus reset", 
> timeout)) {
>
> +         up_read(&pci_bus_sem);
>
> +         return -ENOTTY;
>
> +                 }
>
> +
>
> + timeout -= jiffies_to_msecs(jiffies - start_jiffies);
>
> +         }
>
> +
>
> + up_read(&pci_bus_sem);
>
> +
>
> +         return 0;
>
> +}
>
> +
>

Last, I want to know if all the downstrem devices are ready, how can we 
ensure pci bridge is ready?

 From now version_2 series patch, there is lack checking of the pci bridge.


On 2023/1/15 16:20, Lukas Wunner wrote:
> Sheng Bi reports that pci_bridge_secondary_bus_reset() may fail to wait
> for devices on the secondary bus to become accessible after reset:
>
> Although it does call pci_dev_wait(), it erroneously passes the bridge's
> pci_dev rather than that of a child.  The bridge of course is always
> accessible while its secondary bus is reset, so pci_dev_wait() returns
> immediately.
>
> Sheng Bi proposes introducing a new pci_bridge_secondary_bus_wait()
> function which is called from pci_bridge_secondary_bus_reset():
>
> https://lore.kernel.org/linux-pci/20220523171517.32407-1-windy.bi.enflame@gmail.com/
>
> However we already have pci_bridge_wait_for_secondary_bus() which does
> almost exactly what we need.  So far it's only called on resume from
> D3cold (which implies a Fundamental Reset per PCIe r6.0 sec 5.8).
> Re-using it for Secondary Bus Resets is a leaner and more rational
> approach than introducing a new function.
>
> That only requires a few minor tweaks:
>
> - Amend pci_bridge_wait_for_secondary_bus() to await accessibility of
>    the first device on the secondary bus by calling pci_dev_wait() after
>    performing the prescribed delays.  pci_dev_wait() needs two parameters,
>    a reset reason and a timeout, which callers must now pass to
>    pci_bridge_wait_for_secondary_bus().  The timeout is 1 sec for resume
>    (PCIe r6.0 sec 6.6.1) and 60 sec for reset (commit 821cdad5c46c ("PCI:
>    Wait up to 60 seconds for device to become ready after FLR")).
>    Introduce a PCI_RESET_WAIT macro for the 1 sec timeout.
>
> - Amend pci_bridge_wait_for_secondary_bus() to return 0 on success or
>    -ENOTTY on error for consumption by pci_bridge_secondary_bus_reset().
>
> - Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
>    is now performed by pci_bridge_wait_for_secondary_bus().  A static
>    delay this long is only necessary for Conventional PCI, so modern
>    PCIe systems benefit from shorter reset times as a side effect.
>
> Fixes: 6b2f1351af56 ("PCI: Wait for device to become ready after secondary bus reset")
> Reported-by: Sheng Bi<windy.bi.enflame@gmail.com>
> Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com>
> Signed-off-by: Lukas Wunner<lukas@wunner.de>
> Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan<sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc:stable@vger.kernel.org  # v4.17+
> ---
> Changes v1 -> v2:
>   * Introduce PCI_RESET_WAIT macro for 1 sec timeout prescribed by
>     PCIe r6.0 sec 6.6.1 (Bjorn)
>   * Note in kernel-doc of pci_bridge_wait_for_secondary_bus()
>     that timeout parameter is in milliseconds (Bjorn)
>   * Add Reviewed-by tags (Mika, Sathyanarayanan)
>
>   drivers/pci/pci-driver.c |  2 +-
>   drivers/pci/pci.c        | 54 ++++++++++++++++++----------------------
>   drivers/pci/pci.h        | 10 +++++++-
>   3 files changed, 34 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a2ceeacc33eb..7a19f11daca3 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -572,7 +572,7 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>   
>   static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
>   {
> -	pci_bridge_wait_for_secondary_bus(pci_dev);
> +	pci_bridge_wait_for_secondary_bus(pci_dev, "resume", PCI_RESET_WAIT);
>   	/*
>   	 * When powering on a bridge from D3cold, the whole hierarchy may be
>   	 * powered on into D0uninitialized state, resume them to give them a
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f43f3e84f634..509f6b5c9e14 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1174,7 +1174,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>   			return -ENOTTY;
>   		}
>   
> -		if (delay > 1000)
> +		if (delay > PCI_RESET_WAIT)
>   			pci_info(dev, "not ready %dms after %s; waiting\n",
>   				 delay - 1, reset_type);
>   
> @@ -1183,7 +1183,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>   		pci_read_config_dword(dev, PCI_COMMAND, &id);
>   	}
>   
> -	if (delay > 1000)
> +	if (delay > PCI_RESET_WAIT)
>   		pci_info(dev, "ready %dms after %s\n", delay - 1,
>   			 reset_type);
>   
> @@ -4948,24 +4948,31 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
>   /**
>    * pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible
>    * @dev: PCI bridge
> + * @reset_type: reset type in human-readable form
> + * @timeout: maximum time to wait for devices on secondary bus (milliseconds)
>    *
>    * Handle necessary delays before access to the devices on the secondary
> - * side of the bridge are permitted after D3cold to D0 transition.
> + * side of the bridge are permitted after D3cold to D0 transition
> + * or Conventional Reset.
>    *
>    * For PCIe this means the delays in PCIe 5.0 section 6.6.1. For
>    * conventional PCI it means Tpvrh + Trhfa specified in PCI 3.0 section
>    * 4.3.2.
> + *
> + * Return 0 on success or -ENOTTY if the first device on the secondary bus
> + * failed to become accessible.
>    */
> -void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
> +int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
> +				      int timeout)
>   {
>   	struct pci_dev *child;
>   	int delay;
>   
>   	if (pci_dev_is_disconnected(dev))
> -		return;
> +		return 0;
>   
>   	if (!pci_is_bridge(dev))
> -		return;
> +		return 0;
>   
>   	down_read(&pci_bus_sem);
>   
> @@ -4977,14 +4984,14 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>   	 */
>   	if (!dev->subordinate || list_empty(&dev->subordinate->devices)) {
>   		up_read(&pci_bus_sem);
> -		return;
> +		return 0;
>   	}
>   
>   	/* Take d3cold_delay requirements into account */
>   	delay = pci_bus_max_d3cold_delay(dev->subordinate);
>   	if (!delay) {
>   		up_read(&pci_bus_sem);
> -		return;
> +		return 0;
>   	}
>   
>   	child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
> @@ -4993,14 +5000,12 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>   
>   	/*
>   	 * Conventional PCI and PCI-X we need to wait Tpvrh + Trhfa before
> -	 * accessing the device after reset (that is 1000 ms + 100 ms). In
> -	 * practice this should not be needed because we don't do power
> -	 * management for them (see pci_bridge_d3_possible()).
> +	 * accessing the device after reset (that is 1000 ms + 100 ms).
>   	 */
>   	if (!pci_is_pcie(dev)) {
>   		pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
>   		msleep(1000 + delay);
> -		return;
> +		return 0;
>   	}
>   
>   	/*
> @@ -5017,11 +5022,11 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>   	 * configuration requests if we only wait for 100 ms (see
>   	 *https://bugzilla.kernel.org/show_bug.cgi?id=203885).
>   	 *
> -	 * Therefore we wait for 100 ms and check for the device presence.
> -	 * If it is still not present give it an additional 100 ms.
> +	 * Therefore we wait for 100 ms and check for the device presence
> +	 * until the timeout expires.
>   	 */
>   	if (!pcie_downstream_port(dev))
> -		return;
> +		return 0;
>   
>   	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
>   		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
> @@ -5032,14 +5037,11 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>   		if (!pcie_wait_for_link_delay(dev, true, delay)) {
>   			/* Did not train, no need to wait any further */
>   			pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
> -			return;
> +			return -ENOTTY;
>   		}
>   	}
>   
> -	if (!pci_device_is_present(child)) {
> -		pci_dbg(child, "waiting additional %d ms to become accessible\n", delay);
> -		msleep(delay);
> -	}
> +	return pci_dev_wait(child, reset_type, timeout - delay);
>   }
>   
>   void pci_reset_secondary_bus(struct pci_dev *dev)
> @@ -5058,15 +5060,6 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>   
>   	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>   	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> -
> -	/*
> -	 * Trhfa for conventional PCI is 2^25 clock cycles.
> -	 * Assuming a minimum 33MHz clock this results in a 1s
> -	 * delay before we can consider subordinate devices to
> -	 * be re-initialized.  PCIe has some ways to shorten this,
> -	 * but we don't make use of them yet.
> -	 */
> -	ssleep(1);
>   }
>   
>   void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
> @@ -5085,7 +5078,8 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
>   {
>   	pcibios_reset_secondary_bus(dev);
>   
> -	return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
> +	return pci_bridge_wait_for_secondary_bus(dev, "bus reset",
> +						 PCIE_RESET_READY_POLL_MS);
>   }
>   EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
>   
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 9ed3b5550043..ce1fc3a90b3f 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -64,6 +64,13 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
>   #define PCI_PM_D3HOT_WAIT       10	/* msec */
>   #define PCI_PM_D3COLD_WAIT      100	/* msec */
>   
> +/*
> + * Following exit from Conventional Reset, devices must be ready within 1 sec
> + * (PCIe r6.0 sec 6.6.1).  A D3cold to D0 transition implies a Conventional
> + * Reset (PCIe r6.0 sec 5.8).
> + */
> +#define PCI_RESET_WAIT		1000	/* msec */
> +
>   void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
>   void pci_refresh_power_state(struct pci_dev *dev);
>   int pci_power_up(struct pci_dev *dev);
> @@ -86,8 +93,9 @@ void pci_msi_init(struct pci_dev *dev);
>   void pci_msix_init(struct pci_dev *dev);
>   bool pci_bridge_d3_possible(struct pci_dev *dev);
>   void pci_bridge_d3_update(struct pci_dev *dev);
> -void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev);
>   void pci_bridge_reconfigure_ltr(struct pci_dev *dev);
> +int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
> +				      int timeout);
>   
>   static inline void pci_wakeup_event(struct pci_dev *dev)
>   {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] PCI: Unify delay handling for reset and resume
  2023-02-23 11:01   ` Yang Su
@ 2023-03-01  6:31     ` Lukas Wunner
  0 siblings, 0 replies; 11+ messages in thread
From: Lukas Wunner @ 2023-03-01  6:31 UTC (permalink / raw)
  To: Yang Su
  Cc: Bjorn Helgaas, linux-pci, Keith Busch, Ashok Raj,
	Sathyanarayanan Kuppuswamy, Ravi Kishore Koppuravuri,
	Mika Westerberg, Sheng Bi, Stanislav Spassov, shuo.tan

On Thu, Feb 23, 2023 at 07:01:21PM +0800, Yang Su wrote:
> But in your patch the pci_bridge_wait_for_secondary_bus() we only check
> the first subordinate device of the bridge whether ready via
> pci_dev_wait().
> 
> Why not wait all the downstream devices become ready? As Sheng Bi
> Introduce pci_bridge_secondary_bus_wait() to fix 6b2f1351af56
> ("PCI: Wait for device to become ready after secondary bus reset"),
> using list_for_each_entry.
> 
> https://lore.kernel.org/linux-pci/20220523171517.32407-1-windy.bi.enflame@gmail.com/

At least for PCIe it shouldn't matter as the other pci_devs below
the bridge can only be additional functions of a multifunction
device.  My expectation would be that if the first function
is accessible, all the others are as well.

Checking for accessibility of all pci_devs introduces additional
complexity and I think should only be done if there are actual
real-world use cases that need it.


> Last, I want to know if all the downstrem devices are ready, how can we
> ensure pci bridge is ready?
> 
> From now version_2 series patch, there is lack checking of the pci bridge.

I don't quite follow.  The PCI bridge is the one whose secondary bus
was reset, right?  The PCI bridge's accessibility is unaffected by it
issuing a the Secondary Bus Reset.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-03-01  6:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-15  8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
2023-01-15  8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
2023-02-18 13:22   ` Yang Su
2023-02-19  5:07     ` Lukas Wunner
2023-01-15  8:20 ` [PATCH v2 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
2023-02-23 11:01   ` Yang Su
2023-03-01  6:31     ` Lukas Wunner
2023-01-15  8:20 ` [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
2023-02-18 13:23   ` Yang Su
2023-02-19  5:12     ` Lukas Wunner
2023-02-07 19:03 ` [PATCH v2 0/3] PCI reset delay fixes Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).