All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Spassov <stanspas@amazon.com>
To: <linux-pci@vger.kernel.org>
Cc: "Stanislav Spassov" <stanspas@amazon.de>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jan H . Schönherr" <jschoenh@amazon.de>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Ashok Raj" <ashok.raj@intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Sinan Kaya" <okaya@kernel.org>,
	"Rajat Jain" <rajatja@google.com>
Subject: [PATCH v2 05/17] PCI: Fix handling of _DSM 8 (avoiding reset delays)
Date: Mon, 2 Mar 2020 19:44:17 +0100	[thread overview]
Message-ID: <20200302184429.12880-6-stanspas@amazon.com> (raw)
In-Reply-To: <20200302184429.12880-1-stanspas@amazon.com>

From: Stanislav Spassov <stanspas@amazon.de>

On Sx Resume (boot from ACPI S5, resume from ACPI S4 or S3), platform
firmware may enforce that sufficient time has passed to cover the
mandatory (per PCI Express Specification) power-on reset delays of
all PCI devices before control is handed over to the operating system.
To avoid duplicated waiting by the OS, the firmware uses _DSM 8 in the
ACPI scope of a PCI host bus to inform the OS that for the whole PCI
subsystem underneath that host bridge, power-on reset delays are being
covered by firmware.

Previously, the kernel used _DSM 8 to override the d3cold_delay value
stored in struct pci_dev to 0. However, the assumption that the delay
is covered by firmware when _DSM 8 returns 1 only holds for the
firmware-initiated resets as part of Sx Resume flows. If an OS has the
ability to apply Conventional Reset to devices without system firmware
involvement, then it still needs to adhere to the usual reset delays.

Runtime Device Power Management triggering a D3cold->D0 transition
is an existing example of the kernel not involving system firmware.
Another example is the OS triggering Secondary Bus Reset by setting
the SBR bit in the Bridge Control register of any Port device.

In preparation for future work to improve post-reset delays, this patch
preserves the value of d3cold_delay even when _DSM 8 returns 1.
The decision not to wait after resume is instead taken at a higher level
in pci_bridge_wait_for_secondary_bus based on a new parameter that
indicates whether the waiting is due to an Sx Resume event.

Signed-off-by: Stanislav Spassov <stanspas@amazon.de>
---
 drivers/pci/pci-acpi.c   | 65 ++++++++++++++++++++++++++++++----------
 drivers/pci/pci-driver.c |  4 +--
 drivers/pci/pci.c        | 11 +++++--
 drivers/pci/pci.h        |  2 +-
 include/linux/pci-acpi.h |  8 ++---
 include/linux/pci.h      |  4 ++-
 6 files changed, 67 insertions(+), 27 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 0c02d500158f..a8fa13d6089d 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1120,21 +1120,35 @@ void acpi_pci_add_bus(struct pci_bus *bus)
 	acpi_pci_slot_enumerate(bus);
 	acpiphp_enumerate_slots(bus);
 
-	/*
-	 * For a host bridge, check its _DSM for function 8 and if
-	 * that is available, mark it in pci_host_bridge.
+	/* For a host bridge, check _DSM function 8 and if that returns 1,
+	 * mark it in the pci_host_bridge.
+	 *
+	 * Function 8, "Avoid Power-On Reset Delay Duplication on Sx Resume"
+	 * applies to the entire hierarchy below a PCI host bridge.
+	 * If it returns one, the OS may assume that all devices in the
+	 * hierarchy have already completed power-on reset delays
+	 * before FW handed over control to the OS on Sx Resume (such as boot
+	 * from ACPI S5, or resume from ACPI S4 or S3 states).
+	 * This _DSM is applicable whether reductions in device readiness
+	 * timing, via Readiness Notification or _DSM function 9, are available
+	 * or not.
+	 *
+	 * This _DSM function is defined by the PCI Firmware Specification
+	 * Revision 3.2 (January 26, 2015), after originally introduced by a
+	 * draft ECN of January 28, 2014, titled "ACPI additions for FW latency
+	 * optimizations."
 	 */
 	if (!pci_is_root_bus(bus))
 		return;
 
 	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 3,
-				RESET_DELAY_DSM, NULL);
+				IGNORE_RESET_DELAY_ON_SX_RESUME_DSM, NULL);
 	if (!obj)
 		return;
 
 	if (obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 1) {
 		bridge = pci_find_host_bridge(bus);
-		bridge->ignore_reset_delay = 1;
+		bridge->ignore_reset_delay_on_sx_resume = 1;
 	}
 	ACPI_FREE(obj);
 }
@@ -1168,19 +1182,38 @@ static struct acpi_device *acpi_pci_find_companion(struct device *dev)
  * @handle: ACPI handle of this device
  *
  * Update the d3_delay and d3cold_delay of a PCI device from the ACPI _DSM
- * control method of either the device itself or the PCI host bridge.
- *
- * Function 8, "Reset Delay," applies to the entire hierarchy below a PCI
- * host bridge.  If it returns one, the OS may assume that all devices in
- * the hierarchy have already completed power-on reset delays.
+ * Function 9 of the device, and cache the parent host bridge's flag for
+ * ignoring reset delay upon Sx Resume (the flag is originally set in
+ * acpi_pci_add_bus through _DSM Function 8).
  *
  * Function 9, "Device Readiness Durations," applies only to the object
  * where it is located.  It returns delay durations required after various
- * events if the device requires less time than the spec requires.  Delays
- * from this function take precedence over the Reset Delay function.
+ * events if the device requires less time than the spec requires.
+ * Values provided by this function can only be used to lower (reduce) the
+ * latency required by specification or values discovered from device.
+ *
+ * This _DSM function is defined by the PCI Firmware Specification Rev 3.2
+ * (January 26, 2015), after originally introduced by a draft ECN of
+ * January 28, 2014, titled "ACPI additions for FW latency optimizations."
  *
- * These _DSM functions are defined by the draft ECN of January 28, 2014,
- * titled "ACPI additions for FW latency optimizations."
+ * XXX The PCI Firmware Specification contradicts itself by stating, in addition
+ * to the above "can only be used to lower (reduce)", that also:
+ * Values must be interpreted as overriding any Configuration Ready indicator
+ * from hardware, whether increasing or decreasing required delays. This
+ * includes ignoring FRS and DRS notifications where overridden by this
+ * _DSM function, as well as ignoring values specified in the Readiness Time
+ * Reporting Extended Capability, if present.
+ * Meanwhile, the PCI Express Base Specification Revision 5.0 Version 1.0
+ * (22 May 2019) states in section 7.9.17 Readiness Time Reporting Extended
+ * Capability:
+ * Software is permitted to issue requests upon the earliest of:
+ * - Receiving a Readiness Notification messages
+ * - Waiting the appropriate time as per relevant specifications
+ * - Waiting the time indicated in the associated field of this capability
+ * - Waiting the time defined by system software or firmware
+ * The kernel does not yet support Readiness Notifications, and does not yet
+ * use a Readiness Time Reporting capability if present, so we do not need to
+ * worry about the prioritization for now.
  */
 static void pci_acpi_optimize_delay(struct pci_dev *pdev,
 				    acpi_handle handle)
@@ -1189,8 +1222,8 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
 	int value;
 	union acpi_object *obj, *elements;
 
-	if (bridge->ignore_reset_delay)
-		pdev->d3cold_delay = 0;
+	pdev->ignore_reset_delay_on_sx_resume =
+		bridge->ignore_reset_delay_on_sx_resume;
 
 	obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, 3,
 				FUNCTION_DELAY_DSM, NULL);
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 0454ca0e4e3f..7e8ca8115c4f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -917,7 +917,7 @@ static int pci_pm_resume_noirq(struct device *dev)
 	pcie_pme_root_status_cleanup(pci_dev);
 
 	if (!skip_bus_pm && prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_bridge_wait_for_secondary_bus(pci_dev, true);
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return 0;
@@ -1321,7 +1321,7 @@ static int pci_pm_runtime_resume(struct device *dev)
 	pci_pm_default_resume(pci_dev);
 
 	if (prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_bridge_wait_for_secondary_bus(pci_dev, false);
 
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 03103bb15b42..4899b12b5a38 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2843,6 +2843,7 @@ void pci_pm_init(struct pci_dev *dev)
 	}
 
 	dev->pm_cap = pm;
+	dev->ignore_reset_delay_on_sx_resume = 0;
 	dev->d3_delay = PCI_PM_D3_WAIT;
 	dev->d3cold_delay = PCI_PM_D3COLD_WAIT;
 	dev->bridge_d3 = pci_bridge_d3_possible(dev);
@@ -4698,7 +4699,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
  * conventional PCI it means Tpvrh + Trhfa specified in PCI 3.0 section
  * 4.3.2.
  */
-void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
+void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, bool sx_resume)
 {
 	struct pci_dev *child;
 	int delay;
@@ -4723,7 +4724,11 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	}
 
 	/* Take d3cold_delay requirements into account */
-	delay = pci_bus_max_d3cold_delay(dev->subordinate);
+	if (sx_resume && dev->ignore_reset_delay_on_sx_resume)
+		delay = 0;
+	else
+		delay = pci_bus_max_d3cold_delay(dev->subordinate);
+
 	if (!delay) {
 		up_read(&pci_bus_sem);
 		return;
@@ -4800,7 +4805,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
 
-	pci_bridge_wait_for_secondary_bus(dev);
+	pci_bridge_wait_for_secondary_bus(dev, false);
 }
 
 void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 659ab3f9042a..c4c3ba926f45 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -107,7 +107,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev);
 void pci_free_cap_save_buffers(struct pci_dev *dev);
 bool pci_bridge_d3_possible(struct pci_dev *dev);
 void pci_bridge_d3_update(struct pci_dev *dev);
-void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev);
+void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, bool sx_resume);
 
 static inline void pci_wakeup_event(struct pci_dev *dev)
 {
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 62b7fdcc661c..99865773d65e 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -107,10 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
 #endif
 
 extern const guid_t pci_acpi_dsm_guid;
-#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
-#define DEVICE_LABEL_DSM		0x07
-#define RESET_DELAY_DSM			0x08
-#define FUNCTION_DELAY_DSM		0x09
+#define IGNORE_PCI_BOOT_CONFIG_DSM		0x05
+#define DEVICE_LABEL_DSM			0x07
+#define IGNORE_RESET_DELAY_ON_SX_RESUME_DSM	0x08
+#define FUNCTION_DELAY_DSM			0x09
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3840a541a9de..22a5b7164c32 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -354,6 +354,8 @@ struct pci_dev {
 						      user sysfs */
 	unsigned int	clear_retrain_link:1;	/* Need to clear Retrain Link
 						   bit manually */
+	unsigned int    ignore_reset_delay_on_sx_resume:1; /* Cached value from
+							      pci_host_bridge */
 	unsigned int	d3_delay;	/* D3->D0 transition time in ms */
 	unsigned int	d3cold_delay;	/* D3cold->D0 transition time in ms */
 
@@ -503,7 +505,7 @@ struct pci_host_bridge {
 	void (*release_fn)(struct pci_host_bridge *);
 	void		*release_data;
 	struct msi_controller *msi;
-	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
+	unsigned int	ignore_reset_delay_on_sx_resume:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */
 	unsigned int	native_aer:1;		/* OS may use PCIe AER */
 	unsigned int	native_pcie_hotplug:1;	/* OS may use PCIe hotplug */
-- 
2.25.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




  parent reply	other threads:[~2020-03-02 18:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02 18:44 [PATCH v2 00/17] Improve PCI device post-reset readiness polling Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 01/17] PCI: Fall back to slot/bus reset if softer methods timeout Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 02/17] PCI: Remove unused PCI_PM_BUS_WAIT Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 03/17] PCI: Use pci_bridge_wait_for_secondary_bus after SBR Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 04/17] PCI: Do not override delay for D0->D3hot transition Stanislav Spassov
2020-03-02 18:44 ` Stanislav Spassov [this message]
2020-03-02 18:44 ` [PATCH v2 06/17] PCI: Fix us->ms conversion in pci_acpi_optimize_delay Stanislav Spassov
2020-03-03  4:19   ` kbuild test robot
2020-03-03  4:19     ` kbuild test robot
2020-03-03  5:54   ` kbuild test robot
2020-03-03  5:54     ` kbuild test robot
2020-03-02 18:44 ` [PATCH v2 07/17] PCI: Clean up and document PM/reset delays Stanislav Spassov
2020-03-03  1:51   ` kbuild test robot
2020-03-03  1:51     ` kbuild test robot
2020-03-03  2:54   ` kbuild test robot
2020-03-03  2:54     ` kbuild test robot
2020-03-02 18:44 ` [PATCH v2 08/17] PCI: Add more delay overrides to struct pci_dev Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 09/17] PCI: Generalize pci_bus_max_d3cold_delay to pci_bus_max_delay Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 10/17] PCI: Use correct delay in pci_bridge_wait_for_secondary_bus Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 11/17] PCI: Refactor pci_dev_wait to remove timeout parameter Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 12/17] PCI: Refactor pci_dev_wait to take pci_init_event Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 13/17] PCI: Cache CRS Software Visibiliy in struct pci_dev Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 14/17] PCI: Introduce per-device reset_ready_poll override Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 15/17] PCI: Refactor polling loop out of pci_dev_wait Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 16/17] PCI: Add CRS handling to pci_dev_wait() Stanislav Spassov
2020-03-02 18:44 ` [PATCH v2 17/17] PCI: Lower PCIE_RESET_READY_POLL_MS from 1m to 1s Stanislav Spassov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200302184429.12880-6-stanspas@amazon.com \
    --to=stanspas@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=jschoenh@amazon.de \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@kernel.org \
    --cc=rajatja@google.com \
    --cc=stanspas@amazon.de \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.