linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0
@ 2022-04-09 13:03 Rafael J. Wysocki
  2022-04-09 13:05 ` [PATCH v1 1/7] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
                   ` (7 more replies)
  0 siblings, 8 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:03 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

Hi All,

This series supersedes the one at

https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher

It addresses some potential issues related to PCI device transitions from
low-power states into D0 and makes the related code more straightforward
and so easier to follow.

Please refer to the patch changelogs for details.

Thanks!




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 1/7] PCI/PM: Resume subordinate bus in bus type callbacks
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
@ 2022-04-09 13:05 ` Rafael J. Wysocki
  2022-04-09 13:06 ` [PATCH v1 2/7] PCI/PM: Drop the runtime_d3cold PCI device flag Rafael J. Wysocki
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:05 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Calling pci_resume_bus() on the secondary bus from pci_power_up() as
it is done now is questionable, because it depends on the mandatory
bridge power-up delays that are only covered by the PCI bus type PM
callbacks.

For this reason, move the subordinate bus resume to those callbacks
too and use the observation that if a bridge is being powered-up
during resume from system-wide suspend, it may be still desirable
to runtime-resume its subordinate bus after completing the system-
wide transition (in case the resume of the devices on that bus is
skipped during it).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci-driver.c |   15 +++++++++++++--
 drivers/pci/pci.c        |   15 ---------------
 2 files changed, 13 insertions(+), 17 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -559,6 +559,17 @@ static void pci_pm_default_resume_early(
 	pci_pme_restore(pci_dev);
 }
 
+static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
+{
+	pci_bridge_wait_for_secondary_bus(pci_dev);
+	/*
+	 * When powering on a bridge from D3cold, the whole hierarchy may be
+	 * powered on into D0uninitialized state, resume them to give them a
+	 * chance to suspend again
+	 */
+	pci_resume_bus(pci_dev->subordinate);
+}
+
 #endif
 
 #ifdef CONFIG_PM_SLEEP
@@ -934,7 +945,7 @@ static int pci_pm_resume_noirq(struct de
 	pcie_pme_root_status_cleanup(pci_dev);
 
 	if (!skip_bus_pm && prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_pm_bridge_power_up_actions(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return 0;
@@ -1321,7 +1332,7 @@ static int pci_pm_runtime_resume(struct
 	pci_pm_default_resume(pci_dev);
 
 	if (prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_pm_bridge_power_up_actions(pci_dev);
 
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1310,21 +1310,6 @@ static int pci_dev_wait(struct pci_dev *
 int pci_power_up(struct pci_dev *dev)
 {
 	pci_platform_power_transition(dev, PCI_D0);
-
-	/*
-	 * Mandatory power management transition delays are handled in
-	 * pci_pm_resume_noirq() and pci_pm_runtime_resume() of the
-	 * corresponding bridge.
-	 */
-	if (dev->runtime_d3cold) {
-		/*
-		 * When powering on a bridge from D3cold, the whole hierarchy
-		 * may be powered on into D0uninitialized state, resume them to
-		 * give them a chance to suspend again
-		 */
-		pci_resume_bus(dev->subordinate);
-	}
-
 	return pci_raw_set_power_state(dev, PCI_D0);
 }
 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 2/7] PCI/PM: Drop the runtime_d3cold PCI device flag
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
  2022-04-09 13:05 ` [PATCH v1 1/7] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
@ 2022-04-09 13:06 ` Rafael J. Wysocki
  2022-04-09 13:08 ` [PATCH v1 3/7] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:06 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

This flag is not needed any more, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci-driver.c |    2 --
 drivers/pci/pci.c        |    3 ---
 include/linux/pci.h      |    4 ----
 3 files changed, 9 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1337,8 +1337,6 @@ static int pci_pm_runtime_resume(struct
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
 
-	pci_dev->runtime_d3cold = false;
-
 	return error;
 }
 
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -2703,8 +2703,6 @@ int pci_finish_runtime_suspend(struct pc
 	if (target_state == PCI_POWER_ERROR)
 		return -EIO;
 
-	dev->runtime_d3cold = target_state == PCI_D3cold;
-
 	/*
 	 * There are systems (for example, Intel mobile chips since Coffee
 	 * Lake) where the power drawn while suspended can be significantly
@@ -2722,7 +2720,6 @@ int pci_finish_runtime_suspend(struct pc
 	if (error) {
 		pci_enable_wake(dev, target_state, false);
 		pci_restore_ptm_state(dev);
-		dev->runtime_d3cold = false;
 	}
 
 	return error;
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -379,10 +379,6 @@ struct pci_dev {
 	unsigned int	mmio_always_on:1;	/* Disallow turning off io/mem
 						   decoding during BAR sizing */
 	unsigned int	wakeup_prepared:1;
-	unsigned int	runtime_d3cold:1;	/* Whether go through runtime
-						   D3cold, not set for devices
-						   powered on/off by the
-						   corresponding bridge */
 	unsigned int	skip_bus_pm:1;	/* Internal: Skip bus-level PM */
 	unsigned int	ignore_hotplug:1;	/* Ignore hotplug events */
 	unsigned int	hotplug_user_indicators:1; /* SlotCtl indicators




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 3/7] PCI/PM: Rearrange pci_update_current_state()
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
  2022-04-09 13:05 ` [PATCH v1 1/7] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
  2022-04-09 13:06 ` [PATCH v1 2/7] PCI/PM: Drop the runtime_d3cold PCI device flag Rafael J. Wysocki
@ 2022-04-09 13:08 ` Rafael J. Wysocki
  2022-04-09 13:22 ` [PATCH v1 4/7] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:08 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Save one config space access in pci_update_current_state() by
testing the retireved PCI_PM_CTRL register value against
PCI_POSSIBLE_ERROR() instead of invoking pci_device_is_present()
separately.

While at it, drop a pair of unnecessary parens.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1201,14 +1201,17 @@ static int pci_raw_set_power_state(struc
  */
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
 {
-	if (platform_pci_get_power_state(dev) == PCI_D3cold ||
-	    !pci_device_is_present(dev)) {
+	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
 		dev->current_state = PCI_D3cold;
 	} else if (dev->pm_cap) {
 		u16 pmcsr;
 
 		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			return;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 	} else {
 		dev->current_state = state;
 	}




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 4/7] PCI/PM: Rework changing power states of PCI devices
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2022-04-09 13:08 ` [PATCH v1 3/7] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
@ 2022-04-09 13:22 ` Rafael J. Wysocki
  2022-04-09 13:23 ` [PATCH v1 5/7] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:22 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are some issues related to changing power states of PCI
devices, mostly related to carrying out unnecessary actions in some
places, and the code is generally hard to follow.

 1. pci_power_up() has two callers, pci_set_power_state() and
    pci_pm_default_resume_early().  The latter updates the current
    power state of the device right after calling pci_power_up()
    and it restores the entire config space of the device right
    after that, so pci_power_up() itself need not read the
    PCI_PM_CTRL register or restore the BARs after programming the
    device into D0 in that case (which it does currently).
 
 2. It is generally hard to get a clear view of the pci_power_up()
    code flow, especially in some corner cases, due to all of the
    involved PCI_PM_CTRL register reads and writes occurring in
    pci_platform_power_transition() and in pci_raw_set_power_state(),
    some of which are redundant.

 3. The transitions from low-power states to D0 and the other way
    around are unnecessarily tangled in pci_raw_set_power_state()
    which causes it to use a redundant local variable and makes the
    code in it rather hard to follow.

To address the above shortcomings, make the following changes:

 a. Remove the code handling transitions into D0 from
    pci_raw_set_power_state() and rename that function as
    pci_set_low_power_state().

 b. Add the code handling transitions into D0 directly to
    pci_power_up() and to a new function pci_set_full_power_state()
    calling it internally that is only used in pci_set_power_state().

 c. Make pci_power_up() avoid redundant PCI_PM_CTRL register accesses
    and make it work in the same way for transitions from any low-
    power states (transitions from D1 and D2 are handled slightly
    differently, which is not really necessary, before the change).

 d. Put the restoration of the BARs and the PCI_PM_CTRL register read
    confirming the power state change into pci_set_full_power_state()
    to avoid doing these things in pci_pm_default_resume_early() in
    vain.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci.c |  145 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 100 insertions(+), 45 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,10 +1068,9 @@ static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_raw_set_power_state - Use PCI PM registers to set the power state of
- *			     given PCI device
+ * pci_set_low_power_state - Program the given device into a low-power state
  * @dev: PCI device to handle.
- * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -1080,10 +1079,9 @@ static inline bool platform_pci_bridge_d
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
-	bool need_restore = false;
 
 	/* Check if we're already there */
 	if (dev->current_state == state)
@@ -1092,7 +1090,7 @@ static int pci_raw_set_power_state(struc
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D0 || state > PCI_D3hot)
+	if (state < PCI_D1 || state > PCI_D3hot)
 		return -EINVAL;
 
 	/*
@@ -1101,8 +1099,7 @@ static int pci_raw_set_power_state(struc
 	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
-	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
-	    && dev->current_state > state) {
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
 		pci_err(dev, "invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
@@ -1127,23 +1124,9 @@ static int pci_raw_set_power_state(struc
 	 * This doesn't affect PME_Status, disables PME_En, and
 	 * sets PowerState to 0.
 	 */
-	switch (dev->current_state) {
-	case PCI_D0:
-	case PCI_D1:
-	case PCI_D2:
+	if (dev->current_state <= PCI_D2) {
 		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
 		pmcsr |= state;
-		break;
-	case PCI_D3hot:
-	case PCI_D3cold:
-	case PCI_UNKNOWN: /* Boot-up */
-		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
-			need_restore = true;
-		fallthrough;	/* force to D0 */
-	default:
-		pmcsr = 0;
-		break;
 	}
 
 	/* Enter specified state */
@@ -1153,9 +1136,9 @@ static int pci_raw_set_power_state(struc
 	 * Mandatory power management transition delays; see PCI PM 1.1
 	 * 5.6.1 table 18
 	 */
-	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+	if (state == PCI_D3hot)
 		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2 || dev->current_state == PCI_D2)
+	else if (state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
@@ -1165,22 +1148,6 @@ static int pci_raw_set_power_state(struc
 			 pci_power_name(dev->current_state),
 			 pci_power_name(state));
 
-	/*
-	 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
-	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
-	 * from D3hot to D0 _may_ perform an internal reset, thereby
-	 * going to "D0 Uninitialized" rather than "D0 Initialized".
-	 * For example, at least some versions of the 3c905B and the
-	 * 3c556B exhibit this behaviour.
-	 *
-	 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
-	 * devices in a D3hot state at boot.  Consequently, we need to
-	 * restore at least the BARs so that the device will be
-	 * accessible to its driver.
-	 */
-	if (need_restore)
-		pci_restore_bars(dev);
-
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
@@ -1312,8 +1279,54 @@ static int pci_dev_wait(struct pci_dev *
  */
 int pci_power_up(struct pci_dev *dev)
 {
-	pci_platform_power_transition(dev, PCI_D0);
-	return pci_raw_set_power_state(dev, PCI_D0);
+	int ret;
+
+	ret = pci_platform_power_transition(dev, PCI_D0);
+	if (ret) {
+		u16 pmcsr;
+
+		/*
+		 * The PCI_PM_CTRL register has not been read above, so read it
+		 * now and bail out if that fails.
+		 */
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			goto fail;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	} else if (dev->current_state == PCI_D3cold) {
+		/*
+		 * Since current_state is still PCI_D3cold, the power state seen
+		 * by the platform is still D3cold or the PCI_PM_CTRL register
+		 * read in pci_update_current_state() has failed, so assume the
+		 * device to be inaccessible.
+		 */
+		goto fail;
+	}
+
+	/* There's nothing more to do if current_state is D0 at this point. */
+	if (dev->current_state == PCI_D0)
+		return 0;
+
+	/*
+	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
+	 * doesn't affect PME_Status, disables PME_En, and sets PowerState to 0)
+	 * and wait for the prescribed amount of time.  Assume success.
+	 */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
+
+	if (dev->current_state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (dev->current_state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	dev->current_state = PCI_D0;
+	return 0;
+
+fail:
+	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
+	return -ENODEV;
 }
 
 /**
@@ -1340,6 +1353,48 @@ void pci_bus_set_current_state(struct pc
 		pci_walk_bus(bus, __pci_dev_set_current_state, &state);
 }
 
+static int pci_set_full_power_state(struct pci_dev *dev)
+{
+	pci_power_t old_state = dev->current_state;
+	u16 pmcsr;
+	int ret;
+
+	ret = pci_power_up(dev);
+	if (ret)
+		return ret;
+
+	if (!dev->pm_cap)
+		return 0;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	if (dev->current_state != PCI_D0) {
+		pci_info_ratelimited(dev, "Refused to change power state from %s to D0\n",
+				     pci_power_name(dev->current_state));
+	} else if (old_state >= PCI_D3hot && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		/*
+		 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
+		 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
+		 * from D3hot to D0 _may_ perform an internal reset, thereby
+		 * going to "D0 Uninitialized" rather than "D0 Initialized". For
+		 * example, at least some versions of the 3c905B and the 3c556B
+		 * exhibit this behaviour.
+		 *
+		 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
+		 * devices in a D3hot state at boot. Consequently, we need to
+		 * restore at least the BARs so that the device will be
+		 * accessible to its driver.
+		 */
+		pci_restore_bars(dev);
+	}
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
 /**
  * pci_set_power_state - Set the power state of a PCI device
  * @dev: PCI device to handle.
@@ -1381,7 +1436,7 @@ int pci_set_power_state(struct pci_dev *
 		return 0;
 
 	if (state == PCI_D0)
-		return pci_power_up(dev);
+		return pci_set_full_power_state(dev);
 
 	/*
 	 * This device is quirked not to be put into D3, so don't put it in
@@ -1394,7 +1449,7 @@ int pci_set_power_state(struct pci_dev *
 	 * To put device in D3cold, we put device into D3hot in native
 	 * way, then put device into D3cold with platform ops
 	 */
-	error = pci_raw_set_power_state(dev, state > PCI_D3hot ?
+	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
 					PCI_D3hot : state);
 
 	if (pci_platform_power_transition(dev, state))




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 5/7] PCI/PM: Move pci_set_low_power_state() next to its caller
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2022-04-09 13:22 ` [PATCH v1 4/7] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
@ 2022-04-09 13:23 ` Rafael J. Wysocki
  2022-04-09 13:26 ` [PATCH v1 6/7] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:23 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Because pci_set_power_state() is the only caller of
pci_set_low_power_state(), move the latter next to the former.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci.c |  174 +++++++++++++++++++++++++++---------------------------
 1 file changed, 87 insertions(+), 87 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,93 +1068,6 @@ static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_set_low_power_state - Program the given device into a low-power state
- * @dev: PCI device to handle.
- * @state: PCI power state (D1, D2, D3hot) to put the device into.
- *
- * RETURN VALUE:
- * -EINVAL if the requested state is invalid.
- * -EIO if device does not support PCI PM or its PM capabilities register has a
- * wrong version, or device doesn't support the requested state.
- * 0 if device already is in the requested state.
- * 0 if device's power state has been successfully changed.
- */
-static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
-{
-	u16 pmcsr;
-
-	/* Check if we're already there */
-	if (dev->current_state == state)
-		return 0;
-
-	if (!dev->pm_cap)
-		return -EIO;
-
-	if (state < PCI_D1 || state > PCI_D3hot)
-		return -EINVAL;
-
-	/*
-	 * Validate transition: We can enter D0 from any state, but if
-	 * we're already in a low-power state, we can only go deeper.  E.g.,
-	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
-	 * we'd have to go from D3 to D0, then to D1.
-	 */
-	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
-		pci_err(dev, "invalid power transition (from %s to %s)\n",
-			pci_power_name(dev->current_state),
-			pci_power_name(state));
-		return -EINVAL;
-	}
-
-	/* Check if this device supports the desired state */
-	if ((state == PCI_D1 && !dev->d1_support)
-	   || (state == PCI_D2 && !dev->d2_support))
-		return -EIO;
-
-	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-	if (PCI_POSSIBLE_ERROR(pmcsr)) {
-		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
-			pci_power_name(dev->current_state),
-			pci_power_name(state));
-		return -EIO;
-	}
-
-	/*
-	 * If we're (effectively) in D3, force entire word to 0.
-	 * This doesn't affect PME_Status, disables PME_En, and
-	 * sets PowerState to 0.
-	 */
-	if (dev->current_state <= PCI_D2) {
-		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-		pmcsr |= state;
-	}
-
-	/* Enter specified state */
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
-
-	/*
-	 * Mandatory power management transition delays; see PCI PM 1.1
-	 * 5.6.1 table 18
-	 */
-	if (state == PCI_D3hot)
-		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2)
-		udelay(PCI_PM_D2_DELAY);
-
-	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-	dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
-	if (dev->current_state != state)
-		pci_info_ratelimited(dev, "refused to change power state from %s to %s\n",
-			 pci_power_name(dev->current_state),
-			 pci_power_name(state));
-
-	if (dev->bus->self)
-		pcie_aspm_pm_state_change(dev->bus->self);
-
-	return 0;
-}
-
-/**
  * pci_update_current_state - Read power state of given device and cache it
  * @dev: PCI device to handle.
  * @state: State to cache in case the device doesn't have the PM capability
@@ -1391,6 +1304,93 @@ static int pci_set_full_power_state(stru
 
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
+/**
+ * pci_set_low_power_state - Program the given device into a low-power state
+ * @dev: PCI device to handle.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
+ *
+ * RETURN VALUE:
+ * -EINVAL if the requested state is invalid.
+ * -EIO if device does not support PCI PM or its PM capabilities register has a
+ * wrong version, or device doesn't support the requested state.
+ * 0 if device already is in the requested state.
+ * 0 if device's power state has been successfully changed.
+ */
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
+{
+	u16 pmcsr;
+
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (!dev->pm_cap)
+		return -EIO;
+
+	if (state < PCI_D1 || state > PCI_D3hot)
+		return -EINVAL;
+
+	/*
+	 * Validate transition: We can enter D0 from any state, but if
+	 * we're already in a low-power state, we can only go deeper.  E.g.,
+	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
+	 * we'd have to go from D3 to D0, then to D1.
+	 */
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
+		pci_err(dev, "invalid power transition (from %s to %s)\n",
+			pci_power_name(dev->current_state),
+			pci_power_name(state));
+		return -EINVAL;
+	}
+
+	/* Check if this device supports the desired state */
+	if ((state == PCI_D1 && !dev->d1_support)
+	   || (state == PCI_D2 && !dev->d2_support))
+		return -EIO;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+	if (PCI_POSSIBLE_ERROR(pmcsr)) {
+		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
+			pci_power_name(dev->current_state),
+			pci_power_name(state));
+		return -EIO;
+	}
+
+	/*
+	 * If we're (effectively) in D3, force entire word to 0.
+	 * This doesn't affect PME_Status, disables PME_En, and
+	 * sets PowerState to 0.
+	 */
+	if (dev->current_state <= PCI_D2) {
+		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+		pmcsr |= state;
+	}
+
+	/* Enter specified state */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+
+	/*
+	 * Mandatory power management transition delays; see PCI PM 1.1
+	 * 5.6.1 table 18
+	 */
+	if (state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+	dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+	if (dev->current_state != state)
+		pci_info_ratelimited(dev, "refused to change power state from %s to %s\n",
+			 pci_power_name(dev->current_state),
+			 pci_power_name(state));
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 6/7] PCI/PM: Clean up pci_set_low_power_state()
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2022-04-09 13:23 ` [PATCH v1 5/7] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
@ 2022-04-09 13:26 ` Rafael J. Wysocki
  2022-04-09 13:28 ` [PATCH v1 7/7] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:26 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the following assorted non-essential changes in
pci_set_low_power_state():

 1. Drop two redundant checks from it (the caller takes care of these
    conditions).

 2. Modify error messages printed by it to make them more
    consistent with the messages printed by pci_power_up() and
    pci_set_full_power_state().

 3. Change the log level of one of the messages mentioned above to
    "debug", because it only indicates a programming mistake.

 4. Make the function return -ENODEV (instead of -EIO) when the
    device is not accessible.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci.c |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1324,16 +1324,9 @@ static int pci_set_low_power_state(struc
 {
 	u16 pmcsr;
 
-	/* Check if we're already there */
-	if (dev->current_state == state)
-		return 0;
-
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D1 || state > PCI_D3hot)
-		return -EINVAL;
-
 	/*
 	 * Validate transition: We can enter D0 from any state, but if
 	 * we're already in a low-power state, we can only go deeper.  E.g.,
@@ -1341,7 +1334,7 @@ static int pci_set_low_power_state(struc
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
 	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
-		pci_err(dev, "invalid power transition (from %s to %s)\n",
+		pci_dbg(dev, "Invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
 		return -EINVAL;
@@ -1354,10 +1347,10 @@ static int pci_set_low_power_state(struc
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 	if (PCI_POSSIBLE_ERROR(pmcsr)) {
-		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
+		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
-		return -EIO;
+		return -ENODEV;
 	}
 
 	/*




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 7/7] PCI/PM: Rearrange pci_set_power_state()
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2022-04-09 13:26 ` [PATCH v1 6/7] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
@ 2022-04-09 13:28 ` Rafael J. Wysocki
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
  7 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-09 13:28 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The part of pci_set_power_state() related to transitions into
low-power states is unnecessary convoluted, so clearly divide it
into the D3cold special case and the general case covering all of
the other states.

Also fix a potential issue with calling pci_bus_set_current_state()
to set the current state of all devices on the subordinate to D3cold
without checking if the power state of the parent bridge has really
changed to D3cold.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1438,19 +1438,25 @@ int pci_set_power_state(struct pci_dev *
 	if (state >= PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	/*
-	 * To put device in D3cold, we put device into D3hot in native
-	 * way, then put device into D3cold with platform ops
-	 */
-	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
-					PCI_D3hot : state);
+	if (state == PCI_D3cold) {
+		/*
+		 * To put the device in D3cold, put it into D3hot in the native
+		 * way, then put it into D3cold using platform ops.
+		 */
+		error = pci_set_low_power_state(dev, PCI_D3hot);
 
-	if (pci_platform_power_transition(dev, state))
-		return error;
+		if (pci_platform_power_transition(dev, PCI_D3cold))
+			return error;
 
-	/* Powering off a bridge may power off the whole hierarchy */
-	if (state == PCI_D3cold)
-		pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+		/* Powering off a bridge may power off the whole hierarchy */
+		if (dev->current_state == PCI_D3cold)
+			pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+	} else {
+		error = pci_set_low_power_state(dev, state);
+
+		if (pci_platform_power_transition(dev, state))
+			return error;
+	}
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2022-04-09 13:28 ` [PATCH v1 7/7] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
@ 2022-04-11 14:17 ` Rafael J. Wysocki
  2022-04-11 14:19   ` [PATCH v2 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
                     ` (11 more replies)
  7 siblings, 12 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:17 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

Hi All,

On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> Hi All,
> 
> This series supersedes the one at
> 
> https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> 
> It addresses some potential issues related to PCI device transitions from
> low-power states into D0 and makes the related code more straightforward
> and so easier to follow.
> 
> Please refer to the patch changelogs for details.

Here's a v2 of this patch series which is being sent, because I realized that
one of the checks in pci_power_up() added by patch [4/7] in v1 was redundant
and can be dropped, but that affected the last 3 patches in the series and
then I noticed that more improvements were possible and hence the new patches
[2/9].

Thanks!




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 1/9] PCI/PM: Resume subordinate bus in bus type callbacks
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
@ 2022-04-11 14:19   ` Rafael J. Wysocki
  2022-04-11 14:20   ` [PATCH v2 2/9] PCI/PM: Drop the runtime_d3cold device flag Rafael J. Wysocki
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:19 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Calling pci_resume_bus() on the secondary bus from pci_power_up() as
it is done now is questionable, because it depends on the mandatory
bridge power-up delays that are only covered by the PCI bus type PM
callbacks.

For this reason, move the subordinate bus resume to those callbacks
too and use the observation that if a bridge is being powered-up
during resume from system-wide suspend, it may be still desirable
to runtime-resume its subordinate bus after completing the system-
wide transition (in case the resume of the devices on that bus is
skipped during it).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: No changes.

---
 drivers/pci/pci-driver.c |   15 +++++++++++++--
 drivers/pci/pci.c        |   15 ---------------
 2 files changed, 13 insertions(+), 17 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -559,6 +559,17 @@ static void pci_pm_default_resume_early(
 	pci_pme_restore(pci_dev);
 }
 
+static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
+{
+	pci_bridge_wait_for_secondary_bus(pci_dev);
+	/*
+	 * When powering on a bridge from D3cold, the whole hierarchy may be
+	 * powered on into D0uninitialized state, resume them to give them a
+	 * chance to suspend again
+	 */
+	pci_resume_bus(pci_dev->subordinate);
+}
+
 #endif
 
 #ifdef CONFIG_PM_SLEEP
@@ -934,7 +945,7 @@ static int pci_pm_resume_noirq(struct de
 	pcie_pme_root_status_cleanup(pci_dev);
 
 	if (!skip_bus_pm && prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_pm_bridge_power_up_actions(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return 0;
@@ -1321,7 +1332,7 @@ static int pci_pm_runtime_resume(struct
 	pci_pm_default_resume(pci_dev);
 
 	if (prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_pm_bridge_power_up_actions(pci_dev);
 
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1310,21 +1310,6 @@ static int pci_dev_wait(struct pci_dev *
 int pci_power_up(struct pci_dev *dev)
 {
 	pci_platform_power_transition(dev, PCI_D0);
-
-	/*
-	 * Mandatory power management transition delays are handled in
-	 * pci_pm_resume_noirq() and pci_pm_runtime_resume() of the
-	 * corresponding bridge.
-	 */
-	if (dev->runtime_d3cold) {
-		/*
-		 * When powering on a bridge from D3cold, the whole hierarchy
-		 * may be powered on into D0uninitialized state, resume them to
-		 * give them a chance to suspend again
-		 */
-		pci_resume_bus(dev->subordinate);
-	}
-
 	return pci_raw_set_power_state(dev, PCI_D0);
 }
 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 2/9] PCI/PM: Drop the runtime_d3cold device flag
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
  2022-04-11 14:19   ` [PATCH v2 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
@ 2022-04-11 14:20   ` Rafael J. Wysocki
  2022-04-11 14:21   ` [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:20 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

This flag is not needed any more, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: No changes.

---
 drivers/pci/pci-driver.c |    2 --
 drivers/pci/pci.c        |    3 ---
 include/linux/pci.h      |    4 ----
 3 files changed, 9 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1337,8 +1337,6 @@ static int pci_pm_runtime_resume(struct
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
 
-	pci_dev->runtime_d3cold = false;
-
 	return error;
 }
 
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -2703,8 +2703,6 @@ int pci_finish_runtime_suspend(struct pc
 	if (target_state == PCI_POWER_ERROR)
 		return -EIO;
 
-	dev->runtime_d3cold = target_state == PCI_D3cold;
-
 	/*
 	 * There are systems (for example, Intel mobile chips since Coffee
 	 * Lake) where the power drawn while suspended can be significantly
@@ -2722,7 +2720,6 @@ int pci_finish_runtime_suspend(struct pc
 	if (error) {
 		pci_enable_wake(dev, target_state, false);
 		pci_restore_ptm_state(dev);
-		dev->runtime_d3cold = false;
 	}
 
 	return error;
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -379,10 +379,6 @@ struct pci_dev {
 	unsigned int	mmio_always_on:1;	/* Disallow turning off io/mem
 						   decoding during BAR sizing */
 	unsigned int	wakeup_prepared:1;
-	unsigned int	runtime_d3cold:1;	/* Whether go through runtime
-						   D3cold, not set for devices
-						   powered on/off by the
-						   corresponding bridge */
 	unsigned int	skip_bus_pm:1;	/* Internal: Skip bus-level PM */
 	unsigned int	ignore_hotplug:1;	/* Ignore hotplug events */
 	unsigned int	hotplug_user_indicators:1; /* SlotCtl indicators




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state()
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
  2022-04-11 14:19   ` [PATCH v2 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
  2022-04-11 14:20   ` [PATCH v2 2/9] PCI/PM: Drop the runtime_d3cold device flag Rafael J. Wysocki
@ 2022-04-11 14:21   ` Rafael J. Wysocki
  2022-04-12  9:42     ` Mika Westerberg
  2022-04-11 14:25   ` [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:21 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Save one config space access in pci_update_current_state() by
testing the retireved PCI_PM_CTRL register value against
PCI_POSSIBLE_ERROR() instead of invoking pci_device_is_present()
separately.

While at it, drop a pair of unnecessary parens.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: No changes.

---
 drivers/pci/pci.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1201,14 +1201,17 @@ static int pci_raw_set_power_state(struc
  */
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
 {
-	if (platform_pci_get_power_state(dev) == PCI_D3cold ||
-	    !pci_device_is_present(dev)) {
+	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
 		dev->current_state = PCI_D3cold;
 	} else if (dev->pm_cap) {
 		u16 pmcsr;
 
 		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			return;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 	} else {
 		dev->current_state = state;
 	}




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2022-04-11 14:21   ` [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
@ 2022-04-11 14:25   ` Rafael J. Wysocki
  2022-04-12 10:00     ` Mika Westerberg
  2022-04-11 14:27   ` [PATCH v2 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
                     ` (7 subsequent siblings)
  11 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:25 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are some issues related to changing power states of PCI
devices, mostly related to carrying out unnecessary actions in some
places, and the code is generally hard to follow.

 1. pci_power_up() has two callers, pci_set_power_state() and
    pci_pm_default_resume_early().  The latter updates the current
    power state of the device right after calling pci_power_up()
    and it restores the entire config space of the device right
    after that, so pci_power_up() itself need not read the
    PCI_PM_CTRL register or restore the BARs after programming the
    device into D0 in that case.
 
 2. It is generally hard to get a clear view of the pci_power_up()
    code flow, especially in some corner cases, due to all of the
    involved PCI_PM_CTRL register reads and writes occurring in
    pci_platform_power_transition() and in pci_raw_set_power_state(),
    some of which are redundant.

 3. The transitions from low-power states to D0 and the other way
    around are unnecessarily tangled in pci_raw_set_power_state()
    which causes it to use a redundant local variable and makes it
    rather hard to follow.

To address the above shortcomings, make the following changes:

 a. Remove the code handling transitions into D0
    from pci_raw_set_power_state() and rename it as
    pci_set_low_power_state().

 b. Add the code handling transitions into D0 directly
    to pci_power_up() and to a new wrapper function
    pci_set_full_power_state() calling it internally that is
    only used in pci_set_power_state().

 c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
    and make it work in the same way for transitions from any
    low-power states (transitions from D1 and D2 are handled
    slightly differently before the change).

 d. Put the restoration of the BARs and the PCI_PM_CTRL
    register read confirming the power state change into
    pci_set_full_power_state() to avoid doing that in
    pci_pm_default_resume_early() unnecessarily.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2:
   * Do not add a redundant check to pci_set_low_power_state().

---
 drivers/pci/pci.c |  154 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 101 insertions(+), 53 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,10 +1068,9 @@ static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_raw_set_power_state - Use PCI PM registers to set the power state of
- *			     given PCI device
+ * pci_set_low_power_state - Program the given device into a low-power state
  * @dev: PCI device to handle.
- * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -1080,10 +1079,9 @@ static inline bool platform_pci_bridge_d
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
-	bool need_restore = false;
 
 	/* Check if we're already there */
 	if (dev->current_state == state)
@@ -1092,7 +1090,7 @@ static int pci_raw_set_power_state(struc
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D0 || state > PCI_D3hot)
+	if (state < PCI_D1 || state > PCI_D3hot)
 		return -EINVAL;
 
 	/*
@@ -1101,8 +1099,7 @@ static int pci_raw_set_power_state(struc
 	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
-	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
-	    && dev->current_state > state) {
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
 		pci_err(dev, "invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
@@ -1122,29 +1119,8 @@ static int pci_raw_set_power_state(struc
 		return -EIO;
 	}
 
-	/*
-	 * If we're (effectively) in D3, force entire word to 0.
-	 * This doesn't affect PME_Status, disables PME_En, and
-	 * sets PowerState to 0.
-	 */
-	switch (dev->current_state) {
-	case PCI_D0:
-	case PCI_D1:
-	case PCI_D2:
-		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-		pmcsr |= state;
-		break;
-	case PCI_D3hot:
-	case PCI_D3cold:
-	case PCI_UNKNOWN: /* Boot-up */
-		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
-			need_restore = true;
-		fallthrough;	/* force to D0 */
-	default:
-		pmcsr = 0;
-		break;
-	}
+	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+	pmcsr |= state;
 
 	/* Enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
@@ -1153,9 +1129,9 @@ static int pci_raw_set_power_state(struc
 	 * Mandatory power management transition delays; see PCI PM 1.1
 	 * 5.6.1 table 18
 	 */
-	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+	if (state == PCI_D3hot)
 		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2 || dev->current_state == PCI_D2)
+	else if (state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
@@ -1165,22 +1141,6 @@ static int pci_raw_set_power_state(struc
 			 pci_power_name(dev->current_state),
 			 pci_power_name(state));
 
-	/*
-	 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
-	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
-	 * from D3hot to D0 _may_ perform an internal reset, thereby
-	 * going to "D0 Uninitialized" rather than "D0 Initialized".
-	 * For example, at least some versions of the 3c905B and the
-	 * 3c556B exhibit this behaviour.
-	 *
-	 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
-	 * devices in a D3hot state at boot.  Consequently, we need to
-	 * restore at least the BARs so that the device will be
-	 * accessible to its driver.
-	 */
-	if (need_restore)
-		pci_restore_bars(dev);
-
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
@@ -1312,8 +1272,54 @@ static int pci_dev_wait(struct pci_dev *
  */
 int pci_power_up(struct pci_dev *dev)
 {
-	pci_platform_power_transition(dev, PCI_D0);
-	return pci_raw_set_power_state(dev, PCI_D0);
+	int ret;
+
+	ret = pci_platform_power_transition(dev, PCI_D0);
+	if (ret) {
+		u16 pmcsr;
+
+		/*
+		 * The PCI_PM_CTRL register has not been read above, so read it
+		 * now and bail out if that fails.
+		 */
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			goto fail;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	} else if (dev->current_state == PCI_D3cold) {
+		/*
+		 * Since current_state is PCI_D3cold here, the power state seen
+		 * by the platform is still D3cold or the PCI_PM_CTRL register
+		 * read in pci_update_current_state() has failed, so assume the
+		 * device to be inaccessible.
+		 */
+		goto fail;
+	}
+
+	/* There's nothing more to do if current_state is D0 at this point. */
+	if (dev->current_state == PCI_D0)
+		return 0;
+
+	/*
+	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
+	 * doesn't affect PME_Status, disables PME_En, and sets PowerState to 0)
+	 * and wait for the prescribed amount of time.  Assume success.
+	 */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
+
+	if (dev->current_state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (dev->current_state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	dev->current_state = PCI_D0;
+	return 0;
+
+fail:
+	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
+	return -ENODEV;
 }
 
 /**
@@ -1340,6 +1346,48 @@ void pci_bus_set_current_state(struct pc
 		pci_walk_bus(bus, __pci_dev_set_current_state, &state);
 }
 
+static int pci_set_full_power_state(struct pci_dev *dev)
+{
+	pci_power_t old_state = dev->current_state;
+	u16 pmcsr;
+	int ret;
+
+	ret = pci_power_up(dev);
+	if (ret)
+		return ret;
+
+	if (!dev->pm_cap)
+		return 0;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	if (dev->current_state != PCI_D0) {
+		pci_info_ratelimited(dev, "Refused to change power state from %s to D0\n",
+				     pci_power_name(dev->current_state));
+	} else if (old_state >= PCI_D3hot && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		/*
+		 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
+		 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
+		 * from D3hot to D0 _may_ perform an internal reset, thereby
+		 * going to "D0 Uninitialized" rather than "D0 Initialized". For
+		 * example, at least some versions of the 3c905B and the 3c556B
+		 * exhibit this behaviour.
+		 *
+		 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
+		 * devices in a D3hot state at boot. Consequently, we need to
+		 * restore at least the BARs so that the device will be
+		 * accessible to its driver.
+		 */
+		pci_restore_bars(dev);
+	}
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
 /**
  * pci_set_power_state - Set the power state of a PCI device
  * @dev: PCI device to handle.
@@ -1381,7 +1429,7 @@ int pci_set_power_state(struct pci_dev *
 		return 0;
 
 	if (state == PCI_D0)
-		return pci_power_up(dev);
+		return pci_set_full_power_state(dev);
 
 	/*
 	 * This device is quirked not to be put into D3, so don't put it in
@@ -1394,7 +1442,7 @@ int pci_set_power_state(struct pci_dev *
 	 * To put device in D3cold, we put device into D3hot in native
 	 * way, then put device into D3cold with platform ops
 	 */
-	error = pci_raw_set_power_state(dev, state > PCI_D3hot ?
+	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
 					PCI_D3hot : state);
 
 	if (pci_platform_power_transition(dev, state))




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2022-04-11 14:25   ` [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
@ 2022-04-11 14:27   ` Rafael J. Wysocki
  2022-04-11 14:28   ` [PATCH v2 6/9] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:27 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Because pci_set_power_state() is the only caller of
pci_set_low_power_state(), move the latter next to the former.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2:
   * Take v1 -> v2 difference in the previous patch into account.

---
 drivers/pci/pci.c |  160 +++++++++++++++++++++++++++---------------------------
 1 file changed, 80 insertions(+), 80 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,86 +1068,6 @@ static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_set_low_power_state - Program the given device into a low-power state
- * @dev: PCI device to handle.
- * @state: PCI power state (D1, D2, D3hot) to put the device into.
- *
- * RETURN VALUE:
- * -EINVAL if the requested state is invalid.
- * -EIO if device does not support PCI PM or its PM capabilities register has a
- * wrong version, or device doesn't support the requested state.
- * 0 if device already is in the requested state.
- * 0 if device's power state has been successfully changed.
- */
-static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
-{
-	u16 pmcsr;
-
-	/* Check if we're already there */
-	if (dev->current_state == state)
-		return 0;
-
-	if (!dev->pm_cap)
-		return -EIO;
-
-	if (state < PCI_D1 || state > PCI_D3hot)
-		return -EINVAL;
-
-	/*
-	 * Validate transition: We can enter D0 from any state, but if
-	 * we're already in a low-power state, we can only go deeper.  E.g.,
-	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
-	 * we'd have to go from D3 to D0, then to D1.
-	 */
-	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
-		pci_err(dev, "invalid power transition (from %s to %s)\n",
-			pci_power_name(dev->current_state),
-			pci_power_name(state));
-		return -EINVAL;
-	}
-
-	/* Check if this device supports the desired state */
-	if ((state == PCI_D1 && !dev->d1_support)
-	   || (state == PCI_D2 && !dev->d2_support))
-		return -EIO;
-
-	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-	if (PCI_POSSIBLE_ERROR(pmcsr)) {
-		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
-			pci_power_name(dev->current_state),
-			pci_power_name(state));
-		return -EIO;
-	}
-
-	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-	pmcsr |= state;
-
-	/* Enter specified state */
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
-
-	/*
-	 * Mandatory power management transition delays; see PCI PM 1.1
-	 * 5.6.1 table 18
-	 */
-	if (state == PCI_D3hot)
-		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2)
-		udelay(PCI_PM_D2_DELAY);
-
-	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-	dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
-	if (dev->current_state != state)
-		pci_info_ratelimited(dev, "refused to change power state from %s to %s\n",
-			 pci_power_name(dev->current_state),
-			 pci_power_name(state));
-
-	if (dev->bus->self)
-		pcie_aspm_pm_state_change(dev->bus->self);
-
-	return 0;
-}
-
-/**
  * pci_update_current_state - Read power state of given device and cache it
  * @dev: PCI device to handle.
  * @state: State to cache in case the device doesn't have the PM capability
@@ -1384,6 +1304,86 @@ static int pci_set_full_power_state(stru
 
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
+/**
+ * pci_set_low_power_state - Program the given device into a low-power state
+ * @dev: PCI device to handle.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
+ *
+ * RETURN VALUE:
+ * -EINVAL if the requested state is invalid.
+ * -EIO if device does not support PCI PM or its PM capabilities register has a
+ * wrong version, or device doesn't support the requested state.
+ * 0 if device already is in the requested state.
+ * 0 if device's power state has been successfully changed.
+ */
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
+{
+	u16 pmcsr;
+
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (!dev->pm_cap)
+		return -EIO;
+
+	if (state < PCI_D1 || state > PCI_D3hot)
+		return -EINVAL;
+
+	/*
+	 * Validate transition: We can enter D0 from any state, but if
+	 * we're already in a low-power state, we can only go deeper.  E.g.,
+	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
+	 * we'd have to go from D3 to D0, then to D1.
+	 */
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
+		pci_err(dev, "invalid power transition (from %s to %s)\n",
+			pci_power_name(dev->current_state),
+			pci_power_name(state));
+		return -EINVAL;
+	}
+
+	/* Check if this device supports the desired state */
+	if ((state == PCI_D1 && !dev->d1_support)
+	   || (state == PCI_D2 && !dev->d2_support))
+		return -EIO;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+	if (PCI_POSSIBLE_ERROR(pmcsr)) {
+		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
+			pci_power_name(dev->current_state),
+			pci_power_name(state));
+		return -EIO;
+	}
+
+	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+	pmcsr |= state;
+
+	/* Enter specified state */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+
+	/*
+	 * Mandatory power management transition delays; see PCI PM 1.1
+	 * 5.6.1 table 18
+	 */
+	if (state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+	dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+	if (dev->current_state != state)
+		pci_info_ratelimited(dev, "refused to change power state from %s to %s\n",
+			 pci_power_name(dev->current_state),
+			 pci_power_name(state));
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 6/9] PCI/PM: Clean up pci_set_low_power_state()
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2022-04-11 14:27   ` [PATCH v2 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
@ 2022-04-11 14:28   ` Rafael J. Wysocki
  2022-04-11 14:30   ` [PATCH v2 7/9] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:28 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the following assorted non-essential changes in
pci_set_low_power_state():

 1. Drop two redundant checks from it (the caller takes care of these
    conditions).

 2. Modify error messages printed by it to make them more consistent
    with the messages printed by pci_power_up() and
    pci_set_full_power_state().

 3. Change the log level of one of the messages to "debug", because
    it only indicates a programming mistake.

 4. Make it return -ENODEV (instead of -EIO) when the device is not
    accessible.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2:
   * Rebase on top of the [4-5/9].

---
 drivers/pci/pci.c |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1324,16 +1324,9 @@ static int pci_set_low_power_state(struc
 {
 	u16 pmcsr;
 
-	/* Check if we're already there */
-	if (dev->current_state == state)
-		return 0;
-
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D1 || state > PCI_D3hot)
-		return -EINVAL;
-
 	/*
 	 * Validate transition: We can enter D0 from any state, but if
 	 * we're already in a low-power state, we can only go deeper.  E.g.,
@@ -1341,7 +1334,7 @@ static int pci_set_low_power_state(struc
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
 	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
-		pci_err(dev, "invalid power transition (from %s to %s)\n",
+		pci_dbg(dev, "Invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
 		return -EINVAL;
@@ -1354,10 +1347,10 @@ static int pci_set_low_power_state(struc
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 	if (PCI_POSSIBLE_ERROR(pmcsr)) {
-		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
+		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
-		return -EIO;
+		return -ENODEV;
 	}
 
 	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 7/9] PCI/PM: Rearrange pci_set_power_state()
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (5 preceding siblings ...)
  2022-04-11 14:28   ` [PATCH v2 6/9] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
@ 2022-04-11 14:30   ` Rafael J. Wysocki
  2022-04-11 14:31   ` [PATCH v2 8/9] PCI/PM: Avoid redundant current_state update Rafael J. Wysocki
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:30 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The part of pci_set_power_state() related to transitions into
low-power states is unnecessary convoluted, so clearly divide it
into the D3cold special case and the general case covering all of
the other states.

Also fix a potential issue with calling pci_bus_set_current_state()
to set the current state of all devices on the subordinate to D3cold
without checking if the power state of the parent bridge has really
changed to D3cold.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: No changes.

---
 drivers/pci/pci.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1431,19 +1431,25 @@ int pci_set_power_state(struct pci_dev *
 	if (state >= PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	/*
-	 * To put device in D3cold, we put device into D3hot in native
-	 * way, then put device into D3cold with platform ops
-	 */
-	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
-					PCI_D3hot : state);
+	if (state == PCI_D3cold) {
+		/*
+		 * To put the device in D3cold, put it into D3hot in the native
+		 * way, then put it into D3cold using platform ops.
+		 */
+		error = pci_set_low_power_state(dev, PCI_D3hot);
 
-	if (pci_platform_power_transition(dev, state))
-		return error;
+		if (pci_platform_power_transition(dev, PCI_D3cold))
+			return error;
 
-	/* Powering off a bridge may power off the whole hierarchy */
-	if (state == PCI_D3cold)
-		pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+		/* Powering off a bridge may power off the whole hierarchy */
+		if (dev->current_state == PCI_D3cold)
+			pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+	} else {
+		error = pci_set_low_power_state(dev, state);
+
+		if (pci_platform_power_transition(dev, state))
+			return error;
+	}
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 8/9] PCI/PM: Avoid redundant current_state update
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (6 preceding siblings ...)
  2022-04-11 14:30   ` [PATCH v2 7/9] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
@ 2022-04-11 14:31   ` Rafael J. Wysocki
  2022-04-11 14:33   ` [PATCH v2 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Rafael J. Wysocki
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:31 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Notice that if pci_power_up() returns success early, without updating
the given device's PCI_PM_CTRL register, because it has verified that
the power state of the device is D0 already at that point, the
pci_update_current_state() invocation in pci_pm_default_resume_early()
is redundant.

Avoid that redundant call by making pci_power_up() return 1 when it
has updated the device's PCI_PM_CTRL register and checking its return
value in pci_pm_default_resume_early().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

New patch in v2.

---
 drivers/pci/pci-driver.c |    5 +++--
 drivers/pci/pci.c        |   10 ++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -553,8 +553,9 @@ static void pci_pm_default_resume(struct
 
 static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 {
-	pci_power_up(pci_dev);
-	pci_update_current_state(pci_dev, PCI_D0);
+	if (pci_power_up(pci_dev))
+		pci_update_current_state(pci_dev, PCI_D0);
+
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
 }
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1189,6 +1189,12 @@ static int pci_dev_wait(struct pci_dev *
 /**
  * pci_power_up - Put the given device into D0
  * @dev: PCI device to power up
+ *
+ * Use the platform firmware and the PCI_PM_CTRL register to put @dev into D0.
+ *
+ * Return 0 if invoking the platform firmware was sufficient to put @dev into D0
+ * (and so the PCI_PM_CTRL register was not updated), or 1 if it was necessary to
+ * update the PCI_PM_CTRL register, or -ENODEV if the device was not accessible.
  */
 int pci_power_up(struct pci_dev *dev)
 {
@@ -1235,7 +1241,7 @@ int pci_power_up(struct pci_dev *dev)
 		udelay(PCI_PM_D2_DELAY);
 
 	dev->current_state = PCI_D0;
-	return 0;
+	return 1;
 
 fail:
 	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
@@ -1273,7 +1279,7 @@ static int pci_set_full_power_state(stru
 	int ret;
 
 	ret = pci_power_up(dev);
-	if (ret)
+	if (ret < 0)
 		return ret;
 
 	if (!dev->pm_cap)




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq()
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (7 preceding siblings ...)
  2022-04-11 14:31   ` [PATCH v2 8/9] PCI/PM: Avoid redundant current_state update Rafael J. Wysocki
@ 2022-04-11 14:33   ` Rafael J. Wysocki
  2022-04-11 14:33   ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:33 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Calling pci_set_power_state() to put the given device into D0 in
pci_pm_thaw_noirq() may cause it to restore the device's BARs, which
is redundant before calling pci_restore_state(), so replace it with
a direct pci_power_up() call followed by pci_update_current_state()
if it returns a nonzeor value, in analogy with
pci_pm_default_resume_early().

Avoid code duplication by introducing a wrapper function to contain
the repeating pattern and calling it in both places.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

New patch in v2.

---
 drivers/pci/pci-driver.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -551,11 +551,15 @@ static void pci_pm_default_resume(struct
 	pci_enable_wake(pci_dev, PCI_D0, false);
 }
 
-static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
+static void pci_pm_power_up_and_verify_state(struct pci_dev *pci_dev)
 {
 	if (pci_power_up(pci_dev))
 		pci_update_current_state(pci_dev, PCI_D0);
+}
 
+static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
+{
+	pci_pm_power_up_and_verify_state(pci_dev);
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
 }
@@ -1080,7 +1084,7 @@ static int pci_pm_thaw_noirq(struct devi
 	 * in case the driver's "freeze" callbacks put it into a low-power
 	 * state.
 	 */
-	pci_set_power_state(pci_dev, PCI_D0);
+	pci_pm_power_up_and_verify_state(pci_dev);
 	pci_restore_state(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (8 preceding siblings ...)
  2022-04-11 14:33   ` [PATCH v2 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Rafael J. Wysocki
@ 2022-04-11 14:33   ` Rafael J. Wysocki
  2022-04-12 10:08   ` Mika Westerberg
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
  11 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-11 14:33 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

On Monday, April 11, 2022 4:17:41 PM CEST Rafael J. Wysocki wrote:
> Hi All,
> 
> On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> > Hi All,
> > 
> > This series supersedes the one at
> > 
> > https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> > 
> > It addresses some potential issues related to PCI device transitions from
> > low-power states into D0 and makes the related code more straightforward
> > and so easier to follow.
> > 
> > Please refer to the patch changelogs for details.
> 
> Here's a v2 of this patch series which is being sent, because I realized that
> one of the checks in pci_power_up()

This should be pci_set_low_power_state(), sorry for the confusion.

> added by patch [4/7] in v1 was redundant
> and can be dropped, but that affected the last 3 patches in the series and
> then I noticed that more improvements were possible and hence the new patches
> [2/9].
> 
> Thanks!
> 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state()
  2022-04-11 14:21   ` [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
@ 2022-04-12  9:42     ` Mika Westerberg
  2022-04-12 10:56       ` Rafael J. Wysocki
  0 siblings, 1 reply; 47+ messages in thread
From: Mika Westerberg @ 2022-04-12  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PCI, Linux PM, LKML, Bjorn Helgaas

On Mon, Apr 11, 2022 at 04:21:04PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Save one config space access in pci_update_current_state() by
> testing the retireved PCI_PM_CTRL register value against
              ^^^^^^^^^
retrieved

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-04-11 14:25   ` [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
@ 2022-04-12 10:00     ` Mika Westerberg
  2022-04-12 11:31       ` Rafael J. Wysocki
  0 siblings, 1 reply; 47+ messages in thread
From: Mika Westerberg @ 2022-04-12 10:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PCI, Linux PM, LKML, Bjorn Helgaas

On Mon, Apr 11, 2022 at 04:25:12PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> There are some issues related to changing power states of PCI
> devices, mostly related to carrying out unnecessary actions in some
> places, and the code is generally hard to follow.
> 
>  1. pci_power_up() has two callers, pci_set_power_state() and
>     pci_pm_default_resume_early().  The latter updates the current
>     power state of the device right after calling pci_power_up()
>     and it restores the entire config space of the device right
>     after that, so pci_power_up() itself need not read the
>     PCI_PM_CTRL register or restore the BARs after programming the
>     device into D0 in that case.
>  
>  2. It is generally hard to get a clear view of the pci_power_up()
>     code flow, especially in some corner cases, due to all of the
>     involved PCI_PM_CTRL register reads and writes occurring in
>     pci_platform_power_transition() and in pci_raw_set_power_state(),
>     some of which are redundant.
> 
>  3. The transitions from low-power states to D0 and the other way
>     around are unnecessarily tangled in pci_raw_set_power_state()
>     which causes it to use a redundant local variable and makes it
>     rather hard to follow.
> 
> To address the above shortcomings, make the following changes:
> 
>  a. Remove the code handling transitions into D0

Should this be D3?

>     from pci_raw_set_power_state() and rename it as
>     pci_set_low_power_state().
> 
>  b. Add the code handling transitions into D0 directly
>     to pci_power_up() and to a new wrapper function
>     pci_set_full_power_state() calling it internally that is
>     only used in pci_set_power_state().
> 
>  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
>     and make it work in the same way for transitions from any
>     low-power states (transitions from D1 and D2 are handled
>     slightly differently before the change).
> 
>  d. Put the restoration of the BARs and the PCI_PM_CTRL
>     register read confirming the power state change into
>     pci_set_full_power_state() to avoid doing that in
>     pci_pm_default_resume_early() unnecessarily.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v1 -> v2:
>    * Do not add a redundant check to pci_set_low_power_state().
> 
> ---
>  drivers/pci/pci.c |  154 +++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 101 insertions(+), 53 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci.c
> +++ linux-pm/drivers/pci/pci.c
> @@ -1068,10 +1068,9 @@ static inline bool platform_pci_bridge_d
>  }
>  
>  /**
> - * pci_raw_set_power_state - Use PCI PM registers to set the power state of
> - *			     given PCI device
> + * pci_set_low_power_state - Program the given device into a low-power state
>   * @dev: PCI device to handle.
> - * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
> + * @state: PCI power state (D1, D2, D3hot) to put the device into.
>   *
>   * RETURN VALUE:
>   * -EINVAL if the requested state is invalid.
> @@ -1080,10 +1079,9 @@ static inline bool platform_pci_bridge_d
>   * 0 if device already is in the requested state.
>   * 0 if device's power state has been successfully changed.
>   */
> -static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> +static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
>  {
>  	u16 pmcsr;
> -	bool need_restore = false;
>  
>  	/* Check if we're already there */
>  	if (dev->current_state == state)
> @@ -1092,7 +1090,7 @@ static int pci_raw_set_power_state(struc
>  	if (!dev->pm_cap)
>  		return -EIO;
>  
> -	if (state < PCI_D0 || state > PCI_D3hot)
> +	if (state < PCI_D1 || state > PCI_D3hot)
>  		return -EINVAL;
>  
>  	/*
> @@ -1101,8 +1099,7 @@ static int pci_raw_set_power_state(struc
>  	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
>  	 * we'd have to go from D3 to D0, then to D1.
>  	 */
> -	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
> -	    && dev->current_state > state) {
> +	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
>  		pci_err(dev, "invalid power transition (from %s to %s)\n",
>  			pci_power_name(dev->current_state),
>  			pci_power_name(state));
> @@ -1122,29 +1119,8 @@ static int pci_raw_set_power_state(struc
>  		return -EIO;
>  	}
>  
> -	/*
> -	 * If we're (effectively) in D3, force entire word to 0.
> -	 * This doesn't affect PME_Status, disables PME_En, and
> -	 * sets PowerState to 0.
> -	 */
> -	switch (dev->current_state) {
> -	case PCI_D0:
> -	case PCI_D1:
> -	case PCI_D2:
> -		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> -		pmcsr |= state;
> -		break;
> -	case PCI_D3hot:
> -	case PCI_D3cold:
> -	case PCI_UNKNOWN: /* Boot-up */
> -		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
> -		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
> -			need_restore = true;
> -		fallthrough;	/* force to D0 */
> -	default:
> -		pmcsr = 0;
> -		break;
> -	}
> +	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> +	pmcsr |= state;
>  
>  	/* Enter specified state */
>  	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
> @@ -1153,9 +1129,9 @@ static int pci_raw_set_power_state(struc
>  	 * Mandatory power management transition delays; see PCI PM 1.1
>  	 * 5.6.1 table 18
>  	 */
> -	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> +	if (state == PCI_D3hot)
>  		pci_dev_d3_sleep(dev);
> -	else if (state == PCI_D2 || dev->current_state == PCI_D2)
> +	else if (state == PCI_D2)
>  		udelay(PCI_PM_D2_DELAY);
>  
>  	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> @@ -1165,22 +1141,6 @@ static int pci_raw_set_power_state(struc
>  			 pci_power_name(dev->current_state),
>  			 pci_power_name(state));
>  
> -	/*
> -	 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
> -	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
> -	 * from D3hot to D0 _may_ perform an internal reset, thereby
> -	 * going to "D0 Uninitialized" rather than "D0 Initialized".
> -	 * For example, at least some versions of the 3c905B and the
> -	 * 3c556B exhibit this behaviour.
> -	 *
> -	 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
> -	 * devices in a D3hot state at boot.  Consequently, we need to
> -	 * restore at least the BARs so that the device will be
> -	 * accessible to its driver.
> -	 */
> -	if (need_restore)
> -		pci_restore_bars(dev);
> -
>  	if (dev->bus->self)
>  		pcie_aspm_pm_state_change(dev->bus->self);
>  
> @@ -1312,8 +1272,54 @@ static int pci_dev_wait(struct pci_dev *
>   */
>  int pci_power_up(struct pci_dev *dev)
>  {
> -	pci_platform_power_transition(dev, PCI_D0);
> -	return pci_raw_set_power_state(dev, PCI_D0);
> +	int ret;
> +
> +	ret = pci_platform_power_transition(dev, PCI_D0);
> +	if (ret) {

Here pci_platform_power_transition() returned an error so we go and read
back the PM_CTRL to check in which power state the device is in? Perhaps
add a comment here explaining why we need to do this?

> +		u16 pmcsr;
> +
> +		/*
> +		 * The PCI_PM_CTRL register has not been read above, so read it
> +		 * now and bail out if that fails.
> +		 */
> +		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +		if (PCI_POSSIBLE_ERROR(pmcsr)) {
> +			dev->current_state = PCI_D3cold;
> +			goto fail;
> +		}
> +		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> +	} else if (dev->current_state == PCI_D3cold) {
> +		/*
> +		 * Since current_state is PCI_D3cold here, the power state seen
> +		 * by the platform is still D3cold or the PCI_PM_CTRL register
> +		 * read in pci_update_current_state() has failed, so assume the
> +		 * device to be inaccessible.
> +		 */
> +		goto fail;
> +	}
> +
> +	/* There's nothing more to do if current_state is D0 at this point. */
> +	if (dev->current_state == PCI_D0)
> +		return 0;
> +
> +	/*
> +	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
> +	 * doesn't affect PME_Status, disables PME_En, and sets PowerState to 0)
> +	 * and wait for the prescribed amount of time.  Assume success.
> +	 */
> +	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
> +
> +	if (dev->current_state == PCI_D3hot)
> +		pci_dev_d3_sleep(dev);
> +	else if (dev->current_state == PCI_D2)
> +		udelay(PCI_PM_D2_DELAY);
> +
> +	dev->current_state = PCI_D0;
> +	return 0;
> +
> +fail:
> +	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
> +	return -ENODEV;
>  }
>  
>  /**
> @@ -1340,6 +1346,48 @@ void pci_bus_set_current_state(struct pc
>  		pci_walk_bus(bus, __pci_dev_set_current_state, &state);
>  }
>  

Probably good to add some sort of kernel-doc to explain when this
function should be used since we have pci_power_up() too (that this one
calls).

> +static int pci_set_full_power_state(struct pci_dev *dev)
> +{
> +	pci_power_t old_state = dev->current_state;
> +	u16 pmcsr;
> +	int ret;
> +
> +	ret = pci_power_up(dev);
> +	if (ret)
> +		return ret;
> +
> +	if (!dev->pm_cap)
> +		return 0;
> +
> +	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +
> +	dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> +	if (dev->current_state != PCI_D0) {
> +		pci_info_ratelimited(dev, "Refused to change power state from %s to D0\n",
> +				     pci_power_name(dev->current_state));
> +	} else if (old_state >= PCI_D3hot && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
> +		/*
> +		 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
> +		 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
> +		 * from D3hot to D0 _may_ perform an internal reset, thereby
> +		 * going to "D0 Uninitialized" rather than "D0 Initialized". For
> +		 * example, at least some versions of the 3c905B and the 3c556B
> +		 * exhibit this behaviour.
> +		 *
> +		 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
> +		 * devices in a D3hot state at boot. Consequently, we need to
> +		 * restore at least the BARs so that the device will be
> +		 * accessible to its driver.
> +		 */
> +		pci_restore_bars(dev);
> +	}
> +
> +	if (dev->bus->self)
> +		pcie_aspm_pm_state_change(dev->bus->self);
> +
> +	return 0;
> +}
> +
>  /**
>   * pci_set_power_state - Set the power state of a PCI device
>   * @dev: PCI device to handle.
> @@ -1381,7 +1429,7 @@ int pci_set_power_state(struct pci_dev *
>  		return 0;
>  
>  	if (state == PCI_D0)
> -		return pci_power_up(dev);
> +		return pci_set_full_power_state(dev);
>  
>  	/*
>  	 * This device is quirked not to be put into D3, so don't put it in
> @@ -1394,7 +1442,7 @@ int pci_set_power_state(struct pci_dev *
>  	 * To put device in D3cold, we put device into D3hot in native
>  	 * way, then put device into D3cold with platform ops
>  	 */
> -	error = pci_raw_set_power_state(dev, state > PCI_D3hot ?
> +	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
>  					PCI_D3hot : state);
>  
>  	if (pci_platform_power_transition(dev, state))
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (9 preceding siblings ...)
  2022-04-11 14:33   ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
@ 2022-04-12 10:08   ` Mika Westerberg
  2022-04-12 11:34     ` Rafael J. Wysocki
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
  11 siblings, 1 reply; 47+ messages in thread
From: Mika Westerberg @ 2022-04-12 10:08 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PCI, Linux PM, LKML, Bjorn Helgaas

Hi Rafael,

On Mon, Apr 11, 2022 at 04:17:41PM +0200, Rafael J. Wysocki wrote:
> Hi All,
> 
> On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> > Hi All,
> > 
> > This series supersedes the one at
> > 
> > https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> > 
> > It addresses some potential issues related to PCI device transitions from
> > low-power states into D0 and makes the related code more straightforward
> > and so easier to follow.
> > 
> > Please refer to the patch changelogs for details.
> 
> Here's a v2 of this patch series which is being sent, because I realized that
> one of the checks in pci_power_up() added by patch [4/7] in v1 was redundant
> and can be dropped, but that affected the last 3 patches in the series and
> then I noticed that more improvements were possible and hence the new patches
> [2/9].

I sent a few minor nits separately. The series looks good to me in
general and certainly improves readability :)

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state()
  2022-04-12  9:42     ` Mika Westerberg
@ 2022-04-12 10:56       ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-12 10:56 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas

On Tue, Apr 12, 2022 at 12:54 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> On Mon, Apr 11, 2022 at 04:21:04PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Save one config space access in pci_update_current_state() by
> > testing the retireved PCI_PM_CTRL register value against
>               ^^^^^^^^^
> retrieved

Yup, thanks!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-04-12 10:00     ` Mika Westerberg
@ 2022-04-12 11:31       ` Rafael J. Wysocki
  2022-04-12 11:38         ` Mika Westerberg
  0 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-12 11:31 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas

On Tue, Apr 12, 2022 at 1:17 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> On Mon, Apr 11, 2022 at 04:25:12PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > There are some issues related to changing power states of PCI
> > devices, mostly related to carrying out unnecessary actions in some
> > places, and the code is generally hard to follow.
> >
> >  1. pci_power_up() has two callers, pci_set_power_state() and
> >     pci_pm_default_resume_early().  The latter updates the current
> >     power state of the device right after calling pci_power_up()
> >     and it restores the entire config space of the device right
> >     after that, so pci_power_up() itself need not read the
> >     PCI_PM_CTRL register or restore the BARs after programming the
> >     device into D0 in that case.
> >
> >  2. It is generally hard to get a clear view of the pci_power_up()
> >     code flow, especially in some corner cases, due to all of the
> >     involved PCI_PM_CTRL register reads and writes occurring in
> >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> >     some of which are redundant.
> >
> >  3. The transitions from low-power states to D0 and the other way
> >     around are unnecessarily tangled in pci_raw_set_power_state()
> >     which causes it to use a redundant local variable and makes it
> >     rather hard to follow.
> >
> > To address the above shortcomings, make the following changes:
> >
> >  a. Remove the code handling transitions into D0
>
> Should this be D3?

No.  Transitions into D0 will be handled by pci_power_up() directly,
so they need not be handled by pci_raw_set_power_state().

> >     from pci_raw_set_power_state() and rename it as
> >     pci_set_low_power_state().
> >
> >  b. Add the code handling transitions into D0 directly
> >     to pci_power_up() and to a new wrapper function
> >     pci_set_full_power_state() calling it internally that is
> >     only used in pci_set_power_state().
> >
> >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> >     and make it work in the same way for transitions from any
> >     low-power states (transitions from D1 and D2 are handled
> >     slightly differently before the change).
> >
> >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> >     register read confirming the power state change into
> >     pci_set_full_power_state() to avoid doing that in
> >     pci_pm_default_resume_early() unnecessarily.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v1 -> v2:
> >    * Do not add a redundant check to pci_set_low_power_state().
> >
> > ---
> >  drivers/pci/pci.c |  154 +++++++++++++++++++++++++++++++++++-------------------
> >  1 file changed, 101 insertions(+), 53 deletions(-)
> >
> > Index: linux-pm/drivers/pci/pci.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pci.c
> > +++ linux-pm/drivers/pci/pci.c
> > @@ -1068,10 +1068,9 @@ static inline bool platform_pci_bridge_d
> >  }
> >
> >  /**
> > - * pci_raw_set_power_state - Use PCI PM registers to set the power state of
> > - *                        given PCI device
> > + * pci_set_low_power_state - Program the given device into a low-power state
> >   * @dev: PCI device to handle.
> > - * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
> > + * @state: PCI power state (D1, D2, D3hot) to put the device into.
> >   *
> >   * RETURN VALUE:
> >   * -EINVAL if the requested state is invalid.
> > @@ -1080,10 +1079,9 @@ static inline bool platform_pci_bridge_d
> >   * 0 if device already is in the requested state.
> >   * 0 if device's power state has been successfully changed.
> >   */
> > -static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> > +static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
> >  {
> >       u16 pmcsr;
> > -     bool need_restore = false;
> >
> >       /* Check if we're already there */
> >       if (dev->current_state == state)
> > @@ -1092,7 +1090,7 @@ static int pci_raw_set_power_state(struc
> >       if (!dev->pm_cap)
> >               return -EIO;
> >
> > -     if (state < PCI_D0 || state > PCI_D3hot)
> > +     if (state < PCI_D1 || state > PCI_D3hot)
> >               return -EINVAL;
> >
> >       /*
> > @@ -1101,8 +1099,7 @@ static int pci_raw_set_power_state(struc
> >        * we can go from D1 to D3, but we can't go directly from D3 to D1;
> >        * we'd have to go from D3 to D0, then to D1.
> >        */
> > -     if (state != PCI_D0 && dev->current_state <= PCI_D3cold
> > -         && dev->current_state > state) {
> > +     if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
> >               pci_err(dev, "invalid power transition (from %s to %s)\n",
> >                       pci_power_name(dev->current_state),
> >                       pci_power_name(state));
> > @@ -1122,29 +1119,8 @@ static int pci_raw_set_power_state(struc
> >               return -EIO;
> >       }
> >
> > -     /*
> > -      * If we're (effectively) in D3, force entire word to 0.
> > -      * This doesn't affect PME_Status, disables PME_En, and
> > -      * sets PowerState to 0.
> > -      */
> > -     switch (dev->current_state) {
> > -     case PCI_D0:
> > -     case PCI_D1:
> > -     case PCI_D2:
> > -             pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> > -             pmcsr |= state;
> > -             break;
> > -     case PCI_D3hot:
> > -     case PCI_D3cold:
> > -     case PCI_UNKNOWN: /* Boot-up */
> > -             if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
> > -              && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
> > -                     need_restore = true;
> > -             fallthrough;    /* force to D0 */
> > -     default:
> > -             pmcsr = 0;
> > -             break;
> > -     }
> > +     pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> > +     pmcsr |= state;
> >
> >       /* Enter specified state */
> >       pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
> > @@ -1153,9 +1129,9 @@ static int pci_raw_set_power_state(struc
> >        * Mandatory power management transition delays; see PCI PM 1.1
> >        * 5.6.1 table 18
> >        */
> > -     if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> > +     if (state == PCI_D3hot)
> >               pci_dev_d3_sleep(dev);
> > -     else if (state == PCI_D2 || dev->current_state == PCI_D2)
> > +     else if (state == PCI_D2)
> >               udelay(PCI_PM_D2_DELAY);
> >
> >       pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > @@ -1165,22 +1141,6 @@ static int pci_raw_set_power_state(struc
> >                        pci_power_name(dev->current_state),
> >                        pci_power_name(state));
> >
> > -     /*
> > -      * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
> > -      * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
> > -      * from D3hot to D0 _may_ perform an internal reset, thereby
> > -      * going to "D0 Uninitialized" rather than "D0 Initialized".
> > -      * For example, at least some versions of the 3c905B and the
> > -      * 3c556B exhibit this behaviour.
> > -      *
> > -      * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
> > -      * devices in a D3hot state at boot.  Consequently, we need to
> > -      * restore at least the BARs so that the device will be
> > -      * accessible to its driver.
> > -      */
> > -     if (need_restore)
> > -             pci_restore_bars(dev);
> > -
> >       if (dev->bus->self)
> >               pcie_aspm_pm_state_change(dev->bus->self);
> >
> > @@ -1312,8 +1272,54 @@ static int pci_dev_wait(struct pci_dev *
> >   */
> >  int pci_power_up(struct pci_dev *dev)
> >  {
> > -     pci_platform_power_transition(dev, PCI_D0);
> > -     return pci_raw_set_power_state(dev, PCI_D0);
> > +     int ret;
> > +
> > +     ret = pci_platform_power_transition(dev, PCI_D0);
> > +     if (ret) {
>
> Here pci_platform_power_transition() returned an error so we go and read
> back the PM_CTRL to check in which power state the device is in? Perhaps
> add a comment here explaining why we need to do this?

That's the comment below, but I gather it is insufficient as is.
Please let me know if rephrasing it this way would help:

"Since pci_platform_power_transition() has returned an error, the
PCI_PM_CTRL register has not been read by it and the current power
state of the device is unknown. Read the PCI_PM_CTRL register now and
bail out if that fails."

And I've just realized that pm_cap should be checked here, because it
is not guaranteed to be set.

> > +             u16 pmcsr;
> > +
> > +             /*
> > +              * The PCI_PM_CTRL register has not been read above, so read it
> > +              * now and bail out if that fails.
> > +              */
> > +             pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > +             if (PCI_POSSIBLE_ERROR(pmcsr)) {
> > +                     dev->current_state = PCI_D3cold;
> > +                     goto fail;
> > +             }
> > +             dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> > +     } else if (dev->current_state == PCI_D3cold) {
> > +             /*
> > +              * Since current_state is PCI_D3cold here, the power state seen
> > +              * by the platform is still D3cold or the PCI_PM_CTRL register
> > +              * read in pci_update_current_state() has failed, so assume the
> > +              * device to be inaccessible.
> > +              */
> > +             goto fail;
> > +     }
> > +
> > +     /* There's nothing more to do if current_state is D0 at this point. */
> > +     if (dev->current_state == PCI_D0)
> > +             return 0;
> > +
> > +     /*
> > +      * Program the device into PCI_D0 by forcing the entire word to 0 (this
> > +      * doesn't affect PME_Status, disables PME_En, and sets PowerState to 0)
> > +      * and wait for the prescribed amount of time.  Assume success.
> > +      */
> > +     pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
> > +
> > +     if (dev->current_state == PCI_D3hot)
> > +             pci_dev_d3_sleep(dev);
> > +     else if (dev->current_state == PCI_D2)
> > +             udelay(PCI_PM_D2_DELAY);
> > +
> > +     dev->current_state = PCI_D0;
> > +     return 0;
> > +
> > +fail:
> > +     pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
> > +     return -ENODEV;
> >  }
> >
> >  /**
> > @@ -1340,6 +1346,48 @@ void pci_bus_set_current_state(struct pc
> >               pci_walk_bus(bus, __pci_dev_set_current_state, &state);
> >  }
> >
>
> Probably good to add some sort of kernel-doc to explain when this
> function should be used since we have pci_power_up() too (that this one
> calls).

OK

> > +static int pci_set_full_power_state(struct pci_dev *dev)
> > +{
> > +     pci_power_t old_state = dev->current_state;
> > +     u16 pmcsr;
> > +     int ret;
> > +
> > +     ret = pci_power_up(dev);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (!dev->pm_cap)
> > +             return 0;
> > +
> > +     pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > +
> > +     dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
> > +     if (dev->current_state != PCI_D0) {
> > +             pci_info_ratelimited(dev, "Refused to change power state from %s to D0\n",
> > +                                  pci_power_name(dev->current_state));
> > +     } else if (old_state >= PCI_D3hot && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
> > +             /*
> > +              * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
> > +              * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
> > +              * from D3hot to D0 _may_ perform an internal reset, thereby
> > +              * going to "D0 Uninitialized" rather than "D0 Initialized". For
> > +              * example, at least some versions of the 3c905B and the 3c556B
> > +              * exhibit this behaviour.
> > +              *
> > +              * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
> > +              * devices in a D3hot state at boot. Consequently, we need to
> > +              * restore at least the BARs so that the device will be
> > +              * accessible to its driver.
> > +              */
> > +             pci_restore_bars(dev);
> > +     }
> > +
> > +     if (dev->bus->self)
> > +             pcie_aspm_pm_state_change(dev->bus->self);
> > +
> > +     return 0;
> > +}
> > +
> >  /**
> >   * pci_set_power_state - Set the power state of a PCI device
> >   * @dev: PCI device to handle.
> > @@ -1381,7 +1429,7 @@ int pci_set_power_state(struct pci_dev *
> >               return 0;
> >
> >       if (state == PCI_D0)
> > -             return pci_power_up(dev);
> > +             return pci_set_full_power_state(dev);
> >
> >       /*
> >        * This device is quirked not to be put into D3, so don't put it in
> > @@ -1394,7 +1442,7 @@ int pci_set_power_state(struct pci_dev *
> >        * To put device in D3cold, we put device into D3hot in native
> >        * way, then put device into D3cold with platform ops
> >        */
> > -     error = pci_raw_set_power_state(dev, state > PCI_D3hot ?
> > +     error = pci_set_low_power_state(dev, state > PCI_D3hot ?
> >                                       PCI_D3hot : state);
> >
> >       if (pci_platform_power_transition(dev, state))
> >
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-12 10:08   ` Mika Westerberg
@ 2022-04-12 11:34     ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-12 11:34 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas

Hi Mika,

On Tue, Apr 12, 2022 at 1:24 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Hi Rafael,
>
> On Mon, Apr 11, 2022 at 04:17:41PM +0200, Rafael J. Wysocki wrote:
> > Hi All,
> >
> > On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> > > Hi All,
> > >
> > > This series supersedes the one at
> > >
> > > https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> > >
> > > It addresses some potential issues related to PCI device transitions from
> > > low-power states into D0 and makes the related code more straightforward
> > > and so easier to follow.
> > >
> > > Please refer to the patch changelogs for details.
> >
> > Here's a v2 of this patch series which is being sent, because I realized that
> > one of the checks in pci_power_up() added by patch [4/7] in v1 was redundant
> > and can be dropped, but that affected the last 3 patches in the series and
> > then I noticed that more improvements were possible and hence the new patches
> > [2/9].
>
> I sent a few minor nits separately. The series looks good to me in
> general and certainly improves readability :)
>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Thank you!

I'll address the issues that you have pointed out and send a v3 of
this series later this week.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-04-12 11:31       ` Rafael J. Wysocki
@ 2022-04-12 11:38         ` Mika Westerberg
  0 siblings, 0 replies; 47+ messages in thread
From: Mika Westerberg @ 2022-04-12 11:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas

Hi,

On Tue, Apr 12, 2022 at 01:31:57PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 12, 2022 at 1:17 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > On Mon, Apr 11, 2022 at 04:25:12PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> >
> > Should this be D3?
> 
> No.  Transitions into D0 will be handled by pci_power_up() directly,
> so they need not be handled by pci_raw_set_power_state().

OK.

> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > v1 -> v2:
> > >    * Do not add a redundant check to pci_set_low_power_state().
> > >
> > > ---
> > >  drivers/pci/pci.c |  154 +++++++++++++++++++++++++++++++++++-------------------
> > >  1 file changed, 101 insertions(+), 53 deletions(-)
> > >
> > > Index: linux-pm/drivers/pci/pci.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/pci/pci.c
> > > +++ linux-pm/drivers/pci/pci.c
> > > @@ -1068,10 +1068,9 @@ static inline bool platform_pci_bridge_d
> > >  }
> > >
> > >  /**
> > > - * pci_raw_set_power_state - Use PCI PM registers to set the power state of
> > > - *                        given PCI device
> > > + * pci_set_low_power_state - Program the given device into a low-power state
> > >   * @dev: PCI device to handle.
> > > - * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
> > > + * @state: PCI power state (D1, D2, D3hot) to put the device into.
> > >   *
> > >   * RETURN VALUE:
> > >   * -EINVAL if the requested state is invalid.
> > > @@ -1080,10 +1079,9 @@ static inline bool platform_pci_bridge_d
> > >   * 0 if device already is in the requested state.
> > >   * 0 if device's power state has been successfully changed.
> > >   */
> > > -static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > +static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
> > >  {
> > >       u16 pmcsr;
> > > -     bool need_restore = false;
> > >
> > >       /* Check if we're already there */
> > >       if (dev->current_state == state)
> > > @@ -1092,7 +1090,7 @@ static int pci_raw_set_power_state(struc
> > >       if (!dev->pm_cap)
> > >               return -EIO;
> > >
> > > -     if (state < PCI_D0 || state > PCI_D3hot)
> > > +     if (state < PCI_D1 || state > PCI_D3hot)
> > >               return -EINVAL;
> > >
> > >       /*
> > > @@ -1101,8 +1099,7 @@ static int pci_raw_set_power_state(struc
> > >        * we can go from D1 to D3, but we can't go directly from D3 to D1;
> > >        * we'd have to go from D3 to D0, then to D1.
> > >        */
> > > -     if (state != PCI_D0 && dev->current_state <= PCI_D3cold
> > > -         && dev->current_state > state) {
> > > +     if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
> > >               pci_err(dev, "invalid power transition (from %s to %s)\n",
> > >                       pci_power_name(dev->current_state),
> > >                       pci_power_name(state));
> > > @@ -1122,29 +1119,8 @@ static int pci_raw_set_power_state(struc
> > >               return -EIO;
> > >       }
> > >
> > > -     /*
> > > -      * If we're (effectively) in D3, force entire word to 0.
> > > -      * This doesn't affect PME_Status, disables PME_En, and
> > > -      * sets PowerState to 0.
> > > -      */
> > > -     switch (dev->current_state) {
> > > -     case PCI_D0:
> > > -     case PCI_D1:
> > > -     case PCI_D2:
> > > -             pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> > > -             pmcsr |= state;
> > > -             break;
> > > -     case PCI_D3hot:
> > > -     case PCI_D3cold:
> > > -     case PCI_UNKNOWN: /* Boot-up */
> > > -             if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
> > > -              && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
> > > -                     need_restore = true;
> > > -             fallthrough;    /* force to D0 */
> > > -     default:
> > > -             pmcsr = 0;
> > > -             break;
> > > -     }
> > > +     pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> > > +     pmcsr |= state;
> > >
> > >       /* Enter specified state */
> > >       pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
> > > @@ -1153,9 +1129,9 @@ static int pci_raw_set_power_state(struc
> > >        * Mandatory power management transition delays; see PCI PM 1.1
> > >        * 5.6.1 table 18
> > >        */
> > > -     if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> > > +     if (state == PCI_D3hot)
> > >               pci_dev_d3_sleep(dev);
> > > -     else if (state == PCI_D2 || dev->current_state == PCI_D2)
> > > +     else if (state == PCI_D2)
> > >               udelay(PCI_PM_D2_DELAY);
> > >
> > >       pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > > @@ -1165,22 +1141,6 @@ static int pci_raw_set_power_state(struc
> > >                        pci_power_name(dev->current_state),
> > >                        pci_power_name(state));
> > >
> > > -     /*
> > > -      * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
> > > -      * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
> > > -      * from D3hot to D0 _may_ perform an internal reset, thereby
> > > -      * going to "D0 Uninitialized" rather than "D0 Initialized".
> > > -      * For example, at least some versions of the 3c905B and the
> > > -      * 3c556B exhibit this behaviour.
> > > -      *
> > > -      * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
> > > -      * devices in a D3hot state at boot.  Consequently, we need to
> > > -      * restore at least the BARs so that the device will be
> > > -      * accessible to its driver.
> > > -      */
> > > -     if (need_restore)
> > > -             pci_restore_bars(dev);
> > > -
> > >       if (dev->bus->self)
> > >               pcie_aspm_pm_state_change(dev->bus->self);
> > >
> > > @@ -1312,8 +1272,54 @@ static int pci_dev_wait(struct pci_dev *
> > >   */
> > >  int pci_power_up(struct pci_dev *dev)
> > >  {
> > > -     pci_platform_power_transition(dev, PCI_D0);
> > > -     return pci_raw_set_power_state(dev, PCI_D0);
> > > +     int ret;
> > > +
> > > +     ret = pci_platform_power_transition(dev, PCI_D0);
> > > +     if (ret) {
> >
> > Here pci_platform_power_transition() returned an error so we go and read
> > back the PM_CTRL to check in which power state the device is in? Perhaps
> > add a comment here explaining why we need to do this?
> 
> That's the comment below, but I gather it is insufficient as is.
> Please let me know if rephrasing it this way would help:
> 
> "Since pci_platform_power_transition() has returned an error, the
> PCI_PM_CTRL register has not been read by it and the current power
> state of the device is unknown. Read the PCI_PM_CTRL register now and
> bail out if that fails."

Yes, that's better, thanks!

> And I've just realized that pm_cap should be checked here, because it
> is not guaranteed to be set.

Good point.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
                     ` (10 preceding siblings ...)
  2022-04-12 10:08   ` Mika Westerberg
@ 2022-04-14 13:00   ` Rafael J. Wysocki
  2022-04-14 13:04     ` [PATCH v3 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
                       ` (9 more replies)
  11 siblings, 10 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:00 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

On Monday, April 11, 2022 4:17:41 PM CEST Rafael J. Wysocki wrote:
> Hi All,
> 
> On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> > Hi All,
> > 
> > This series supersedes the one at
> > 
> > https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> > 
> > It addresses some potential issues related to PCI device transitions from
> > low-power states into D0 and makes the related code more straightforward
> > and so easier to follow.
> > 
> > Please refer to the patch changelogs for details.
> 
> Here's a v2 of this patch series which is being sent, because I realized that
> one of the checks in pci_power_up() added by patch [4/7] in v1 was redundant
> and can be dropped, but that affected the last 3 patches in the series and
> then I noticed that more improvements were possible and hence the new patches
> [2/9].

Here's a v3 of this series with some minor review comments addressed and R-by
tags from Mika added.

Thanks!




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 1/9] PCI/PM: Resume subordinate bus in bus type callbacks
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
@ 2022-04-14 13:04     ` Rafael J. Wysocki
  2022-04-14 13:04     ` [PATCH v3 2/9] PCI/PM: Drop the runtime_d3cold device flag Rafael J. Wysocki
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:04 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Calling pci_resume_bus() on the secondary bus from pci_power_up() as
it is done now is questionable, because it depends on the mandatory
bridge power-up delays that are only covered by the PCI bus type PM
callbacks.

For this reason, move the subordinate bus resume to those callbacks
too and use the observation that if a bridge is being powered-up
during resume from system-wide suspend, it may be still desirable
to runtime-resume its subordinate bus after completing the system-
wide transition (in case the resume of the devices on that bus is
skipped during it).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v1 -> v3:
   * Added R-by from Mika.

---
 drivers/pci/pci-driver.c |   15 +++++++++++++--
 drivers/pci/pci.c        |   15 ---------------
 2 files changed, 13 insertions(+), 17 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -559,6 +559,17 @@ static void pci_pm_default_resume_early(
 	pci_pme_restore(pci_dev);
 }
 
+static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
+{
+	pci_bridge_wait_for_secondary_bus(pci_dev);
+	/*
+	 * When powering on a bridge from D3cold, the whole hierarchy may be
+	 * powered on into D0uninitialized state, resume them to give them a
+	 * chance to suspend again
+	 */
+	pci_resume_bus(pci_dev->subordinate);
+}
+
 #endif
 
 #ifdef CONFIG_PM_SLEEP
@@ -934,7 +945,7 @@ static int pci_pm_resume_noirq(struct de
 	pcie_pme_root_status_cleanup(pci_dev);
 
 	if (!skip_bus_pm && prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_pm_bridge_power_up_actions(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return 0;
@@ -1321,7 +1332,7 @@ static int pci_pm_runtime_resume(struct
 	pci_pm_default_resume(pci_dev);
 
 	if (prev_state == PCI_D3cold)
-		pci_bridge_wait_for_secondary_bus(pci_dev);
+		pci_pm_bridge_power_up_actions(pci_dev);
 
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1310,21 +1310,6 @@ static int pci_dev_wait(struct pci_dev *
 int pci_power_up(struct pci_dev *dev)
 {
 	pci_platform_power_transition(dev, PCI_D0);
-
-	/*
-	 * Mandatory power management transition delays are handled in
-	 * pci_pm_resume_noirq() and pci_pm_runtime_resume() of the
-	 * corresponding bridge.
-	 */
-	if (dev->runtime_d3cold) {
-		/*
-		 * When powering on a bridge from D3cold, the whole hierarchy
-		 * may be powered on into D0uninitialized state, resume them to
-		 * give them a chance to suspend again
-		 */
-		pci_resume_bus(dev->subordinate);
-	}
-
 	return pci_raw_set_power_state(dev, PCI_D0);
 }
 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 2/9] PCI/PM: Drop the runtime_d3cold device flag
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
  2022-04-14 13:04     ` [PATCH v3 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
@ 2022-04-14 13:04     ` Rafael J. Wysocki
  2022-04-14 13:07     ` [PATCH v3 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:04 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

This flag is not needed any more, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v1 -> v3:
   * Added R-by from Mika.

---
 drivers/pci/pci-driver.c |    2 --
 drivers/pci/pci.c        |    3 ---
 include/linux/pci.h      |    4 ----
 3 files changed, 9 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1337,8 +1337,6 @@ static int pci_pm_runtime_resume(struct
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
 
-	pci_dev->runtime_d3cold = false;
-
 	return error;
 }
 
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -2703,8 +2703,6 @@ int pci_finish_runtime_suspend(struct pc
 	if (target_state == PCI_POWER_ERROR)
 		return -EIO;
 
-	dev->runtime_d3cold = target_state == PCI_D3cold;
-
 	/*
 	 * There are systems (for example, Intel mobile chips since Coffee
 	 * Lake) where the power drawn while suspended can be significantly
@@ -2722,7 +2720,6 @@ int pci_finish_runtime_suspend(struct pc
 	if (error) {
 		pci_enable_wake(dev, target_state, false);
 		pci_restore_ptm_state(dev);
-		dev->runtime_d3cold = false;
 	}
 
 	return error;
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -379,10 +379,6 @@ struct pci_dev {
 	unsigned int	mmio_always_on:1;	/* Disallow turning off io/mem
 						   decoding during BAR sizing */
 	unsigned int	wakeup_prepared:1;
-	unsigned int	runtime_d3cold:1;	/* Whether go through runtime
-						   D3cold, not set for devices
-						   powered on/off by the
-						   corresponding bridge */
 	unsigned int	skip_bus_pm:1;	/* Internal: Skip bus-level PM */
 	unsigned int	ignore_hotplug:1;	/* Ignore hotplug events */
 	unsigned int	hotplug_user_indicators:1; /* SlotCtl indicators




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 3/9] PCI/PM: Rearrange pci_update_current_state()
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
  2022-04-14 13:04     ` [PATCH v3 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
  2022-04-14 13:04     ` [PATCH v3 2/9] PCI/PM: Drop the runtime_d3cold device flag Rafael J. Wysocki
@ 2022-04-14 13:07     ` Rafael J. Wysocki
  2022-04-14 13:11     ` [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:07 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Save one config space access in pci_update_current_state() by
testing the retireved PCI_PM_CTRL register value against
PCI_POSSIBLE_ERROR() instead of invoking pci_device_is_present()
separately.

While at it, drop a pair of unnecessary parens.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3:
   * Fixed typo in the changelog ("retrieved" spelling).
   * Added R-by from Mika.

---
 drivers/pci/pci.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1201,14 +1201,17 @@ static int pci_raw_set_power_state(struc
  */
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
 {
-	if (platform_pci_get_power_state(dev) == PCI_D3cold ||
-	    !pci_device_is_present(dev)) {
+	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
 		dev->current_state = PCI_D3cold;
 	} else if (dev->pm_cap) {
 		u16 pmcsr;
 
 		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			return;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 	} else {
 		dev->current_state = state;
 	}




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (2 preceding siblings ...)
  2022-04-14 13:07     ` [PATCH v3 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
@ 2022-04-14 13:11     ` Rafael J. Wysocki
  2022-05-03 17:59       ` Nathan Chancellor
  2022-04-14 13:14     ` [PATCH v3 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
                       ` (5 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:11 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are some issues related to changing power states of PCI
devices, mostly related to carrying out unnecessary actions in some
places, and the code is generally hard to follow.

 1. pci_power_up() has two callers, pci_set_power_state() and
    pci_pm_default_resume_early().  The latter updates the current
    power state of the device right after calling pci_power_up()
    and it restores the entire config space of the device right
    after that, so pci_power_up() itself need not read the
    PCI_PM_CTRL register or restore the BARs after programming the
    device into D0 in that case.
 
 2. It is generally hard to get a clear view of the pci_power_up()
    code flow, especially in some corner cases, due to all of the
    involved PCI_PM_CTRL register reads and writes occurring in
    pci_platform_power_transition() and in pci_raw_set_power_state(),
    some of which are redundant.

 3. The transitions from low-power states to D0 and the other way
    around are unnecessarily tangled in pci_raw_set_power_state()
    which causes it to use a redundant local variable and makes it
    rather hard to follow.

To address the above shortcomings, make the following changes:

 a. Remove the code handling transitions into D0
    from pci_raw_set_power_state() and rename it as
    pci_set_low_power_state().

 b. Add the code handling transitions into D0 directly
    to pci_power_up() and to a new wrapper function
    pci_set_full_power_state() calling it internally that is
    only used in pci_set_power_state().

 c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
    and make it work in the same way for transitions from any
    low-power states (transitions from D1 and D2 are handled
    slightly differently before the change).

 d. Put the restoration of the BARs and the PCI_PM_CTRL
    register read confirming the power state change into
    pci_set_full_power_state() to avoid doing that in
    pci_pm_default_resume_early() unnecessarily.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v2 -> v3:
   * Make pci_power_up() check dev->pm_cap.
   * Rephrase a comment in pci_power_up() (Mika).
   * Add kerneldoc comment to pci_set_full_power_state() (Mika).
   * Add R-by from Mika.

v1 -> v2:
   * Do not add a redundant check to pci_set_low_power_state().

---
 drivers/pci/pci.c |  175 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 122 insertions(+), 53 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,10 +1068,9 @@ static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_raw_set_power_state - Use PCI PM registers to set the power state of
- *			     given PCI device
+ * pci_set_low_power_state - Program the given device into a low-power state
  * @dev: PCI device to handle.
- * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -1080,10 +1079,9 @@ static inline bool platform_pci_bridge_d
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
-	bool need_restore = false;
 
 	/* Check if we're already there */
 	if (dev->current_state == state)
@@ -1092,7 +1090,7 @@ static int pci_raw_set_power_state(struc
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D0 || state > PCI_D3hot)
+	if (state < PCI_D1 || state > PCI_D3hot)
 		return -EINVAL;
 
 	/*
@@ -1101,8 +1099,7 @@ static int pci_raw_set_power_state(struc
 	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
-	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
-	    && dev->current_state > state) {
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
 		pci_err(dev, "invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
@@ -1122,29 +1119,8 @@ static int pci_raw_set_power_state(struc
 		return -EIO;
 	}
 
-	/*
-	 * If we're (effectively) in D3, force entire word to 0.
-	 * This doesn't affect PME_Status, disables PME_En, and
-	 * sets PowerState to 0.
-	 */
-	switch (dev->current_state) {
-	case PCI_D0:
-	case PCI_D1:
-	case PCI_D2:
-		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-		pmcsr |= state;
-		break;
-	case PCI_D3hot:
-	case PCI_D3cold:
-	case PCI_UNKNOWN: /* Boot-up */
-		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
-			need_restore = true;
-		fallthrough;	/* force to D0 */
-	default:
-		pmcsr = 0;
-		break;
-	}
+	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+	pmcsr |= state;
 
 	/* Enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
@@ -1153,9 +1129,9 @@ static int pci_raw_set_power_state(struc
 	 * Mandatory power management transition delays; see PCI PM 1.1
 	 * 5.6.1 table 18
 	 */
-	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+	if (state == PCI_D3hot)
 		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2 || dev->current_state == PCI_D2)
+	else if (state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
@@ -1165,22 +1141,6 @@ static int pci_raw_set_power_state(struc
 			 pci_power_name(dev->current_state),
 			 pci_power_name(state));
 
-	/*
-	 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
-	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
-	 * from D3hot to D0 _may_ perform an internal reset, thereby
-	 * going to "D0 Uninitialized" rather than "D0 Initialized".
-	 * For example, at least some versions of the 3c905B and the
-	 * 3c556B exhibit this behaviour.
-	 *
-	 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
-	 * devices in a D3hot state at boot.  Consequently, we need to
-	 * restore at least the BARs so that the device will be
-	 * accessible to its driver.
-	 */
-	if (need_restore)
-		pci_restore_bars(dev);
-
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
@@ -1312,8 +1272,63 @@ static int pci_dev_wait(struct pci_dev *
  */
 int pci_power_up(struct pci_dev *dev)
 {
-	pci_platform_power_transition(dev, PCI_D0);
-	return pci_raw_set_power_state(dev, PCI_D0);
+	int ret;
+
+	ret = pci_platform_power_transition(dev, PCI_D0);
+	if (ret) {
+		u16 pmcsr;
+
+		/*
+		 * If native PM is not supported, pass the error returned by
+		 * pci_platform_power_transition() to the caller.
+		 */
+		if (!dev->pm_cap)
+			return ret;
+
+		/*
+		 * Since pci_platform_power_transition() has returned an error,
+		 * the PCI_PM_CTRL register has not been read by it and the
+		 * current power state of the device is unknown.  Read the
+		 * PCI_PM_CTRL register now and bail out if that fails.
+		 */
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			goto fail;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	} else if (dev->current_state == PCI_D3cold) {
+		/*
+		 * Since current_state is PCI_D3cold here, the power state seen
+		 * by the platform is still D3cold or the PCI_PM_CTRL register
+		 * read in pci_update_current_state() has failed, so assume the
+		 * device to be inaccessible.
+		 */
+		goto fail;
+	}
+
+	/* There's nothing more to do if current_state is D0 at this point. */
+	if (dev->current_state == PCI_D0)
+		return 0;
+
+	/*
+	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
+	 * doesn't affect PME_Status, disables PME_En, and sets PowerState to 0)
+	 * and wait for the prescribed amount of time.  Assume success.
+	 */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
+
+	if (dev->current_state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (dev->current_state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	dev->current_state = PCI_D0;
+	return 0;
+
+fail:
+	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
+	return -ENODEV;
 }
 
 /**
@@ -1341,6 +1356,60 @@ void pci_bus_set_current_state(struct pc
 }
 
 /**
+ * pci_set_full_power_state - Put a PCI device into D0 and update its state
+ * @dev: PCI device to power up
+ *
+ * Call pci_power_up() to put @dev into D0, read from its PCI_PM_CTRL register
+ * to confirm the state change, restore its BARs if they might be lost and
+ * reconfigure ASPM in acordance with the new power state.
+ *
+ * If pci_restore_state() is going to be called right after a power state change
+ * to D0, it is more efficient to use pci_power_up() directly instead of this
+ * function.
+ */
+static int pci_set_full_power_state(struct pci_dev *dev)
+{
+	pci_power_t old_state = dev->current_state;
+	u16 pmcsr;
+	int ret;
+
+	ret = pci_power_up(dev);
+	if (ret)
+		return ret;
+
+	if (!dev->pm_cap)
+		return 0;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	if (dev->current_state != PCI_D0) {
+		pci_info_ratelimited(dev, "Refused to change power state from %s to D0\n",
+				     pci_power_name(dev->current_state));
+	} else if (old_state >= PCI_D3hot && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		/*
+		 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
+		 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
+		 * from D3hot to D0 _may_ perform an internal reset, thereby
+		 * going to "D0 Uninitialized" rather than "D0 Initialized". For
+		 * example, at least some versions of the 3c905B and the 3c556B
+		 * exhibit this behaviour.
+		 *
+		 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
+		 * devices in a D3hot state at boot. Consequently, we need to
+		 * restore at least the BARs so that the device will be
+		 * accessible to its driver.
+		 */
+		pci_restore_bars(dev);
+	}
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
+/**
  * pci_set_power_state - Set the power state of a PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
@@ -1381,7 +1450,7 @@ int pci_set_power_state(struct pci_dev *
 		return 0;
 
 	if (state == PCI_D0)
-		return pci_power_up(dev);
+		return pci_set_full_power_state(dev);
 
 	/*
 	 * This device is quirked not to be put into D3, so don't put it in
@@ -1394,7 +1463,7 @@ int pci_set_power_state(struct pci_dev *
 	 * To put device in D3cold, we put device into D3hot in native
 	 * way, then put device into D3cold with platform ops
 	 */
-	error = pci_raw_set_power_state(dev, state > PCI_D3hot ?
+	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
 					PCI_D3hot : state);
 
 	if (pci_platform_power_transition(dev, state))




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (3 preceding siblings ...)
  2022-04-14 13:11     ` [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
@ 2022-04-14 13:14     ` Rafael J. Wysocki
  2022-04-14 13:15     ` [PATCH v3 6/9] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:14 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Because pci_set_power_state() is the only caller of
pci_set_low_power_state(), move the latter next to the former.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v1 -> v3:
   * Added R-by from Mika.

---
 drivers/pci/pci.c |  160 +++++++++++++++++++++++++++---------------------------
 1 file changed, 80 insertions(+), 80 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,86 +1068,6 @@ static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_set_low_power_state - Program the given device into a low-power state
- * @dev: PCI device to handle.
- * @state: PCI power state (D1, D2, D3hot) to put the device into.
- *
- * RETURN VALUE:
- * -EINVAL if the requested state is invalid.
- * -EIO if device does not support PCI PM or its PM capabilities register has a
- * wrong version, or device doesn't support the requested state.
- * 0 if device already is in the requested state.
- * 0 if device's power state has been successfully changed.
- */
-static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
-{
-	u16 pmcsr;
-
-	/* Check if we're already there */
-	if (dev->current_state == state)
-		return 0;
-
-	if (!dev->pm_cap)
-		return -EIO;
-
-	if (state < PCI_D1 || state > PCI_D3hot)
-		return -EINVAL;
-
-	/*
-	 * Validate transition: We can enter D0 from any state, but if
-	 * we're already in a low-power state, we can only go deeper.  E.g.,
-	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
-	 * we'd have to go from D3 to D0, then to D1.
-	 */
-	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
-		pci_err(dev, "invalid power transition (from %s to %s)\n",
-			pci_power_name(dev->current_state),
-			pci_power_name(state));
-		return -EINVAL;
-	}
-
-	/* Check if this device supports the desired state */
-	if ((state == PCI_D1 && !dev->d1_support)
-	   || (state == PCI_D2 && !dev->d2_support))
-		return -EIO;
-
-	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-	if (PCI_POSSIBLE_ERROR(pmcsr)) {
-		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
-			pci_power_name(dev->current_state),
-			pci_power_name(state));
-		return -EIO;
-	}
-
-	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-	pmcsr |= state;
-
-	/* Enter specified state */
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
-
-	/*
-	 * Mandatory power management transition delays; see PCI PM 1.1
-	 * 5.6.1 table 18
-	 */
-	if (state == PCI_D3hot)
-		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2)
-		udelay(PCI_PM_D2_DELAY);
-
-	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
-	dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
-	if (dev->current_state != state)
-		pci_info_ratelimited(dev, "refused to change power state from %s to %s\n",
-			 pci_power_name(dev->current_state),
-			 pci_power_name(state));
-
-	if (dev->bus->self)
-		pcie_aspm_pm_state_change(dev->bus->self);
-
-	return 0;
-}
-
-/**
  * pci_update_current_state - Read power state of given device and cache it
  * @dev: PCI device to handle.
  * @state: State to cache in case the device doesn't have the PM capability
@@ -1405,6 +1325,86 @@ static int pci_set_full_power_state(stru
 
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
+/**
+ * pci_set_low_power_state - Program the given device into a low-power state
+ * @dev: PCI device to handle.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
+ *
+ * RETURN VALUE:
+ * -EINVAL if the requested state is invalid.
+ * -EIO if device does not support PCI PM or its PM capabilities register has a
+ * wrong version, or device doesn't support the requested state.
+ * 0 if device already is in the requested state.
+ * 0 if device's power state has been successfully changed.
+ */
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
+{
+	u16 pmcsr;
+
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (!dev->pm_cap)
+		return -EIO;
+
+	if (state < PCI_D1 || state > PCI_D3hot)
+		return -EINVAL;
+
+	/*
+	 * Validate transition: We can enter D0 from any state, but if
+	 * we're already in a low-power state, we can only go deeper.  E.g.,
+	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
+	 * we'd have to go from D3 to D0, then to D1.
+	 */
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
+		pci_err(dev, "invalid power transition (from %s to %s)\n",
+			pci_power_name(dev->current_state),
+			pci_power_name(state));
+		return -EINVAL;
+	}
+
+	/* Check if this device supports the desired state */
+	if ((state == PCI_D1 && !dev->d1_support)
+	   || (state == PCI_D2 && !dev->d2_support))
+		return -EIO;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+	if (PCI_POSSIBLE_ERROR(pmcsr)) {
+		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
+			pci_power_name(dev->current_state),
+			pci_power_name(state));
+		return -EIO;
+	}
+
+	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+	pmcsr |= state;
+
+	/* Enter specified state */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+
+	/*
+	 * Mandatory power management transition delays; see PCI PM 1.1
+	 * 5.6.1 table 18
+	 */
+	if (state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+	dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+	if (dev->current_state != state)
+		pci_info_ratelimited(dev, "refused to change power state from %s to %s\n",
+			 pci_power_name(dev->current_state),
+			 pci_power_name(state));
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 6/9] PCI/PM: Clean up pci_set_low_power_state()
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (4 preceding siblings ...)
  2022-04-14 13:14     ` [PATCH v3 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
@ 2022-04-14 13:15     ` Rafael J. Wysocki
  2022-04-14 13:17     ` [PATCH v3 7/9] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:15 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the following assorted non-essential changes in
pci_set_low_power_state():

 1. Drop two redundant checks from it (the caller takes care of these
    conditions).

 2. Modify error messages printed by it to make them more consistent
    with the messages printed by pci_power_up() and
    pci_set_full_power_state().

 3. Change the log level of one of the messages to "debug", because
    it only indicates a programming mistake.

 4. Make it return -ENODEV (instead of -EIO) when the device is not
    accessible.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v1 -> v3:
   * Added R-by from Mika.

---
 drivers/pci/pci.c |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1345,16 +1345,9 @@ static int pci_set_low_power_state(struc
 {
 	u16 pmcsr;
 
-	/* Check if we're already there */
-	if (dev->current_state == state)
-		return 0;
-
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D1 || state > PCI_D3hot)
-		return -EINVAL;
-
 	/*
 	 * Validate transition: We can enter D0 from any state, but if
 	 * we're already in a low-power state, we can only go deeper.  E.g.,
@@ -1362,7 +1355,7 @@ static int pci_set_low_power_state(struc
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
 	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
-		pci_err(dev, "invalid power transition (from %s to %s)\n",
+		pci_dbg(dev, "Invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
 		return -EINVAL;
@@ -1375,10 +1368,10 @@ static int pci_set_low_power_state(struc
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 	if (PCI_POSSIBLE_ERROR(pmcsr)) {
-		pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n",
+		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
-		return -EIO;
+		return -ENODEV;
 	}
 
 	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 7/9] PCI/PM: Rearrange pci_set_power_state()
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (5 preceding siblings ...)
  2022-04-14 13:15     ` [PATCH v3 6/9] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
@ 2022-04-14 13:17     ` Rafael J. Wysocki
  2022-04-14 13:18     ` [PATCH v3 8/9] PCI/PM: Avoid redundant current_state update Rafael J. Wysocki
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:17 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The part of pci_set_power_state() related to transitions into
low-power states is unnecessary convoluted, so clearly divide it
into the D3cold special case and the general case covering all of
the other states.

Also fix a potential issue with calling pci_bus_set_current_state()
to set the current state of all devices on the subordinate bus to
D3cold without checking if the power state of the parent bridge has
really changed to D3cold.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v1 -> v3:
   * Fixed typo in the changelog (missing word "bus").
   * Added R-by from Mika.

---
 drivers/pci/pci.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1452,19 +1452,25 @@ int pci_set_power_state(struct pci_dev *
 	if (state >= PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	/*
-	 * To put device in D3cold, we put device into D3hot in native
-	 * way, then put device into D3cold with platform ops
-	 */
-	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
-					PCI_D3hot : state);
+	if (state == PCI_D3cold) {
+		/*
+		 * To put the device in D3cold, put it into D3hot in the native
+		 * way, then put it into D3cold using platform ops.
+		 */
+		error = pci_set_low_power_state(dev, PCI_D3hot);
 
-	if (pci_platform_power_transition(dev, state))
-		return error;
+		if (pci_platform_power_transition(dev, PCI_D3cold))
+			return error;
 
-	/* Powering off a bridge may power off the whole hierarchy */
-	if (state == PCI_D3cold)
-		pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+		/* Powering off a bridge may power off the whole hierarchy */
+		if (dev->current_state == PCI_D3cold)
+			pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+	} else {
+		error = pci_set_low_power_state(dev, state);
+
+		if (pci_platform_power_transition(dev, state))
+			return error;
+	}
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 8/9] PCI/PM: Avoid redundant current_state update
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (6 preceding siblings ...)
  2022-04-14 13:17     ` [PATCH v3 7/9] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
@ 2022-04-14 13:18     ` Rafael J. Wysocki
  2022-04-14 13:19     ` [PATCH v3 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Rafael J. Wysocki
  2022-04-20 16:25     ` [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0 Bjorn Helgaas
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:18 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Notice that if pci_power_up() returns success early, without updating
the given device's PCI_PM_CTRL register, because it has verified that
the power state of the device is D0 already at that point, the
pci_update_current_state() invocation in pci_pm_default_resume_early()
is redundant.

Avoid that redundant call by making pci_power_up() return 1 when it
has updated the device's PCI_PM_CTRL register and checking its return
value in pci_pm_default_resume_early().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v2 -> v3:
   * Added R-by from Mika.

---
 drivers/pci/pci-driver.c |    5 +++--
 drivers/pci/pci.c        |   10 ++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -553,8 +553,9 @@ static void pci_pm_default_resume(struct
 
 static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 {
-	pci_power_up(pci_dev);
-	pci_update_current_state(pci_dev, PCI_D0);
+	if (pci_power_up(pci_dev))
+		pci_update_current_state(pci_dev, PCI_D0);
+
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
 }
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1189,6 +1189,12 @@ static int pci_dev_wait(struct pci_dev *
 /**
  * pci_power_up - Put the given device into D0
  * @dev: PCI device to power up
+ *
+ * Use the platform firmware and the PCI_PM_CTRL register to put @dev into D0.
+ *
+ * Return 0 if invoking the platform firmware was sufficient to put @dev into D0
+ * (and so the PCI_PM_CTRL register was not updated), or 1 if it was necessary to
+ * update the PCI_PM_CTRL register, or -ENODEV if the device was not accessible.
  */
 int pci_power_up(struct pci_dev *dev)
 {
@@ -1244,7 +1250,7 @@ int pci_power_up(struct pci_dev *dev)
 		udelay(PCI_PM_D2_DELAY);
 
 	dev->current_state = PCI_D0;
-	return 0;
+	return 1;
 
 fail:
 	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
@@ -1294,7 +1300,7 @@ static int pci_set_full_power_state(stru
 	int ret;
 
 	ret = pci_power_up(dev);
-	if (ret)
+	if (ret < 0)
 		return ret;
 
 	if (!dev->pm_cap)




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq()
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (7 preceding siblings ...)
  2022-04-14 13:18     ` [PATCH v3 8/9] PCI/PM: Avoid redundant current_state update Rafael J. Wysocki
@ 2022-04-14 13:19     ` Rafael J. Wysocki
  2022-04-20 16:25     ` [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0 Bjorn Helgaas
  9 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-14 13:19 UTC (permalink / raw)
  To: Linux PCI; +Cc: Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Calling pci_set_power_state() to put the given device into D0 in
pci_pm_thaw_noirq() may cause it to restore the device's BARs, which
is redundant before calling pci_restore_state(), so replace it with
a direct pci_power_up() call followed by pci_update_current_state()
if it returns a nonzeor value, in analogy with
pci_pm_default_resume_early().

Avoid code duplication by introducing a wrapper function to contain
the repeating pattern and calling it in both places.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v2 -> v3:
   * Added R-by from Mika.

---
 drivers/pci/pci-driver.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -551,11 +551,15 @@ static void pci_pm_default_resume(struct
 	pci_enable_wake(pci_dev, PCI_D0, false);
 }
 
-static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
+static void pci_pm_power_up_and_verify_state(struct pci_dev *pci_dev)
 {
 	if (pci_power_up(pci_dev))
 		pci_update_current_state(pci_dev, PCI_D0);
+}
 
+static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
+{
+	pci_pm_power_up_and_verify_state(pci_dev);
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
 }
@@ -1080,7 +1084,7 @@ static int pci_pm_thaw_noirq(struct devi
 	 * in case the driver's "freeze" callbacks put it into a low-power
 	 * state.
 	 */
-	pci_set_power_state(pci_dev, PCI_D0);
+	pci_pm_power_up_and_verify_state(pci_dev);
 	pci_restore_state(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
                       ` (8 preceding siblings ...)
  2022-04-14 13:19     ` [PATCH v3 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Rafael J. Wysocki
@ 2022-04-20 16:25     ` Bjorn Helgaas
  2022-04-20 16:28       ` Rafael J. Wysocki
  9 siblings, 1 reply; 47+ messages in thread
From: Bjorn Helgaas @ 2022-04-20 16:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PCI, Linux PM, LKML, Mika Westerberg

On Thu, Apr 14, 2022 at 03:00:23PM +0200, Rafael J. Wysocki wrote:
> On Monday, April 11, 2022 4:17:41 PM CEST Rafael J. Wysocki wrote:
> > Hi All,
> > 
> > On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> > > Hi All,
> > > 
> > > This series supersedes the one at
> > > 
> > > https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> > > 
> > > It addresses some potential issues related to PCI device transitions from
> > > low-power states into D0 and makes the related code more straightforward
> > > and so easier to follow.
> > > 
> > > Please refer to the patch changelogs for details.
> > 
> > Here's a v2 of this patch series which is being sent, because I realized that
> > one of the checks in pci_power_up() added by patch [4/7] in v1 was redundant
> > and can be dropped, but that affected the last 3 patches in the series and
> > then I noticed that more improvements were possible and hence the new patches
> > [2/9].
> 
> Here's a v3 of this series with some minor review comments addressed and R-by
> tags from Mika added.

Applied to pci/pm for v5.19, thanks, Rafael!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0
  2022-04-20 16:25     ` [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0 Bjorn Helgaas
@ 2022-04-20 16:28       ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-04-20 16:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Mika Westerberg

On Wed, Apr 20, 2022 at 6:25 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Apr 14, 2022 at 03:00:23PM +0200, Rafael J. Wysocki wrote:
> > On Monday, April 11, 2022 4:17:41 PM CEST Rafael J. Wysocki wrote:
> > > Hi All,
> > >
> > > On Saturday, April 9, 2022 3:03:14 PM CEST Rafael J. Wysocki wrote:
> > > > Hi All,
> > > >
> > > > This series supersedes the one at
> > > >
> > > > https://lore.kernel.org/linux-pm/4198163.ejJDZkT8p0@kreacher
> > > >
> > > > It addresses some potential issues related to PCI device transitions from
> > > > low-power states into D0 and makes the related code more straightforward
> > > > and so easier to follow.
> > > >
> > > > Please refer to the patch changelogs for details.
> > >
> > > Here's a v2 of this patch series which is being sent, because I realized that
> > > one of the checks in pci_power_up() added by patch [4/7] in v1 was redundant
> > > and can be dropped, but that affected the last 3 patches in the series and
> > > then I noticed that more improvements were possible and hence the new patches
> > > [2/9].
> >
> > Here's a v3 of this series with some minor review comments addressed and R-by
> > tags from Mika added.
>
> Applied to pci/pm for v5.19, thanks, Rafael!

Thank you!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-04-14 13:11     ` [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
@ 2022-05-03 17:59       ` Nathan Chancellor
  2022-05-04 12:59         ` Rafael J. Wysocki
  2022-05-04 16:54         ` Bjorn Helgaas
  0 siblings, 2 replies; 47+ messages in thread
From: Nathan Chancellor @ 2022-05-03 17:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PCI, Linux PM, LKML, Bjorn Helgaas, Mika Westerberg

Hi Rafael,

On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> There are some issues related to changing power states of PCI
> devices, mostly related to carrying out unnecessary actions in some
> places, and the code is generally hard to follow.
> 
>  1. pci_power_up() has two callers, pci_set_power_state() and
>     pci_pm_default_resume_early().  The latter updates the current
>     power state of the device right after calling pci_power_up()
>     and it restores the entire config space of the device right
>     after that, so pci_power_up() itself need not read the
>     PCI_PM_CTRL register or restore the BARs after programming the
>     device into D0 in that case.
>  
>  2. It is generally hard to get a clear view of the pci_power_up()
>     code flow, especially in some corner cases, due to all of the
>     involved PCI_PM_CTRL register reads and writes occurring in
>     pci_platform_power_transition() and in pci_raw_set_power_state(),
>     some of which are redundant.
> 
>  3. The transitions from low-power states to D0 and the other way
>     around are unnecessarily tangled in pci_raw_set_power_state()
>     which causes it to use a redundant local variable and makes it
>     rather hard to follow.
> 
> To address the above shortcomings, make the following changes:
> 
>  a. Remove the code handling transitions into D0
>     from pci_raw_set_power_state() and rename it as
>     pci_set_low_power_state().
> 
>  b. Add the code handling transitions into D0 directly
>     to pci_power_up() and to a new wrapper function
>     pci_set_full_power_state() calling it internally that is
>     only used in pci_set_power_state().
> 
>  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
>     and make it work in the same way for transitions from any
>     low-power states (transitions from D1 and D2 are handled
>     slightly differently before the change).
> 
>  d. Put the restoration of the BARs and the PCI_PM_CTRL
>     register read confirming the power state change into
>     pci_set_full_power_state() to avoid doing that in
>     pci_pm_default_resume_early() unnecessarily.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
states of PCI devices") causes my AMD-based system to fail to fully
boot. As far as I can tell, this might be NVMe related, which might make
getting a full log difficult, as journalctl won't have anywhere to save
it. I see:

nvme nvme0: I/O 8 QID 0 timeout, completion polled

then shortly afterwards:

nvme nvme0: I/O 24 QID 0 timeout, completion polled
nvme nvme0: missing or invalid SUBNQN field

then I am dropped into an emergency shell.

This is a log from the previous commit, which may give some hints about
the configuration of this particular system.

https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log

If there is any additional debugging information I can provide or
patches I can try, please let me know!

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-03 17:59       ` Nathan Chancellor
@ 2022-05-04 12:59         ` Rafael J. Wysocki
  2022-05-04 15:54           ` Anders Roxell
  2022-05-04 16:36           ` Nathan Chancellor
  2022-05-04 16:54         ` Bjorn Helgaas
  1 sibling, 2 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-05-04 12:59 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas,
	Mika Westerberg

On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi Rafael,
>
> On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > There are some issues related to changing power states of PCI
> > devices, mostly related to carrying out unnecessary actions in some
> > places, and the code is generally hard to follow.
> >
> >  1. pci_power_up() has two callers, pci_set_power_state() and
> >     pci_pm_default_resume_early().  The latter updates the current
> >     power state of the device right after calling pci_power_up()
> >     and it restores the entire config space of the device right
> >     after that, so pci_power_up() itself need not read the
> >     PCI_PM_CTRL register or restore the BARs after programming the
> >     device into D0 in that case.
> >
> >  2. It is generally hard to get a clear view of the pci_power_up()
> >     code flow, especially in some corner cases, due to all of the
> >     involved PCI_PM_CTRL register reads and writes occurring in
> >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> >     some of which are redundant.
> >
> >  3. The transitions from low-power states to D0 and the other way
> >     around are unnecessarily tangled in pci_raw_set_power_state()
> >     which causes it to use a redundant local variable and makes it
> >     rather hard to follow.
> >
> > To address the above shortcomings, make the following changes:
> >
> >  a. Remove the code handling transitions into D0
> >     from pci_raw_set_power_state() and rename it as
> >     pci_set_low_power_state().
> >
> >  b. Add the code handling transitions into D0 directly
> >     to pci_power_up() and to a new wrapper function
> >     pci_set_full_power_state() calling it internally that is
> >     only used in pci_set_power_state().
> >
> >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> >     and make it work in the same way for transitions from any
> >     low-power states (transitions from D1 and D2 are handled
> >     slightly differently before the change).
> >
> >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> >     register read confirming the power state change into
> >     pci_set_full_power_state() to avoid doing that in
> >     pci_pm_default_resume_early() unnecessarily.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>
> This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> states of PCI devices") causes my AMD-based system to fail to fully
> boot. As far as I can tell, this might be NVMe related, which might make
> getting a full log difficult, as journalctl won't have anywhere to save
> it. I see:
>
> nvme nvme0: I/O 8 QID 0 timeout, completion polled
>
> then shortly afterwards:
>
> nvme nvme0: I/O 24 QID 0 timeout, completion polled
> nvme nvme0: missing or invalid SUBNQN field
>
> then I am dropped into an emergency shell.

Thanks for the report!

> This is a log from the previous commit, which may give some hints about
> the configuration of this particular system.
>
> https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
>
> If there is any additional debugging information I can provide or
> patches I can try, please let me know!

Please see what happens if the "if (dev->current_state == PCI_D0)"
check and the following "return 0" statement in pci_power_up() are
commented out.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-04 12:59         ` Rafael J. Wysocki
@ 2022-05-04 15:54           ` Anders Roxell
  2022-05-04 16:36           ` Nathan Chancellor
  1 sibling, 0 replies; 47+ messages in thread
From: Anders Roxell @ 2022-05-04 15:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nathan Chancellor, Rafael J. Wysocki, Linux PCI, Linux PM, LKML,
	Bjorn Helgaas, Mika Westerberg

On 2022-05-04 14:59, Rafael J. Wysocki wrote:
> On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > Hi Rafael,
> >
> > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >
> > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > states of PCI devices") causes my AMD-based system to fail to fully
> > boot. As far as I can tell, this might be NVMe related, which might make
> > getting a full log difficult, as journalctl won't have anywhere to save
> > it. I see:
> >
> > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> >
> > then shortly afterwards:
> >
> > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > nvme nvme0: missing or invalid SUBNQN field
> >
> > then I am dropped into an emergency shell.
> 
> Thanks for the report!
> 
> > This is a log from the previous commit, which may give some hints about
> > the configuration of this particular system.
> >
> > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> >
> > If there is any additional debugging information I can provide or
> > patches I can try, please let me know!
> 
> Please see what happens if the "if (dev->current_state == PCI_D0)"
> check and the following "return 0" statement in pci_power_up() are
> commented out.

I've built an arm64 allmodconfig kernel on linux-next tag next-20220503, and tried to boot it.
This is the boot error I see [1].
I bisected down to this patch [2]

When I revert the following patches [3] the kernel boots fine.
I also tried next-20220504 and I saw the same issue.

Cheers,
Anders
[1] http://ix.io/3WT3
[2] http://ix.io/3WXT
[3] http://ix.io/3WXU

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-04 12:59         ` Rafael J. Wysocki
  2022-05-04 15:54           ` Anders Roxell
@ 2022-05-04 16:36           ` Nathan Chancellor
  2022-05-04 18:00             ` Rafael J. Wysocki
  1 sibling, 1 reply; 47+ messages in thread
From: Nathan Chancellor @ 2022-05-04 16:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas,
	Mika Westerberg

On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > Hi Rafael,
> >
> > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >
> > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > states of PCI devices") causes my AMD-based system to fail to fully
> > boot. As far as I can tell, this might be NVMe related, which might make
> > getting a full log difficult, as journalctl won't have anywhere to save
> > it. I see:
> >
> > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> >
> > then shortly afterwards:
> >
> > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > nvme nvme0: missing or invalid SUBNQN field
> >
> > then I am dropped into an emergency shell.
> 
> Thanks for the report!
> 
> > This is a log from the previous commit, which may give some hints about
> > the configuration of this particular system.
> >
> > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> >
> > If there is any additional debugging information I can provide or
> > patches I can try, please let me know!
> 
> Please see what happens if the "if (dev->current_state == PCI_D0)"
> check and the following "return 0" statement in pci_power_up() are
> commented out.

If I understand you correctly, this? Unfortunately, that does not help.

Cheers,
Nathan

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1e22dc5187e7..9f7a463107f3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1235,8 +1235,10 @@ int pci_power_up(struct pci_dev *dev)
 	}
 
 	/* There's nothing more to do if current_state is D0 at this point. */
+#if 0
 	if (dev->current_state == PCI_D0)
 		return 0;
+#endif
 
 	/*
 	 * Program the device into PCI_D0 by forcing the entire word to 0 (this

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-03 17:59       ` Nathan Chancellor
  2022-05-04 12:59         ` Rafael J. Wysocki
@ 2022-05-04 16:54         ` Bjorn Helgaas
  2022-05-04 17:15           ` Rafael J. Wysocki
  1 sibling, 1 reply; 47+ messages in thread
From: Bjorn Helgaas @ 2022-05-04 16:54 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Mika Westerberg,
	Anders Roxell

[+cc Anders]

On Tue, May 03, 2022 at 10:59:43AM -0700, Nathan Chancellor wrote:
> On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > There are some issues related to changing power states of PCI
> > devices, mostly related to carrying out unnecessary actions in some
> > places, and the code is generally hard to follow.
> > 
> >  1. pci_power_up() has two callers, pci_set_power_state() and
> >     pci_pm_default_resume_early().  The latter updates the current
> >     power state of the device right after calling pci_power_up()
> >     and it restores the entire config space of the device right
> >     after that, so pci_power_up() itself need not read the
> >     PCI_PM_CTRL register or restore the BARs after programming the
> >     device into D0 in that case.
> >  
> >  2. It is generally hard to get a clear view of the pci_power_up()
> >     code flow, especially in some corner cases, due to all of the
> >     involved PCI_PM_CTRL register reads and writes occurring in
> >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> >     some of which are redundant.
> > 
> >  3. The transitions from low-power states to D0 and the other way
> >     around are unnecessarily tangled in pci_raw_set_power_state()
> >     which causes it to use a redundant local variable and makes it
> >     rather hard to follow.
> > 
> > To address the above shortcomings, make the following changes:
> > 
> >  a. Remove the code handling transitions into D0
> >     from pci_raw_set_power_state() and rename it as
> >     pci_set_low_power_state().
> > 
> >  b. Add the code handling transitions into D0 directly
> >     to pci_power_up() and to a new wrapper function
> >     pci_set_full_power_state() calling it internally that is
> >     only used in pci_set_power_state().
> > 
> >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> >     and make it work in the same way for transitions from any
> >     low-power states (transitions from D1 and D2 are handled
> >     slightly differently before the change).
> > 
> >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> >     register read confirming the power state change into
> >     pci_set_full_power_state() to avoid doing that in
> >     pci_pm_default_resume_early() unnecessarily.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> states of PCI devices") causes my AMD-based system to fail to fully
> boot.

I dropped 5bffe4c611f5 and subsequent pci/pm patches temporarily while
this gets worked out.

Bjorn

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-04 16:54         ` Bjorn Helgaas
@ 2022-05-04 17:15           ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-05-04 17:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nathan Chancellor, Rafael J. Wysocki, Linux PCI, Linux PM, LKML,
	Mika Westerberg, Anders Roxell

On Wed, May 4, 2022 at 7:03 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Anders]
>
> On Tue, May 03, 2022 at 10:59:43AM -0700, Nathan Chancellor wrote:
> > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >
> > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > states of PCI devices") causes my AMD-based system to fail to fully
> > boot.
>
> I dropped 5bffe4c611f5 and subsequent pci/pm patches temporarily while
> this gets worked out.

OK

It looks like I missed something subtle that triggers on a subset of
systems only.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-04 16:36           ` Nathan Chancellor
@ 2022-05-04 18:00             ` Rafael J. Wysocki
  2022-05-04 19:35               ` Nathan Chancellor
  0 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-05-04 18:00 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas,
	Mika Westerberg

On Wednesday, May 4, 2022 6:36:00 PM CEST Nathan Chancellor wrote:
> On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> > On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > >
> > > Hi Rafael,
> > >
> > > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > There are some issues related to changing power states of PCI
> > > > devices, mostly related to carrying out unnecessary actions in some
> > > > places, and the code is generally hard to follow.
> > > >
> > > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > > >     pci_pm_default_resume_early().  The latter updates the current
> > > >     power state of the device right after calling pci_power_up()
> > > >     and it restores the entire config space of the device right
> > > >     after that, so pci_power_up() itself need not read the
> > > >     PCI_PM_CTRL register or restore the BARs after programming the
> > > >     device into D0 in that case.
> > > >
> > > >  2. It is generally hard to get a clear view of the pci_power_up()
> > > >     code flow, especially in some corner cases, due to all of the
> > > >     involved PCI_PM_CTRL register reads and writes occurring in
> > > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > > >     some of which are redundant.
> > > >
> > > >  3. The transitions from low-power states to D0 and the other way
> > > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > > >     which causes it to use a redundant local variable and makes it
> > > >     rather hard to follow.
> > > >
> > > > To address the above shortcomings, make the following changes:
> > > >
> > > >  a. Remove the code handling transitions into D0
> > > >     from pci_raw_set_power_state() and rename it as
> > > >     pci_set_low_power_state().
> > > >
> > > >  b. Add the code handling transitions into D0 directly
> > > >     to pci_power_up() and to a new wrapper function
> > > >     pci_set_full_power_state() calling it internally that is
> > > >     only used in pci_set_power_state().
> > > >
> > > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > > >     and make it work in the same way for transitions from any
> > > >     low-power states (transitions from D1 and D2 are handled
> > > >     slightly differently before the change).
> > > >
> > > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > > >     register read confirming the power state change into
> > > >     pci_set_full_power_state() to avoid doing that in
> > > >     pci_pm_default_resume_early() unnecessarily.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > >
> > > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > > states of PCI devices") causes my AMD-based system to fail to fully
> > > boot. As far as I can tell, this might be NVMe related, which might make
> > > getting a full log difficult, as journalctl won't have anywhere to save
> > > it. I see:
> > >
> > > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> > >
> > > then shortly afterwards:
> > >
> > > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > > nvme nvme0: missing or invalid SUBNQN field
> > >
> > > then I am dropped into an emergency shell.
> > 
> > Thanks for the report!
> > 
> > > This is a log from the previous commit, which may give some hints about
> > > the configuration of this particular system.
> > >
> > > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> > >
> > > If there is any additional debugging information I can provide or
> > > patches I can try, please let me know!
> > 
> > Please see what happens if the "if (dev->current_state == PCI_D0)"
> > check and the following "return 0" statement in pci_power_up() are
> > commented out.
> 
> If I understand you correctly, this? Unfortunately, that does not help.

Thanks for testing.

Please check if the patch below makes any difference.

---
 drivers/pci/pci.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1245,7 +1245,7 @@ int pci_power_up(struct pci_dev *dev)
 
 	/* There's nothing more to do if current_state is D0 at this point. */
 	if (dev->current_state == PCI_D0)
-		return 0;
+		goto done;
 
 	/*
 	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
@@ -1260,6 +1260,11 @@ int pci_power_up(struct pci_dev *dev)
 		udelay(PCI_PM_D2_DELAY);
 
 	dev->current_state = PCI_D0;
+
+done:
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
+
 	return 1;
 
 fail:
@@ -1339,9 +1344,6 @@ static int pci_set_full_power_state(stru
 		pci_restore_bars(dev);
 	}
 
-	if (dev->bus->self)
-		pcie_aspm_pm_state_change(dev->bus->self);
-
 	return 0;
 }
 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-04 18:00             ` Rafael J. Wysocki
@ 2022-05-04 19:35               ` Nathan Chancellor
  2022-05-05 11:58                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 47+ messages in thread
From: Nathan Chancellor @ 2022-05-04 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, LKML, Bjorn Helgaas,
	Mika Westerberg

On Wed, May 04, 2022 at 08:00:33PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, May 4, 2022 6:36:00 PM CEST Nathan Chancellor wrote:
> > On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> > > On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > >
> > > > Hi Rafael,
> > > >
> > > > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > There are some issues related to changing power states of PCI
> > > > > devices, mostly related to carrying out unnecessary actions in some
> > > > > places, and the code is generally hard to follow.
> > > > >
> > > > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > > > >     pci_pm_default_resume_early().  The latter updates the current
> > > > >     power state of the device right after calling pci_power_up()
> > > > >     and it restores the entire config space of the device right
> > > > >     after that, so pci_power_up() itself need not read the
> > > > >     PCI_PM_CTRL register or restore the BARs after programming the
> > > > >     device into D0 in that case.
> > > > >
> > > > >  2. It is generally hard to get a clear view of the pci_power_up()
> > > > >     code flow, especially in some corner cases, due to all of the
> > > > >     involved PCI_PM_CTRL register reads and writes occurring in
> > > > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > > > >     some of which are redundant.
> > > > >
> > > > >  3. The transitions from low-power states to D0 and the other way
> > > > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > > > >     which causes it to use a redundant local variable and makes it
> > > > >     rather hard to follow.
> > > > >
> > > > > To address the above shortcomings, make the following changes:
> > > > >
> > > > >  a. Remove the code handling transitions into D0
> > > > >     from pci_raw_set_power_state() and rename it as
> > > > >     pci_set_low_power_state().
> > > > >
> > > > >  b. Add the code handling transitions into D0 directly
> > > > >     to pci_power_up() and to a new wrapper function
> > > > >     pci_set_full_power_state() calling it internally that is
> > > > >     only used in pci_set_power_state().
> > > > >
> > > > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > > > >     and make it work in the same way for transitions from any
> > > > >     low-power states (transitions from D1 and D2 are handled
> > > > >     slightly differently before the change).
> > > > >
> > > > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > > > >     register read confirming the power state change into
> > > > >     pci_set_full_power_state() to avoid doing that in
> > > > >     pci_pm_default_resume_early() unnecessarily.
> > > > >
> > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > >
> > > > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > > > states of PCI devices") causes my AMD-based system to fail to fully
> > > > boot. As far as I can tell, this might be NVMe related, which might make
> > > > getting a full log difficult, as journalctl won't have anywhere to save
> > > > it. I see:
> > > >
> > > > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> > > >
> > > > then shortly afterwards:
> > > >
> > > > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > > > nvme nvme0: missing or invalid SUBNQN field
> > > >
> > > > then I am dropped into an emergency shell.
> > > 
> > > Thanks for the report!
> > > 
> > > > This is a log from the previous commit, which may give some hints about
> > > > the configuration of this particular system.
> > > >
> > > > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> > > >
> > > > If there is any additional debugging information I can provide or
> > > > patches I can try, please let me know!
> > > 
> > > Please see what happens if the "if (dev->current_state == PCI_D0)"
> > > check and the following "return 0" statement in pci_power_up() are
> > > commented out.
> > 
> > If I understand you correctly, this? Unfortunately, that does not help.
> 
> Thanks for testing.
> 
> Please check if the patch below makes any difference.

Unfortunately, there is still no difference. Even worse, I thought I
might be able to get some information from the emergency shell but I
don't think the HID driver is loaded yet so my keyboard does not work. I
am not sure of how to get any further information from the problematic
kernel; if anyone has any ideas, I am happy to test them! I am more than
happy to continue to test patches or provide information, I just don't
want to be a waste of time :)

Cheers,
Nathan

> ---
>  drivers/pci/pci.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci.c
> +++ linux-pm/drivers/pci/pci.c
> @@ -1245,7 +1245,7 @@ int pci_power_up(struct pci_dev *dev)
>  
>  	/* There's nothing more to do if current_state is D0 at this point. */
>  	if (dev->current_state == PCI_D0)
> -		return 0;
> +		goto done;
>  
>  	/*
>  	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
> @@ -1260,6 +1260,11 @@ int pci_power_up(struct pci_dev *dev)
>  		udelay(PCI_PM_D2_DELAY);
>  
>  	dev->current_state = PCI_D0;
> +
> +done:
> +	if (dev->bus->self)
> +		pcie_aspm_pm_state_change(dev->bus->self);
> +
>  	return 1;
>  
>  fail:
> @@ -1339,9 +1344,6 @@ static int pci_set_full_power_state(stru
>  		pci_restore_bars(dev);
>  	}
>  
> -	if (dev->bus->self)
> -		pcie_aspm_pm_state_change(dev->bus->self);
> -
>  	return 0;
>  }
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices
  2022-05-04 19:35               ` Nathan Chancellor
@ 2022-05-05 11:58                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2022-05-05 11:58 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PCI, Linux PM, LKML,
	Bjorn Helgaas, Mika Westerberg

On Wed, May 4, 2022 at 9:35 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Wed, May 04, 2022 at 08:00:33PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, May 4, 2022 6:36:00 PM CEST Nathan Chancellor wrote:
> > > On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > >
> > > > > > There are some issues related to changing power states of PCI
> > > > > > devices, mostly related to carrying out unnecessary actions in some
> > > > > > places, and the code is generally hard to follow.
> > > > > >
> > > > > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > > > > >     pci_pm_default_resume_early().  The latter updates the current
> > > > > >     power state of the device right after calling pci_power_up()
> > > > > >     and it restores the entire config space of the device right
> > > > > >     after that, so pci_power_up() itself need not read the
> > > > > >     PCI_PM_CTRL register or restore the BARs after programming the
> > > > > >     device into D0 in that case.
> > > > > >
> > > > > >  2. It is generally hard to get a clear view of the pci_power_up()
> > > > > >     code flow, especially in some corner cases, due to all of the
> > > > > >     involved PCI_PM_CTRL register reads and writes occurring in
> > > > > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > > > > >     some of which are redundant.
> > > > > >
> > > > > >  3. The transitions from low-power states to D0 and the other way
> > > > > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > > > > >     which causes it to use a redundant local variable and makes it
> > > > > >     rather hard to follow.
> > > > > >
> > > > > > To address the above shortcomings, make the following changes:
> > > > > >
> > > > > >  a. Remove the code handling transitions into D0
> > > > > >     from pci_raw_set_power_state() and rename it as
> > > > > >     pci_set_low_power_state().
> > > > > >
> > > > > >  b. Add the code handling transitions into D0 directly
> > > > > >     to pci_power_up() and to a new wrapper function
> > > > > >     pci_set_full_power_state() calling it internally that is
> > > > > >     only used in pci_set_power_state().
> > > > > >
> > > > > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > > > > >     and make it work in the same way for transitions from any
> > > > > >     low-power states (transitions from D1 and D2 are handled
> > > > > >     slightly differently before the change).
> > > > > >
> > > > > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > > > > >     register read confirming the power state change into
> > > > > >     pci_set_full_power_state() to avoid doing that in
> > > > > >     pci_pm_default_resume_early() unnecessarily.
> > > > > >
> > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > >
> > > > > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > > > > states of PCI devices") causes my AMD-based system to fail to fully
> > > > > boot. As far as I can tell, this might be NVMe related, which might make
> > > > > getting a full log difficult, as journalctl won't have anywhere to save
> > > > > it. I see:
> > > > >
> > > > > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> > > > >
> > > > > then shortly afterwards:
> > > > >
> > > > > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > > > > nvme nvme0: missing or invalid SUBNQN field
> > > > >
> > > > > then I am dropped into an emergency shell.
> > > >
> > > > Thanks for the report!
> > > >
> > > > > This is a log from the previous commit, which may give some hints about
> > > > > the configuration of this particular system.
> > > > >
> > > > > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> > > > >
> > > > > If there is any additional debugging information I can provide or
> > > > > patches I can try, please let me know!
> > > >
> > > > Please see what happens if the "if (dev->current_state == PCI_D0)"
> > > > check and the following "return 0" statement in pci_power_up() are
> > > > commented out.
> > >
> > > If I understand you correctly, this? Unfortunately, that does not help.
> >
> > Thanks for testing.
> >
> > Please check if the patch below makes any difference.
>
> Unfortunately, there is still no difference. Even worse, I thought I
> might be able to get some information from the emergency shell but I
> don't think the HID driver is loaded yet so my keyboard does not work. I
> am not sure of how to get any further information from the problematic
> kernel; if anyone has any ideas, I am happy to test them! I am more than
> happy to continue to test patches or provide information, I just don't
> want to be a waste of time :)

It's not a waste of time if you run tests I ask for.

Anyway, I'm going to change the approach, because we're looking for a
subtle change in behavior that breaks your system and there are quite
a few of these in the problematic patch.

I'll post a new series of patches to replace the commits dropped by
Bjorn later today.

Thanks!

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2022-05-05 11:58 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-09 13:03 [PATCH v1 0/7] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
2022-04-09 13:05 ` [PATCH v1 1/7] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
2022-04-09 13:06 ` [PATCH v1 2/7] PCI/PM: Drop the runtime_d3cold PCI device flag Rafael J. Wysocki
2022-04-09 13:08 ` [PATCH v1 3/7] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
2022-04-09 13:22 ` [PATCH v1 4/7] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
2022-04-09 13:23 ` [PATCH v1 5/7] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
2022-04-09 13:26 ` [PATCH v1 6/7] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
2022-04-09 13:28 ` [PATCH v1 7/7] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
2022-04-11 14:17 ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
2022-04-11 14:19   ` [PATCH v2 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
2022-04-11 14:20   ` [PATCH v2 2/9] PCI/PM: Drop the runtime_d3cold device flag Rafael J. Wysocki
2022-04-11 14:21   ` [PATCH v2 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
2022-04-12  9:42     ` Mika Westerberg
2022-04-12 10:56       ` Rafael J. Wysocki
2022-04-11 14:25   ` [PATCH v2 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
2022-04-12 10:00     ` Mika Westerberg
2022-04-12 11:31       ` Rafael J. Wysocki
2022-04-12 11:38         ` Mika Westerberg
2022-04-11 14:27   ` [PATCH v2 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
2022-04-11 14:28   ` [PATCH v2 6/9] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
2022-04-11 14:30   ` [PATCH v2 7/9] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
2022-04-11 14:31   ` [PATCH v2 8/9] PCI/PM: Avoid redundant current_state update Rafael J. Wysocki
2022-04-11 14:33   ` [PATCH v2 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Rafael J. Wysocki
2022-04-11 14:33   ` [PATCH v2 0/9] PCI/PM: Improvements related to device transitions into D0 Rafael J. Wysocki
2022-04-12 10:08   ` Mika Westerberg
2022-04-12 11:34     ` Rafael J. Wysocki
2022-04-14 13:00   ` [PATCH v3 " Rafael J. Wysocki
2022-04-14 13:04     ` [PATCH v3 1/9] PCI/PM: Resume subordinate bus in bus type callbacks Rafael J. Wysocki
2022-04-14 13:04     ` [PATCH v3 2/9] PCI/PM: Drop the runtime_d3cold device flag Rafael J. Wysocki
2022-04-14 13:07     ` [PATCH v3 3/9] PCI/PM: Rearrange pci_update_current_state() Rafael J. Wysocki
2022-04-14 13:11     ` [PATCH v3 4/9] PCI/PM: Rework changing power states of PCI devices Rafael J. Wysocki
2022-05-03 17:59       ` Nathan Chancellor
2022-05-04 12:59         ` Rafael J. Wysocki
2022-05-04 15:54           ` Anders Roxell
2022-05-04 16:36           ` Nathan Chancellor
2022-05-04 18:00             ` Rafael J. Wysocki
2022-05-04 19:35               ` Nathan Chancellor
2022-05-05 11:58                 ` Rafael J. Wysocki
2022-05-04 16:54         ` Bjorn Helgaas
2022-05-04 17:15           ` Rafael J. Wysocki
2022-04-14 13:14     ` [PATCH v3 5/9] PCI/PM: Move pci_set_low_power_state() next to its caller Rafael J. Wysocki
2022-04-14 13:15     ` [PATCH v3 6/9] PCI/PM: Clean up pci_set_low_power_state() Rafael J. Wysocki
2022-04-14 13:17     ` [PATCH v3 7/9] PCI/PM: Rearrange pci_set_power_state() Rafael J. Wysocki
2022-04-14 13:18     ` [PATCH v3 8/9] PCI/PM: Avoid redundant current_state update Rafael J. Wysocki
2022-04-14 13:19     ` [PATCH v3 9/9] PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Rafael J. Wysocki
2022-04-20 16:25     ` [PATCH v3 0/9] PCI/PM: Improvements related to device transitions into D0 Bjorn Helgaas
2022-04-20 16:28       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).