linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used
       [not found]                   ` <20190731221956.GB15795@localhost.localdomain>
@ 2019-08-07  9:53                     ` Rafael J. Wysocki
  2019-08-07 10:14                       ` Rafael J. Wysocki
                                         ` (2 more replies)
  2019-08-08  8:36                     ` [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
                                       ` (2 subsequent siblings)
  3 siblings, 3 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-07  9:53 UTC (permalink / raw)
  To: Keith Busch
  Cc: Mario Limonciello, Kai-Heng Feng, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Linux PM, Linux Kernel Mailing List,
	Rajat Jain, Linux PCI, Bjorn Helgaas

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
host managed power state for suspend") was adding a pci_save_state()
call to nvme_suspend() in order to prevent the PCI bus-level PM from
being applied to the suspended NVMe devices, but if ASPM is not
enabled for the target NVMe device, that causes its PCIe link to stay
up and the platform may not be able to get into its optimum low-power
state because of that.

For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
suspend-to-idle prevents the SoC from reaching package idle states
deeper than PC3, which is way insufficient for system suspend.

To address this shortcoming, make nvme_suspend() check if ASPM is
enabled for the target device and fall back to full device shutdown
and PCI bus-level PM if that is not the case.

Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/nvme/host/pci.c |   14 ++++++++++----
 drivers/pci/pcie/aspm.c |   17 +++++++++++++++++
 include/linux/pci.h     |    2 ++
 3 files changed, 29 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/nvme/host/pci.c
===================================================================
--- linux-pm.orig/drivers/nvme/host/pci.c
+++ linux-pm/drivers/nvme/host/pci.c
@@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 
-	if (pm_resume_via_firmware() || !ctrl->npss ||
+	if (ndev->last_ps == U32_MAX ||
 	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
 		nvme_reset_ctrl(ctrl);
 	return 0;
@@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 	int ret = -EBUSY;
 
+	ndev->last_ps = U32_MAX;
+
 	/*
 	 * The platform does not remove power for a kernel managed suspend so
 	 * use host managed nvme power settings for lowest idle power if
@@ -2866,8 +2868,13 @@ static int nvme_suspend(struct device *d
 	 * shutdown.  But if the firmware is involved after the suspend or the
 	 * device does not support any non-default power states, shut down the
 	 * device fully.
+	 *
+	 * If ASPM is not enabled for the device, shut down the device and allow
+	 * the PCI bus layer to put it into D3 in order to take the PCIe link
+	 * down, so as to allow the platform to achieve its minimum low-power
+	 * state (which may not be possible if the link is up).
 	 */
-	if (pm_suspend_via_firmware() || !ctrl->npss) {
+	if (pm_suspend_via_firmware() || !ctrl->npss || !pcie_aspm_enabled(pdev)) {
 		nvme_dev_disable(ndev, true);
 		return 0;
 	}
@@ -2880,9 +2887,8 @@ static int nvme_suspend(struct device *d
 	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
 		goto unfreeze;
 
-	ndev->last_ps = 0;
 	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
-	if (ret < 0)
+	if (ret < 0 || ndev->last_ps == U32_MAX)
 		goto unfreeze;
 
 	ret = nvme_set_power_state(ctrl, ctrl->npss);
Index: linux-pm/drivers/pci/pcie/aspm.c
===================================================================
--- linux-pm.orig/drivers/pci/pcie/aspm.c
+++ linux-pm/drivers/pci/pcie/aspm.c
@@ -1170,6 +1170,23 @@ static int pcie_aspm_get_policy(char *bu
 module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
 	NULL, 0644);
 
+/*
+ * pcie_aspm_enabled - Return the mask of enabled ASPM link states.
+ * @pci_device: Target device.
+ */
+u32 pcie_aspm_enabled(struct pci_dev *pci_device)
+{
+	struct pci_dev *bridge = pci_device->bus->self;
+	u32 aspm_enabled;
+
+	mutex_lock(&aspm_lock);
+	aspm_enabled = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
+	mutex_unlock(&aspm_lock);
+
+	return aspm_enabled;
+}
+
+
 #ifdef CONFIG_PCIEASPM_DEBUG
 static ssize_t link_state_show(struct device *dev,
 		struct device_attribute *attr,
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -1567,8 +1567,10 @@ extern bool pcie_ports_native;
 
 #ifdef CONFIG_PCIEASPM
 bool pcie_aspm_support_enabled(void);
+u32 pcie_aspm_enabled(struct pci_dev *pci_device);
 #else
 static inline bool pcie_aspm_support_enabled(void) { return false; }
+static inline u32 pcie_aspm_enabled(struct pci_dev *pci_device) { return 0; }
 #endif
 
 #ifdef CONFIG_PCIEAER




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used
  2019-08-07  9:53                     ` [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used Rafael J. Wysocki
@ 2019-08-07 10:14                       ` Rafael J. Wysocki
  2019-08-07 10:43                       ` Christoph Hellwig
  2019-08-07 14:37                       ` Keith Busch
  2 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-07 10:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: Mario Limonciello, Kai-Heng Feng, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Linux PM, Linux Kernel Mailing List,
	Rajat Jain, Linux PCI, Bjorn Helgaas

On Wednesday, August 7, 2019 11:53:44 AM CEST Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> host managed power state for suspend") was adding a pci_save_state()
> call to nvme_suspend() in order to prevent the PCI bus-level PM from
> being applied to the suspended NVMe devices, but if ASPM is not
> enabled for the target NVMe device, that causes its PCIe link to stay
> up and the platform may not be able to get into its optimum low-power
> state because of that.
> 
> For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> suspend-to-idle prevents the SoC from reaching package idle states
> deeper than PC3, which is way insufficient for system suspend.
> 
> To address this shortcoming, make nvme_suspend() check if ASPM is
> enabled for the target device and fall back to full device shutdown
> and PCI bus-level PM if that is not the case.
> 
> Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

I should have used a better subject for this patch.

I'll resend it with a changed subject later, but for now I would like to collect
opinions about it (if any).

Cheers!




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used
  2019-08-07  9:53                     ` [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used Rafael J. Wysocki
  2019-08-07 10:14                       ` Rafael J. Wysocki
@ 2019-08-07 10:43                       ` Christoph Hellwig
  2019-08-07 14:37                       ` Keith Busch
  2 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2019-08-07 10:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

> +	if (pm_suspend_via_firmware() || !ctrl->npss || !pcie_aspm_enabled(pdev)) {



> +	mutex_lock(&aspm_lock);
> +	aspm_enabled = bridge->link_state ? bridge->link_state->aspm_enabled : 0;

Please fix the overly long lines..

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used
  2019-08-07  9:53                     ` [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used Rafael J. Wysocki
  2019-08-07 10:14                       ` Rafael J. Wysocki
  2019-08-07 10:43                       ` Christoph Hellwig
@ 2019-08-07 14:37                       ` Keith Busch
  2 siblings, 0 replies; 34+ messages in thread
From: Keith Busch @ 2019-08-07 14:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mario Limonciello, Kai-Heng Feng, Busch, Keith,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

On Wed, Aug 07, 2019 at 02:53:44AM -0700, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> host managed power state for suspend") was adding a pci_save_state()
> call to nvme_suspend() in order to prevent the PCI bus-level PM from
> being applied to the suspended NVMe devices, but if ASPM is not
> enabled for the target NVMe device, that causes its PCIe link to stay
> up and the platform may not be able to get into its optimum low-power
> state because of that.
> 
> For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> suspend-to-idle prevents the SoC from reaching package idle states
> deeper than PC3, which is way insufficient for system suspend.
> 
> To address this shortcoming, make nvme_suspend() check if ASPM is
> enabled for the target device and fall back to full device shutdown
> and PCI bus-level PM if that is not the case.
> 
> Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks for tracking down the cause. Sounds like your earlier assumption
on ASPM's involvement was spot on.

> +/*
> + * pcie_aspm_enabled - Return the mask of enabled ASPM link states.
> + * @pci_device: Target device.
> + */
> +u32 pcie_aspm_enabled(struct pci_dev *pci_device)
> +{
> +	struct pci_dev *bridge = pci_device->bus->self;

You may want use pci_upstream_bridge() instead, just in case someone
calls this on a virtual function's pci_dev.

> +	u32 aspm_enabled;
> +
> +	mutex_lock(&aspm_lock);
> +	aspm_enabled = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
> +	mutex_unlock(&aspm_lock);
> +
> +	return aspm_enabled;
> +}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
       [not found]                   ` <20190731221956.GB15795@localhost.localdomain>
  2019-08-07  9:53                     ` [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used Rafael J. Wysocki
@ 2019-08-08  8:36                     ` Rafael J. Wysocki
  2019-08-08  8:48                       ` Christoph Hellwig
  2019-08-08 10:03                     ` [PATCH v2 0/2] " Rafael J. Wysocki
  2019-08-08 21:51                     ` [PATCH v3 0/2] " Rafael J. Wysocki
  3 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08  8:36 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
host managed power state for suspend") was adding a pci_save_state()
call to nvme_suspend() in order to prevent the PCI bus-level PM from
being applied to the suspended NVMe devices, but if ASPM is not
enabled for the target NVMe device, that causes its PCIe link to stay
up and the platform may not be able to get into its optimum low-power
state because of that.

For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
suspend-to-idle prevents the SoC from reaching package idle states
deeper than PC3, which is way insufficient for system suspend.

To address this shortcoming, make nvme_suspend() check if ASPM is
enabled for the target device and fall back to full device shutdown
and PCI bus-level PM if that is not the case.

Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

This is an update of the following patch:

https://patchwork.kernel.org/patch/11081791/

going with the subject matching the changes in the patch.

This also addresses style-related comments from Christoph and follows the
Keith's advice to use pci_upstream_bridge() to get to the upstream bridge
of the device.

Thanks!

---
 drivers/nvme/host/pci.c |   15 +++++++++++----
 drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
 include/linux/pci.h     |    2 ++
 3 files changed, 33 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/nvme/host/pci.c
===================================================================
--- linux-pm.orig/drivers/nvme/host/pci.c
+++ linux-pm/drivers/nvme/host/pci.c
@@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 
-	if (pm_resume_via_firmware() || !ctrl->npss ||
+	if (ndev->last_ps == U32_MAX ||
 	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
 		nvme_reset_ctrl(ctrl);
 	return 0;
@@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 	int ret = -EBUSY;
 
+	ndev->last_ps = U32_MAX;
+
 	/*
 	 * The platform does not remove power for a kernel managed suspend so
 	 * use host managed nvme power settings for lowest idle power if
@@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
 	 * shutdown.  But if the firmware is involved after the suspend or the
 	 * device does not support any non-default power states, shut down the
 	 * device fully.
+	 *
+	 * If ASPM is not enabled for the device, shut down the device and allow
+	 * the PCI bus layer to put it into D3 in order to take the PCIe link
+	 * down, so as to allow the platform to achieve its minimum low-power
+	 * state (which may not be possible if the link is up).
 	 */
-	if (pm_suspend_via_firmware() || !ctrl->npss) {
+	if (pm_suspend_via_firmware() || !ctrl->npss ||
+	    !pcie_aspm_enabled(pdev)) {
 		nvme_dev_disable(ndev, true);
 		return 0;
 	}
@@ -2880,9 +2888,8 @@ static int nvme_suspend(struct device *d
 	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
 		goto unfreeze;
 
-	ndev->last_ps = 0;
 	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
-	if (ret < 0)
+	if (ret < 0 || ndev->last_ps == U32_MAX)
 		goto unfreeze;
 
 	ret = nvme_set_power_state(ctrl, ctrl->npss);
Index: linux-pm/drivers/pci/pcie/aspm.c
===================================================================
--- linux-pm.orig/drivers/pci/pcie/aspm.c
+++ linux-pm/drivers/pci/pcie/aspm.c
@@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
 module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
 	NULL, 0644);
 
+/*
+ * pcie_aspm_enabled - Return the mask of enabled ASPM link states.
+ * @pci_device: Target device.
+ */
+u32 pcie_aspm_enabled(struct pci_dev *pci_device)
+{
+	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
+	u32 ret;
+
+	if (!bridge)
+		return 0;
+
+	mutex_lock(&aspm_lock);
+	ret = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
+	mutex_unlock(&aspm_lock);
+
+	return ret;
+}
+
+
 #ifdef CONFIG_PCIEASPM_DEBUG
 static ssize_t link_state_show(struct device *dev,
 		struct device_attribute *attr,
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -1567,8 +1567,10 @@ extern bool pcie_ports_native;
 
 #ifdef CONFIG_PCIEASPM
 bool pcie_aspm_support_enabled(void);
+u32 pcie_aspm_enabled(struct pci_dev *pci_device);
 #else
 static inline bool pcie_aspm_support_enabled(void) { return false; }
+static inline u32 pcie_aspm_enabled(struct pci_dev *pci_device) { return 0; }
 #endif
 
 #ifdef CONFIG_PCIEAER




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08  8:36                     ` [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
@ 2019-08-08  8:48                       ` Christoph Hellwig
  2019-08-08  9:06                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2019-08-08  8:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-nvme, Keith Busch, Mario Limonciello, Kai-Heng Feng,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

> -     ndev->last_ps = 0;
>       ret = nvme_get_power_state(ctrl, &ndev->last_ps);
> -     if (ret < 0)
> +     if (ret < 0 || ndev->last_ps == U32_MAX)

Is the intent of the magic U32_MAX check to see if the
nvme_get_power_state failed at the nvme level?  In that case just
checking for any non-zero return value from nvme_get_power_state might
be the easier and more clear way to do it.

> Index: linux-pm/drivers/pci/pcie/aspm.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pcie/aspm.c
> +++ linux-pm/drivers/pci/pcie/aspm.c

Shouldn't we split PCI vs nvme in two patches?

> @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
>  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
>  	NULL, 0644);
>  
> +/*
> + * pcie_aspm_enabled - Return the mask of enabled ASPM link states.
> + * @pci_device: Target device.
> + */
> +u32 pcie_aspm_enabled(struct pci_dev *pci_device)

pcie_aspm_enabled sounds like it returns a boolean.  Shouldn't there be
a mask or so in the name better documenting what it returns?

> +{
> +	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> +	u32 ret;
> +
> +	if (!bridge)
> +		return 0;
> +
> +	mutex_lock(&aspm_lock);
> +	ret = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
> +	mutex_unlock(&aspm_lock);
> +
> +	return ret;
> +}

I think this will need a EXPORT_SYMBOL_GPL thrown in so that modular
nvme continues working.

> +
> +
>  #ifdef CONFIG_PCIEASPM_DEBUG

Nit: double blank line here.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08  8:48                       ` Christoph Hellwig
@ 2019-08-08  9:06                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08  9:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

On Thu, Aug 8, 2019 at 10:48 AM Christoph Hellwig <hch@lst.de> wrote:
>
> > -     ndev->last_ps = 0;
> >       ret = nvme_get_power_state(ctrl, &ndev->last_ps);
> > -     if (ret < 0)
> > +     if (ret < 0 || ndev->last_ps == U32_MAX)
>
> Is the intent of the magic U32_MAX check to see if the
> nvme_get_power_state failed at the nvme level?  In that case just
> checking for any non-zero return value from nvme_get_power_state might
> be the easier and more clear way to do it.

Now that I think of that, it appears redundant.  I'll drop it.

>
> > Index: linux-pm/drivers/pci/pcie/aspm.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > +++ linux-pm/drivers/pci/pcie/aspm.c
>
> Shouldn't we split PCI vs nvme in two patches?

That can be done.

> > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> >       NULL, 0644);
> >
> > +/*
> > + * pcie_aspm_enabled - Return the mask of enabled ASPM link states.
> > + * @pci_device: Target device.
> > + */
> > +u32 pcie_aspm_enabled(struct pci_dev *pci_device)
>
> pcie_aspm_enabled sounds like it returns a boolean.  Shouldn't there be
> a mask or so in the name better documenting what it returns?

OK

> > +{
> > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > +     u32 ret;
> > +
> > +     if (!bridge)
> > +             return 0;
> > +
> > +     mutex_lock(&aspm_lock);
> > +     ret = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
> > +     mutex_unlock(&aspm_lock);
> > +
> > +     return ret;
> > +}
>
> I think this will need a EXPORT_SYMBOL_GPL thrown in so that modular
> nvme continues working.

Right, sorry.

> > +
> > +
> >  #ifdef CONFIG_PCIEASPM_DEBUG
>
> Nit: double blank line here.

Overlooked, will fix.

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 0/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
       [not found]                   ` <20190731221956.GB15795@localhost.localdomain>
  2019-08-07  9:53                     ` [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used Rafael J. Wysocki
  2019-08-08  8:36                     ` [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
@ 2019-08-08 10:03                     ` Rafael J. Wysocki
  2019-08-08 10:06                       ` [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask() Rafael J. Wysocki
  2019-08-08 10:10                       ` [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
  2019-08-08 21:51                     ` [PATCH v3 0/2] " Rafael J. Wysocki
  3 siblings, 2 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 10:03 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

Hi All,

This series is equivalent to the following patch:

https://patchwork.kernel.org/patch/11083551/

posted earlier today.

It addresses review comments from Christoph by splitting the PCI/PCIe ASPM part
off to a separate patch (patch [1/2]) and fixing a few defects.

Thanks!




^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask()
  2019-08-08 10:03                     ` [PATCH v2 0/2] " Rafael J. Wysocki
@ 2019-08-08 10:06                       ` Rafael J. Wysocki
  2019-08-08 13:15                         ` Bjorn Helgaas
  2019-08-08 10:10                       ` [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
  1 sibling, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 10:06 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add a function returning the mask of currently enabled ASPM link
states for a given device.

It will be used by the NVMe driver to decide how to handle the
device during system suspend.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2:
  * Move the PCI/PCIe ASPM changes to a separate patch.
  * Add the _mask suffix to the new function name.
  * Add EXPORT_SYMBOL_GPL() to the new function.
  * Avoid adding an unnecessary blank line.

---
 drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
 include/linux/pci.h     |    3 +++
 2 files changed, 23 insertions(+)

Index: linux-pm/drivers/pci/pcie/aspm.c
===================================================================
--- linux-pm.orig/drivers/pci/pcie/aspm.c
+++ linux-pm/drivers/pci/pcie/aspm.c
@@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
 module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
 	NULL, 0644);
 
+/*
+ * pcie_aspm_enabled_mask - Return the mask of enabled ASPM link states.
+ * @pci_device: Target device.
+ */
+u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device)
+{
+	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
+	u32 ret;
+
+	if (!bridge)
+		return 0;
+
+	mutex_lock(&aspm_lock);
+	ret = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
+	mutex_unlock(&aspm_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pcie_aspm_enabled_mask);
+
 #ifdef CONFIG_PCIEASPM_DEBUG
 static ssize_t link_state_show(struct device *dev,
 		struct device_attribute *attr,
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
 
 #ifdef CONFIG_PCIEASPM
 bool pcie_aspm_support_enabled(void);
+u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device);
 #else
 static inline bool pcie_aspm_support_enabled(void) { return false; }
+static inline u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device)
+{ return 0; }
 #endif
 
 #ifdef CONFIG_PCIEAER




^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 10:03                     ` [PATCH v2 0/2] " Rafael J. Wysocki
  2019-08-08 10:06                       ` [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask() Rafael J. Wysocki
@ 2019-08-08 10:10                       ` Rafael J. Wysocki
  2019-08-08 13:43                         ` Bjorn Helgaas
  1 sibling, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 10:10 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
host managed power state for suspend") was adding a pci_save_state()
call to nvme_suspend() in order to prevent the PCI bus-level PM from
being applied to the suspended NVMe devices, but if ASPM is not
enabled for the target NVMe device, that causes its PCIe link to stay
up and the platform may not be able to get into its optimum low-power
state because of that.

For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
suspend-to-idle prevents the SoC from reaching package idle states
deeper than PC3, which is way insufficient for system suspend.

To address this shortcoming, make nvme_suspend() check if ASPM is
enabled for the target device and fall back to full device shutdown
and PCI bus-level PM if that is not the case.

Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2:
  * Move the PCI/PCIe ASPM changes to a separate patch.
  * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().

---
 drivers/nvme/host/pci.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/nvme/host/pci.c
===================================================================
--- linux-pm.orig/drivers/nvme/host/pci.c
+++ linux-pm/drivers/nvme/host/pci.c
@@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 
-	if (pm_resume_via_firmware() || !ctrl->npss ||
+	if (ndev->last_ps == U32_MAX ||
 	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
 		nvme_reset_ctrl(ctrl);
 	return 0;
@@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 	int ret = -EBUSY;
 
+	ndev->last_ps = U32_MAX;
+
 	/*
 	 * The platform does not remove power for a kernel managed suspend so
 	 * use host managed nvme power settings for lowest idle power if
@@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
 	 * shutdown.  But if the firmware is involved after the suspend or the
 	 * device does not support any non-default power states, shut down the
 	 * device fully.
+	 *
+	 * If ASPM is not enabled for the device, shut down the device and allow
+	 * the PCI bus layer to put it into D3 in order to take the PCIe link
+	 * down, so as to allow the platform to achieve its minimum low-power
+	 * state (which may not be possible if the link is up).
 	 */
-	if (pm_suspend_via_firmware() || !ctrl->npss) {
+	if (pm_suspend_via_firmware() || !ctrl->npss ||
+	    !pcie_aspm_enabled_mask(pdev)) {
 		nvme_dev_disable(ndev, true);
 		return 0;
 	}
@@ -2880,7 +2888,6 @@ static int nvme_suspend(struct device *d
 	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
 		goto unfreeze;
 
-	ndev->last_ps = 0;
 	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
 	if (ret < 0)
 		goto unfreeze;




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask()
  2019-08-08 10:06                       ` [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask() Rafael J. Wysocki
@ 2019-08-08 13:15                         ` Bjorn Helgaas
  2019-08-08 14:48                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2019-08-08 13:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-nvme, Keith Busch, Mario Limonciello, Kai-Heng Feng,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 08, 2019 at 12:06:52PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Add a function returning the mask of currently enabled ASPM link
> states for a given device.
> 
> It will be used by the NVMe driver to decide how to handle the
> device during system suspend.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2:
>   * Move the PCI/PCIe ASPM changes to a separate patch.
>   * Add the _mask suffix to the new function name.
>   * Add EXPORT_SYMBOL_GPL() to the new function.
>   * Avoid adding an unnecessary blank line.
> 
> ---
>  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
>  include/linux/pci.h     |    3 +++
>  2 files changed, 23 insertions(+)
> 
> Index: linux-pm/drivers/pci/pcie/aspm.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pcie/aspm.c
> +++ linux-pm/drivers/pci/pcie/aspm.c
> @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
>  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
>  	NULL, 0644);
>  
> +/*
> + * pcie_aspm_enabled_mask - Return the mask of enabled ASPM link states.
> + * @pci_device: Target device.
> + */
> +u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device)
> +{
> +	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> +	u32 ret;
> +
> +	if (!bridge)
> +		return 0;
> +
> +	mutex_lock(&aspm_lock);
> +	ret = bridge->link_state ? bridge->link_state->aspm_enabled : 0;

This returns the "aspm_enabled" mask, but the values of that mask are
combinations of:

  ASPM_STATE_L0S_UP
  ASPM_STATE_L0S_DW
  ASPM_STATE_L1
  ...

which are defined internally in drivers/pci/pcie/aspm.c and not
visible to the caller of pcie_aspm_enabled_mask().  If there's no need
for the actual mask (the current caller doesn't seem to use it), maybe
this could be a boolean?

> +	mutex_unlock(&aspm_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pcie_aspm_enabled_mask);
> +
>  #ifdef CONFIG_PCIEASPM_DEBUG
>  static ssize_t link_state_show(struct device *dev,
>  		struct device_attribute *attr,
> Index: linux-pm/include/linux/pci.h
> ===================================================================
> --- linux-pm.orig/include/linux/pci.h
> +++ linux-pm/include/linux/pci.h
> @@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
>  
>  #ifdef CONFIG_PCIEASPM
>  bool pcie_aspm_support_enabled(void);
> +u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device);
>  #else
>  static inline bool pcie_aspm_support_enabled(void) { return false; }
> +static inline u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device)
> +{ return 0; }
>  #endif
>  
>  #ifdef CONFIG_PCIEAER
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 10:10                       ` [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
@ 2019-08-08 13:43                         ` Bjorn Helgaas
  2019-08-08 14:47                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2019-08-08 13:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-nvme, Keith Busch, Mario Limonciello, Kai-Heng Feng,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> host managed power state for suspend") was adding a pci_save_state()
> call to nvme_suspend() in order to prevent the PCI bus-level PM from
> being applied to the suspended NVMe devices, but if ASPM is not
> enabled for the target NVMe device, that causes its PCIe link to stay
> up and the platform may not be able to get into its optimum low-power
> state because of that.
> 
> For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> suspend-to-idle prevents the SoC from reaching package idle states
> deeper than PC3, which is way insufficient for system suspend.

Just curious: I assume the SoC you reference is some part of the NVMe
drive?

> To address this shortcoming, make nvme_suspend() check if ASPM is
> enabled for the target device and fall back to full device shutdown
> and PCI bus-level PM if that is not the case.
> 
> Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2:
>   * Move the PCI/PCIe ASPM changes to a separate patch.
>   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> 
> ---
>  drivers/nvme/host/pci.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/drivers/nvme/host/pci.c
> ===================================================================
> --- linux-pm.orig/drivers/nvme/host/pci.c
> +++ linux-pm/drivers/nvme/host/pci.c
> @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
>  	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
>  	struct nvme_ctrl *ctrl = &ndev->ctrl;
>  
> -	if (pm_resume_via_firmware() || !ctrl->npss ||
> +	if (ndev->last_ps == U32_MAX ||
>  	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
>  		nvme_reset_ctrl(ctrl);
>  	return 0;
> @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
>  	struct nvme_ctrl *ctrl = &ndev->ctrl;
>  	int ret = -EBUSY;
>  
> +	ndev->last_ps = U32_MAX;
> +
>  	/*
>  	 * The platform does not remove power for a kernel managed suspend so
>  	 * use host managed nvme power settings for lowest idle power if
> @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
>  	 * shutdown.  But if the firmware is involved after the suspend or the
>  	 * device does not support any non-default power states, shut down the
>  	 * device fully.
> +	 *
> +	 * If ASPM is not enabled for the device, shut down the device and allow
> +	 * the PCI bus layer to put it into D3 in order to take the PCIe link
> +	 * down, so as to allow the platform to achieve its minimum low-power
> +	 * state (which may not be possible if the link is up).
>  	 */
> -	if (pm_suspend_via_firmware() || !ctrl->npss) {
> +	if (pm_suspend_via_firmware() || !ctrl->npss ||
> +	    !pcie_aspm_enabled_mask(pdev)) {

This seems like a layering violation, in the sense that ASPM is
supposed to be hardware-autonomous and invisible to software.

IIUC the NVMe device will go to the desired package idle state if the
link is in L0s or L1, but not if the link is in L0.  I don't
understand that connection; AFAIK that would be something outside the
scope of the PCIe spec.

The spec (PCIe r5.0, sec 5.4.1.1.1 for L0s, 5.4.1.2.1 for L1) is
careful to say that when the conditions are right, devices "should"
enter L0s but it is never mandatory, or "may" enter L1.

And this patch assumes that if ASPM is enabled, the link will
eventually go to L0s or L1.  Because the PCIe spec doesn't mandate
that transition, I think this patch makes the driver dependent on
device-specific behavior.

>  		nvme_dev_disable(ndev, true);
>  		return 0;
>  	}
> @@ -2880,7 +2888,6 @@ static int nvme_suspend(struct device *d
>  	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
>  		goto unfreeze;
>  
> -	ndev->last_ps = 0;
>  	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
>  	if (ret < 0)
>  		goto unfreeze;
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 13:43                         ` Bjorn Helgaas
@ 2019-08-08 14:47                           ` Rafael J. Wysocki
  2019-08-08 17:06                             ` Rafael J. Wysocki
  2019-08-08 18:39                             ` Bjorn Helgaas
  0 siblings, 2 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 14:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> > host managed power state for suspend") was adding a pci_save_state()
> > call to nvme_suspend() in order to prevent the PCI bus-level PM from
> > being applied to the suspended NVMe devices, but if ASPM is not
> > enabled for the target NVMe device, that causes its PCIe link to stay
> > up and the platform may not be able to get into its optimum low-power
> > state because of that.
> >
> > For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> > hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> > suspend-to-idle prevents the SoC from reaching package idle states
> > deeper than PC3, which is way insufficient for system suspend.
>
> Just curious: I assume the SoC you reference is some part of the NVMe
> drive?

No, the SoC is what contains the Intel processor and PCH (formerly "chipset").

> > To address this shortcoming, make nvme_suspend() check if ASPM is
> > enabled for the target device and fall back to full device shutdown
> > and PCI bus-level PM if that is not the case.
> >
> > Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> > Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > -> v2:
> >   * Move the PCI/PCIe ASPM changes to a separate patch.
> >   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> >
> > ---
> >  drivers/nvme/host/pci.c |   13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/nvme/host/pci.c
> > ===================================================================
> > --- linux-pm.orig/drivers/nvme/host/pci.c
> > +++ linux-pm/drivers/nvme/host/pci.c
> > @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
> >       struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
> >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> >
> > -     if (pm_resume_via_firmware() || !ctrl->npss ||
> > +     if (ndev->last_ps == U32_MAX ||
> >           nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> >               nvme_reset_ctrl(ctrl);
> >       return 0;
> > @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
> >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> >       int ret = -EBUSY;
> >
> > +     ndev->last_ps = U32_MAX;
> > +
> >       /*
> >        * The platform does not remove power for a kernel managed suspend so
> >        * use host managed nvme power settings for lowest idle power if
> > @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
> >        * shutdown.  But if the firmware is involved after the suspend or the
> >        * device does not support any non-default power states, shut down the
> >        * device fully.
> > +      *
> > +      * If ASPM is not enabled for the device, shut down the device and allow
> > +      * the PCI bus layer to put it into D3 in order to take the PCIe link
> > +      * down, so as to allow the platform to achieve its minimum low-power
> > +      * state (which may not be possible if the link is up).
> >        */
> > -     if (pm_suspend_via_firmware() || !ctrl->npss) {
> > +     if (pm_suspend_via_firmware() || !ctrl->npss ||
> > +         !pcie_aspm_enabled_mask(pdev)) {
>
> This seems like a layering violation, in the sense that ASPM is
> supposed to be hardware-autonomous and invisible to software.

But software has to enable it.

If it is not enabled, it will not be used, and that's what the check is about.

> IIUC the NVMe device will go to the desired package idle state if the
> link is in L0s or L1, but not if the link is in L0.  I don't
> understand that connection; AFAIK that would be something outside the
> scope of the PCIe spec.

Yes, it is outside of the PCIe spec.

No, this is not about the NVMe device, it is about the Intel SoC
(System-on-a-Chip) the platform is based on.

The background really is commit d916b1be94b6 and its changelog is kind
of misleading, unfortunately.  What it did, among other things, was to
cause the NVMe driver to prevent the PCI bus type from applying the
standard PCI PM to the devices handled by it in the suspend-to-idle
flow.  The reason for doing that was a (reportedly) widespread failure
to take the PCIe link down during D0 -> D3hot transitions of NVMe
devices, which then prevented the platform from going into a deep
enough low-power state while suspended (because it was not sure
whether or not the NVMe device was really "sufficiently" inactive).
[I guess I should mention that in the changelog of the $subject
patch.]  So the idea was to put the (NVMe) device into a low-power
state internally and then let ASPM take care of the PCIe link.

Of course, that can only work if ASPM is enabled at all for the device
in question, even though it may not be sufficient as you say below.

> The spec (PCIe r5.0, sec 5.4.1.1.1 for L0s, 5.4.1.2.1 for L1) is
> careful to say that when the conditions are right, devices "should"
> enter L0s but it is never mandatory, or "may" enter L1.
>
> And this patch assumes that if ASPM is enabled, the link will
> eventually go to L0s or L1.

No, it doesn't.

It avoids failure in the case in which it is guaranteed to happen
(disabled ASPM) and that's it.

> Because the PCIe spec doesn't mandate that transition, I think this patch makes the
> driver dependent on device-specific behavior.

IMO not really.  It just adds a "don't do it if you are going to fail"
kind of check.

>
> >               nvme_dev_disable(ndev, true);
> >               return 0;
> >       }
> > @@ -2880,7 +2888,6 @@ static int nvme_suspend(struct device *d
> >           ctrl->state != NVME_CTRL_ADMIN_ONLY)
> >               goto unfreeze;
> >
> > -     ndev->last_ps = 0;
> >       ret = nvme_get_power_state(ctrl, &ndev->last_ps);
> >       if (ret < 0)
> >               goto unfreeze;
> >
> >
> >

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask()
  2019-08-08 13:15                         ` Bjorn Helgaas
@ 2019-08-08 14:48                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 14:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 8, 2019 at 3:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Aug 08, 2019 at 12:06:52PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Add a function returning the mask of currently enabled ASPM link
> > states for a given device.
> >
> > It will be used by the NVMe driver to decide how to handle the
> > device during system suspend.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > -> v2:
> >   * Move the PCI/PCIe ASPM changes to a separate patch.
> >   * Add the _mask suffix to the new function name.
> >   * Add EXPORT_SYMBOL_GPL() to the new function.
> >   * Avoid adding an unnecessary blank line.
> >
> > ---
> >  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
> >  include/linux/pci.h     |    3 +++
> >  2 files changed, 23 insertions(+)
> >
> > Index: linux-pm/drivers/pci/pcie/aspm.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > +++ linux-pm/drivers/pci/pcie/aspm.c
> > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> >       NULL, 0644);
> >
> > +/*
> > + * pcie_aspm_enabled_mask - Return the mask of enabled ASPM link states.
> > + * @pci_device: Target device.
> > + */
> > +u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device)
> > +{
> > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > +     u32 ret;
> > +
> > +     if (!bridge)
> > +             return 0;
> > +
> > +     mutex_lock(&aspm_lock);
> > +     ret = bridge->link_state ? bridge->link_state->aspm_enabled : 0;
>
> This returns the "aspm_enabled" mask, but the values of that mask are
> combinations of:
>
>   ASPM_STATE_L0S_UP
>   ASPM_STATE_L0S_DW
>   ASPM_STATE_L1
>   ...
>
> which are defined internally in drivers/pci/pcie/aspm.c and not
> visible to the caller of pcie_aspm_enabled_mask().  If there's no need
> for the actual mask (the current caller doesn't seem to use it), maybe
> this could be a boolean?

Yes, it can be a boolean.

>
> > +     mutex_unlock(&aspm_lock);
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(pcie_aspm_enabled_mask);
> > +
> >  #ifdef CONFIG_PCIEASPM_DEBUG
> >  static ssize_t link_state_show(struct device *dev,
> >               struct device_attribute *attr,
> > Index: linux-pm/include/linux/pci.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/pci.h
> > +++ linux-pm/include/linux/pci.h
> > @@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
> >
> >  #ifdef CONFIG_PCIEASPM
> >  bool pcie_aspm_support_enabled(void);
> > +u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device);
> >  #else
> >  static inline bool pcie_aspm_support_enabled(void) { return false; }
> > +static inline u32 pcie_aspm_enabled_mask(struct pci_dev *pci_device)
> > +{ return 0; }
> >  #endif
> >
> >  #ifdef CONFIG_PCIEAER
> >
> >
> >

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 14:47                           ` Rafael J. Wysocki
@ 2019-08-08 17:06                             ` Rafael J. Wysocki
  2019-08-08 18:39                             ` Bjorn Helgaas
  1 sibling, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 17:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-nvme, Keith Busch, Mario Limonciello, Kai-Heng Feng,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 8, 2019 at 4:47 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> > > host managed power state for suspend") was adding a pci_save_state()
> > > call to nvme_suspend() in order to prevent the PCI bus-level PM from
> > > being applied to the suspended NVMe devices, but if ASPM is not
> > > enabled for the target NVMe device, that causes its PCIe link to stay
> > > up and the platform may not be able to get into its optimum low-power
> > > state because of that.
> > >
> > > For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> > > hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> > > suspend-to-idle prevents the SoC from reaching package idle states
> > > deeper than PC3, which is way insufficient for system suspend.
> >
> > Just curious: I assume the SoC you reference is some part of the NVMe
> > drive?
>
> No, the SoC is what contains the Intel processor and PCH (formerly "chipset").
>
> > > To address this shortcoming, make nvme_suspend() check if ASPM is
> > > enabled for the target device and fall back to full device shutdown
> > > and PCI bus-level PM if that is not the case.
> > >
> > > Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> > > Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > -> v2:
> > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > >   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> > >
> > > ---
> > >  drivers/nvme/host/pci.c |   13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/drivers/nvme/host/pci.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/nvme/host/pci.c
> > > +++ linux-pm/drivers/nvme/host/pci.c
> > > @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
> > >       struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
> > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > >
> > > -     if (pm_resume_via_firmware() || !ctrl->npss ||
> > > +     if (ndev->last_ps == U32_MAX ||
> > >           nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> > >               nvme_reset_ctrl(ctrl);
> > >       return 0;
> > > @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
> > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > >       int ret = -EBUSY;
> > >
> > > +     ndev->last_ps = U32_MAX;
> > > +
> > >       /*
> > >        * The platform does not remove power for a kernel managed suspend so
> > >        * use host managed nvme power settings for lowest idle power if
> > > @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
> > >        * shutdown.  But if the firmware is involved after the suspend or the
> > >        * device does not support any non-default power states, shut down the
> > >        * device fully.
> > > +      *
> > > +      * If ASPM is not enabled for the device, shut down the device and allow
> > > +      * the PCI bus layer to put it into D3 in order to take the PCIe link
> > > +      * down, so as to allow the platform to achieve its minimum low-power
> > > +      * state (which may not be possible if the link is up).
> > >        */
> > > -     if (pm_suspend_via_firmware() || !ctrl->npss) {
> > > +     if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > +         !pcie_aspm_enabled_mask(pdev)) {
> >
> > This seems like a layering violation, in the sense that ASPM is
> > supposed to be hardware-autonomous and invisible to software.
>
> But software has to enable it.
>
> If it is not enabled, it will not be used, and that's what the check is about.
>
> > IIUC the NVMe device will go to the desired package idle state if the
> > link is in L0s or L1, but not if the link is in L0.  I don't
> > understand that connection; AFAIK that would be something outside the
> > scope of the PCIe spec.
>
> Yes, it is outside of the PCIe spec.
>
> No, this is not about the NVMe device, it is about the Intel SoC
> (System-on-a-Chip) the platform is based on.
>
> The background really is commit d916b1be94b6 and its changelog is kind
> of misleading, unfortunately.  What it did, among other things, was to
> cause the NVMe driver to prevent the PCI bus type from applying the
> standard PCI PM to the devices handled by it in the suspend-to-idle
> flow.  The reason for doing that was a (reportedly) widespread failure
> to take the PCIe link down during D0 -> D3hot transitions of NVMe
> devices, which then prevented the platform from going into a deep
> enough low-power state while suspended (because it was not sure
> whether or not the NVMe device was really "sufficiently" inactive).
> [I guess I should mention that in the changelog of the $subject
> patch.]  So the idea was to put the (NVMe) device into a low-power
> state internally and then let ASPM take care of the PCIe link.
>
> Of course, that can only work if ASPM is enabled at all for the device
> in question, even though it may not be sufficient as you say below.
>
> > The spec (PCIe r5.0, sec 5.4.1.1.1 for L0s, 5.4.1.2.1 for L1) is
> > careful to say that when the conditions are right, devices "should"
> > enter L0s but it is never mandatory, or "may" enter L1.
> >
> > And this patch assumes that if ASPM is enabled, the link will
> > eventually go to L0s or L1.
>
> No, it doesn't.
>
> It avoids failure in the case in which it is guaranteed to happen
> (disabled ASPM) and that's it.

IOW, after commit d916b1be94b6 and without this patch, nvme_suspend()
*always* assumes ASPM to take the device's PCIe link down, which
obviously is not going to happen if ASPM is disabled for that device.

The rationale for this patch is to avoid the obvious failure.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 14:47                           ` Rafael J. Wysocki
  2019-08-08 17:06                             ` Rafael J. Wysocki
@ 2019-08-08 18:39                             ` Bjorn Helgaas
  2019-08-08 20:01                               ` Keith Busch
                                                 ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2019-08-08 18:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 08, 2019 at 04:47:45PM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> > > host managed power state for suspend") was adding a pci_save_state()
> > > call to nvme_suspend() in order to prevent the PCI bus-level PM from
> > > being applied to the suspended NVMe devices, but if ASPM is not
> > > enabled for the target NVMe device, that causes its PCIe link to stay
> > > up and the platform may not be able to get into its optimum low-power
> > > state because of that.
> > >
> > > For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> > > hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> > > suspend-to-idle prevents the SoC from reaching package idle states
> > > deeper than PC3, which is way insufficient for system suspend.
> >
> > Just curious: I assume the SoC you reference is some part of the NVMe
> > drive?
> 
> No, the SoC is what contains the Intel processor and PCH (formerly "chipset").
> 
> > > To address this shortcoming, make nvme_suspend() check if ASPM is
> > > enabled for the target device and fall back to full device shutdown
> > > and PCI bus-level PM if that is not the case.
> > >
> > > Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> > > Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > -> v2:
> > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > >   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> > >
> > > ---
> > >  drivers/nvme/host/pci.c |   13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/drivers/nvme/host/pci.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/nvme/host/pci.c
> > > +++ linux-pm/drivers/nvme/host/pci.c
> > > @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
> > >       struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
> > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > >
> > > -     if (pm_resume_via_firmware() || !ctrl->npss ||
> > > +     if (ndev->last_ps == U32_MAX ||
> > >           nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> > >               nvme_reset_ctrl(ctrl);
> > >       return 0;
> > > @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
> > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > >       int ret = -EBUSY;
> > >
> > > +     ndev->last_ps = U32_MAX;
> > > +
> > >       /*
> > >        * The platform does not remove power for a kernel managed suspend so
> > >        * use host managed nvme power settings for lowest idle power if
> > > @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
> > >        * shutdown.  But if the firmware is involved after the suspend or the
> > >        * device does not support any non-default power states, shut down the
> > >        * device fully.
> > > +      *
> > > +      * If ASPM is not enabled for the device, shut down the device and allow
> > > +      * the PCI bus layer to put it into D3 in order to take the PCIe link
> > > +      * down, so as to allow the platform to achieve its minimum low-power
> > > +      * state (which may not be possible if the link is up).
> > >        */
> > > -     if (pm_suspend_via_firmware() || !ctrl->npss) {
> > > +     if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > +         !pcie_aspm_enabled_mask(pdev)) {
> >
> > This seems like a layering violation, in the sense that ASPM is
> > supposed to be hardware-autonomous and invisible to software.
> 
> But software has to enable it.
> 
> If it is not enabled, it will not be used, and that's what the check
> is about.
> 
> > IIUC the NVMe device will go to the desired package idle state if
> > the link is in L0s or L1, but not if the link is in L0.  I don't
> > understand that connection; AFAIK that would be something outside
> > the scope of the PCIe spec.
> 
> Yes, it is outside of the PCIe spec.
> 
> No, this is not about the NVMe device, it is about the Intel SoC
> (System-on-a-Chip) the platform is based on.

Ah.  So this problem could occur with any device, not just NVMe?  If
so, how do you address that?  Obviously you don't want to patch all
drivers this way.

> The background really is commit d916b1be94b6 and its changelog is
> kind of misleading, unfortunately.  What it did, among other things,
> was to cause the NVMe driver to prevent the PCI bus type from
> applying the standard PCI PM to the devices handled by it in the
> suspend-to-idle flow.  

This is more meaningful to you than to most people because "applying
the standard PCI PM" doesn't tell us what that means in terms of the
device.  Presumably it has something to do with a D-state transition?
I *assume* a suspend might involve the D0 -> D3hot transition you
mention below?

> The reason for doing that was a (reportedly) widespread failure to
> take the PCIe link down during D0 -> D3hot transitions of NVMe
> devices,

I don't know any of the details, but "failure to take the link down
during D0 -> D3hot transitions" is phrased as though it might be a
hardware erratum.  If this *is* related to an NVMe erratum, that would
explain why you only need to patch the nvme driver, and it would be
useful to mention that in the commit log, since otherwise it sounds
like something that might be needed in other drivers, too.

According to PCIe r5.0 sec 5.3.2, the only legal link states for D3hot
are L1, L2/L3 Ready.  So if you put a device in D3hot and its link
stays in L0, that sounds like a defect.  Is that what happens?

Obviously I'm still confused.  I think it would help if you could
describe the problem in terms of the specific PCIe states involved
(D0, D3hot, L0, L1, L2, L3, etc) because then the spec would help
explain what's happening.

> which then prevented the platform from going into a deep enough
> low-power state while suspended (because it was not sure whether or
> not the NVMe device was really "sufficiently" inactive).  [I guess I
> should mention that in the changelog of the $subject patch.]  So the
> idea was to put the (NVMe) device into a low-power state internally
> and then let ASPM take care of the PCIe link.
> 
> Of course, that can only work if ASPM is enabled at all for the
> device in question, even though it may not be sufficient as you say
> below.
> 
> > The spec (PCIe r5.0, sec 5.4.1.1.1 for L0s, 5.4.1.2.1 for L1) is
> > careful to say that when the conditions are right, devices
> > "should" enter L0s but it is never mandatory, or "may" enter L1.
> >
> > And this patch assumes that if ASPM is enabled, the link will
> > eventually go to L0s or L1.
> 
> No, it doesn't.
> 
> It avoids failure in the case in which it is guaranteed to happen
> (disabled ASPM) and that's it.
> 
> > Because the PCIe spec doesn't mandate that transition, I think
> > this patch makes the driver dependent on device-specific behavior.
> 
> IMO not really.  It just adds a "don't do it if you are going to
> fail" kind of check.
> 
> >
> > >               nvme_dev_disable(ndev, true);
> > >               return 0;
> > >       }
> > > @@ -2880,7 +2888,6 @@ static int nvme_suspend(struct device *d
> > >           ctrl->state != NVME_CTRL_ADMIN_ONLY)
> > >               goto unfreeze;
> > >
> > > -     ndev->last_ps = 0;
> > >       ret = nvme_get_power_state(ctrl, &ndev->last_ps);
> > >       if (ret < 0)
> > >               goto unfreeze;
> > >
> > >
> > >

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 18:39                             ` Bjorn Helgaas
@ 2019-08-08 20:01                               ` Keith Busch
  2019-08-08 20:05                               ` Mario.Limonciello
  2019-08-08 20:41                               ` Rafael J. Wysocki
  2 siblings, 0 replies; 34+ messages in thread
From: Keith Busch @ 2019-08-08 20:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-nvme,
	Mario Limonciello, Kai-Heng Feng, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Linux PM, Linux Kernel Mailing List, Rajat Jain,
	Linux PCI

On Thu, Aug 08, 2019 at 01:39:54PM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 08, 2019 at 04:47:45PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > 
> > > IIUC the NVMe device will go to the desired package idle state if
> > > the link is in L0s or L1, but not if the link is in L0.  I don't
> > > understand that connection; AFAIK that would be something outside
> > > the scope of the PCIe spec.
> > 
> > Yes, it is outside of the PCIe spec.
> > 
> > No, this is not about the NVMe device, it is about the Intel SoC
> > (System-on-a-Chip) the platform is based on.
> 
> Ah.  So this problem could occur with any device, not just NVMe?  If
> so, how do you address that?  Obviously you don't want to patch all
> drivers this way.

We discovered this when using an NVMe protocol specific power setting, so
that part is driver specific. We just have to ensure device generic
dependencies are met in order to achieve the our power target. So in
that sense, I think you would need to patch all drivers if they're also
using protocol specific settings incorrectly.

Granted, the NVMe specification doesn't detail what PCIe settings may
prevent NVMe power management from hitting the objective, but I think
ASPM enabled makes sense.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 18:39                             ` Bjorn Helgaas
  2019-08-08 20:01                               ` Keith Busch
@ 2019-08-08 20:05                               ` Mario.Limonciello
  2019-08-08 20:41                               ` Rafael J. Wysocki
  2 siblings, 0 replies; 34+ messages in thread
From: Mario.Limonciello @ 2019-08-08 20:05 UTC (permalink / raw)
  To: helgaas, rafael
  Cc: rjw, linux-nvme, kbusch, kai.heng.feng, keith.busch, hch, sagi,
	linux-pm, linux-kernel, rajatja, linux-pci

> This is more meaningful to you than to most people because "applying
> the standard PCI PM" doesn't tell us what that means in terms of the
> device.  Presumably it has something to do with a D-state transition?
> I *assume* a suspend might involve the D0 -> D3hot transition you
> mention below?
> 
> > The reason for doing that was a (reportedly) widespread failure to
> > take the PCIe link down during D0 -> D3hot transitions of NVMe
> > devices,
> 
> I don't know any of the details, but "failure to take the link down
> during D0 -> D3hot transitions" is phrased as though it might be a
> hardware erratum.  If this *is* related to an NVMe erratum, that would
> explain why you only need to patch the nvme driver, and it would be
> useful to mention that in the commit log, since otherwise it sounds
> like something that might be needed in other drivers, too.

NVME is special in this case that there is other logic being put in place
to set the drive's power state explicitly.

I would mention that also this alternate flow is quicker for s0ix
resume since NVME doesn't go through shutdown routine.

Unanimously the feedback from vendors was to avoid NVME shutdown
and to instead use SetFeatures to go into deepest power state instead
over S0ix.

> 
> According to PCIe r5.0 sec 5.3.2, the only legal link states for D3hot
> are L1, L2/L3 Ready.  So if you put a device in D3hot and its link
> stays in L0, that sounds like a defect.  Is that what happens?
> 
> Obviously I'm still confused.  I think it would help if you could
> describe the problem in terms of the specific PCIe states involved
> (D0, D3hot, L0, L1, L2, L3, etc) because then the spec would help
> explain what's happening.

Before that commit, the flow for NVME s0ix was:

* Delete IO SQ/CQ
* Shutdown NVME controller
* Save PCI registers
* Go into D3hot
* Read PMCSR

A functioning drive had the link at L1.2 and NVME power state at PS4
at this point.
Resuming looked like this:

* Restore PCI registers
* Enable NVME controller
* Configure NVME controller (IO queues, features, etc).

After that commit the flow for NVME s0ix is:

* Use NVME SetFeatures to put drive into low power mode (PS3 or PS4)
* Save PCI config register
* ASPM is used to bring link into L1.2

The resume flow is:

* Restore PCI registers

"Non-functioning" drives consumed too much power from the old flow.

The root cause varied from manufacturer to manufacturer.
The two I know off hand:

One instance is that when PM status register is read after the device in L1.2
from D3 it causes link to go to L0 and then stay there.

Another instance I heard drive isn't able to service D3hot request when NVME
was already shut down.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 18:39                             ` Bjorn Helgaas
  2019-08-08 20:01                               ` Keith Busch
  2019-08-08 20:05                               ` Mario.Limonciello
@ 2019-08-08 20:41                               ` Rafael J. Wysocki
  2019-08-09  4:47                                 ` Bjorn Helgaas
  2 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-nvme, Keith Busch,
	Mario Limonciello, Kai-Heng Feng, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Linux PM, Linux Kernel Mailing List, Rajat Jain,
	Linux PCI

On Thu, Aug 8, 2019, 20:39 Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Aug 08, 2019 at 04:47:45PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> > > > host managed power state for suspend") was adding a pci_save_state()
> > > > call to nvme_suspend() in order to prevent the PCI bus-level PM from
> > > > being applied to the suspended NVMe devices, but if ASPM is not
> > > > enabled for the target NVMe device, that causes its PCIe link to stay
> > > > up and the platform may not be able to get into its optimum low-power
> > > > state because of that.
> > > >
> > > > For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> > > > hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> > > > suspend-to-idle prevents the SoC from reaching package idle states
> > > > deeper than PC3, which is way insufficient for system suspend.
> > >
> > > Just curious: I assume the SoC you reference is some part of the NVMe
> > > drive?
> >
> > No, the SoC is what contains the Intel processor and PCH (formerly "chipset").
> >
> > > > To address this shortcoming, make nvme_suspend() check if ASPM is
> > > > enabled for the target device and fall back to full device shutdown
> > > > and PCI bus-level PM if that is not the case.
> > > >
> > > > Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> > > > Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > -> v2:
> > > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > > >   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> > > >
> > > > ---
> > > >  drivers/nvme/host/pci.c |   13 ++++++++++---
> > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/nvme/host/pci.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/nvme/host/pci.c
> > > > +++ linux-pm/drivers/nvme/host/pci.c
> > > > @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
> > > >       struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
> > > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > > >
> > > > -     if (pm_resume_via_firmware() || !ctrl->npss ||
> > > > +     if (ndev->last_ps == U32_MAX ||
> > > >           nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> > > >               nvme_reset_ctrl(ctrl);
> > > >       return 0;
> > > > @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
> > > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > > >       int ret = -EBUSY;
> > > >
> > > > +     ndev->last_ps = U32_MAX;
> > > > +
> > > >       /*
> > > >        * The platform does not remove power for a kernel managed suspend so
> > > >        * use host managed nvme power settings for lowest idle power if
> > > > @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
> > > >        * shutdown.  But if the firmware is involved after the suspend or the
> > > >        * device does not support any non-default power states, shut down the
> > > >        * device fully.
> > > > +      *
> > > > +      * If ASPM is not enabled for the device, shut down the device and allow
> > > > +      * the PCI bus layer to put it into D3 in order to take the PCIe link
> > > > +      * down, so as to allow the platform to achieve its minimum low-power
> > > > +      * state (which may not be possible if the link is up).
> > > >        */
> > > > -     if (pm_suspend_via_firmware() || !ctrl->npss) {
> > > > +     if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > > +         !pcie_aspm_enabled_mask(pdev)) {
> > >
> > > This seems like a layering violation, in the sense that ASPM is
> > > supposed to be hardware-autonomous and invisible to software.
> >
> > But software has to enable it.
> >
> > If it is not enabled, it will not be used, and that's what the check
> > is about.
> >
> > > IIUC the NVMe device will go to the desired package idle state if
> > > the link is in L0s or L1, but not if the link is in L0.  I don't
> > > understand that connection; AFAIK that would be something outside
> > > the scope of the PCIe spec.
> >
> > Yes, it is outside of the PCIe spec.
> >
> > No, this is not about the NVMe device, it is about the Intel SoC
> > (System-on-a-Chip) the platform is based on.
>
> Ah.  So this problem could occur with any device, not just NVMe?  If
> so, how do you address that?  Obviously you don't want to patch all
> drivers this way.

It could, if the device was left in D0 during suspend, but drivers
don't let devices stay in D0 during suspend as a rule, so this is all
academic, except for the NVMe driver that has just started to do it in
5.3-rc1.

It has started to do that becasuse of what can be regarded as a
hardware issue, but this does not even matter here.

>
> > The background really is commit d916b1be94b6 and its changelog is
> > kind of misleading, unfortunately.  What it did, among other things,
> > was to cause the NVMe driver to prevent the PCI bus type from
> > applying the standard PCI PM to the devices handled by it in the
> > suspend-to-idle flow.
>
> This is more meaningful to you than to most people because "applying
> the standard PCI PM" doesn't tell us what that means in terms of the
> device.  Presumably it has something to do with a D-state transition?
> I *assume* a suspend might involve the D0 -> D3hot transition you
> mention below?

By "standard PCI PM" I mean what pci_prepare_to_sleep() does. And yes,
in the vast majority of cases the device goes from D0 to D3hot then.

>
> > The reason for doing that was a (reportedly) widespread failure to
> > take the PCIe link down during D0 -> D3hot transitions of NVMe
> > devices,
>
> I don't know any of the details, but "failure to take the link down
> during D0 -> D3hot transitions" is phrased as though it might be a
> hardware erratum.  If this *is* related to an NVMe erratum, that would
> explain why you only need to patch the nvme driver, and it would be
> useful to mention that in the commit log, since otherwise it sounds
> like something that might be needed in other drivers, too.

Yes, that can be considered as an NVMe erratum and the NVMe driver has
been *already* patched because of that in 5.3-rc1. [That's the commit
mentioned in the changelog of the $subject patch.]

It effectively asks the PCI bus type to leave *all* devices handled by
it in D0 during suspend-to-idle.  Already today.

I hope that this clarifies the current situation. :-)

>
> According to PCIe r5.0 sec 5.3.2, the only legal link states for D3hot
> are L1, L2/L3 Ready.  So if you put a device in D3hot and its link
> stays in L0, that sounds like a defect.  Is that what happens?

For some devices that's what happens. For some other devices the state
of the link in D3hot appears to be L1 or L2/L3 Ready (as per the spec)
and that's when the $subject patch makes a difference.

The underlying principle is that the energy used by the system while
suspended depends on the states of all of the PCIe links and the
deeper the link state, the less energy the system will use.

Now, say an NVMe device works in accordance with the spec, so when it
goes from D0 to D3hot, its PCIe link goes into L1 or L2/L3 Ready.  As
of 5.3-rc1 or later it will be left in D0 during suspend-to-idle
(because that's how the NVMe driver works), so its link state will
depend on whether or not ASPM is enabled for it.  If ASPM is enabled
for it, the final state of its link will depend on how deep ASPM is
allowed to go, but if ASPM is not enabled for it, its link will remain
in L0.

This means, however, that by allowing that device to go into D3hot
when ASPM is not enabled for it, the energy used by the system while
suspended can be reduced, because the PCIe link of the device will
then go to L1 or L2/L3 Ready.  That's exactly what the $subject patch
does.

Is this still not convincing enough?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 0/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
       [not found]                   ` <20190731221956.GB15795@localhost.localdomain>
                                       ` (2 preceding siblings ...)
  2019-08-08 10:03                     ` [PATCH v2 0/2] " Rafael J. Wysocki
@ 2019-08-08 21:51                     ` Rafael J. Wysocki
  2019-08-08 21:55                       ` [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled() Rafael J. Wysocki
                                         ` (2 more replies)
  3 siblings, 3 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 21:51 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

Hi All,

> This series is equivalent to the following patch:
> 
> https://patchwork.kernel.org/patch/11083551/
> 
> posted earlier today.
> 
> It addresses review comments from Christoph by splitting the PCI/PCIe ASPM
> part off to a separate patch (patch [1/2]) and fixing a few defects.\0

Sending v3 to address review comments from Bjorn.

Thanks!




^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-08-08 21:51                     ` [PATCH v3 0/2] " Rafael J. Wysocki
@ 2019-08-08 21:55                       ` Rafael J. Wysocki
  2019-08-09  4:50                         ` Bjorn Helgaas
  2019-10-07 22:34                         ` Bjorn Helgaas
  2019-08-08 21:58                       ` [PATCH v3 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
  2019-08-08 22:13                       ` [PATCH v3 0/2] " Keith Busch
  2 siblings, 2 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 21:55 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add a function checking whether or not PCIe ASPM has been enabled for
a given device.

It will be used by the NVMe driver to decide how to handle the
device during system suspend.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3:
  * Make the new function return bool.
  * Change its name back to pcie_aspm_enabled().
  * Fix kerneldoc comment formatting.

-> v2:
  * Move the PCI/PCIe ASPM changes to a separate patch.
  * Add the _mask suffix to the new function name.
  * Add EXPORT_SYMBOL_GPL() to the new function.
  * Avoid adding an unnecessary blank line.

---
 drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
 include/linux/pci.h     |    3 +++
 2 files changed, 23 insertions(+)

Index: linux-pm/drivers/pci/pcie/aspm.c
===================================================================
--- linux-pm.orig/drivers/pci/pcie/aspm.c
+++ linux-pm/drivers/pci/pcie/aspm.c
@@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
 module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
 	NULL, 0644);
 
+/**
+ * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
+ * @pci_device: Target device.
+ */
+bool pcie_aspm_enabled(struct pci_dev *pci_device)
+{
+	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
+	bool ret;
+
+	if (!bridge)
+		return false;
+
+	mutex_lock(&aspm_lock);
+	ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
+	mutex_unlock(&aspm_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pcie_aspm_enabled);
+
 #ifdef CONFIG_PCIEASPM_DEBUG
 static ssize_t link_state_show(struct device *dev,
 		struct device_attribute *attr,
Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
 
 #ifdef CONFIG_PCIEASPM
 bool pcie_aspm_support_enabled(void);
+bool pcie_aspm_enabled(struct pci_dev *pci_device);
 #else
 static inline bool pcie_aspm_support_enabled(void) { return false; }
+static inline bool pcie_aspm_enabled(struct pci_dev *pci_device)
+{ return false; }
 #endif
 
 #ifdef CONFIG_PCIEAER




^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 21:51                     ` [PATCH v3 0/2] " Rafael J. Wysocki
  2019-08-08 21:55                       ` [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled() Rafael J. Wysocki
@ 2019-08-08 21:58                       ` Rafael J. Wysocki
  2019-08-08 22:13                       ` [PATCH v3 0/2] " Keith Busch
  2 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-08 21:58 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Mario Limonciello, Kai-Heng Feng, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI, Bjorn Helgaas

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
host managed power state for suspend") was adding a pci_save_state()
call to nvme_suspend() so as to instruct the PCI bus type to leave
devices handled by the nvme driver in D0 during suspend-to-idle.
That was done with the assumption that ASPM would transition the
device's PCIe link into a low-power state when the device became
inactive.  However, if ASPM is disabled for the device, its PCIe
link will stay in L0 and in that case commit d916b1be94b6 is likely
to cause the energy used by the system while suspended to increase.

Namely, if the device in question works in accordance with the PCIe
specification, putting it into D3hot causes its PCIe link to go to
L1 or L2/L3 Ready, which is lower-power than L0.  Since the energy
used by the system while suspended depends on the state of its PCIe
link (as a general rule, the lower-power the state of the link, the
less energy the system will use), putting the device into D3hot
during suspend-to-idle should be more energy-efficient that leaving
it in D0 with disabled ASPM.

For this reason, avoid leaving NVMe devices with disabled ASPM in D0
during suspend-to-idle.  Instead, shut them down entirely and let
the PCI bus type put them into D3.

Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3:
  * Modify the changelog to describe the rationale for this patch in
    a less confusing and more convincing way.

-> v2:
  * Move the PCI/PCIe ASPM changes to a separate patch.
  * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().

---
 drivers/nvme/host/pci.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/nvme/host/pci.c
===================================================================
--- linux-pm.orig/drivers/nvme/host/pci.c
+++ linux-pm/drivers/nvme/host/pci.c
@@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 
-	if (pm_resume_via_firmware() || !ctrl->npss ||
+	if (ndev->last_ps == U32_MAX ||
 	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
 		nvme_reset_ctrl(ctrl);
 	return 0;
@@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 	int ret = -EBUSY;
 
+	ndev->last_ps = U32_MAX;
+
 	/*
 	 * The platform does not remove power for a kernel managed suspend so
 	 * use host managed nvme power settings for lowest idle power if
@@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
 	 * shutdown.  But if the firmware is involved after the suspend or the
 	 * device does not support any non-default power states, shut down the
 	 * device fully.
+	 *
+	 * If ASPM is not enabled for the device, shut down the device and allow
+	 * the PCI bus layer to put it into D3 in order to take the PCIe link
+	 * down, so as to allow the platform to achieve its minimum low-power
+	 * state (which may not be possible if the link is up).
 	 */
-	if (pm_suspend_via_firmware() || !ctrl->npss) {
+	if (pm_suspend_via_firmware() || !ctrl->npss ||
+	    !pcie_aspm_enabled(pdev)) {
 		nvme_dev_disable(ndev, true);
 		return 0;
 	}
@@ -2880,7 +2888,6 @@ static int nvme_suspend(struct device *d
 	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
 		goto unfreeze;
 
-	ndev->last_ps = 0;
 	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
 	if (ret < 0)
 		goto unfreeze;




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 21:51                     ` [PATCH v3 0/2] " Rafael J. Wysocki
  2019-08-08 21:55                       ` [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled() Rafael J. Wysocki
  2019-08-08 21:58                       ` [PATCH v3 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
@ 2019-08-08 22:13                       ` Keith Busch
  2019-08-09  8:05                         ` Rafael J. Wysocki
  2 siblings, 1 reply; 34+ messages in thread
From: Keith Busch @ 2019-08-08 22:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-nvme, Sagi Grimberg, Mario Limonciello, Linux PCI,
	Linux PM, Linux Kernel Mailing List, Keith Busch, Kai-Heng Feng,
	Bjorn Helgaas, Rajat Jain, Christoph Hellwig

The v3 series looks good to me.

Reviewed-by: Keith Busch <keith.busch@intel.com>

Bjorn,

If you're okay with the series, we can either take it through nvme,
or you can feel free to apply through pci, whichever you prefer.

Thanks,
Keith

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 20:41                               ` Rafael J. Wysocki
@ 2019-08-09  4:47                                 ` Bjorn Helgaas
  2019-08-09  8:04                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2019-08-09  4:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Thu, Aug 08, 2019 at 10:41:56PM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 8, 2019, 20:39 Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Aug 08, 2019 at 04:47:45PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> > > > > host managed power state for suspend") was adding a pci_save_state()
> > > > > call to nvme_suspend() in order to prevent the PCI bus-level PM from
> > > > > being applied to the suspended NVMe devices, but if ASPM is not
> > > > > enabled for the target NVMe device, that causes its PCIe link to stay
> > > > > up and the platform may not be able to get into its optimum low-power
> > > > > state because of that.
> > > > >
> > > > > For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> > > > > hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> > > > > suspend-to-idle prevents the SoC from reaching package idle states
> > > > > deeper than PC3, which is way insufficient for system suspend.
> > > >
> > > > Just curious: I assume the SoC you reference is some part of the NVMe
> > > > drive?
> > >
> > > No, the SoC is what contains the Intel processor and PCH (formerly "chipset").
> > >
> > > > > To address this shortcoming, make nvme_suspend() check if ASPM is
> > > > > enabled for the target device and fall back to full device shutdown
> > > > > and PCI bus-level PM if that is not the case.
> > > > >
> > > > > Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> > > > > Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > ---
> > > > >
> > > > > -> v2:
> > > > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > > > >   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> > > > >
> > > > > ---
> > > > >  drivers/nvme/host/pci.c |   13 ++++++++++---
> > > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > > >
> > > > > Index: linux-pm/drivers/nvme/host/pci.c
> > > > > ===================================================================
> > > > > --- linux-pm.orig/drivers/nvme/host/pci.c
> > > > > +++ linux-pm/drivers/nvme/host/pci.c
> > > > > @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
> > > > >       struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
> > > > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > > > >
> > > > > -     if (pm_resume_via_firmware() || !ctrl->npss ||
> > > > > +     if (ndev->last_ps == U32_MAX ||
> > > > >           nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> > > > >               nvme_reset_ctrl(ctrl);
> > > > >       return 0;
> > > > > @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
> > > > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > > > >       int ret = -EBUSY;
> > > > >
> > > > > +     ndev->last_ps = U32_MAX;
> > > > > +
> > > > >       /*
> > > > >        * The platform does not remove power for a kernel managed suspend so
> > > > >        * use host managed nvme power settings for lowest idle power if
> > > > > @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
> > > > >        * shutdown.  But if the firmware is involved after the suspend or the
> > > > >        * device does not support any non-default power states, shut down the
> > > > >        * device fully.
> > > > > +      *
> > > > > +      * If ASPM is not enabled for the device, shut down the device and allow
> > > > > +      * the PCI bus layer to put it into D3 in order to take the PCIe link
> > > > > +      * down, so as to allow the platform to achieve its minimum low-power
> > > > > +      * state (which may not be possible if the link is up).
> > > > >        */
> > > > > -     if (pm_suspend_via_firmware() || !ctrl->npss) {
> > > > > +     if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > > > +         !pcie_aspm_enabled_mask(pdev)) {
> > > >
> > > > This seems like a layering violation, in the sense that ASPM is
> > > > supposed to be hardware-autonomous and invisible to software.
> > >
> > > But software has to enable it.
> > >
> > > If it is not enabled, it will not be used, and that's what the check
> > > is about.
> > >
> > > > IIUC the NVMe device will go to the desired package idle state if
> > > > the link is in L0s or L1, but not if the link is in L0.  I don't
> > > > understand that connection; AFAIK that would be something outside
> > > > the scope of the PCIe spec.
> > >
> > > Yes, it is outside of the PCIe spec.
> > >
> > > No, this is not about the NVMe device, it is about the Intel SoC
> > > (System-on-a-Chip) the platform is based on.
> >
> > Ah.  So this problem could occur with any device, not just NVMe?  If
> > so, how do you address that?  Obviously you don't want to patch all
> > drivers this way.
> 
> It could, if the device was left in D0 during suspend, but drivers
> don't let devices stay in D0 during suspend as a rule, so this is all
> academic, except for the NVMe driver that has just started to do it in
> 5.3-rc1.
> 
> It has started to do that becasuse of what can be regarded as a
> hardware issue, but this does not even matter here.
> 
> > > The background really is commit d916b1be94b6 and its changelog is
> > > kind of misleading, unfortunately.  What it did, among other things,
> > > was to cause the NVMe driver to prevent the PCI bus type from
> > > applying the standard PCI PM to the devices handled by it in the
> > > suspend-to-idle flow.
> >
> > This is more meaningful to you than to most people because "applying
> > the standard PCI PM" doesn't tell us what that means in terms of the
> > device.  Presumably it has something to do with a D-state transition?
> > I *assume* a suspend might involve the D0 -> D3hot transition you
> > mention below?
> 
> By "standard PCI PM" I mean what pci_prepare_to_sleep() does. And yes,
> in the vast majority of cases the device goes from D0 to D3hot then.
> 
> > > The reason for doing that was a (reportedly) widespread failure to
> > > take the PCIe link down during D0 -> D3hot transitions of NVMe
> > > devices,
> >
> > I don't know any of the details, but "failure to take the link down
> > during D0 -> D3hot transitions" is phrased as though it might be a
> > hardware erratum.  If this *is* related to an NVMe erratum, that would
> > explain why you only need to patch the nvme driver, and it would be
> > useful to mention that in the commit log, since otherwise it sounds
> > like something that might be needed in other drivers, too.
> 
> Yes, that can be considered as an NVMe erratum and the NVMe driver has
> been *already* patched because of that in 5.3-rc1. [That's the commit
> mentioned in the changelog of the $subject patch.]
> 
> It effectively asks the PCI bus type to leave *all* devices handled by
> it in D0 during suspend-to-idle.  Already today.
> 
> I hope that this clarifies the current situation. :-)
> 
> > According to PCIe r5.0 sec 5.3.2, the only legal link states for D3hot
> > are L1, L2/L3 Ready.  So if you put a device in D3hot and its link
> > stays in L0, that sounds like a defect.  Is that what happens?
> 
> For some devices that's what happens. For some other devices the state
> of the link in D3hot appears to be L1 or L2/L3 Ready (as per the spec)
> and that's when the $subject patch makes a difference.
> ...

> Now, say an NVMe device works in accordance with the spec, so when it
> goes from D0 to D3hot, its PCIe link goes into L1 or L2/L3 Ready.  As
> of 5.3-rc1 or later it will be left in D0 during suspend-to-idle
> (because that's how the NVMe driver works), so its link state will
> depend on whether or not ASPM is enabled for it.  If ASPM is enabled
> for it, the final state of its link will depend on how deep ASPM is
> allowed to go, but if ASPM is not enabled for it, its link will remain
> in L0.
> 
> This means, however, that by allowing that device to go into D3hot
> when ASPM is not enabled for it, the energy used by the system while
> suspended can be reduced, because the PCIe link of the device will
> then go to L1 or L2/L3 Ready.  That's exactly what the $subject patch
> does.
> 
> Is this still not convincing enough?

It's not a matter of being convincing, it's a matter of dissecting and
analyzing this far enough so it makes sense to someone who hasn't
debugged the problem.  Since we're talking about ASPM being enabled,
that really means making the connections to specific PCIe situations.

I'm not the nvme maintainer, so my only interest in this is that it
was really hard for me to figure out how pcie_aspm_enabled() is
related to pm_suspend_via_firmware() and ctrl->npss.

But I think it has finally percolated through.  Here's my
understanding; see it has any connection with reality:

  Prior to d916b1be94b6 ("nvme-pci: use host managed power state for
  suspend"), suspend always put the NVMe device in D3hot.

  After d916b1be94b6, when it's possible, suspend keeps the NVMe
  device in D0 and uses NVMe-specific power settings because it's
  faster to change those than to do D0 -> D3hot -> D0 transitions.

  When it's not possible (either the device doesn't support
  NVMe-specific power settings or platform firmware has to be
  involved), we use D3hot as before.

  So now we have these three cases for suspending an NVMe device:

    1  D0 + no ASPM + NVMe power setting
    2  D0 +    ASPM + NVMe power setting
    3  D3hot

  Prior to d916b1be94b6, we always used case 3.  After d916b1be94b6,
  we used case 1 or 2 whenever possible (we didn't know which).  Case
  2 seemed acceptable, but the power consumption in case 1 was too
  high.

  This patch ("nvme-pci: Allow PCI bus-level PM to be used if ASPM is
  disabled") would replace case 1 with case 3 to reduce power
  consumption.

AFAICT we don't have a way to compute the relative power consumption
of these cases.  It's possible that even case 2 would use more power
than case 3.  You can empirically determine that this patch makes the
right trade-offs for the controllers you care about, but I don't think
it's clear that this will *always* be the case, so in that sense I
think pcie_aspm_enabled() is being used as part of a heuristic.

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-08-08 21:55                       ` [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled() Rafael J. Wysocki
@ 2019-08-09  4:50                         ` Bjorn Helgaas
  2019-08-09  8:00                           ` Rafael J. Wysocki
  2019-10-07 22:34                         ` Bjorn Helgaas
  1 sibling, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2019-08-09  4:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-nvme, Keith Busch, Mario Limonciello, Kai-Heng Feng,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI

s|PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()|PCI/ASPM: Add pcie_aspm_enabled()|

to match previous history.

On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Add a function checking whether or not PCIe ASPM has been enabled for
> a given device.
> 
> It will be used by the NVMe driver to decide how to handle the
> device during system suspend.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
> 
> v2 -> v3:
>   * Make the new function return bool.
>   * Change its name back to pcie_aspm_enabled().
>   * Fix kerneldoc comment formatting.
> 
> -> v2:
>   * Move the PCI/PCIe ASPM changes to a separate patch.
>   * Add the _mask suffix to the new function name.
>   * Add EXPORT_SYMBOL_GPL() to the new function.
>   * Avoid adding an unnecessary blank line.
> 
> ---
>  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
>  include/linux/pci.h     |    3 +++
>  2 files changed, 23 insertions(+)
> 
> Index: linux-pm/drivers/pci/pcie/aspm.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pcie/aspm.c
> +++ linux-pm/drivers/pci/pcie/aspm.c
> @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
>  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
>  	NULL, 0644);
>  
> +/**
> + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> + * @pci_device: Target device.
> + */
> +bool pcie_aspm_enabled(struct pci_dev *pci_device)

The typical name in this file is "pdev".

> +{
> +	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> +	bool ret;
> +
> +	if (!bridge)
> +		return false;
> +
> +	mutex_lock(&aspm_lock);
> +	ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> +	mutex_unlock(&aspm_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pcie_aspm_enabled);
> +
>  #ifdef CONFIG_PCIEASPM_DEBUG
>  static ssize_t link_state_show(struct device *dev,
>  		struct device_attribute *attr,
> Index: linux-pm/include/linux/pci.h
> ===================================================================
> --- linux-pm.orig/include/linux/pci.h
> +++ linux-pm/include/linux/pci.h
> @@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
>  
>  #ifdef CONFIG_PCIEASPM
>  bool pcie_aspm_support_enabled(void);
> +bool pcie_aspm_enabled(struct pci_dev *pci_device);
>  #else
>  static inline bool pcie_aspm_support_enabled(void) { return false; }
> +static inline bool pcie_aspm_enabled(struct pci_dev *pci_device)
> +{ return false; }
>  #endif
>  
>  #ifdef CONFIG_PCIEAER
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-08-09  4:50                         ` Bjorn Helgaas
@ 2019-08-09  8:00                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-09  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI

On Fri, Aug 9, 2019 at 6:51 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> s|PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()|PCI/ASPM: Add pcie_aspm_enabled()|

Will change.

>
> to match previous history.
>
> On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Add a function checking whether or not PCIe ASPM has been enabled for
> > a given device.
> >
> > It will be used by the NVMe driver to decide how to handle the
> > device during system suspend.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thanks!

> > ---
> >
> > v2 -> v3:
> >   * Make the new function return bool.
> >   * Change its name back to pcie_aspm_enabled().
> >   * Fix kerneldoc comment formatting.
> >
> > -> v2:
> >   * Move the PCI/PCIe ASPM changes to a separate patch.
> >   * Add the _mask suffix to the new function name.
> >   * Add EXPORT_SYMBOL_GPL() to the new function.
> >   * Avoid adding an unnecessary blank line.
> >
> > ---
> >  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
> >  include/linux/pci.h     |    3 +++
> >  2 files changed, 23 insertions(+)
> >
> > Index: linux-pm/drivers/pci/pcie/aspm.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > +++ linux-pm/drivers/pci/pcie/aspm.c
> > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> >       NULL, 0644);
> >
> > +/**
> > + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> > + * @pci_device: Target device.
> > + */
> > +bool pcie_aspm_enabled(struct pci_dev *pci_device)
>
> The typical name in this file is "pdev".

OK, will change.

> > +{
> > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > +     bool ret;
> > +
> > +     if (!bridge)
> > +             return false;
> > +
> > +     mutex_lock(&aspm_lock);
> > +     ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> > +     mutex_unlock(&aspm_lock);
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(pcie_aspm_enabled);
> > +
> >  #ifdef CONFIG_PCIEASPM_DEBUG
> >  static ssize_t link_state_show(struct device *dev,
> >               struct device_attribute *attr,
> > Index: linux-pm/include/linux/pci.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/pci.h
> > +++ linux-pm/include/linux/pci.h
> > @@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
> >
> >  #ifdef CONFIG_PCIEASPM
> >  bool pcie_aspm_support_enabled(void);
> > +bool pcie_aspm_enabled(struct pci_dev *pci_device);
> >  #else
> >  static inline bool pcie_aspm_support_enabled(void) { return false; }
> > +static inline bool pcie_aspm_enabled(struct pci_dev *pci_device)
> > +{ return false; }
> >  #endif
> >
> >  #ifdef CONFIG_PCIEAER
> >
> >
> >

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-09  4:47                                 ` Bjorn Helgaas
@ 2019-08-09  8:04                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-09  8:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-nvme, Keith Busch,
	Mario Limonciello, Kai-Heng Feng, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Linux PM, Linux Kernel Mailing List, Rajat Jain,
	Linux PCI

On Fri, Aug 9, 2019 at 6:47 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Aug 08, 2019 at 10:41:56PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Aug 8, 2019, 20:39 Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Aug 08, 2019 at 04:47:45PM +0200, Rafael J. Wysocki wrote:
> > > > On Thu, Aug 8, 2019 at 3:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Thu, Aug 08, 2019 at 12:10:06PM +0200, Rafael J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > >
> > > > > > One of the modifications made by commit d916b1be94b6 ("nvme-pci: use
> > > > > > host managed power state for suspend") was adding a pci_save_state()
> > > > > > call to nvme_suspend() in order to prevent the PCI bus-level PM from
> > > > > > being applied to the suspended NVMe devices, but if ASPM is not
> > > > > > enabled for the target NVMe device, that causes its PCIe link to stay
> > > > > > up and the platform may not be able to get into its optimum low-power
> > > > > > state because of that.
> > > > > >
> > > > > > For example, if ASPM is disabled for the NVMe drive (PC401 NVMe SK
> > > > > > hynix 256GB) in my Dell XPS13 9380, leaving it in D0 during
> > > > > > suspend-to-idle prevents the SoC from reaching package idle states
> > > > > > deeper than PC3, which is way insufficient for system suspend.
> > > > >
> > > > > Just curious: I assume the SoC you reference is some part of the NVMe
> > > > > drive?
> > > >
> > > > No, the SoC is what contains the Intel processor and PCH (formerly "chipset").
> > > >
> > > > > > To address this shortcoming, make nvme_suspend() check if ASPM is
> > > > > > enabled for the target device and fall back to full device shutdown
> > > > > > and PCI bus-level PM if that is not the case.
> > > > > >
> > > > > > Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend")
> > > > > > Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
> > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > > ---
> > > > > >
> > > > > > -> v2:
> > > > > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > > > > >   * Do not add a redundant ndev->last_ps == U32_MAX check in nvme_suspend().
> > > > > >
> > > > > > ---
> > > > > >  drivers/nvme/host/pci.c |   13 ++++++++++---
> > > > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > Index: linux-pm/drivers/nvme/host/pci.c
> > > > > > ===================================================================
> > > > > > --- linux-pm.orig/drivers/nvme/host/pci.c
> > > > > > +++ linux-pm/drivers/nvme/host/pci.c
> > > > > > @@ -2846,7 +2846,7 @@ static int nvme_resume(struct device *de
> > > > > >       struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
> > > > > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > > > > >
> > > > > > -     if (pm_resume_via_firmware() || !ctrl->npss ||
> > > > > > +     if (ndev->last_ps == U32_MAX ||
> > > > > >           nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> > > > > >               nvme_reset_ctrl(ctrl);
> > > > > >       return 0;
> > > > > > @@ -2859,6 +2859,8 @@ static int nvme_suspend(struct device *d
> > > > > >       struct nvme_ctrl *ctrl = &ndev->ctrl;
> > > > > >       int ret = -EBUSY;
> > > > > >
> > > > > > +     ndev->last_ps = U32_MAX;
> > > > > > +
> > > > > >       /*
> > > > > >        * The platform does not remove power for a kernel managed suspend so
> > > > > >        * use host managed nvme power settings for lowest idle power if
> > > > > > @@ -2866,8 +2868,14 @@ static int nvme_suspend(struct device *d
> > > > > >        * shutdown.  But if the firmware is involved after the suspend or the
> > > > > >        * device does not support any non-default power states, shut down the
> > > > > >        * device fully.
> > > > > > +      *
> > > > > > +      * If ASPM is not enabled for the device, shut down the device and allow
> > > > > > +      * the PCI bus layer to put it into D3 in order to take the PCIe link
> > > > > > +      * down, so as to allow the platform to achieve its minimum low-power
> > > > > > +      * state (which may not be possible if the link is up).
> > > > > >        */
> > > > > > -     if (pm_suspend_via_firmware() || !ctrl->npss) {
> > > > > > +     if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > > > > +         !pcie_aspm_enabled_mask(pdev)) {
> > > > >
> > > > > This seems like a layering violation, in the sense that ASPM is
> > > > > supposed to be hardware-autonomous and invisible to software.
> > > >
> > > > But software has to enable it.
> > > >
> > > > If it is not enabled, it will not be used, and that's what the check
> > > > is about.
> > > >
> > > > > IIUC the NVMe device will go to the desired package idle state if
> > > > > the link is in L0s or L1, but not if the link is in L0.  I don't
> > > > > understand that connection; AFAIK that would be something outside
> > > > > the scope of the PCIe spec.
> > > >
> > > > Yes, it is outside of the PCIe spec.
> > > >
> > > > No, this is not about the NVMe device, it is about the Intel SoC
> > > > (System-on-a-Chip) the platform is based on.
> > >
> > > Ah.  So this problem could occur with any device, not just NVMe?  If
> > > so, how do you address that?  Obviously you don't want to patch all
> > > drivers this way.
> >
> > It could, if the device was left in D0 during suspend, but drivers
> > don't let devices stay in D0 during suspend as a rule, so this is all
> > academic, except for the NVMe driver that has just started to do it in
> > 5.3-rc1.
> >
> > It has started to do that becasuse of what can be regarded as a
> > hardware issue, but this does not even matter here.
> >
> > > > The background really is commit d916b1be94b6 and its changelog is
> > > > kind of misleading, unfortunately.  What it did, among other things,
> > > > was to cause the NVMe driver to prevent the PCI bus type from
> > > > applying the standard PCI PM to the devices handled by it in the
> > > > suspend-to-idle flow.
> > >
> > > This is more meaningful to you than to most people because "applying
> > > the standard PCI PM" doesn't tell us what that means in terms of the
> > > device.  Presumably it has something to do with a D-state transition?
> > > I *assume* a suspend might involve the D0 -> D3hot transition you
> > > mention below?
> >
> > By "standard PCI PM" I mean what pci_prepare_to_sleep() does. And yes,
> > in the vast majority of cases the device goes from D0 to D3hot then.
> >
> > > > The reason for doing that was a (reportedly) widespread failure to
> > > > take the PCIe link down during D0 -> D3hot transitions of NVMe
> > > > devices,
> > >
> > > I don't know any of the details, but "failure to take the link down
> > > during D0 -> D3hot transitions" is phrased as though it might be a
> > > hardware erratum.  If this *is* related to an NVMe erratum, that would
> > > explain why you only need to patch the nvme driver, and it would be
> > > useful to mention that in the commit log, since otherwise it sounds
> > > like something that might be needed in other drivers, too.
> >
> > Yes, that can be considered as an NVMe erratum and the NVMe driver has
> > been *already* patched because of that in 5.3-rc1. [That's the commit
> > mentioned in the changelog of the $subject patch.]
> >
> > It effectively asks the PCI bus type to leave *all* devices handled by
> > it in D0 during suspend-to-idle.  Already today.
> >
> > I hope that this clarifies the current situation. :-)
> >
> > > According to PCIe r5.0 sec 5.3.2, the only legal link states for D3hot
> > > are L1, L2/L3 Ready.  So if you put a device in D3hot and its link
> > > stays in L0, that sounds like a defect.  Is that what happens?
> >
> > For some devices that's what happens. For some other devices the state
> > of the link in D3hot appears to be L1 or L2/L3 Ready (as per the spec)
> > and that's when the $subject patch makes a difference.
> > ...
>
> > Now, say an NVMe device works in accordance with the spec, so when it
> > goes from D0 to D3hot, its PCIe link goes into L1 or L2/L3 Ready.  As
> > of 5.3-rc1 or later it will be left in D0 during suspend-to-idle
> > (because that's how the NVMe driver works), so its link state will
> > depend on whether or not ASPM is enabled for it.  If ASPM is enabled
> > for it, the final state of its link will depend on how deep ASPM is
> > allowed to go, but if ASPM is not enabled for it, its link will remain
> > in L0.
> >
> > This means, however, that by allowing that device to go into D3hot
> > when ASPM is not enabled for it, the energy used by the system while
> > suspended can be reduced, because the PCIe link of the device will
> > then go to L1 or L2/L3 Ready.  That's exactly what the $subject patch
> > does.
> >
> > Is this still not convincing enough?
>
> It's not a matter of being convincing, it's a matter of dissecting and
> analyzing this far enough so it makes sense to someone who hasn't
> debugged the problem.  Since we're talking about ASPM being enabled,
> that really means making the connections to specific PCIe situations.
>
> I'm not the nvme maintainer, so my only interest in this is that it
> was really hard for me to figure out how pcie_aspm_enabled() is
> related to pm_suspend_via_firmware() and ctrl->npss.

Fair enough.

> But I think it has finally percolated through.  Here's my
> understanding; see it has any connection with reality:
>
>   Prior to d916b1be94b6 ("nvme-pci: use host managed power state for
>   suspend"), suspend always put the NVMe device in D3hot.

Right.

>   After d916b1be94b6, when it's possible, suspend keeps the NVMe
>   device in D0 and uses NVMe-specific power settings because it's
>   faster to change those than to do D0 -> D3hot -> D0 transitions.
>
>   When it's not possible (either the device doesn't support
>   NVMe-specific power settings or platform firmware has to be
>   involved), we use D3hot as before.

Right.

>   So now we have these three cases for suspending an NVMe device:
>
>     1  D0 + no ASPM + NVMe power setting
>     2  D0 +    ASPM + NVMe power setting
>     3  D3hot
>
>   Prior to d916b1be94b6, we always used case 3.  After d916b1be94b6,
>   we used case 1 or 2 whenever possible (we didn't know which).  Case
>   2 seemed acceptable, but the power consumption in case 1 was too
>   high.

That's correct.

>   This patch ("nvme-pci: Allow PCI bus-level PM to be used if ASPM is
>   disabled") would replace case 1 with case 3 to reduce power
>   consumption.

Right.

> AFAICT we don't have a way to compute the relative power consumption
> of these cases.  It's possible that even case 2 would use more power
> than case 3.  You can empirically determine that this patch makes the
> right trade-offs for the controllers you care about, but I don't think
> it's clear that this will *always* be the case, so in that sense I
> think pcie_aspm_enabled() is being used as part of a heuristic.

Fair enough.

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-08 22:13                       ` [PATCH v3 0/2] " Keith Busch
@ 2019-08-09  8:05                         ` Rafael J. Wysocki
  2019-08-09 14:52                           ` Keith Busch
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-08-09  8:05 UTC (permalink / raw)
  To: Keith Busch
  Cc: Rafael J. Wysocki, linux-nvme, Sagi Grimberg, Mario Limonciello,
	Linux PCI, Linux PM, Linux Kernel Mailing List, Keith Busch,
	Kai-Heng Feng, Bjorn Helgaas, Rajat Jain, Christoph Hellwig

On Fri, Aug 9, 2019 at 12:16 AM Keith Busch <kbusch@kernel.org> wrote:
>
> The v3 series looks good to me.
>
> Reviewed-by: Keith Busch <keith.busch@intel.com>
>
> Bjorn,
>
> If you're okay with the series, we can either take it through nvme,
> or you can feel free to apply through pci, whichever you prefer.

Actually, I can apply it too with your R-by along with the PCIe patch
ACKed by Bjorn.  Please let me know if that works for you.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  2019-08-09  8:05                         ` Rafael J. Wysocki
@ 2019-08-09 14:52                           ` Keith Busch
  0 siblings, 0 replies; 34+ messages in thread
From: Keith Busch @ 2019-08-09 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, linux-nvme, Sagi Grimberg, Mario Limonciello,
	Linux PCI, Linux PM, Linux Kernel Mailing List, Busch, Keith,
	Kai-Heng Feng, Bjorn Helgaas, Rajat Jain, Christoph Hellwig

On Fri, Aug 09, 2019 at 01:05:42AM -0700, Rafael J. Wysocki wrote:
> On Fri, Aug 9, 2019 at 12:16 AM Keith Busch <kbusch@kernel.org> wrote:
> >
> > The v3 series looks good to me.
> >
> > Reviewed-by: Keith Busch <keith.busch@intel.com>
> >
> > Bjorn,
> >
> > If you're okay with the series, we can either take it through nvme,
> > or you can feel free to apply through pci, whichever you prefer.
> 
> Actually, I can apply it too with your R-by along with the PCIe patch
> ACKed by Bjorn.  Please let me know if that works for you.

Thanks, that sounds good to me.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-08-08 21:55                       ` [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled() Rafael J. Wysocki
  2019-08-09  4:50                         ` Bjorn Helgaas
@ 2019-10-07 22:34                         ` Bjorn Helgaas
  2019-10-08  9:27                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2019-10-07 22:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-nvme, Keith Busch, Mario Limonciello, Kai-Heng Feng,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Linux PM,
	Linux Kernel Mailing List, Rajat Jain, Linux PCI,
	Heiner Kallweit

[+cc Heiner]

On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Add a function checking whether or not PCIe ASPM has been enabled for
> a given device.
> 
> It will be used by the NVMe driver to decide how to handle the
> device during system suspend.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3:
>   * Make the new function return bool.
>   * Change its name back to pcie_aspm_enabled().
>   * Fix kerneldoc comment formatting.
> 
> -> v2:
>   * Move the PCI/PCIe ASPM changes to a separate patch.
>   * Add the _mask suffix to the new function name.
>   * Add EXPORT_SYMBOL_GPL() to the new function.
>   * Avoid adding an unnecessary blank line.
> 
> ---
>  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
>  include/linux/pci.h     |    3 +++
>  2 files changed, 23 insertions(+)
> 
> Index: linux-pm/drivers/pci/pcie/aspm.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pcie/aspm.c
> +++ linux-pm/drivers/pci/pcie/aspm.c
> @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
>  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
>  	NULL, 0644);
>  
> +/**
> + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> + * @pci_device: Target device.
> + */
> +bool pcie_aspm_enabled(struct pci_dev *pci_device)
> +{
> +	struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> +	bool ret;
> +
> +	if (!bridge)
> +		return false;
> +
> +	mutex_lock(&aspm_lock);
> +	ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> +	mutex_unlock(&aspm_lock);

Why do we need to acquire aspm_lock here?  We aren't modifying
anything, and I don't think we're preventing a race.  If this races
with another thread that changes aspm_enabled, we'll return either the
old state or the new one, and I think that's still the case even if we
don't acquire aspm_lock.

> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pcie_aspm_enabled);
> +
>  #ifdef CONFIG_PCIEASPM_DEBUG
>  static ssize_t link_state_show(struct device *dev,
>  		struct device_attribute *attr,
> Index: linux-pm/include/linux/pci.h
> ===================================================================
> --- linux-pm.orig/include/linux/pci.h
> +++ linux-pm/include/linux/pci.h
> @@ -1567,8 +1567,11 @@ extern bool pcie_ports_native;
>  
>  #ifdef CONFIG_PCIEASPM
>  bool pcie_aspm_support_enabled(void);
> +bool pcie_aspm_enabled(struct pci_dev *pci_device);
>  #else
>  static inline bool pcie_aspm_support_enabled(void) { return false; }
> +static inline bool pcie_aspm_enabled(struct pci_dev *pci_device)
> +{ return false; }
>  #endif
>  
>  #ifdef CONFIG_PCIEAER
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-10-07 22:34                         ` Bjorn Helgaas
@ 2019-10-08  9:27                           ` Rafael J. Wysocki
  2019-10-08 21:16                             ` Bjorn Helgaas
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-10-08  9:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI,
	Heiner Kallweit

On Tue, Oct 8, 2019 at 12:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Heiner]
>
> On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Add a function checking whether or not PCIe ASPM has been enabled for
> > a given device.
> >
> > It will be used by the NVMe driver to decide how to handle the
> > device during system suspend.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v2 -> v3:
> >   * Make the new function return bool.
> >   * Change its name back to pcie_aspm_enabled().
> >   * Fix kerneldoc comment formatting.
> >
> > -> v2:
> >   * Move the PCI/PCIe ASPM changes to a separate patch.
> >   * Add the _mask suffix to the new function name.
> >   * Add EXPORT_SYMBOL_GPL() to the new function.
> >   * Avoid adding an unnecessary blank line.
> >
> > ---
> >  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
> >  include/linux/pci.h     |    3 +++
> >  2 files changed, 23 insertions(+)
> >
> > Index: linux-pm/drivers/pci/pcie/aspm.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > +++ linux-pm/drivers/pci/pcie/aspm.c
> > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> >       NULL, 0644);
> >
> > +/**
> > + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> > + * @pci_device: Target device.
> > + */
> > +bool pcie_aspm_enabled(struct pci_dev *pci_device)
> > +{
> > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > +     bool ret;
> > +
> > +     if (!bridge)
> > +             return false;
> > +
> > +     mutex_lock(&aspm_lock);
> > +     ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> > +     mutex_unlock(&aspm_lock);
>
> Why do we need to acquire aspm_lock here?  We aren't modifying
> anything, and I don't think we're preventing a race.  If this races
> with another thread that changes aspm_enabled, we'll return either the
> old state or the new one, and I think that's still the case even if we
> don't acquire aspm_lock.

Well, if we can guarantee that pci_remove_bus_device() will never be
called in parallel with this helper, then I agree, but can we
guarantee that?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-10-08  9:27                           ` Rafael J. Wysocki
@ 2019-10-08 21:16                             ` Bjorn Helgaas
  2019-10-08 22:54                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2019-10-08 21:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI,
	Heiner Kallweit

On Tue, Oct 08, 2019 at 11:27:51AM +0200, Rafael J. Wysocki wrote:
> On Tue, Oct 8, 2019 at 12:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Add a function checking whether or not PCIe ASPM has been enabled for
> > > a given device.
> > >
> > > It will be used by the NVMe driver to decide how to handle the
> > > device during system suspend.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > v2 -> v3:
> > >   * Make the new function return bool.
> > >   * Change its name back to pcie_aspm_enabled().
> > >   * Fix kerneldoc comment formatting.
> > >
> > > -> v2:
> > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > >   * Add the _mask suffix to the new function name.
> > >   * Add EXPORT_SYMBOL_GPL() to the new function.
> > >   * Avoid adding an unnecessary blank line.
> > >
> > > ---
> > >  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
> > >  include/linux/pci.h     |    3 +++
> > >  2 files changed, 23 insertions(+)
> > >
> > > Index: linux-pm/drivers/pci/pcie/aspm.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > > +++ linux-pm/drivers/pci/pcie/aspm.c
> > > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> > >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> > >       NULL, 0644);
> > >
> > > +/**
> > > + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> > > + * @pci_device: Target device.
> > > + */
> > > +bool pcie_aspm_enabled(struct pci_dev *pci_device)
> > > +{
> > > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > > +     bool ret;
> > > +
> > > +     if (!bridge)
> > > +             return false;
> > > +
> > > +     mutex_lock(&aspm_lock);
> > > +     ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> > > +     mutex_unlock(&aspm_lock);
> >
> > Why do we need to acquire aspm_lock here?  We aren't modifying
> > anything, and I don't think we're preventing a race.  If this races
> > with another thread that changes aspm_enabled, we'll return either the
> > old state or the new one, and I think that's still the case even if we
> > don't acquire aspm_lock.
> 
> Well, if we can guarantee that pci_remove_bus_device() will never be
> called in parallel with this helper, then I agree, but can we
> guarantee that?

Hmm, yeah, I guess that's the question.  It's not a race with another
thread changing aspm_enabled; the potential race is with another
thread removing the last child of "bridge", which will free the
link_state and set bridge->link_state = NULL.

I think it should be safe to call device-related PCI interfaces if
you're holding a reference to the device, e.g., from a driver bound to
the device or a sysfs accessor.  Since we call pcie_aspm_enabled(dev)
from a driver bound to "dev", another thread should not be able to
remove "dev" while we're using it.

I know that's a little hand-wavey, but if it weren't true, I think
we'd have a lot more locking sprinkled everywhere in the PCI core than
we do.

This has implications for Heiner's ASPM sysfs patches because we're
currently doing this in sysfs accessors:

  static ssize_t aspm_attr_show_common(struct device *dev, ...)
  {
    ...
    link = pcie_aspm_get_link(pdev);

    mutex_lock(&aspm_lock);
    enabled = link->aspm_enabled & state;
    mutex_unlock(&aspm_lock);
    ...
  }

I assume sysfs must be holding a reference that guarantees "dev" is
valid througout this code, and therefore we should not need to hold
aspm_lock.

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-10-08 21:16                             ` Bjorn Helgaas
@ 2019-10-08 22:54                               ` Rafael J. Wysocki
  2019-10-09 12:49                                 ` Bjorn Helgaas
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2019-10-08 22:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-nvme, Keith Busch,
	Mario Limonciello, Kai-Heng Feng, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Linux PM, Linux Kernel Mailing List, Rajat Jain,
	Linux PCI, Heiner Kallweit

On Tue, Oct 8, 2019 at 11:16 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Oct 08, 2019 at 11:27:51AM +0200, Rafael J. Wysocki wrote:
> > On Tue, Oct 8, 2019 at 12:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > Add a function checking whether or not PCIe ASPM has been enabled for
> > > > a given device.
> > > >
> > > > It will be used by the NVMe driver to decide how to handle the
> > > > device during system suspend.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > v2 -> v3:
> > > >   * Make the new function return bool.
> > > >   * Change its name back to pcie_aspm_enabled().
> > > >   * Fix kerneldoc comment formatting.
> > > >
> > > > -> v2:
> > > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > > >   * Add the _mask suffix to the new function name.
> > > >   * Add EXPORT_SYMBOL_GPL() to the new function.
> > > >   * Avoid adding an unnecessary blank line.
> > > >
> > > > ---
> > > >  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
> > > >  include/linux/pci.h     |    3 +++
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > Index: linux-pm/drivers/pci/pcie/aspm.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > > > +++ linux-pm/drivers/pci/pcie/aspm.c
> > > > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> > > >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> > > >       NULL, 0644);
> > > >
> > > > +/**
> > > > + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> > > > + * @pci_device: Target device.
> > > > + */
> > > > +bool pcie_aspm_enabled(struct pci_dev *pci_device)
> > > > +{
> > > > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > > > +     bool ret;
> > > > +
> > > > +     if (!bridge)
> > > > +             return false;
> > > > +
> > > > +     mutex_lock(&aspm_lock);
> > > > +     ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> > > > +     mutex_unlock(&aspm_lock);
> > >
> > > Why do we need to acquire aspm_lock here?  We aren't modifying
> > > anything, and I don't think we're preventing a race.  If this races
> > > with another thread that changes aspm_enabled, we'll return either the
> > > old state or the new one, and I think that's still the case even if we
> > > don't acquire aspm_lock.
> >
> > Well, if we can guarantee that pci_remove_bus_device() will never be
> > called in parallel with this helper, then I agree, but can we
> > guarantee that?
>
> Hmm, yeah, I guess that's the question.  It's not a race with another
> thread changing aspm_enabled; the potential race is with another
> thread removing the last child of "bridge", which will free the
> link_state and set bridge->link_state = NULL.
>
> I think it should be safe to call device-related PCI interfaces if
> you're holding a reference to the device, e.g., from a driver bound to
> the device or a sysfs accessor.  Since we call pcie_aspm_enabled(dev)
> from a driver bound to "dev", another thread should not be able to
> remove "dev" while we're using it.
>
> I know that's a little hand-wavey, but if it weren't true, I think
> we'd have a lot more locking sprinkled everywhere in the PCI core than
> we do.
>
> This has implications for Heiner's ASPM sysfs patches because we're
> currently doing this in sysfs accessors:
>
>   static ssize_t aspm_attr_show_common(struct device *dev, ...)
>   {
>     ...
>     link = pcie_aspm_get_link(pdev);
>
>     mutex_lock(&aspm_lock);
>     enabled = link->aspm_enabled & state;
>     mutex_unlock(&aspm_lock);
>     ...
>   }
>
> I assume sysfs must be holding a reference that guarantees "dev" is
> valid througout this code, and therefore we should not need to hold
> aspm_lock.

In principle, pcie_aspm_enabled() need not be called via sysfs.

In the particular NVMe use case, it is called from the driver's own PM
callback, so it would be safe without the locking AFAICS.

I guess it is safe to drop the locking from there, but then it would
be good to mention in the kerneldoc that calling it is only safe under
the assumption that the link_state object cannot go away while it is
running.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()
  2019-10-08 22:54                               ` Rafael J. Wysocki
@ 2019-10-09 12:49                                 ` Bjorn Helgaas
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2019-10-09 12:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, linux-nvme, Keith Busch, Mario Limonciello,
	Kai-Heng Feng, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Linux PM, Linux Kernel Mailing List, Rajat Jain, Linux PCI,
	Heiner Kallweit

On Wed, Oct 09, 2019 at 12:54:37AM +0200, Rafael J. Wysocki wrote:
> On Tue, Oct 8, 2019 at 11:16 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Tue, Oct 08, 2019 at 11:27:51AM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Oct 8, 2019 at 12:34 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > Add a function checking whether or not PCIe ASPM has been enabled for
> > > > > a given device.
> > > > >
> > > > > It will be used by the NVMe driver to decide how to handle the
> > > > > device during system suspend.
> > > > >
> > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > ---
> > > > >
> > > > > v2 -> v3:
> > > > >   * Make the new function return bool.
> > > > >   * Change its name back to pcie_aspm_enabled().
> > > > >   * Fix kerneldoc comment formatting.
> > > > >
> > > > > -> v2:
> > > > >   * Move the PCI/PCIe ASPM changes to a separate patch.
> > > > >   * Add the _mask suffix to the new function name.
> > > > >   * Add EXPORT_SYMBOL_GPL() to the new function.
> > > > >   * Avoid adding an unnecessary blank line.
> > > > >
> > > > > ---
> > > > >  drivers/pci/pcie/aspm.c |   20 ++++++++++++++++++++
> > > > >  include/linux/pci.h     |    3 +++
> > > > >  2 files changed, 23 insertions(+)
> > > > >
> > > > > Index: linux-pm/drivers/pci/pcie/aspm.c
> > > > > ===================================================================
> > > > > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > > > > +++ linux-pm/drivers/pci/pcie/aspm.c
> > > > > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> > > > >  module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> > > > >       NULL, 0644);
> > > > >
> > > > > +/**
> > > > > + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> > > > > + * @pci_device: Target device.
> > > > > + */
> > > > > +bool pcie_aspm_enabled(struct pci_dev *pci_device)
> > > > > +{
> > > > > +     struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > > > > +     bool ret;
> > > > > +
> > > > > +     if (!bridge)
> > > > > +             return false;
> > > > > +
> > > > > +     mutex_lock(&aspm_lock);
> > > > > +     ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> > > > > +     mutex_unlock(&aspm_lock);
> > > >
> > > > Why do we need to acquire aspm_lock here?  We aren't modifying
> > > > anything, and I don't think we're preventing a race.  If this races
> > > > with another thread that changes aspm_enabled, we'll return either the
> > > > old state or the new one, and I think that's still the case even if we
> > > > don't acquire aspm_lock.
> > >
> > > Well, if we can guarantee that pci_remove_bus_device() will never be
> > > called in parallel with this helper, then I agree, but can we
> > > guarantee that?
> >
> > Hmm, yeah, I guess that's the question.  It's not a race with another
> > thread changing aspm_enabled; the potential race is with another
> > thread removing the last child of "bridge", which will free the
> > link_state and set bridge->link_state = NULL.
> >
> > I think it should be safe to call device-related PCI interfaces if
> > you're holding a reference to the device, e.g., from a driver bound to
> > the device or a sysfs accessor.  Since we call pcie_aspm_enabled(dev)
> > from a driver bound to "dev", another thread should not be able to
> > remove "dev" while we're using it.
> >
> > I know that's a little hand-wavey, but if it weren't true, I think
> > we'd have a lot more locking sprinkled everywhere in the PCI core than
> > we do.
> >
> > This has implications for Heiner's ASPM sysfs patches because we're
> > currently doing this in sysfs accessors:
> >
> >   static ssize_t aspm_attr_show_common(struct device *dev, ...)
> >   {
> >     ...
> >     link = pcie_aspm_get_link(pdev);
> >
> >     mutex_lock(&aspm_lock);
> >     enabled = link->aspm_enabled & state;
> >     mutex_unlock(&aspm_lock);
> >     ...
> >   }
> >
> > I assume sysfs must be holding a reference that guarantees "dev" is
> > valid througout this code, and therefore we should not need to hold
> > aspm_lock.
> 
> In principle, pcie_aspm_enabled() need not be called via sysfs.
> 
> In the particular NVMe use case, it is called from the driver's own PM
> callback, so it would be safe without the locking AFAICS.

Right, pcie_aspm_enabled() is only used by drivers (actually only by
the nvme driver so far).  And aspm_attr_show_common() is only used via
new sysfs code being added by Heiner.

> I guess it is safe to drop the locking from there, but then it would
> be good to mention in the kerneldoc that calling it is only safe under
> the assumption that the link_state object cannot go away while it is
> running.

I'll post a patch to that effect.  Thanks!

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-10-09 12:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4323ed84dd07474eab65699b4d007aaf@AUSX13MPC105.AMER.DELL.COM>
     [not found] ` <CAJZ5v0iDQ4=kTUgW94tKGt7oJzA_3uVU_M6HAMbNCRXwp_do8A@mail.gmail.com>
     [not found]   ` <47415939.KV5G6iaeJG@kreacher>
     [not found]     ` <20190730144134.GA12844@localhost.localdomain>
     [not found]       ` <100ba4aff1c6434a81e47774ab4acddc@AUSX13MPC105.AMER.DELL.COM>
     [not found]         ` <8246360B-F7D9-42EB-94FC-82995A769E28@canonical.com>
     [not found]           ` <20190730191934.GD13948@localhost.localdomain>
     [not found]             ` <7d3e0b8ba1444194a153c93faa1cabb3@AUSX13MPC105.AMER.DELL.COM>
     [not found]               ` <20190730213114.GK13948@localhost.localdomain>
     [not found]                 ` <CAJZ5v0gxfeMN8eCNRjcXmUOkReVsdozb3EccaYMpnmSHu3771g@mail.gmail.com>
     [not found]                   ` <20190731221956.GB15795@localhost.localdomain>
2019-08-07  9:53                     ` [PATCH] nvme-pci: Do not prevent PCI bus-level PM from being used Rafael J. Wysocki
2019-08-07 10:14                       ` Rafael J. Wysocki
2019-08-07 10:43                       ` Christoph Hellwig
2019-08-07 14:37                       ` Keith Busch
2019-08-08  8:36                     ` [PATCH] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
2019-08-08  8:48                       ` Christoph Hellwig
2019-08-08  9:06                         ` Rafael J. Wysocki
2019-08-08 10:03                     ` [PATCH v2 0/2] " Rafael J. Wysocki
2019-08-08 10:06                       ` [PATCH v2 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled_mask() Rafael J. Wysocki
2019-08-08 13:15                         ` Bjorn Helgaas
2019-08-08 14:48                           ` Rafael J. Wysocki
2019-08-08 10:10                       ` [PATCH v2 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
2019-08-08 13:43                         ` Bjorn Helgaas
2019-08-08 14:47                           ` Rafael J. Wysocki
2019-08-08 17:06                             ` Rafael J. Wysocki
2019-08-08 18:39                             ` Bjorn Helgaas
2019-08-08 20:01                               ` Keith Busch
2019-08-08 20:05                               ` Mario.Limonciello
2019-08-08 20:41                               ` Rafael J. Wysocki
2019-08-09  4:47                                 ` Bjorn Helgaas
2019-08-09  8:04                                   ` Rafael J. Wysocki
2019-08-08 21:51                     ` [PATCH v3 0/2] " Rafael J. Wysocki
2019-08-08 21:55                       ` [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled() Rafael J. Wysocki
2019-08-09  4:50                         ` Bjorn Helgaas
2019-08-09  8:00                           ` Rafael J. Wysocki
2019-10-07 22:34                         ` Bjorn Helgaas
2019-10-08  9:27                           ` Rafael J. Wysocki
2019-10-08 21:16                             ` Bjorn Helgaas
2019-10-08 22:54                               ` Rafael J. Wysocki
2019-10-09 12:49                                 ` Bjorn Helgaas
2019-08-08 21:58                       ` [PATCH v3 2/2] nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled Rafael J. Wysocki
2019-08-08 22:13                       ` [PATCH v3 0/2] " Keith Busch
2019-08-09  8:05                         ` Rafael J. Wysocki
2019-08-09 14:52                           ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).