linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
       [not found] <20210520033315.490584-1-koba.ko@canonical.com>
@ 2021-05-25  7:44 ` Christoph Hellwig
  2021-05-25 16:49   ` Kai-Heng Feng
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-05-25  7:44 UTC (permalink / raw)
  To: Koba Ko
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel, Henrik Juul Hansen, Kai-Heng Feng,
	Bjorn Helgaas, linux-pci

On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> After resume, host can't change power state of the closed controller
> from D3cold to D0.

Why?

> For these devices, just avoid to go deeper than d3hot.

What are "these devices"?

> @@ -2958,6 +2959,15 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  
>  	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
>  
> +	if (pm_suspend_via_firmware() || !dev->ctrl.npss ||
> +	    !pcie_aspm_enabled(pdev) ||
> +	    dev->nr_host_mem_descs ||
> +	    (dev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND)) {

Before we start open coding this in even more places we really want a
little helper function for these checks, which should be accomodated with
the comment near the existing copy of the checks.

> +		pdev->d3cold_allowed = false;
> +		pci_d3cold_disable(pdev);
> +		pm_runtime_resume(&pdev->dev);

Why do we need to both set d3cold_allowed and call pci_d3cold_disable?

What is the pm_runtime_resume doing here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-25  7:44 ` [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss Christoph Hellwig
@ 2021-05-25 16:49   ` Kai-Heng Feng
  2021-05-25 20:14   ` Bjorn Helgaas
  2021-05-26  2:02   ` Koba Ko
  2 siblings, 0 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-05-25 16:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Koba Ko, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	LKML, Henrik Juul Hansen, Bjorn Helgaas, Linux PCI

On Tue, May 25, 2021 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> > After resume, host can't change power state of the closed controller
> > from D3cold to D0.
>
> Why?

IIUC it's a regression introduced by commit b97120b15ebd ("nvme-pci:
use simple suspend when a HMB is enabled"). The affected NVMe is using
HMB.

That commit intentionally put the device to D3hot instead of D0 on
suspend, as the root port of the NVMe device has _PR3, the NVMe was
put to D3cold as a result. I believe because the other OS doesn't put
the NVMe to D3cold, so turning off the power resource is untested by
the vendor.

I think the proper fix would be reverting that commit, and
teardown/setup DMA on suspend/resume for HMB NVMes.

Kai-Heng

>
> > For these devices, just avoid to go deeper than d3hot.
>
> What are "these devices"?
>
> > @@ -2958,6 +2959,15 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >
> >       dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
> >
> > +     if (pm_suspend_via_firmware() || !dev->ctrl.npss ||
> > +         !pcie_aspm_enabled(pdev) ||
> > +         dev->nr_host_mem_descs ||
> > +         (dev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND)) {
>
> Before we start open coding this in even more places we really want a
> little helper function for these checks, which should be accomodated with
> the comment near the existing copy of the checks.
>
> > +             pdev->d3cold_allowed = false;
> > +             pci_d3cold_disable(pdev);
> > +             pm_runtime_resume(&pdev->dev);
>
> Why do we need to both set d3cold_allowed and call pci_d3cold_disable?
>
> What is the pm_runtime_resume doing here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-25  7:44 ` [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss Christoph Hellwig
  2021-05-25 16:49   ` Kai-Heng Feng
@ 2021-05-25 20:14   ` Bjorn Helgaas
  2021-05-26  2:02   ` Koba Ko
  2 siblings, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2021-05-25 20:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Koba Ko, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-kernel, Henrik Juul Hansen, Kai-Heng Feng, Bjorn Helgaas,
	linux-pci

On Tue, May 25, 2021 at 09:44:26AM +0200, Christoph Hellwig wrote:
> On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:

> > @@ -2958,6 +2959,15 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  
> >  	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
> >  
> > +	if (pm_suspend_via_firmware() || !dev->ctrl.npss ||
> > +	    !pcie_aspm_enabled(pdev) ||
> > +	    dev->nr_host_mem_descs ||
> > +	    (dev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND)) {
> 
> Before we start open coding this in even more places we really want a
> little helper function for these checks, which should be accomodated with
> the comment near the existing copy of the checks.
> 
> > +		pdev->d3cold_allowed = false;
> > +		pci_d3cold_disable(pdev);
> > +		pm_runtime_resume(&pdev->dev);
> 
> Why do we need to both set d3cold_allowed and call pci_d3cold_disable?

Ugh, this looks pretty hard to maintain.

I don't see why setting d3cold_allowed=false is useful.

pci_d3cold_disable() already sets dev->no_d3cold=true, and the only place
we look at d3cold_allowed is pci_dev_check_d3cold():

  if (dev->no_d3cold || !dev->d3cold_allowed || ...)

so we won't even look at d3cold_allowed when no_d3cold is set.

I don't know why we need both no_d3cold and d3cold_allowed in the
first place.  448bd857d48e ("PCI/PM: add PCIe runtime D3cold support")
added them, but without explanation for that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-25  7:44 ` [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss Christoph Hellwig
  2021-05-25 16:49   ` Kai-Heng Feng
  2021-05-25 20:14   ` Bjorn Helgaas
@ 2021-05-26  2:02   ` Koba Ko
  2021-05-26  2:49     ` Keith Busch
  2 siblings, 1 reply; 14+ messages in thread
From: Koba Ko @ 2021-05-26  2:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	Linux Kernel Mailing List, Henrik Juul Hansen, Kai-Heng Feng,
	Bjorn Helgaas, linux-pci

On Tue, May 25, 2021 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> > After resume, host can't change power state of the closed controller
> > from D3cold to D0.
>
> Why?
As per Kai-Heng said, it's a regression introduced by commit
b97120b15ebd ("nvme-pci:
use simple suspend when a HMB is enabled"). The affected NVMe is using HMB.
the target nvme ssd uses HMB and the target machine would put nvme to d3cold.
During suspend, nvme driver would shutdown the nvme controller caused by
commit b97120b15ebd ("nvme-pci: use simple suspend when a HMB is enabled").
During resuming, the nvme controller can't change the power state from
d3cold to d0.
    # nvme 0000:58:00.0: can't change power state from D3cold to D0
(config space inaccessible)
Tried some machines, they only put nvme to d3hot so even if nvme is
forced to shutdown,
it could be resumed correctly.

As per commit b97120b15ebd , the TP spec would allow nvme to access
the host memory in any power state in S3.
but the Host would fail to manage. I agree with Kai-Heng's suggestion
but this TP would be broken.

>
> > For these devices, just avoid to go deeper than d3hot.
>
> What are "these devices"?

It's a Samsung ssd using HMB.

> > @@ -2958,6 +2959,15 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >
> >       dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
> >
> > +     if (pm_suspend_via_firmware() || !dev->ctrl.npss ||
> > +         !pcie_aspm_enabled(pdev) ||
> > +         dev->nr_host_mem_descs ||
> > +         (dev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND)) {
>
> Before we start open coding this in even more places we really want a
> little helper function for these checks, which should be accomodated with
> the comment near the existing copy of the checks.

Thanks, I will refine this.

>
> > +             pdev->d3cold_allowed = false;
> > +             pci_d3cold_disable(pdev);
> > +             pm_runtime_resume(&pdev->dev);
>
> Why do we need to both set d3cold_allowed and call pci_d3cold_disable?
>
> What is the pm_runtime_resume doing here?
I referenced the codes of d3cold_allowed_store@d3cold_allowed_store fun,
As per Bjorn and search in multiple drivers, only pci_d3cold_disable is enough.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26  2:02   ` Koba Ko
@ 2021-05-26  2:49     ` Keith Busch
  2021-05-26 12:11       ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2021-05-26  2:49 UTC (permalink / raw)
  To: Koba Ko
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme,
	Linux Kernel Mailing List, Henrik Juul Hansen, Kai-Heng Feng,
	Bjorn Helgaas, linux-pci

On Wed, May 26, 2021 at 10:02:27AM +0800, Koba Ko wrote:
> On Tue, May 25, 2021 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> > > After resume, host can't change power state of the closed controller
> > > from D3cold to D0.
> >
> > Why?
> As per Kai-Heng said, it's a regression introduced by commit
> b97120b15ebd ("nvme-pci:
> use simple suspend when a HMB is enabled"). The affected NVMe is using HMB.

That really doesn't add up. The mentioned commit restores the driver
behavior for HMB drives that existed prior to d916b1be94b6d from kernel
5.3. Is that NVMe device broken in pre-5.3 kernels, too?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26  2:49     ` Keith Busch
@ 2021-05-26 12:11       ` Kai-Heng Feng
  2021-05-26 12:59         ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-05-26 12:11 UTC (permalink / raw)
  To: Keith Busch
  Cc: Koba Ko, Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	linux-nvme, Linux Kernel Mailing List, Henrik Juul Hansen,
	Bjorn Helgaas, Linux PCI

On Wed, May 26, 2021 at 10:49 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, May 26, 2021 at 10:02:27AM +0800, Koba Ko wrote:
> > On Tue, May 25, 2021 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> > > > After resume, host can't change power state of the closed controller
> > > > from D3cold to D0.
> > >
> > > Why?
> > As per Kai-Heng said, it's a regression introduced by commit
> > b97120b15ebd ("nvme-pci:
> > use simple suspend when a HMB is enabled"). The affected NVMe is using HMB.
>
> That really doesn't add up. The mentioned commit restores the driver
> behavior for HMB drives that existed prior to d916b1be94b6d from kernel
> 5.3. Is that NVMe device broken in pre-5.3 kernels, too?

Quite likely. The system in question is a late 2020 Ice Lake laptop,
so it was released after 5.3 kernel.

Kai-Heng

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 12:11       ` Kai-Heng Feng
@ 2021-05-26 12:59         ` Christoph Hellwig
  2021-05-26 14:21           ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-05-26 12:59 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Keith Busch, Koba Ko, Christoph Hellwig, Jens Axboe,
	Sagi Grimberg, linux-nvme, Linux Kernel Mailing List,
	Henrik Juul Hansen, Bjorn Helgaas, Linux PCI

On Wed, May 26, 2021 at 08:11:41PM +0800, Kai-Heng Feng wrote:
> On Wed, May 26, 2021 at 10:49 AM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Wed, May 26, 2021 at 10:02:27AM +0800, Koba Ko wrote:
> > > On Tue, May 25, 2021 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
> > > >
> > > > On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> > > > > After resume, host can't change power state of the closed controller
> > > > > from D3cold to D0.
> > > >
> > > > Why?
> > > As per Kai-Heng said, it's a regression introduced by commit
> > > b97120b15ebd ("nvme-pci:
> > > use simple suspend when a HMB is enabled"). The affected NVMe is using HMB.
> >
> > That really doesn't add up. The mentioned commit restores the driver
> > behavior for HMB drives that existed prior to d916b1be94b6d from kernel
> > 5.3. Is that NVMe device broken in pre-5.3 kernels, too?
> 
> Quite likely. The system in question is a late 2020 Ice Lake laptop,
> so it was released after 5.3 kernel.

This is just a mess.  We had to disable the sensible power state based
suspend on these systems because Intel broke it by just cutting the power
off.  And now the shutdown based one doesn't work either because it can't
handle d3cold.  Someone we need to stop Intel and the integrators from
doing stupid things, and I'm not sure how.

But degrading all systems even more is just a bad idea, so I fear we'll
need a quirk again.  Can you figure out by switching the cards if this
is the fault of the platform or the nvme device?

> 
> Kai-Heng
---end quoted text---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 12:59         ` Christoph Hellwig
@ 2021-05-26 14:21           ` Kai-Heng Feng
  2021-05-26 14:28             ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-05-26 14:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Koba Ko, Jens Axboe, Sagi Grimberg, linux-nvme,
	Linux Kernel Mailing List, Henrik Juul Hansen, Bjorn Helgaas,
	Linux PCI

On Wed, May 26, 2021 at 8:59 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, May 26, 2021 at 08:11:41PM +0800, Kai-Heng Feng wrote:
> > On Wed, May 26, 2021 at 10:49 AM Keith Busch <kbusch@kernel.org> wrote:
> > >
> > > On Wed, May 26, 2021 at 10:02:27AM +0800, Koba Ko wrote:
> > > > On Tue, May 25, 2021 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
> > > > >
> > > > > On Thu, May 20, 2021 at 11:33:15AM +0800, Koba Ko wrote:
> > > > > > After resume, host can't change power state of the closed controller
> > > > > > from D3cold to D0.
> > > > >
> > > > > Why?
> > > > As per Kai-Heng said, it's a regression introduced by commit
> > > > b97120b15ebd ("nvme-pci:
> > > > use simple suspend when a HMB is enabled"). The affected NVMe is using HMB.
> > >
> > > That really doesn't add up. The mentioned commit restores the driver
> > > behavior for HMB drives that existed prior to d916b1be94b6d from kernel
> > > 5.3. Is that NVMe device broken in pre-5.3 kernels, too?
> >
> > Quite likely. The system in question is a late 2020 Ice Lake laptop,
> > so it was released after 5.3 kernel.
>
> This is just a mess.  We had to disable the sensible power state based
> suspend on these systems because Intel broke it by just cutting the power
> off.  And now the shutdown based one doesn't work either because it can't
> handle d3cold.  Someone we need to stop Intel and the integrators from
> doing stupid things, and I'm not sure how.

To be fair, resuming the NVMe from D3hot is much slower than keep it
at D0, which gives us a faster s2idle resume time. And now AMD also
requires s2idle on their latest laptops.

And it's more like NVMe controllers don't respect PCI D3hot.

>
> But degrading all systems even more is just a bad idea, so I fear we'll
> need a quirk again.  Can you figure out by switching the cards if this
> is the fault of the platform or the nvme device?

Here's the original bug report:
https://bugs.launchpad.net/bugs/1912057

Because the NVMe continues to work after s2idle and the symbol is
rather subtle, so I suspect this is not platform or vendor specific.
Is it possible to disable DMA for HMB NVMe on suspend?

Kai-Heng

>
> >
> > Kai-Heng
> ---end quoted text---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 14:21           ` Kai-Heng Feng
@ 2021-05-26 14:28             ` Christoph Hellwig
  2021-05-26 14:47               ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-05-26 14:28 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Christoph Hellwig, Keith Busch, Koba Ko, Jens Axboe,
	Sagi Grimberg, linux-nvme, Linux Kernel Mailing List,
	Henrik Juul Hansen, Bjorn Helgaas, Linux PCI

On Wed, May 26, 2021 at 10:21:59PM +0800, Kai-Heng Feng wrote:
> To be fair, resuming the NVMe from D3hot is much slower than keep it
> at D0, which gives us a faster s2idle resume time. And now AMD also
> requires s2idle on their latest laptops.

We'd much prefer to use it, but due to the broken platforms we can't
unfortunately.

> And it's more like NVMe controllers don't respect PCI D3hot.

What do you mean with that?

> Because the NVMe continues to work after s2idle and the symbol is
> rather subtle, so I suspect this is not platform or vendor specific.
> Is it possible to disable DMA for HMB NVMe on suspend?

Not in shipping products.  The NVMe technical working group is working
on a way to do that, but it will take a while until that shows up in
products.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 14:28             ` Christoph Hellwig
@ 2021-05-26 14:47               ` Kai-Heng Feng
  2021-05-26 15:06                 ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-05-26 14:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Koba Ko, Jens Axboe, Sagi Grimberg, linux-nvme,
	Linux Kernel Mailing List, Henrik Juul Hansen, Bjorn Helgaas,
	Linux PCI

On Wed, May 26, 2021 at 10:28 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, May 26, 2021 at 10:21:59PM +0800, Kai-Heng Feng wrote:
> > To be fair, resuming the NVMe from D3hot is much slower than keep it
> > at D0, which gives us a faster s2idle resume time. And now AMD also
> > requires s2idle on their latest laptops.
>
> We'd much prefer to use it, but due to the broken platforms we can't
> unfortunately.
>
> > And it's more like NVMe controllers don't respect PCI D3hot.
>
> What do you mean with that?

Originally, we found that under s2idle, most NVMe controllers caused
substantially more power if D3hot was used.
We were told by all the major NVMe vendors that D3hot is not
supported. It may also disable APST.
And that's the reason why we have the host-managed power control for s2idle.

IIRC only Samsung NVMes respect D3hot and keeps the power consumption low.

>
> > Because the NVMe continues to work after s2idle and the symbol is
> > rather subtle, so I suspect this is not platform or vendor specific.
> > Is it possible to disable DMA for HMB NVMe on suspend?
>
> Not in shipping products.  The NVMe technical working group is working
> on a way to do that, but it will take a while until that shows up in
> products.

Hmm, then what else can we do? Because D3hot isn't support by the
vendor, does it really stop HMB?

Kai-Heng

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 14:47               ` Kai-Heng Feng
@ 2021-05-26 15:06                 ` Bjorn Helgaas
  2021-05-26 16:24                   ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2021-05-26 15:06 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Christoph Hellwig, Keith Busch, Koba Ko, Jens Axboe,
	Sagi Grimberg, linux-nvme, Linux Kernel Mailing List,
	Henrik Juul Hansen, Bjorn Helgaas, Linux PCI

On Wed, May 26, 2021 at 10:47:13PM +0800, Kai-Heng Feng wrote:
> On Wed, May 26, 2021 at 10:28 PM Christoph Hellwig <hch@lst.de> wrote:
> > On Wed, May 26, 2021 at 10:21:59PM +0800, Kai-Heng Feng wrote:
> > > To be fair, resuming the NVMe from D3hot is much slower than keep it
> > > at D0, which gives us a faster s2idle resume time. And now AMD also
> > > requires s2idle on their latest laptops.
> >
> > We'd much prefer to use it, but due to the broken platforms we can't
> > unfortunately.
> >
> > > And it's more like NVMe controllers don't respect PCI D3hot.
> >
> > What do you mean with that?
> 
> Originally, we found that under s2idle, most NVMe controllers caused
> substantially more power if D3hot was used.
> We were told by all the major NVMe vendors that D3hot is not
> supported. 

What is this supposed to mean?  PCIe r5.0, sec 5.3.1, says

  All Functions must support the D0 and D3 states (both D3Hot and D3Cold).

Since D3hot is required for all functions, I don't think there is a
standard way to discover whether D3hot is supported.  The PM
Capability (sec 7.5.2.1) has D1_Support and D2_Support bits, but no
D3_Support bit.

Are the vendors just saying "sorry, our devices don't conform to the
spec"?

If what you really mean is "D3hot is supported and it works, but it
consumes more power than expected, or the D3hot->D0 transition takes
longer than expected," that's a totally different thing, and you should
say *that*.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 15:06                 ` Bjorn Helgaas
@ 2021-05-26 16:24                   ` Kai-Heng Feng
  2021-05-27 11:40                     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-05-26 16:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, Keith Busch, Koba Ko, Jens Axboe,
	Sagi Grimberg, linux-nvme, Linux Kernel Mailing List,
	Henrik Juul Hansen, Bjorn Helgaas, Linux PCI

On Wed, May 26, 2021 at 11:06 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, May 26, 2021 at 10:47:13PM +0800, Kai-Heng Feng wrote:
> > On Wed, May 26, 2021 at 10:28 PM Christoph Hellwig <hch@lst.de> wrote:
> > > On Wed, May 26, 2021 at 10:21:59PM +0800, Kai-Heng Feng wrote:
> > > > To be fair, resuming the NVMe from D3hot is much slower than keep it
> > > > at D0, which gives us a faster s2idle resume time. And now AMD also
> > > > requires s2idle on their latest laptops.
> > >
> > > We'd much prefer to use it, but due to the broken platforms we can't
> > > unfortunately.
> > >
> > > > And it's more like NVMe controllers don't respect PCI D3hot.
> > >
> > > What do you mean with that?
> >
> > Originally, we found that under s2idle, most NVMe controllers caused
> > substantially more power if D3hot was used.
> > We were told by all the major NVMe vendors that D3hot is not
> > supported.
>
> What is this supposed to mean?  PCIe r5.0, sec 5.3.1, says
>
>   All Functions must support the D0 and D3 states (both D3Hot and D3Cold).
>
> Since D3hot is required for all functions, I don't think there is a
> standard way to discover whether D3hot is supported.  The PM
> Capability (sec 7.5.2.1) has D1_Support and D2_Support bits, but no
> D3_Support bit.
>
> Are the vendors just saying "sorry, our devices don't conform to the
> spec"?

Yes, that's exactly what they said. Because Windows Modern Standby
always keep the NVMe at D0, so D3hot is untested by the vendors.

NVMe vendors explicitly asked us to keep the NVMe controllers at D0 for s2idle.

>
> If what you really mean is "D3hot is supported and it works, but it
> consumes more power than expected,

If D3hot consumes more power than D0, is it really supported?

> or the D3hot->D0 transition takes
> longer than expected," that's a totally different thing, and you should
> say *that*.

The D3hot->D0 transition doesn't take longer than expected. It's just
relatively slower than keeping the NVMe at D0.

Kai-Heng

>
> Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-26 16:24                   ` Kai-Heng Feng
@ 2021-05-27 11:40                     ` Christoph Hellwig
  2021-05-27 12:08                       ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-05-27 11:40 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Bjorn Helgaas, Christoph Hellwig, Keith Busch, Koba Ko,
	Jens Axboe, Sagi Grimberg, linux-nvme, Linux Kernel Mailing List,
	Henrik Juul Hansen, Bjorn Helgaas, Linux PCI

On Thu, May 27, 2021 at 12:24:06AM +0800, Kai-Heng Feng wrote:
> Yes, that's exactly what they said. Because Windows Modern Standby
> always keep the NVMe at D0, so D3hot is untested by the vendors.

So all the problems we've deal with latetly are that platforms cut the
power off.  So it is not kept always at D0 which would be the right thing,
but moved into D3cold.

Which makes this patch to disable D3cold rather counterintuitive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss.
  2021-05-27 11:40                     ` Christoph Hellwig
@ 2021-05-27 12:08                       ` Kai-Heng Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-05-27 12:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Keith Busch, Koba Ko, Jens Axboe, Sagi Grimberg,
	linux-nvme, Linux Kernel Mailing List, Henrik Juul Hansen,
	Bjorn Helgaas, Linux PCI

On Thu, May 27, 2021 at 7:40 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, May 27, 2021 at 12:24:06AM +0800, Kai-Heng Feng wrote:
> > Yes, that's exactly what they said. Because Windows Modern Standby
> > always keep the NVMe at D0, so D3hot is untested by the vendors.
>
> So all the problems we've deal with latetly are that platforms cut the
> power off.  So it is not kept always at D0 which would be the right thing,
> but moved into D3cold.

It's still okay to use D3cold under s2idle if nvme_acpi_storage_d3()
check passes.

Otherwise D0 should be used. I am unsure how to proceed with HMB NVMes
if we can't setup DMA after resume...

>
> Which makes this patch to disable D3cold rather counterintuitive.

The intention of this patch is indeed very unclear.

Kai-Heng

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-27 12:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210520033315.490584-1-koba.ko@canonical.com>
2021-05-25  7:44 ` [PATCH] nvme-pci: Avoid to go into d3cold if device can't use npss Christoph Hellwig
2021-05-25 16:49   ` Kai-Heng Feng
2021-05-25 20:14   ` Bjorn Helgaas
2021-05-26  2:02   ` Koba Ko
2021-05-26  2:49     ` Keith Busch
2021-05-26 12:11       ` Kai-Heng Feng
2021-05-26 12:59         ` Christoph Hellwig
2021-05-26 14:21           ` Kai-Heng Feng
2021-05-26 14:28             ` Christoph Hellwig
2021-05-26 14:47               ` Kai-Heng Feng
2021-05-26 15:06                 ` Bjorn Helgaas
2021-05-26 16:24                   ` Kai-Heng Feng
2021-05-27 11:40                     ` Christoph Hellwig
2021-05-27 12:08                       ` Kai-Heng Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).