All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug] nvme blocks PC10 since v5.15 - bisected
@ 2022-01-21 19:00 Rafael J. Wysocki
  2022-01-21 21:09 ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2022-01-21 19:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Len Brown, Linux PM,
	Linux Kernel Mailing List, m.heingbecker, linux-nvme

Hi Keith,

It is reported that the following commit

commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
Author: Keith Busch <kbusch@kernel.org>
Date:   Tue Jul 27 09:40:44 2021 -0700

   nvme-pci: disable hmb on idle suspend

   An idle suspend may or may not disable host memory access from devices
   placed in low power mode. Either way, it should always be safe to
   disable the host memory buffer prior to entering the low power mode, and
   this should also always be faster than a full device shutdown.

   Signed-off-by: Keith Busch <kbusch@kernel.org>
   Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
   Signed-off-by: Christoph Hellwig <hch@lst.de>

is the source of a serious power regression occurring since 5.15
(please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).

After this commit, the SoC on the affected system cannot enter
C-states deeper than PC2 while suspended to idle which basically
defeats the purpose of suspending.

What may be happening is that nvme_disable_prepare_reset() that is not
called any more in the ndev->nr_host_mem_descs case somehow causes the
LTR of the device to change to "no requirement" which allows deeper
C-states to be entered.

Can you have a look at this, please?

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-01-21 19:00 [Bug] nvme blocks PC10 since v5.15 - bisected Rafael J. Wysocki
@ 2022-01-21 21:09 ` Keith Busch
  2022-01-27 19:02   ` Rafael J. Wysocki
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2022-01-21 21:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sagi Grimberg, Christoph Hellwig, Len Brown, Linux PM,
	Linux Kernel Mailing List, m.heingbecker, linux-nvme

On Fri, Jan 21, 2022 at 08:00:49PM +0100, Rafael J. Wysocki wrote:
> Hi Keith,
> 
> It is reported that the following commit
> 
> commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
> Author: Keith Busch <kbusch@kernel.org>
> Date:   Tue Jul 27 09:40:44 2021 -0700
> 
>    nvme-pci: disable hmb on idle suspend
> 
>    An idle suspend may or may not disable host memory access from devices
>    placed in low power mode. Either way, it should always be safe to
>    disable the host memory buffer prior to entering the low power mode, and
>    this should also always be faster than a full device shutdown.
> 
>    Signed-off-by: Keith Busch <kbusch@kernel.org>
>    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
>    Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> is the source of a serious power regression occurring since 5.15
> (please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).
> 
> After this commit, the SoC on the affected system cannot enter
> C-states deeper than PC2 while suspended to idle which basically
> defeats the purpose of suspending.
> 
> What may be happening is that nvme_disable_prepare_reset() that is not
> called any more in the ndev->nr_host_mem_descs case somehow causes the
> LTR of the device to change to "no requirement" which allows deeper
> C-states to be entered.
> 
> Can you have a look at this, please?

I thought platforms that wanted full device shutdown  behaviour would
always set acpi_storage_d3. Is that not happening here?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-01-21 21:09 ` Keith Busch
@ 2022-01-27 19:02   ` Rafael J. Wysocki
  2022-01-27 19:30     ` Keith Busch
  2022-02-10 14:56     ` Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2022-01-27 19:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Rafael J. Wysocki, Sagi Grimberg, Christoph Hellwig, Len Brown,
	Linux PM, Linux Kernel Mailing List, m.heingbecker, linux-nvme

On Fri, Jan 21, 2022 at 10:09 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Jan 21, 2022 at 08:00:49PM +0100, Rafael J. Wysocki wrote:
> > Hi Keith,
> >
> > It is reported that the following commit
> >
> > commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
> > Author: Keith Busch <kbusch@kernel.org>
> > Date:   Tue Jul 27 09:40:44 2021 -0700
> >
> >    nvme-pci: disable hmb on idle suspend
> >
> >    An idle suspend may or may not disable host memory access from devices
> >    placed in low power mode. Either way, it should always be safe to
> >    disable the host memory buffer prior to entering the low power mode, and
> >    this should also always be faster than a full device shutdown.
> >
> >    Signed-off-by: Keith Busch <kbusch@kernel.org>
> >    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> >    Signed-off-by: Christoph Hellwig <hch@lst.de>
> >
> > is the source of a serious power regression occurring since 5.15
> > (please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).
> >
> > After this commit, the SoC on the affected system cannot enter
> > C-states deeper than PC2 while suspended to idle which basically
> > defeats the purpose of suspending.
> >
> > What may be happening is that nvme_disable_prepare_reset() that is not
> > called any more in the ndev->nr_host_mem_descs case somehow causes the
> > LTR of the device to change to "no requirement" which allows deeper
> > C-states to be entered.
> >
> > Can you have a look at this, please?
>
> I thought platforms that wanted full device shutdown  behaviour would
> always set acpi_storage_d3. Is that not happening here?

Evidently, it isn't.

Also that flag is about putting the device into D3, which need not be
necessary as long as the LTR is set to "don't care".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-01-27 19:02   ` Rafael J. Wysocki
@ 2022-01-27 19:30     ` Keith Busch
  2022-01-27 19:41       ` Rafael J. Wysocki
  2022-02-10 14:56     ` Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Keith Busch @ 2022-01-27 19:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sagi Grimberg, Christoph Hellwig, Len Brown, Linux PM,
	Linux Kernel Mailing List, m.heingbecker, linux-nvme

On Thu, Jan 27, 2022 at 08:02:07PM +0100, Rafael J. Wysocki wrote:
> On Fri, Jan 21, 2022 at 10:09 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Fri, Jan 21, 2022 at 08:00:49PM +0100, Rafael J. Wysocki wrote:
> > > Hi Keith,
> > >
> > > It is reported that the following commit
> > >
> > > commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
> > > Author: Keith Busch <kbusch@kernel.org>
> > > Date:   Tue Jul 27 09:40:44 2021 -0700
> > >
> > >    nvme-pci: disable hmb on idle suspend
> > >
> > >    An idle suspend may or may not disable host memory access from devices
> > >    placed in low power mode. Either way, it should always be safe to
> > >    disable the host memory buffer prior to entering the low power mode, and
> > >    this should also always be faster than a full device shutdown.
> > >
> > >    Signed-off-by: Keith Busch <kbusch@kernel.org>
> > >    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> > >    Signed-off-by: Christoph Hellwig <hch@lst.de>
> > >
> > > is the source of a serious power regression occurring since 5.15
> > > (please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).
> > >
> > > After this commit, the SoC on the affected system cannot enter
> > > C-states deeper than PC2 while suspended to idle which basically
> > > defeats the purpose of suspending.
> > >
> > > What may be happening is that nvme_disable_prepare_reset() that is not
> > > called any more in the ndev->nr_host_mem_descs case somehow causes the
> > > LTR of the device to change to "no requirement" which allows deeper
> > > C-states to be entered.
> > >
> > > Can you have a look at this, please?
> >
> > I thought platforms that wanted full device shutdown  behaviour would
> > always set acpi_storage_d3. Is that not happening here?
> 
> Evidently, it isn't.
> 
> Also that flag is about putting the device into D3, which need not be
> necessary as long as the LTR is set to "don't care".

The only NVMe spec guidance for a driver to initiate a controller
shutdown is to prepare for D3 transition. If this platform wants a full
device shutdown without D3, then I think we may need a quirk.

We did a shutdown before because we didn't know any better and it's the
safeset thing to do. That caused complaints for excessive resume
latency, so now we have a platform indicator to tell us if we should,
and we rely on that. Are you suggesting we instead consult the PCIe LTR
in addition to ACPI storage properties?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-01-27 19:30     ` Keith Busch
@ 2022-01-27 19:41       ` Rafael J. Wysocki
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2022-01-27 19:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: Rafael J. Wysocki, Sagi Grimberg, Christoph Hellwig, Len Brown,
	Linux PM, Linux Kernel Mailing List, m.heingbecker, linux-nvme

On Thu, Jan 27, 2022 at 8:30 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Thu, Jan 27, 2022 at 08:02:07PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Jan 21, 2022 at 10:09 PM Keith Busch <kbusch@kernel.org> wrote:
> > >
> > > On Fri, Jan 21, 2022 at 08:00:49PM +0100, Rafael J. Wysocki wrote:
> > > > Hi Keith,
> > > >
> > > > It is reported that the following commit
> > > >
> > > > commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
> > > > Author: Keith Busch <kbusch@kernel.org>
> > > > Date:   Tue Jul 27 09:40:44 2021 -0700
> > > >
> > > >    nvme-pci: disable hmb on idle suspend
> > > >
> > > >    An idle suspend may or may not disable host memory access from devices
> > > >    placed in low power mode. Either way, it should always be safe to
> > > >    disable the host memory buffer prior to entering the low power mode, and
> > > >    this should also always be faster than a full device shutdown.
> > > >
> > > >    Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > >    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> > > >    Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > >
> > > > is the source of a serious power regression occurring since 5.15
> > > > (please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).
> > > >
> > > > After this commit, the SoC on the affected system cannot enter
> > > > C-states deeper than PC2 while suspended to idle which basically
> > > > defeats the purpose of suspending.
> > > >
> > > > What may be happening is that nvme_disable_prepare_reset() that is not
> > > > called any more in the ndev->nr_host_mem_descs case somehow causes the
> > > > LTR of the device to change to "no requirement" which allows deeper
> > > > C-states to be entered.
> > > >
> > > > Can you have a look at this, please?
> > >
> > > I thought platforms that wanted full device shutdown  behaviour would
> > > always set acpi_storage_d3. Is that not happening here?
> >
> > Evidently, it isn't.
> >
> > Also that flag is about putting the device into D3, which need not be
> > necessary as long as the LTR is set to "don't care".
>
> The only NVMe spec guidance for a driver to initiate a controller
> shutdown is to prepare for D3 transition. If this platform wants a full
> device shutdown without D3, then I think we may need a quirk.
>
> We did a shutdown before because we didn't know any better and it's the
> safeset thing to do. That caused complaints for excessive resume
> latency, so now we have a platform indicator to tell us if we should,
> and we rely on that. Are you suggesting we instead consult the PCIe LTR
> in addition to ACPI storage properties?

Possibly.

The point is that there is a regression on this particular system
caused by the above change.  It needs to be dealt with this way or
another.  Doing an additional LTR check may be the way to go, but it
needs to be verified.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-01-27 19:02   ` Rafael J. Wysocki
  2022-01-27 19:30     ` Keith Busch
@ 2022-02-10 14:56     ` Keith Busch
  2022-02-10 18:24       ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Keith Busch @ 2022-02-10 14:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sagi Grimberg, Christoph Hellwig, Len Brown, Linux PM,
	Linux Kernel Mailing List, m.heingbecker, linux-nvme

On Thu, Jan 27, 2022 at 08:02:07PM +0100, Rafael J. Wysocki wrote:
> On Fri, Jan 21, 2022 at 10:09 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Fri, Jan 21, 2022 at 08:00:49PM +0100, Rafael J. Wysocki wrote:
> > > Hi Keith,
> > >
> > > It is reported that the following commit
> > >
> > > commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
> > > Author: Keith Busch <kbusch@kernel.org>
> > > Date:   Tue Jul 27 09:40:44 2021 -0700
> > >
> > >    nvme-pci: disable hmb on idle suspend
> > >
> > >    An idle suspend may or may not disable host memory access from devices
> > >    placed in low power mode. Either way, it should always be safe to
> > >    disable the host memory buffer prior to entering the low power mode, and
> > >    this should also always be faster than a full device shutdown.
> > >
> > >    Signed-off-by: Keith Busch <kbusch@kernel.org>
> > >    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> > >    Signed-off-by: Christoph Hellwig <hch@lst.de>
> > >
> > > is the source of a serious power regression occurring since 5.15
> > > (please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).
> > >
> > > After this commit, the SoC on the affected system cannot enter
> > > C-states deeper than PC2 while suspended to idle which basically
> > > defeats the purpose of suspending.
> > >
> > > What may be happening is that nvme_disable_prepare_reset() that is not
> > > called any more in the ndev->nr_host_mem_descs case somehow causes the
> > > LTR of the device to change to "no requirement" which allows deeper
> > > C-states to be entered.
> > >
> > > Can you have a look at this, please?
> >
> > I thought platforms that wanted full device shutdown  behaviour would
> > always set acpi_storage_d3. Is that not happening here?
> 
> Evidently, it isn't.

Apparently it works fine when you disable VMD, so sounds like the
acpi_storage_d3 is set, but we fail to find the correct acpi companion
device when it's in a VMD domain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-02-10 14:56     ` Keith Busch
@ 2022-02-10 18:24       ` Christoph Hellwig
  2022-02-10 18:54         ` Jonathan Derrick
  2022-02-10 18:54         ` Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2022-02-10 18:24 UTC (permalink / raw)
  To: Keith Busch
  Cc: Rafael J. Wysocki, Sagi Grimberg, Christoph Hellwig, Len Brown,
	Linux PM, Linux Kernel Mailing List, m.heingbecker, linux-nvme,
	Nirmal Patel, Jonathan Derrick

On Thu, Feb 10, 2022 at 06:56:35AM -0800, Keith Busch wrote:
> Apparently it works fine when you disable VMD, so sounds like the
> acpi_storage_d3 is set, but we fail to find the correct acpi companion
> device when it's in a VMD domain.

I guess the acpi_storage_d3 is set on the VMD device and we need
to propagate that down the entire bus hanging off it.

Which kinda makes sense in the twisted world where vmd was invented,
given that vmd is Intel's evil plot so that only their Windows driver
can bind to these devices, so the property also needs to be set on
the vmd device.

Nirmal and Jonathan, can you help to sort this mess out?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-02-10 18:24       ` Christoph Hellwig
@ 2022-02-10 18:54         ` Jonathan Derrick
  2022-02-10 18:54         ` Keith Busch
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Derrick @ 2022-02-10 18:54 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Rafael J. Wysocki, Sagi Grimberg, Len Brown, Linux PM,
	Linux Kernel Mailing List, m.heingbecker, linux-nvme,
	Nirmal Patel



On 2/10/2022 11:24 AM, Christoph Hellwig wrote:
> On Thu, Feb 10, 2022 at 06:56:35AM -0800, Keith Busch wrote:
>> Apparently it works fine when you disable VMD, so sounds like the
>> acpi_storage_d3 is set, but we fail to find the correct acpi companion
>> device when it's in a VMD domain.
> 
> I guess the acpi_storage_d3 is set on the VMD device and we need
> to propagate that down the entire bus hanging off it.
This is all you needed to say

> 
> Which kinda makes sense in the twisted world where vmd was invented,
> given that vmd is Intel's evil plot so that only their Windows driver
> can bind to these devices, 
Is this relevant or necessary?


so the property also needs to be set on
> the vmd device.
> 
> Nirmal and Jonathan, can you help to sort this mess out?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug] nvme blocks PC10 since v5.15 - bisected
  2022-02-10 18:24       ` Christoph Hellwig
  2022-02-10 18:54         ` Jonathan Derrick
@ 2022-02-10 18:54         ` Keith Busch
  1 sibling, 0 replies; 9+ messages in thread
From: Keith Busch @ 2022-02-10 18:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rafael J. Wysocki, Sagi Grimberg, Len Brown, Linux PM,
	Linux Kernel Mailing List, m.heingbecker, linux-nvme,
	Nirmal Patel, Jonathan Derrick

On Thu, Feb 10, 2022 at 07:24:23PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 10, 2022 at 06:56:35AM -0800, Keith Busch wrote:
> > Apparently it works fine when you disable VMD, so sounds like the
> > acpi_storage_d3 is set, but we fail to find the correct acpi companion
> > device when it's in a VMD domain.
> 
> I guess the acpi_storage_d3 is set on the VMD device and we need
> to propagate that down the entire bus hanging off it.
> 
> Which kinda makes sense in the twisted world where vmd was invented,
> given that vmd is Intel's evil plot so that only their Windows driver
> can bind to these devices, so the property also needs to be set on
> the vmd device.
> 
> Nirmal and Jonathan, can you help to sort this mess out?

Just fyi, I'm not sure now if my previous comment is entirely accurate.
I'll get some more info from the reporter to confirm.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-02-10 18:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21 19:00 [Bug] nvme blocks PC10 since v5.15 - bisected Rafael J. Wysocki
2022-01-21 21:09 ` Keith Busch
2022-01-27 19:02   ` Rafael J. Wysocki
2022-01-27 19:30     ` Keith Busch
2022-01-27 19:41       ` Rafael J. Wysocki
2022-02-10 14:56     ` Keith Busch
2022-02-10 18:24       ` Christoph Hellwig
2022-02-10 18:54         ` Jonathan Derrick
2022-02-10 18:54         ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.