All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drivers/nvme: save/restore HMB on suspend/resume
@ 2019-07-30 20:09 Charles.Hyde
  2019-07-30 20:48 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Charles.Hyde @ 2019-07-30 20:09 UTC (permalink / raw)


LiteOn CL1 devices allocate host memory buffer at initialization.
This patch saves and restores the host memory buffer allocation
for any NVMe device which has HMB.  Devices which have on-board
memory buffers are not impacted.  This patch has been tested locally
with the following devices: LiteOn CL1 and CA3, Hynix BC511 and
PC601, WDC SN520 and SN720.  This patch has also been tested by
our partners at Wistron and Compal.

Signed-off-by: Charles Hyde <charles_hyde at dellteam.com>
Cc: Mario Limonciello <mario.limonciello at dell.com>
Cc: Rajat Jain <rajatja at google.com>
---
 drivers/nvme/host/pci.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index db160cee42ad..7c07d6ddbbac 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2847,8 +2847,20 @@ static int nvme_resume(struct device *dev)
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 
 	if (pm_resume_via_firmware() || !ctrl->npss ||
-	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
-		nvme_reset_ctrl(ctrl);
+	    nvme_set_power_state(ctrl, ndev->last_ps) != 0) {
+		goto reset;
+	}
+
+	/*
+	 * If the device has allocated system memory for host memory buffer,
+	 * such as LiteOn CL1 devices, deal with it.
+	 */
+	if (ndev->host_mem_descs) {
+		if (nvme_setup_host_mem(ndev) != 0) {
+reset:
+			nvme_reset_ctrl(ctrl);
+		}
+	}
 	return 0;
 }
 
@@ -2880,6 +2892,16 @@ static int nvme_suspend(struct device *dev)
 	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
 		goto unfreeze;
 
+	/*
+	 * If the device has allocated system memory for host memory buffer,
+	 * such as LiteOn CL1 devices, we need to deal with that.
+	 */
+	if (ndev->host_mem_descs) {
+		ret = nvme_set_host_mem(ndev, 0);
+		if (ret < 0)
+			goto unfreeze;
+	}
+
 	ndev->last_ps = 0;
 	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
 	if (ret < 0)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] drivers/nvme: save/restore HMB on suspend/resume
  2019-07-30 20:09 [PATCH] drivers/nvme: save/restore HMB on suspend/resume Charles.Hyde
@ 2019-07-30 20:48 ` Keith Busch
  2019-07-30 20:59   ` Charles.Hyde
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2019-07-30 20:48 UTC (permalink / raw)


On Tue, Jul 30, 2019@08:09:15PM +0000, Charles.Hyde@dellteam.com wrote:
> LiteOn CL1 devices allocate host memory buffer at initialization.
> This patch saves and restores the host memory buffer allocation
> for any NVMe device which has HMB.  Devices which have on-board
> memory buffers are not impacted.  This patch has been tested locally
> with the following devices: LiteOn CL1 and CA3, Hynix BC511 and
> PC601, WDC SN520 and SN720.  This patch has also been tested by
> our partners at Wistron and Compal.

Please explain why you're doing this rather than what you're doing. We
previously concluded that NVMe power states have no spec defined impact
on HMB, so changing driver behavior requires justification.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drivers/nvme: save/restore HMB on suspend/resume
  2019-07-30 20:48 ` Keith Busch
@ 2019-07-30 20:59   ` Charles.Hyde
  2019-07-30 21:10     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Charles.Hyde @ 2019-07-30 20:59 UTC (permalink / raw)


> > LiteOn CL1 devices allocate host memory buffer at initialization.
> > This patch saves and restores the host memory buffer allocation for
> > any NVMe device which has HMB.  Devices which have on-board memory
> > buffers are not impacted.  This patch has been tested locally with the
> > following devices: LiteOn CL1 and CA3, Hynix BC511 and PC601, WDC
> > SN520 and SN720.  This patch has also been tested by our partners at
> > Wistron and Compal.
> 
> Please explain why you're doing this rather than what you're doing. We
> previously concluded that NVMe power states have no spec defined impact
> on HMB, so changing driver behavior requires justification.

Why this change is necessary is because LiteOn CL1 devices would
effectively freeze when coming out of S0ix.  The user perspective is
that, although the Linux kernel is still running in RAM, no commands
could be executed because the CL1 device had no memory buffer, or the
memory buffer was not in the same location; the memory buffer address
was not discernable by me because I had no ability to run commands in
this condition.  After implementing this change, these same CL1 devices
function properly and command access is restored.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drivers/nvme: save/restore HMB on suspend/resume
  2019-07-30 20:59   ` Charles.Hyde
@ 2019-07-30 21:10     ` Keith Busch
  2019-07-30 21:32       ` Charles.Hyde
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2019-07-30 21:10 UTC (permalink / raw)


On Tue, Jul 30, 2019@08:59:38PM +0000, Charles.Hyde@dellteam.com wrote:
> > > LiteOn CL1 devices allocate host memory buffer at initialization.
> > > This patch saves and restores the host memory buffer allocation for
> > > any NVMe device which has HMB.  Devices which have on-board memory
> > > buffers are not impacted.  This patch has been tested locally with the
> > > following devices: LiteOn CL1 and CA3, Hynix BC511 and PC601, WDC
> > > SN520 and SN720.  This patch has also been tested by our partners at
> > > Wistron and Compal.
> > 
> > Please explain why you're doing this rather than what you're doing. We
> > previously concluded that NVMe power states have no spec defined impact
> > on HMB, so changing driver behavior requires justification.
> 
> Why this change is necessary is because LiteOn CL1 devices would
> effectively freeze when coming out of S0ix.  The user perspective is
> that, although the Linux kernel is still running in RAM, no commands
> could be executed because the CL1 device had no memory buffer, or the
> memory buffer was not in the same location; the memory buffer address
> was not discernable by me because I had no ability to run commands in
> this condition.  After implementing this change, these same CL1 devices
> function properly and command access is restored.

But why does the device need this? The kernel has not relocated the HMB,
and it hasn't removed control of the HMB from the device either. We made
a concious choice to not disable HMB in order to get the fastest resume
latency, and additional HMB management is not supposed to be necessary
in the first place. So what's going on with this device that it needs
the driver to disable/enable HMB? Is the nvme power state breaking its
ability to use it or something else?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drivers/nvme: save/restore HMB on suspend/resume
  2019-07-30 21:10     ` Keith Busch
@ 2019-07-30 21:32       ` Charles.Hyde
  2019-07-30 21:34         ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Charles.Hyde @ 2019-07-30 21:32 UTC (permalink / raw)


> > > > LiteOn CL1 devices allocate host memory buffer at initialization.
> > > > This patch saves and restores the host memory buffer allocation
> > > > for any NVMe device which has HMB.  Devices which have on-board
> > > > memory buffers are not impacted.  This patch has been tested
> > > > locally with the following devices: LiteOn CL1 and CA3, Hynix
> > > > BC511 and PC601, WDC
> > > > SN520 and SN720.  This patch has also been tested by our partners
> > > > at Wistron and Compal.
> > >
> > > Please explain why you're doing this rather than what you're doing.
> > > We previously concluded that NVMe power states have no spec defined
> > > impact on HMB, so changing driver behavior requires justification.
> >
> > Why this change is necessary is because LiteOn CL1 devices would
> > effectively freeze when coming out of S0ix.  The user perspective is
> > that, although the Linux kernel is still running in RAM, no commands
> > could be executed because the CL1 device had no memory buffer, or the
> > memory buffer was not in the same location; the memory buffer address
> > was not discernable by me because I had no ability to run commands in
> > this condition.  After implementing this change, these same CL1
> > devices function properly and command access is restored.
> 
> But why does the device need this? The kernel has not relocated the HMB,
> and it hasn't removed control of the HMB from the device either. We made a
> concious choice to not disable HMB in order to get the fastest resume
> latency, and additional HMB management is not supposed to be necessary in
> the first place. So what's going on with this device that it needs the driver to
> disable/enable HMB? Is the nvme power state breaking its ability to use it or
> something else?

Without this change, I have observed 100% failure rate with LiteOn CL1
devices.  I was informed yesterday the vendor is investigating a root
cause of S0ix failures with their device, I have not heard anything
further.  Like you, I would wish this patch is not needed, and maybe it
will not be when the vendor finishes their failure analysis.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drivers/nvme: save/restore HMB on suspend/resume
  2019-07-30 21:32       ` Charles.Hyde
@ 2019-07-30 21:34         ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2019-07-30 21:34 UTC (permalink / raw)


On Tue, Jul 30, 2019@09:32:13PM +0000, Charles.Hyde@dellteam.com wrote:
> Without this change, I have observed 100% failure rate with LiteOn CL1
> devices.  I was informed yesterday the vendor is investigating a root
> cause of S0ix failures with their device, I have not heard anything
> further.  Like you, I would wish this patch is not needed, and maybe it
> will not be when the vendor finishes their failure analysis.

Okay, let's wait to hear root cause before considering work-arounds.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-30 21:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 20:09 [PATCH] drivers/nvme: save/restore HMB on suspend/resume Charles.Hyde
2019-07-30 20:48 ` Keith Busch
2019-07-30 20:59   ` Charles.Hyde
2019-07-30 21:10     ` Keith Busch
2019-07-30 21:32       ` Charles.Hyde
2019-07-30 21:34         ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.