linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: fix APST error for power latency tolerance
@ 2021-03-23  7:31 pngliu
  2021-03-23 16:23 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: pngliu @ 2021-03-23  7:31 UTC (permalink / raw)
  To: kbusch; +Cc: linux-nvme, Peng Liu

From: Peng Liu <liupeng17@lenovo.com>

Clear apsta so that nvme_configure_apst() does not execute
nvme_set_features(), which will fail because admin_q is either not set up
yet or no longer available at the time of nvme_uninit_ctrl() being called,
and this leads to the error message "nvme nvme0: failed to set APST feature
(-19)".

Fixes: 510a405d945b("nvme: fix memory leak for power latency tolerance")

Signed-off-by: Peng Liu <liupeng17@lenovo.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e68a8c4ac5a6..413a33c67655 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4475,7 +4475,11 @@ void nvme_uninit_ctrl(struct nvme_ctrl *ctrl)
 {
 	nvme_hwmon_exit(ctrl);
 	nvme_fault_inject_fini(&ctrl->fault_inject);
+
+	/* clear APSTA since admin_q is unavailable for feature setting */
+	ctrl->apsta = 0;
 	dev_pm_qos_hide_latency_tolerance(ctrl->device);
+
 	cdev_device_del(&ctrl->cdev, ctrl->device);
 	nvme_put_ctrl(ctrl);
 }
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: fix APST error for power latency tolerance
  2021-03-23  7:31 [PATCH] nvme: fix APST error for power latency tolerance pngliu
@ 2021-03-23 16:23 ` Christoph Hellwig
  2021-03-24  2:38   ` Peng Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-03-23 16:23 UTC (permalink / raw)
  To: pngliu; +Cc: kbusch, linux-nvme, Peng Liu

On Tue, Mar 23, 2021 at 03:31:33PM +0800, pngliu@hotmail.com wrote:
> From: Peng Liu <liupeng17@lenovo.com>
> 
> Clear apsta so that nvme_configure_apst() does not execute
> nvme_set_features(), which will fail because admin_q is either not set up
> yet or no longer available at the time of nvme_uninit_ctrl() being called,
> and this leads to the error message "nvme nvme0: failed to set APST feature
> (-19)".
> 
> Fixes: 510a405d945b("nvme: fix memory leak for power latency tolerance")

How did you get into this situation?  For PCIe nvme_uninit_ctrl is
only called at the end of ->remove and ->delete_ctrl, so how do we end
up in nvme_configure_apst after that?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: fix APST error for power latency tolerance
  2021-03-23 16:23 ` Christoph Hellwig
@ 2021-03-24  2:38   ` Peng Liu
  2021-03-24  7:58     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Peng Liu @ 2021-03-24  2:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, linux-nvme, Peng Liu

On Tue, Mar 23, 2021 at 04:23:21PM +0000, Christoph Hellwig wrote:
> On Tue, Mar 23, 2021 at 03:31:33PM +0800, pngliu@hotmail.com wrote:
> > From: Peng Liu <liupeng17@lenovo.com>
> > 
> > Clear apsta so that nvme_configure_apst() does not execute
> > nvme_set_features(), which will fail because admin_q is either not set up
> > yet or no longer available at the time of nvme_uninit_ctrl() being called,
> > and this leads to the error message "nvme nvme0: failed to set APST feature
> > (-19)".
> > 
> > Fixes: 510a405d945b("nvme: fix memory leak for power latency tolerance")
> 
> How did you get into this situation?  For PCIe nvme_uninit_ctrl is
> only called at the end of ->remove and ->delete_ctrl, so how do we end
> up in nvme_configure_apst after that?

I got into it with nvme surprise and non-surprise hot-removal tests.
Below is the stack ftrace result for nvme_configure_apst under the
surprise hot-removal, and it is similar for the non-surprise hot-removal.

  irq/165-pciehp-939   [039] ....  1529.443824: nvme_configure_apst
                                          <-nvme_set_latency_tolerance
  irq/165-pciehp-939   [039] ....  1529.443836: <stack trace>
 => nvme_configure_apst
 => nvme_set_latency_tolerance
 => apply_constraint
 => __dev_pm_qos_remove_request
 => __dev_pm_qos_drop_user_request.isra.7
 => dev_pm_qos_update_user_latency_tolerance
 => dev_pm_qos_hide_latency_tolerance
 => nvme_uninit_ctrl
 => nvme_remove
 => pci_device_remove
 => device_release_driver_internal
 => device_release_driver
 => pci_stop_bus_device
 => pci_stop_and_remove_bus_device
 => pciehp_unconfigure_device
 => pciehp_disable_slot
 => pciehp_handle_disable_request
 => pciehp_ist
 => irq_thread_fn
 => irq_thread
 => kthread
 => ret_from_fork

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: fix APST error for power latency tolerance
  2021-03-24  2:38   ` Peng Liu
@ 2021-03-24  7:58     ` Christoph Hellwig
  2021-04-09  7:19       ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-03-24  7:58 UTC (permalink / raw)
  To: Peng Liu; +Cc: Christoph Hellwig, kbusch, linux-nvme, Peng Liu

On Wed, Mar 24, 2021 at 10:38:12AM +0800, Peng Liu wrote:
> On Tue, Mar 23, 2021 at 04:23:21PM +0000, Christoph Hellwig wrote:
> > On Tue, Mar 23, 2021 at 03:31:33PM +0800, pngliu@hotmail.com wrote:
> > > From: Peng Liu <liupeng17@lenovo.com>
> > > 
> > > Clear apsta so that nvme_configure_apst() does not execute
> > > nvme_set_features(), which will fail because admin_q is either not set up
> > > yet or no longer available at the time of nvme_uninit_ctrl() being called,
> > > and this leads to the error message "nvme nvme0: failed to set APST feature
> > > (-19)".
> > > 
> > > Fixes: 510a405d945b("nvme: fix memory leak for power latency tolerance")
> > 
> > How did you get into this situation?  For PCIe nvme_uninit_ctrl is
> > only called at the end of ->remove and ->delete_ctrl, so how do we end
> > up in nvme_configure_apst after that?
> 
> I got into it with nvme surprise and non-surprise hot-removal tests.
> Below is the stack ftrace result for nvme_configure_apst under the
> surprise hot-removal, and it is similar for the non-surprise hot-removal.

Ok, looks like dev_pm_qos_hide_latency_tolerance calls back into
nvme_set_latency_tolerance, which is a little .. unexpected.

Does this patch work for you?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642beba..d5d7e0cdd78d80 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2681,7 +2681,8 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
 
 	if (ctrl->ps_max_latency_us != latency) {
 		ctrl->ps_max_latency_us = latency;
-		nvme_configure_apst(ctrl);
+		if (ctrl->state == NVME_CTRL_LIVE)
+			nvme_configure_apst(ctrl);
 	}
 }
 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: fix APST error for power latency tolerance
  2021-03-24  7:58     ` Christoph Hellwig
@ 2021-04-09  7:19       ` Christoph Hellwig
  2021-04-09  8:12         ` Peng Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-04-09  7:19 UTC (permalink / raw)
  To: Peng Liu; +Cc: Christoph Hellwig, kbusch, linux-nvme, Peng Liu

On Wed, Mar 24, 2021 at 07:58:24AM +0000, Christoph Hellwig wrote:
> On Wed, Mar 24, 2021 at 10:38:12AM +0800, Peng Liu wrote:
> > On Tue, Mar 23, 2021 at 04:23:21PM +0000, Christoph Hellwig wrote:
> > > On Tue, Mar 23, 2021 at 03:31:33PM +0800, pngliu@hotmail.com wrote:
> > > > From: Peng Liu <liupeng17@lenovo.com>
> > > > 
> > > > Clear apsta so that nvme_configure_apst() does not execute
> > > > nvme_set_features(), which will fail because admin_q is either not set up
> > > > yet or no longer available at the time of nvme_uninit_ctrl() being called,
> > > > and this leads to the error message "nvme nvme0: failed to set APST feature
> > > > (-19)".
> > > > 
> > > > Fixes: 510a405d945b("nvme: fix memory leak for power latency tolerance")
> > > 
> > > How did you get into this situation?  For PCIe nvme_uninit_ctrl is
> > > only called at the end of ->remove and ->delete_ctrl, so how do we end
> > > up in nvme_configure_apst after that?
> > 
> > I got into it with nvme surprise and non-surprise hot-removal tests.
> > Below is the stack ftrace result for nvme_configure_apst under the
> > surprise hot-removal, and it is similar for the non-surprise hot-removal.
> 
> Ok, looks like dev_pm_qos_hide_latency_tolerance calls back into
> nvme_set_latency_tolerance, which is a little .. unexpected.
> 
> Does this patch work for you?

ping?

> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0896e21642beba..d5d7e0cdd78d80 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2681,7 +2681,8 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
>  
>  	if (ctrl->ps_max_latency_us != latency) {
>  		ctrl->ps_max_latency_us = latency;
> -		nvme_configure_apst(ctrl);
> +		if (ctrl->state == NVME_CTRL_LIVE)
> +			nvme_configure_apst(ctrl);
>  	}
>  }
>  
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: fix APST error for power latency tolerance
  2021-04-09  7:19       ` Christoph Hellwig
@ 2021-04-09  8:12         ` Peng Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Peng Liu @ 2021-04-09  8:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, linux-nvme, Peng Liu

On Fri, Apr 09, 2021 at 08:19:21AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 24, 2021 at 07:58:24AM +0000, Christoph Hellwig wrote:
> > On Wed, Mar 24, 2021 at 10:38:12AM +0800, Peng Liu wrote:
> > > On Tue, Mar 23, 2021 at 04:23:21PM +0000, Christoph Hellwig wrote:
> > > > On Tue, Mar 23, 2021 at 03:31:33PM +0800, pngliu@hotmail.com wrote:
> > > > > From: Peng Liu <liupeng17@lenovo.com>
> > > > > 
> > > > > Clear apsta so that nvme_configure_apst() does not execute
> > > > > nvme_set_features(), which will fail because admin_q is either not set up
> > > > > yet or no longer available at the time of nvme_uninit_ctrl() being called,
> > > > > and this leads to the error message "nvme nvme0: failed to set APST feature
> > > > > (-19)".
> > > > > 
> > > > > Fixes: 510a405d945b("nvme: fix memory leak for power latency tolerance")
> > > > 
> > > > How did you get into this situation?  For PCIe nvme_uninit_ctrl is
> > > > only called at the end of ->remove and ->delete_ctrl, so how do we end
> > > > up in nvme_configure_apst after that?
> > > 
> > > I got into it with nvme surprise and non-surprise hot-removal tests.
> > > Below is the stack ftrace result for nvme_configure_apst under the
> > > surprise hot-removal, and it is similar for the non-surprise hot-removal.
> > 
> > Ok, looks like dev_pm_qos_hide_latency_tolerance calls back into
> > nvme_set_latency_tolerance, which is a little .. unexpected.
> > 
> > Does this patch work for you?
> 
> ping?
> 

Sorry for late reply. The nvme disk with APST feature is not available
till now. Yes, it works. I tested it with nvme hot-removal surprise and
non-surprise.

> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 0896e21642beba..d5d7e0cdd78d80 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2681,7 +2681,8 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
> >  
> >  	if (ctrl->ps_max_latency_us != latency) {
> >  		ctrl->ps_max_latency_us = latency;
> > -		nvme_configure_apst(ctrl);
> > +		if (ctrl->state == NVME_CTRL_LIVE)
> > +			nvme_configure_apst(ctrl);
> >  	}
> >  }
> >  
> > 
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme
> ---end quoted text---

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-09  8:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23  7:31 [PATCH] nvme: fix APST error for power latency tolerance pngliu
2021-03-23 16:23 ` Christoph Hellwig
2021-03-24  2:38   ` Peng Liu
2021-03-24  7:58     ` Christoph Hellwig
2021-04-09  7:19       ` Christoph Hellwig
2021-04-09  8:12         ` Peng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).