All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: fix hang in remove path
@ 2017-06-02  8:32 ` Ming Lei
  0 siblings, 0 replies; 22+ messages in thread
From: Ming Lei @ 2017-06-02  8:32 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Johannes Thumshirn
  Cc: linux-nvme, linux-block, Ming Lei

We need to start admin queues too in nvme_kill_queues()
for avoiding hang in remove path[1].

This patch is very similar with 806f026f9b901eaf(nvme: use
blk_mq_start_hw_queues() in nvme_kill_queues()).

[1] hang stack trace
[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a60926410438..0f9cc0c55e15 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2438,6 +2438,10 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
+
+	/* Forcibly start all queues to avoid having stuck requests */
+	blk_mq_start_hw_queues(ctrl->admin_q);
+
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		/*
 		 * Revalidating a dead namespace sets capacity to 0. This will
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-02  8:32 ` Ming Lei
  0 siblings, 0 replies; 22+ messages in thread
From: Ming Lei @ 2017-06-02  8:32 UTC (permalink / raw)


We need to start admin queues too in nvme_kill_queues()
for avoiding hang in remove path[1].

This patch is very similar with 806f026f9b901eaf(nvme: use
blk_mq_start_hw_queues() in nvme_kill_queues()).

[1] hang stack trace
[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Reported-by: Rakesh Pandit <rakesh at tuxera.com>
Tested-by: Rakesh Pandit <rakesh at tuxera.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a60926410438..0f9cc0c55e15 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2438,6 +2438,10 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
+
+	/* Forcibly start all queues to avoid having stuck requests */
+	blk_mq_start_hw_queues(ctrl->admin_q);
+
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		/*
 		 * Revalidating a dead namespace sets capacity to 0. This will
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-02  8:32 ` Ming Lei
@ 2017-06-02 13:36   ` Sagi Grimberg
  -1 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2017-06-02 13:36 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Keith Busch, Christoph Hellwig, Johannes Thumshirn
  Cc: linux-nvme, linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-02 13:36   ` Sagi Grimberg
  0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2017-06-02 13:36 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-02  8:32 ` Ming Lei
@ 2017-06-02 18:04   ` Rakesh Pandit
  -1 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-02 18:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Johannes Thumshirn, linux-nvme, linux-block

Hi Ming,

On Fri, Jun 02, 2017 at 04:32:08PM +0800, Ming Lei wrote:
> We need to start admin queues too in nvme_kill_queues()
> for avoiding hang in remove path[1].
> 
> This patch is very similar with 806f026f9b901eaf(nvme: use
> blk_mq_start_hw_queues() in nvme_kill_queues()).

It would make sense to still add:

if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
	return

inside nvme_configure_apst at the top irrespective of this change.

It would make sure that no further instructions are executed
(possibily no commands of any sort are sent ever) if we already know
that there is no point sending latency tolerance update to controller
if it is already dead or in process of deletion.

> 
> [1] hang stack trace
> [<ffffffff813c9716>] blk_execute_rq+0x56/0x80
> [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
> [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
> [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
> [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
> [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
> [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
> [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
> [<ffffffff8156d951>] device_del+0x101/0x320
> [<ffffffff8156db8a>] device_unregister+0x1a/0x60
> [<ffffffff8156dc4c>] device_destroy+0x3c/0x50
> [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
> [<ffffffff815d4858>] nvme_remove+0x78/0x110
> [<ffffffff81452b69>] pci_device_remove+0x39/0xb0
> [<ffffffff81572935>] device_release_driver_internal+0x155/0x210
> [<ffffffff81572a02>] device_release_driver+0x12/0x20
> [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
> [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
> [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
> [<ffffffff810c5ac9>] kthread+0x109/0x140
> [<ffffffff8185800c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
> Reported-by: Rakesh Pandit <rakesh@tuxera.com>
> Tested-by: Rakesh Pandit <rakesh@tuxera.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/nvme/host/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a60926410438..0f9cc0c55e15 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2438,6 +2438,10 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
> +
> +	/* Forcibly start all queues to avoid having stuck requests */
> +	blk_mq_start_hw_queues(ctrl->admin_q);
> +
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		/*
>  		 * Revalidating a dead namespace sets capacity to 0. This will
> -- 
> 2.9.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-02 18:04   ` Rakesh Pandit
  0 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-02 18:04 UTC (permalink / raw)


Hi Ming,

On Fri, Jun 02, 2017@04:32:08PM +0800, Ming Lei wrote:
> We need to start admin queues too in nvme_kill_queues()
> for avoiding hang in remove path[1].
> 
> This patch is very similar with 806f026f9b901eaf(nvme: use
> blk_mq_start_hw_queues() in nvme_kill_queues()).

It would make sense to still add:

if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
	return

inside nvme_configure_apst at the top irrespective of this change.

It would make sure that no further instructions are executed
(possibily no commands of any sort are sent ever) if we already know
that there is no point sending latency tolerance update to controller
if it is already dead or in process of deletion.

> 
> [1] hang stack trace
> [<ffffffff813c9716>] blk_execute_rq+0x56/0x80
> [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
> [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
> [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
> [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
> [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
> [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
> [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
> [<ffffffff8156d951>] device_del+0x101/0x320
> [<ffffffff8156db8a>] device_unregister+0x1a/0x60
> [<ffffffff8156dc4c>] device_destroy+0x3c/0x50
> [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
> [<ffffffff815d4858>] nvme_remove+0x78/0x110
> [<ffffffff81452b69>] pci_device_remove+0x39/0xb0
> [<ffffffff81572935>] device_release_driver_internal+0x155/0x210
> [<ffffffff81572a02>] device_release_driver+0x12/0x20
> [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
> [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
> [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
> [<ffffffff810c5ac9>] kthread+0x109/0x140
> [<ffffffff8185800c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
> Reported-by: Rakesh Pandit <rakesh at tuxera.com>
> Tested-by: Rakesh Pandit <rakesh at tuxera.com>
> Signed-off-by: Ming Lei <ming.lei at redhat.com>
> ---
>  drivers/nvme/host/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a60926410438..0f9cc0c55e15 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2438,6 +2438,10 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
> +
> +	/* Forcibly start all queues to avoid having stuck requests */
> +	blk_mq_start_hw_queues(ctrl->admin_q);
> +
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		/*
>  		 * Revalidating a dead namespace sets capacity to 0. This will
> -- 
> 2.9.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-02 18:04   ` Rakesh Pandit
@ 2017-06-04 15:24     ` Sagi Grimberg
  -1 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2017-06-04 15:24 UTC (permalink / raw)
  To: Rakesh Pandit, Ming Lei
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Johannes Thumshirn,
	linux-nvme, linux-block


> It would make sense to still add:
> 
> if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> 	return
> 
> inside nvme_configure_apst at the top irrespective of this change.

I'm not sure what is the value given that it is taken care of in
.queue_rq?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-04 15:24     ` Sagi Grimberg
  0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2017-06-04 15:24 UTC (permalink / raw)



> It would make sense to still add:
> 
> if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> 	return
> 
> inside nvme_configure_apst at the top irrespective of this change.

I'm not sure what is the value given that it is taken care of in
.queue_rq?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-02  8:32 ` Ming Lei
@ 2017-06-05  8:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-06-05  8:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Johannes Thumshirn, linux-nvme, linux-block

Thanks Ming, I'll add this to the nvme tree for 4.12-rc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-05  8:20   ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-06-05  8:20 UTC (permalink / raw)


Thanks Ming, I'll add this to the nvme tree for 4.12-rc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-04 15:24     ` Sagi Grimberg
@ 2017-06-05  8:44       ` Rakesh Pandit
  -1 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-05  8:44 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Ming Lei, Jens Axboe, Keith Busch, Christoph Hellwig,
	Johannes Thumshirn, linux-nvme, linux-block

On Sun, Jun 04, 2017 at 06:24:09PM +0300, Sagi Grimberg wrote:
> 
> > It would make sense to still add:
> > 
> > if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > 	return
> > 
> > inside nvme_configure_apst at the top irrespective of this change.
> 
> I'm not sure what is the value given that it is taken care of in
> .queue_rq?

We would avoid getting error message which says: "failed to set APST
feature 7".  Why an error if controller is already under reset.

Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
all power states inside the nvme_configure_apst as
nvme_set_latency_tolerance was called with value
PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
which of course fails with error message.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-05  8:44       ` Rakesh Pandit
  0 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-05  8:44 UTC (permalink / raw)


On Sun, Jun 04, 2017@06:24:09PM +0300, Sagi Grimberg wrote:
> 
> > It would make sense to still add:
> > 
> > if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > 	return
> > 
> > inside nvme_configure_apst at the top irrespective of this change.
> 
> I'm not sure what is the value given that it is taken care of in
> .queue_rq?

We would avoid getting error message which says: "failed to set APST
feature 7".  Why an error if controller is already under reset.

Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
all power states inside the nvme_configure_apst as
nvme_set_latency_tolerance was called with value
PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
which of course fails with error message.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-05  8:44       ` Rakesh Pandit
@ 2017-06-05  8:47         ` Rakesh Pandit
  -1 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-05  8:47 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Ming Lei, Jens Axboe, Keith Busch, Christoph Hellwig,
	Johannes Thumshirn, linux-nvme, linux-block

On Mon, Jun 05, 2017 at 11:44:34AM +0300, Rakesh Pandit wrote:
> On Sun, Jun 04, 2017 at 06:24:09PM +0300, Sagi Grimberg wrote:
> > 
> > > It would make sense to still add:
> > > 
> > > if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > > 	return
> > > 
> > > inside nvme_configure_apst at the top irrespective of this change.
> > 
> > I'm not sure what is the value given that it is taken care of in
> > .queue_rq?
> 
> We would avoid getting error message which says: "failed to set APST
> feature 7".  Why an error if controller is already under reset.

I meant deletion.  Of course not a huge value but worth a fix IMHO
while we are at it.

> 
> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
> all power states inside the nvme_configure_apst as
> nvme_set_latency_tolerance was called with value
> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
> which of course fails with error message.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-05  8:47         ` Rakesh Pandit
  0 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-05  8:47 UTC (permalink / raw)


On Mon, Jun 05, 2017@11:44:34AM +0300, Rakesh Pandit wrote:
> On Sun, Jun 04, 2017@06:24:09PM +0300, Sagi Grimberg wrote:
> > 
> > > It would make sense to still add:
> > > 
> > > if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > > 	return
> > > 
> > > inside nvme_configure_apst at the top irrespective of this change.
> > 
> > I'm not sure what is the value given that it is taken care of in
> > .queue_rq?
> 
> We would avoid getting error message which says: "failed to set APST
> feature 7".  Why an error if controller is already under reset.

I meant deletion.  Of course not a huge value but worth a fix IMHO
while we are at it.

> 
> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
> all power states inside the nvme_configure_apst as
> nvme_set_latency_tolerance was called with value
> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
> which of course fails with error message.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-05  8:47         ` Rakesh Pandit
@ 2017-06-05 19:45           ` Rakesh Pandit
  -1 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-05 19:45 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei, Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Johannes Thumshirn, linux-nvme, linux-block

On Mon, Jun 05, 2017 at 11:47:33AM +0300, Rakesh Pandit wrote:
> On Mon, Jun 05, 2017 at 11:44:34AM +0300, Rakesh Pandit wrote:
> > On Sun, Jun 04, 2017 at 06:24:09PM +0300, Sagi Grimberg wrote:
> > > 
> > > > It would make sense to still add:
> > > > 
> > > > if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > > > 	return
> > > > 
> > > > inside nvme_configure_apst at the top irrespective of this change.
> > > 
> > > I'm not sure what is the value given that it is taken care of in
> > > .queue_rq?
> > 
> > We would avoid getting error message which says: "failed to set APST
> > feature 7".  Why an error if controller is already under reset.
> 
> I meant deletion.  Of course not a huge value but worth a fix IMHO
> while we are at it.
> 
> > 
> > Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
> > all power states inside the nvme_configure_apst as
> > nvme_set_latency_tolerance was called with value
> > PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
> > ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
> > which of course fails with error message.

Even though this change from this patch does fix the hang, just tested
again and I can see above error message "failed to set APST feature 7"
while nvme_remove PID is getting executed.

So, sync requests (while nvme_remove is executing) are going through
and not everything is handled well in .queue_rq while controller is
under deleting state or dead state.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-05 19:45           ` Rakesh Pandit
  0 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-05 19:45 UTC (permalink / raw)


On Mon, Jun 05, 2017@11:47:33AM +0300, Rakesh Pandit wrote:
> On Mon, Jun 05, 2017@11:44:34AM +0300, Rakesh Pandit wrote:
> > On Sun, Jun 04, 2017@06:24:09PM +0300, Sagi Grimberg wrote:
> > > 
> > > > It would make sense to still add:
> > > > 
> > > > if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > > > 	return
> > > > 
> > > > inside nvme_configure_apst at the top irrespective of this change.
> > > 
> > > I'm not sure what is the value given that it is taken care of in
> > > .queue_rq?
> > 
> > We would avoid getting error message which says: "failed to set APST
> > feature 7".  Why an error if controller is already under reset.
> 
> I meant deletion.  Of course not a huge value but worth a fix IMHO
> while we are at it.
> 
> > 
> > Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
> > all power states inside the nvme_configure_apst as
> > nvme_set_latency_tolerance was called with value
> > PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
> > ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
> > which of course fails with error message.

Even though this change from this patch does fix the hang, just tested
again and I can see above error message "failed to set APST feature 7"
while nvme_remove PID is getting executed.

So, sync requests (while nvme_remove is executing) are going through
and not everything is handled well in .queue_rq while controller is
under deleting state or dead state.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-05 19:45           ` Rakesh Pandit
@ 2017-06-06  7:10             ` Sagi Grimberg
  -1 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2017-06-06  7:10 UTC (permalink / raw)
  To: Rakesh Pandit, Ming Lei, Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Johannes Thumshirn, linux-nvme, linux-block


>>> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
>>> all power states inside the nvme_configure_apst as
>>> nvme_set_latency_tolerance was called with value
>>> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
>>> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
>>> which of course fails with error message.
> 
> Even though this change from this patch does fix the hang, just tested
> again and I can see above error message "failed to set APST feature 7"
> while nvme_remove PID is getting executed.
> 
> So, sync requests (while nvme_remove is executing) are going through
> and not everything is handled well in .queue_rq while controller is
> under deleting state or dead state.

The error message is expected because queue_rq is failing the I/O, did
you see it actually hang?

Personally I'm not too bothered with the error log, we can suppress it
conditional on the ctrl state if it really annoys people, but if we're
conditioning on the ctrl state, we can do it like you suggested anyway...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-06  7:10             ` Sagi Grimberg
  0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2017-06-06  7:10 UTC (permalink / raw)



>>> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
>>> all power states inside the nvme_configure_apst as
>>> nvme_set_latency_tolerance was called with value
>>> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
>>> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
>>> which of course fails with error message.
> 
> Even though this change from this patch does fix the hang, just tested
> again and I can see above error message "failed to set APST feature 7"
> while nvme_remove PID is getting executed.
> 
> So, sync requests (while nvme_remove is executing) are going through
> and not everything is handled well in .queue_rq while controller is
> under deleting state or dead state.

The error message is expected because queue_rq is failing the I/O, did
you see it actually hang?

Personally I'm not too bothered with the error log, we can suppress it
conditional on the ctrl state if it really annoys people, but if we're
conditioning on the ctrl state, we can do it like you suggested anyway...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-06  7:10             ` Sagi Grimberg
@ 2017-06-06  7:12               ` Christoph Hellwig
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-06-06  7:12 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Rakesh Pandit, Ming Lei, Christoph Hellwig, Jens Axboe,
	Keith Busch, Johannes Thumshirn, linux-nvme, linux-block

On Tue, Jun 06, 2017 at 10:10:45AM +0300, Sagi Grimberg wrote:
>
>>>> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
>>>> all power states inside the nvme_configure_apst as
>>>> nvme_set_latency_tolerance was called with value
>>>> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
>>>> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
>>>> which of course fails with error message.
>>
>> Even though this change from this patch does fix the hang, just tested
>> again and I can see above error message "failed to set APST feature 7"
>> while nvme_remove PID is getting executed.
>>
>> So, sync requests (while nvme_remove is executing) are going through
>> and not everything is handled well in .queue_rq while controller is
>> under deleting state or dead state.
>
> The error message is expected because queue_rq is failing the I/O, did
> you see it actually hang?
>
> Personally I'm not too bothered with the error log, we can suppress it
> conditional on the ctrl state if it really annoys people, but if we're
> conditioning on the ctrl state, we can do it like you suggested anyway...

Yeah.  Or just remove the error printk entirely..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-06  7:12               ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-06-06  7:12 UTC (permalink / raw)


On Tue, Jun 06, 2017@10:10:45AM +0300, Sagi Grimberg wrote:
>
>>>> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
>>>> all power states inside the nvme_configure_apst as
>>>> nvme_set_latency_tolerance was called with value
>>>> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
>>>> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
>>>> which of course fails with error message.
>>
>> Even though this change from this patch does fix the hang, just tested
>> again and I can see above error message "failed to set APST feature 7"
>> while nvme_remove PID is getting executed.
>>
>> So, sync requests (while nvme_remove is executing) are going through
>> and not everything is handled well in .queue_rq while controller is
>> under deleting state or dead state.
>
> The error message is expected because queue_rq is failing the I/O, did
> you see it actually hang?
>
> Personally I'm not too bothered with the error log, we can suppress it
> conditional on the ctrl state if it really annoys people, but if we're
> conditioning on the ctrl state, we can do it like you suggested anyway...

Yeah.  Or just remove the error printk entirely..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] nvme: fix hang in remove path
  2017-06-06  7:12               ` Christoph Hellwig
@ 2017-06-06  7:30                 ` Rakesh Pandit
  -1 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-06  7:30 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: Ming Lei, Jens Axboe, Keith Busch, Johannes Thumshirn,
	linux-nvme, linux-block

On Tue, Jun 06, 2017 at 09:12:23AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017 at 10:10:45AM +0300, Sagi Grimberg wrote:
> >
> >>>> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
> >>>> all power states inside the nvme_configure_apst as
> >>>> nvme_set_latency_tolerance was called with value
> >>>> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
> >>>> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
> >>>> which of course fails with error message.
> >>
> >> Even though this change from this patch does fix the hang, just tested
> >> again and I can see above error message "failed to set APST feature 7"
> >> while nvme_remove PID is getting executed.
> >>
> >> So, sync requests (while nvme_remove is executing) are going through
> >> and not everything is handled well in .queue_rq while controller is
> >> under deleting state or dead state.
> >
> > The error message is expected because queue_rq is failing the I/O, did
> > you see it actually hang?

No hang.

> >
> > Personally I'm not too bothered with the error log, we can suppress it
> > conditional on the ctrl state if it really annoys people, but if we're
> > conditioning on the ctrl state, we can do it like you suggested anyway...
> 
> Yeah.  Or just remove the error printk entirely..

Thanks for input, lets ignore.

Based on earlier message I guess this is already queued.  If not it
seems worth adding for 4.12-rc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] nvme: fix hang in remove path
@ 2017-06-06  7:30                 ` Rakesh Pandit
  0 siblings, 0 replies; 22+ messages in thread
From: Rakesh Pandit @ 2017-06-06  7:30 UTC (permalink / raw)


On Tue, Jun 06, 2017@09:12:23AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017@10:10:45AM +0300, Sagi Grimberg wrote:
> >
> >>>> Note 7 here is NVME_SC_ABORT_REQ.  Also we would avoid walking through
> >>>> all power states inside the nvme_configure_apst as
> >>>> nvme_set_latency_tolerance was called with value
> >>>> PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) which sets
> >>>> ctrl->ps_max_latency_us to U64_MAX and tries to send a sync command
> >>>> which of course fails with error message.
> >>
> >> Even though this change from this patch does fix the hang, just tested
> >> again and I can see above error message "failed to set APST feature 7"
> >> while nvme_remove PID is getting executed.
> >>
> >> So, sync requests (while nvme_remove is executing) are going through
> >> and not everything is handled well in .queue_rq while controller is
> >> under deleting state or dead state.
> >
> > The error message is expected because queue_rq is failing the I/O, did
> > you see it actually hang?

No hang.

> >
> > Personally I'm not too bothered with the error log, we can suppress it
> > conditional on the ctrl state if it really annoys people, but if we're
> > conditioning on the ctrl state, we can do it like you suggested anyway...
> 
> Yeah.  Or just remove the error printk entirely..

Thanks for input, lets ignore.

Based on earlier message I guess this is already queued.  If not it
seems worth adding for 4.12-rc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-06-06  7:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-02  8:32 [PATCH] nvme: fix hang in remove path Ming Lei
2017-06-02  8:32 ` Ming Lei
2017-06-02 13:36 ` Sagi Grimberg
2017-06-02 13:36   ` Sagi Grimberg
2017-06-02 18:04 ` Rakesh Pandit
2017-06-02 18:04   ` Rakesh Pandit
2017-06-04 15:24   ` Sagi Grimberg
2017-06-04 15:24     ` Sagi Grimberg
2017-06-05  8:44     ` Rakesh Pandit
2017-06-05  8:44       ` Rakesh Pandit
2017-06-05  8:47       ` Rakesh Pandit
2017-06-05  8:47         ` Rakesh Pandit
2017-06-05 19:45         ` Rakesh Pandit
2017-06-05 19:45           ` Rakesh Pandit
2017-06-06  7:10           ` Sagi Grimberg
2017-06-06  7:10             ` Sagi Grimberg
2017-06-06  7:12             ` Christoph Hellwig
2017-06-06  7:12               ` Christoph Hellwig
2017-06-06  7:30               ` Rakesh Pandit
2017-06-06  7:30                 ` Rakesh Pandit
2017-06-05  8:20 ` Christoph Hellwig
2017-06-05  8:20   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.