[PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-05-30  7:16 ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-05-30  7:16 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Andy Lutomirski,
	Sagi Grimberg

Once controller is in DEAD or DELETING state a call to delete_destroy
from nvme_uninit_ctrl results in setting the latency tolerance via
nvme_set_latency_tolerance callback even though queues have already
been killed.  This in turn leads the PID to go into uninterruptible
sleep and prevents removal of nvme controller from completion.  The
stack trace is:

[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

and PID is in 'D' state.  Attached patch returns from
nvme_configure_apst to avoids configuration and syncing commands when
controller is either in DELETING state or DEAD state which can only
happen once we are in nvme_remove.  This allows removal to complete
and release remaining resources after nvme_uninit_ctrl.

V2: Move the check to nvme_configure_apst instead of callback
(suggested by Christoph)
Fixes: c5552fde102fc ("nvme: Enable autonomous power state transitions")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
---
 drivers/nvme/host/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a609264..8dfe854 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1357,6 +1357,13 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	int ret;
 
 	/*
+	 * Avoid configuration and syncing commands if controller is already
+	 * being removed and queues have been killed.
+	 */
+	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
+		return;
+
+	/*
 	 * If APST isn't supported or if we haven't been initialized yet,
 	 * then don't do anything.
 	 */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-05-30  7:16 ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-05-30  7:16 UTC (permalink / raw)


Once controller is in DEAD or DELETING state a call to delete_destroy
from nvme_uninit_ctrl results in setting the latency tolerance via
nvme_set_latency_tolerance callback even though queues have already
been killed.  This in turn leads the PID to go into uninterruptible
sleep and prevents removal of nvme controller from completion.  The
stack trace is:

[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

and PID is in 'D' state.  Attached patch returns from
nvme_configure_apst to avoids configuration and syncing commands when
controller is either in DELETING state or DEAD state which can only
happen once we are in nvme_remove.  This allows removal to complete
and release remaining resources after nvme_uninit_ctrl.

V2: Move the check to nvme_configure_apst instead of callback
(suggested by Christoph)
Fixes: c5552fde102fc ("nvme: Enable autonomous power state transitions")
Signed-off-by: Rakesh Pandit <rakesh at tuxera.com>
---
 drivers/nvme/host/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a609264..8dfe854 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1357,6 +1357,13 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	int ret;
 
 	/*
+	 * Avoid configuration and syncing commands if controller is already
+	 * being removed and queues have been killed.
+	 */
+	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
+		return;
+
+	/*
 	 * If APST isn't supported or if we haven't been initialized yet,
 	 * then don't do anything.
 	 */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-05-30  7:16 ` Rakesh Pandit
@ 2017-05-30  9:36   ` Christoph Hellwig
  -1 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-05-30  9:36 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch,
	Christoph Hellwig, Andy Lutomirski, Sagi Grimberg

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-05-30  9:36   ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-05-30  9:36 UTC (permalink / raw)


Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-05-30  7:16 ` Rakesh Pandit
@ 2017-05-30 10:18   ` Sagi Grimberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-05-30 10:18 UTC (permalink / raw)
  To: Rakesh Pandit, linux-nvme, linux-kernel
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Andy Lutomirski

>  	/*
> +	 * Avoid configuration and syncing commands if controller is already
> +	 * being removed and queues have been killed.
> +	 */
> +	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> +		return;
> +

Hey Rakesh, Christoph,

Given that the issue is for sync command submission during controller
removal, I'm wandering if we should perhaps move this check to
__nvme_submit_sync_cmd?

AFAICT user-space can just as easily trigger set_features in the same
condition which will trigger the hang couldn't it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-05-30 10:18   ` Sagi Grimberg
  0 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-05-30 10:18 UTC (permalink / raw)



>  	/*
> +	 * Avoid configuration and syncing commands if controller is already
> +	 * being removed and queues have been killed.
> +	 */
> +	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> +		return;
> +

Hey Rakesh, Christoph,

Given that the issue is for sync command submission during controller
removal, I'm wandering if we should perhaps move this check to
__nvme_submit_sync_cmd?

AFAICT user-space can just as easily trigger set_features in the same
condition which will trigger the hang couldn't it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-05-30 10:18   ` Sagi Grimberg
@ 2017-05-30 14:23     ` Rakesh Pandit
  -1 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-05-30 14:23 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch,
	Christoph Hellwig, Andy Lutomirski

On Tue, May 30, 2017 at 01:18:55PM +0300, Sagi Grimberg wrote:
> 
> >  	/*
> > +	 * Avoid configuration and syncing commands if controller is already
> > +	 * being removed and queues have been killed.
> > +	 */
> > +	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > +		return;
> > +
> 
> Hey Rakesh, Christoph,
> 
> Given that the issue is for sync command submission during controller
> removal, I'm wandering if we should perhaps move this check to
> __nvme_submit_sync_cmd?
> 
> AFAICT user-space can just as easily trigger set_features in the same
> condition which will trigger the hang couldn't it?

Seems possible.  But it seems worth keeping this check as it avoids
the instructions between start of nvme_configure_apst and
__nvme_submit_sync_cmd.  This check seems to solve more severe hang as
PID which started off from nvme_remove eventually hangs itself on
blk_execute_rq..

We can fix user-space triggered set_features higger up e.g. in
nvme_ioctl by putting same check.  Introduction of a separate state
NVME_CTRL_SCHED_RESET (being discussed in another thread) has
additional advantage of making sure that only one thread is going
through resetting and eventually through removal (if required) and
solves lot of problems.

It makes sense to push this separately because of above reasons and we
can fix user space trigger of deadlock once discussion on another
thread has moved forward on introducing of new state.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-05-30 14:23     ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-05-30 14:23 UTC (permalink / raw)

On Tue, May 30, 2017@01:18:55PM +0300, Sagi Grimberg wrote:
> 
> >  	/*
> > +	 * Avoid configuration and syncing commands if controller is already
> > +	 * being removed and queues have been killed.
> > +	 */
> > +	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > +		return;
> > +
> 
> Hey Rakesh, Christoph,
> 
> Given that the issue is for sync command submission during controller
> removal, I'm wandering if we should perhaps move this check to
> __nvme_submit_sync_cmd?
> 
> AFAICT user-space can just as easily trigger set_features in the same
> condition which will trigger the hang couldn't it?

Seems possible.  But it seems worth keeping this check as it avoids
the instructions between start of nvme_configure_apst and
__nvme_submit_sync_cmd.  This check seems to solve more severe hang as
PID which started off from nvme_remove eventually hangs itself on
blk_execute_rq..

We can fix user-space triggered set_features higger up e.g. in
nvme_ioctl by putting same check.  Introduction of a separate state
NVME_CTRL_SCHED_RESET (being discussed in another thread) has
additional advantage of making sure that only one thread is going
through resetting and eventually through removal (if required) and
solves lot of problems.

It makes sense to push this separately because of above reasons and we
can fix user space trigger of deadlock once discussion on another
thread has moved forward on introducing of new state.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-05-30 14:23     ` Rakesh Pandit
@ 2017-06-01 11:43       ` Christoph Hellwig
  -1 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:43 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: Sagi Grimberg, linux-nvme, linux-kernel, Jens Axboe, Keith Busch,
	Christoph Hellwig, Andy Lutomirski

On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> We can fix user-space triggered set_features higger up e.g. in
> nvme_ioctl by putting same check.  Introduction of a separate state
> NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> additional advantage of making sure that only one thread is going
> through resetting and eventually through removal (if required) and
> solves lot of problems.

I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
sure some time in the past I already had it in a local tree as a
generalization of rdma and loop already use NVME_CTRL_RESETTING
(they set it before queueing the reset work).

But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
related to this patch?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 11:43       ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:43 UTC (permalink / raw)

On Tue, May 30, 2017@05:23:46PM +0300, Rakesh Pandit wrote:
> We can fix user-space triggered set_features higger up e.g. in
> nvme_ioctl by putting same check.  Introduction of a separate state
> NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> additional advantage of making sure that only one thread is going
> through resetting and eventually through removal (if required) and
> solves lot of problems.

I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
sure some time in the past I already had it in a local tree as a
generalization of rdma and loop already use NVME_CTRL_RESETTING
(they set it before queueing the reset work).

But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
related to this patch?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 11:43       ` Christoph Hellwig
@ 2017-06-01 12:28         ` Rakesh Pandit
  -1 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-01 12:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-nvme, linux-kernel, Jens Axboe, Keith Busch,
	Andy Lutomirski

On Thu, Jun 01, 2017 at 01:43:38PM +0200, Christoph Hellwig wrote:
> On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> > We can fix user-space triggered set_features higger up e.g. in
> > nvme_ioctl by putting same check.  Introduction of a separate state
> > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > additional advantage of making sure that only one thread is going
> > through resetting and eventually through removal (if required) and
> > solves lot of problems.
> 
> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).
>
> But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> related to this patch?

They aren't related.  Sorry for confusion.  I intended to answer
another thread but instead wrote it here.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 12:28         ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-01 12:28 UTC (permalink / raw)


On Thu, Jun 01, 2017@01:43:38PM +0200, Christoph Hellwig wrote:
> On Tue, May 30, 2017@05:23:46PM +0300, Rakesh Pandit wrote:
> > We can fix user-space triggered set_features higger up e.g. in
> > nvme_ioctl by putting same check.  Introduction of a separate state
> > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > additional advantage of making sure that only one thread is going
> > through resetting and eventually through removal (if required) and
> > solves lot of problems.
> 
> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).
>
> But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> related to this patch?

They aren't related.  Sorry for confusion.  I intended to answer
another thread but instead wrote it here.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 12:28         ` Rakesh Pandit
@ 2017-06-01 12:35           ` Christoph Hellwig
  -1 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-01 12:35 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-kernel,
	Jens Axboe, Keith Busch, Andy Lutomirski

On Thu, Jun 01, 2017 at 03:28:18PM +0300, Rakesh Pandit wrote:
> > I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
> 
> They aren't related.  Sorry for confusion.  I intended to answer
> another thread but instead wrote it here.

Ok, thanks.  But I think the Point from Sagi that we should move
the state check all the way down to submit_sync_command still makes
sense to me.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 12:35           ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-01 12:35 UTC (permalink / raw)


On Thu, Jun 01, 2017@03:28:18PM +0300, Rakesh Pandit wrote:
> > I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
> 
> They aren't related.  Sorry for confusion.  I intended to answer
> another thread but instead wrote it here.

Ok, thanks.  But I think the Point from Sagi that we should move
the state check all the way down to submit_sync_command still makes
sense to me.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 12:28         ` Rakesh Pandit
@ 2017-06-01 12:36           ` Rakesh Pandit
  -1 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-01 12:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, linux-kernel, linux-nvme, Jens Axboe,
	Andy Lutomirski

On Thu, Jun 01, 2017 at 03:28:18PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017 at 01:43:38PM +0200, Christoph Hellwig wrote:
> > On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> > > We can fix user-space triggered set_features higger up e.g. in
> > > nvme_ioctl by putting same check.  Introduction of a separate state
> > > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > > additional advantage of making sure that only one thread is going
> > > through resetting and eventually through removal (if required) and
> > > solves lot of problems.
> > 
> > I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
> 
> They aren't related.  Sorry for confusion.  I intended to answer
> another thread but instead wrote it here.
> 

Also Sagi pointed out that user space set_features ioctl if fired up
in a window after nvme_removal it can also result in this issue seems
to be correct.  I would prefer to keep this as it is and introduce
similar check higher up in nvme_ioctrl instead so that we don't send
sync commands if queues are killed already.

Would you prefer a patch ? Thanks,

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 12:36           ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-01 12:36 UTC (permalink / raw)


On Thu, Jun 01, 2017@03:28:18PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017@01:43:38PM +0200, Christoph Hellwig wrote:
> > On Tue, May 30, 2017@05:23:46PM +0300, Rakesh Pandit wrote:
> > > We can fix user-space triggered set_features higger up e.g. in
> > > nvme_ioctl by putting same check.  Introduction of a separate state
> > > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > > additional advantage of making sure that only one thread is going
> > > through resetting and eventually through removal (if required) and
> > > solves lot of problems.
> > 
> > I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
> 
> They aren't related.  Sorry for confusion.  I intended to answer
> another thread but instead wrote it here.
> 

Also Sagi pointed out that user space set_features ioctl if fired up
in a window after nvme_removal it can also result in this issue seems
to be correct.  I would prefer to keep this as it is and introduce
similar check higher up in nvme_ioctrl instead so that we don't send
sync commands if queues are killed already.

Would you prefer a patch ? Thanks,

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 12:36           ` Rakesh Pandit
@ 2017-06-01 12:46             ` Christoph Hellwig
  -1 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-01 12:46 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, linux-kernel,
	linux-nvme, Jens Axboe, Andy Lutomirski

On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> Also Sagi pointed out that user space set_features ioctl if fired up
> in a window after nvme_removal it can also result in this issue seems
> to be correct.  I would prefer to keep this as it is and introduce
> similar check higher up in nvme_ioctrl instead so that we don't send
> sync commands if queues are killed already.
> 
> Would you prefer a patch ? Thanks,

If we want to kill everyone we probably should do it in ->queue_rq.
Or is the block layer blocking you somewhere else?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 12:46             ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-01 12:46 UTC (permalink / raw)


On Thu, Jun 01, 2017@03:36:50PM +0300, Rakesh Pandit wrote:
> Also Sagi pointed out that user space set_features ioctl if fired up
> in a window after nvme_removal it can also result in this issue seems
> to be correct.  I would prefer to keep this as it is and introduce
> similar check higher up in nvme_ioctrl instead so that we don't send
> sync commands if queues are killed already.
> 
> Would you prefer a patch ? Thanks,

If we want to kill everyone we probably should do it in ->queue_rq.
Or is the block layer blocking you somewhere else?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 12:46             ` Christoph Hellwig
@ 2017-06-01 14:56               ` Ming Lei
  -1 siblings, 0 replies; 32+ messages in thread
From: Ming Lei @ 2017-06-01 14:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rakesh Pandit, Jens Axboe, Sagi Grimberg, linux-kernel,
	linux-nvme, Keith Busch, Andy Lutomirski

On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > Also Sagi pointed out that user space set_features ioctl if fired up
> > in a window after nvme_removal it can also result in this issue seems
> > to be correct.  I would prefer to keep this as it is and introduce
> > similar check higher up in nvme_ioctrl instead so that we don't send
> > sync commands if queues are killed already.
> > 
> > Would you prefer a patch ? Thanks,
> 
> If we want to kill everyone we probably should do it in ->queue_rq.

Looks ->queue_rq has done it already via checking nvmeq->cq_vector

> Or is the block layer blocking you somewhere else?

blk-mq doesn't handle dying in the I/O path.

Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
nvme_kill_queues()), seems we need to do it for admin_q too.

Can the following change fix the issue?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e44326d5cf19..360758488124 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
+	blk_mq_start_hw_queues(ctrl->admin_q);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		/*
 		 * Revalidating a dead namespace sets capacity to 0. This will


Thanks,
Ming

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 14:56               ` Ming Lei
  0 siblings, 0 replies; 32+ messages in thread
From: Ming Lei @ 2017-06-01 14:56 UTC (permalink / raw)


On Thu, Jun 01, 2017@02:46:32PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 01, 2017@03:36:50PM +0300, Rakesh Pandit wrote:
> > Also Sagi pointed out that user space set_features ioctl if fired up
> > in a window after nvme_removal it can also result in this issue seems
> > to be correct.  I would prefer to keep this as it is and introduce
> > similar check higher up in nvme_ioctrl instead so that we don't send
> > sync commands if queues are killed already.
> > 
> > Would you prefer a patch ? Thanks,
> 
> If we want to kill everyone we probably should do it in ->queue_rq.

Looks ->queue_rq has done it already via checking nvmeq->cq_vector

> Or is the block layer blocking you somewhere else?

blk-mq doesn't handle dying in the I/O path.

Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
nvme_kill_queues()), seems we need to do it for admin_q too.

Can the following change fix the issue?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e44326d5cf19..360758488124 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
+	blk_mq_start_hw_queues(ctrl->admin_q);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		/*
 		 * Revalidating a dead namespace sets capacity to 0. This will


Thanks,
Ming

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 14:56               ` Ming Lei
@ 2017-06-01 19:33                 ` Rakesh Pandit
  -1 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-01 19:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-kernel,
	linux-nvme, Keith Busch, Andy Lutomirski

On Thu, Jun 01, 2017 at 10:56:10PM +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > in a window after nvme_removal it can also result in this issue seems
> > > to be correct.  I would prefer to keep this as it is and introduce
> > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > sync commands if queues are killed already.
> > > 
> > > Would you prefer a patch ? Thanks,
> > 
> > If we want to kill everyone we probably should do it in ->queue_rq.
> 
> Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> 
> > Or is the block layer blocking you somewhere else?
> 
> blk-mq doesn't handle dying in the I/O path.
> 
> Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> nvme_kill_queues()), seems we need to do it for admin_q too.
> 
> Can the following change fix the issue?
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e44326d5cf19..360758488124 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
> +	blk_mq_start_hw_queues(ctrl->admin_q);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		/*
>  		 * Revalidating a dead namespace sets capacity to 0. This will
> 
> 

Yes change fixes the issue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-01 19:33                 ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-01 19:33 UTC (permalink / raw)


On Thu, Jun 01, 2017@10:56:10PM +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017@02:46:32PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 01, 2017@03:36:50PM +0300, Rakesh Pandit wrote:
> > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > in a window after nvme_removal it can also result in this issue seems
> > > to be correct.  I would prefer to keep this as it is and introduce
> > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > sync commands if queues are killed already.
> > > 
> > > Would you prefer a patch ? Thanks,
> > 
> > If we want to kill everyone we probably should do it in ->queue_rq.
> 
> Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> 
> > Or is the block layer blocking you somewhere else?
> 
> blk-mq doesn't handle dying in the I/O path.
> 
> Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> nvme_kill_queues()), seems we need to do it for admin_q too.
> 
> Can the following change fix the issue?
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e44326d5cf19..360758488124 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
> +	blk_mq_start_hw_queues(ctrl->admin_q);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		/*
>  		 * Revalidating a dead namespace sets capacity to 0. This will
> 
> 

Yes change fixes the issue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 19:33                 ` Rakesh Pandit
@ 2017-06-02  1:42                   ` Ming Lei
  -1 siblings, 0 replies; 32+ messages in thread
From: Ming Lei @ 2017-06-02  1:42 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-kernel,
	linux-nvme, Keith Busch, Andy Lutomirski

On Thu, Jun 01, 2017 at 10:33:04PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017 at 10:56:10PM +0800, Ming Lei wrote:
> > On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > > in a window after nvme_removal it can also result in this issue seems
> > > > to be correct.  I would prefer to keep this as it is and introduce
> > > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > > sync commands if queues are killed already.
> > > > 
> > > > Would you prefer a patch ? Thanks,
> > > 
> > > If we want to kill everyone we probably should do it in ->queue_rq.
> > 
> > Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> > 
> > > Or is the block layer blocking you somewhere else?
> > 
> > blk-mq doesn't handle dying in the I/O path.
> > 
> > Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> > nvme_kill_queues()), seems we need to do it for admin_q too.
> > 
> > Can the following change fix the issue?
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index e44326d5cf19..360758488124 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> >  	struct nvme_ns *ns;
> >  
> >  	mutex_lock(&ctrl->namespaces_mutex);
> > +	blk_mq_start_hw_queues(ctrl->admin_q);
> >  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> >  		/*
> >  		 * Revalidating a dead namespace sets capacity to 0. This will
> > 
> > 
> 
> Yes change fixes the issue.

Rakesh, thanks for your test, and I will prepare a formal one for
merging.

thanks,
Ming

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-02  1:42                   ` Ming Lei
  0 siblings, 0 replies; 32+ messages in thread
From: Ming Lei @ 2017-06-02  1:42 UTC (permalink / raw)


On Thu, Jun 01, 2017@10:33:04PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017@10:56:10PM +0800, Ming Lei wrote:
> > On Thu, Jun 01, 2017@02:46:32PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 01, 2017@03:36:50PM +0300, Rakesh Pandit wrote:
> > > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > > in a window after nvme_removal it can also result in this issue seems
> > > > to be correct.  I would prefer to keep this as it is and introduce
> > > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > > sync commands if queues are killed already.
> > > > 
> > > > Would you prefer a patch ? Thanks,
> > > 
> > > If we want to kill everyone we probably should do it in ->queue_rq.
> > 
> > Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> > 
> > > Or is the block layer blocking you somewhere else?
> > 
> > blk-mq doesn't handle dying in the I/O path.
> > 
> > Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> > nvme_kill_queues()), seems we need to do it for admin_q too.
> > 
> > Can the following change fix the issue?
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index e44326d5cf19..360758488124 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> >  	struct nvme_ns *ns;
> >  
> >  	mutex_lock(&ctrl->namespaces_mutex);
> > +	blk_mq_start_hw_queues(ctrl->admin_q);
> >  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> >  		/*
> >  		 * Revalidating a dead namespace sets capacity to 0. This will
> > 
> > 
> 
> Yes change fixes the issue.

Rakesh, thanks for your test, and I will prepare a formal one for
merging.

thanks,
Ming

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-01 11:43       ` Christoph Hellwig
@ 2017-06-04 15:28         ` Sagi Grimberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-06-04 15:28 UTC (permalink / raw)
  To: Christoph Hellwig, Rakesh Pandit
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Andy Lutomirski


> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).

I don't remember having it, but I might be wrong...

Can you explain again why you think we need it? Sorry for being
difficult, but I'm not exactly sure why it makes things better
or simpler.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-04 15:28         ` Sagi Grimberg
  0 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-06-04 15:28 UTC (permalink / raw)



> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).

I don't remember having it, but I might be wrong...

Can you explain again why you think we need it? Sorry for being
difficult, but I'm not exactly sure why it makes things better
or simpler.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-04 15:28         ` Sagi Grimberg
@ 2017-06-05  8:18           ` Christoph Hellwig
  -1 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-05  8:18 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Rakesh Pandit, linux-nvme, linux-kernel,
	Jens Axboe, Keith Busch, Andy Lutomirski

On Sun, Jun 04, 2017 at 06:28:15PM +0300, Sagi Grimberg wrote:
>
>> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
>> sure some time in the past I already had it in a local tree as a
>> generalization of rdma and loop already use NVME_CTRL_RESETTING
>> (they set it before queueing the reset work).
>
> I don't remember having it, but I might be wrong...
>
> Can you explain again why you think we need it? Sorry for being
> difficult, but I'm not exactly sure why it makes things better
> or simpler.

Motly that we can treat a controller as under reset before scheduling
the reset work, both to prevent multiple schedules, and to make
checks like the one in nvme_should_reset robus.

But I think something along the lines of the earlier patch from
Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
suggested by you should also work for that purpose.  So maybe that's
the way to go after all.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-05  8:18           ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2017-06-05  8:18 UTC (permalink / raw)

On Sun, Jun 04, 2017@06:28:15PM +0300, Sagi Grimberg wrote:
>
>> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
>> sure some time in the past I already had it in a local tree as a
>> generalization of rdma and loop already use NVME_CTRL_RESETTING
>> (they set it before queueing the reset work).
>
> I don't remember having it, but I might be wrong...
>
> Can you explain again why you think we need it? Sorry for being
> difficult, but I'm not exactly sure why it makes things better
> or simpler.

Motly that we can treat a controller as under reset before scheduling
the reset work, both to prevent multiple schedules, and to make
checks like the one in nvme_should_reset robus.

But I think something along the lines of the earlier patch from
Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
suggested by you should also work for that purpose.  So maybe that's
the way to go after all.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-05  8:18           ` Christoph Hellwig
@ 2017-06-05 10:52             ` Rakesh Pandit
  -1 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-05 10:52 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch

On Mon, Jun 05, 2017 at 10:18:17AM +0200, Christoph Hellwig wrote:
> On Sun, Jun 04, 2017 at 06:28:15PM +0300, Sagi Grimberg wrote:
> >
> >> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> >> sure some time in the past I already had it in a local tree as a
> >> generalization of rdma and loop already use NVME_CTRL_RESETTING
> >> (they set it before queueing the reset work).
> >
> > I don't remember having it, but I might be wrong...
> >
> > Can you explain again why you think we need it? Sorry for being
> > difficult, but I'm not exactly sure why it makes things better
> > or simpler.
> 
> Motly that we can treat a controller as under reset before scheduling
> the reset work, both to prevent multiple schedules, and to make
> checks like the one in nvme_should_reset robus.
> 
> But I think something along the lines of the earlier patch from
> Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
> suggested by you should also work for that purpose.  So maybe that's
> the way to go after all.

I would post a new patch which includes my RESETTING state earlier
patch + the cancel_work_sync which Sagi suggested after testing.

Sagi: Because my RESETTING patch earlier is subset of your untested
patch with cancel_work_sync, it would be logical to take a signed off
from you as well.  May you review/ack/nack the patch?  Feel free to
let me know if you want me to change it further or instead you want to
post as author.

I am okay with either as long as we fix the issue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-05 10:52             ` Rakesh Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Rakesh Pandit @ 2017-06-05 10:52 UTC (permalink / raw)


On Mon, Jun 05, 2017@10:18:17AM +0200, Christoph Hellwig wrote:
> On Sun, Jun 04, 2017@06:28:15PM +0300, Sagi Grimberg wrote:
> >
> >> I think we need the NVME_CTRL_SCHED_RESET state.  In fact I'm pretty
> >> sure some time in the past I already had it in a local tree as a
> >> generalization of rdma and loop already use NVME_CTRL_RESETTING
> >> (they set it before queueing the reset work).
> >
> > I don't remember having it, but I might be wrong...
> >
> > Can you explain again why you think we need it? Sorry for being
> > difficult, but I'm not exactly sure why it makes things better
> > or simpler.
> 
> Motly that we can treat a controller as under reset before scheduling
> the reset work, both to prevent multiple schedules, and to make
> checks like the one in nvme_should_reset robus.
> 
> But I think something along the lines of the earlier patch from
> Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
> suggested by you should also work for that purpose.  So maybe that's
> the way to go after all.

I would post a new patch which includes my RESETTING state earlier
patch + the cancel_work_sync which Sagi suggested after testing.

Sagi: Because my RESETTING patch earlier is subset of your untested
patch with cancel_work_sync, it would be logical to take a signed off
from you as well.  May you review/ack/nack the patch?  Feel free to
let me know if you want me to change it further or instead you want to
post as author.

I am okay with either as long as we fix the issue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
  2017-06-05 10:52             ` Rakesh Pandit
@ 2017-06-05 11:09               ` Sagi Grimberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-06-05 11:09 UTC (permalink / raw)
  To: Rakesh Pandit, Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch

Rakesh,

> I would post a new patch which includes my RESETTING state earlier
> patch + the cancel_work_sync which Sagi suggested after testing.
> 
> Sagi: Because my RESETTING patch earlier is subset of your untested
> patch with cancel_work_sync, it would be logical to take a signed off
> from you as well.  May you review/ack/nack the patch?  Feel free to
> let me know if you want me to change it further or instead you want to
> post as author.

Its all yours :)

send a patch and we'll review it, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever
@ 2017-06-05 11:09               ` Sagi Grimberg
  0 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-06-05 11:09 UTC (permalink / raw)


Rakesh,

> I would post a new patch which includes my RESETTING state earlier
> patch + the cancel_work_sync which Sagi suggested after testing.
> 
> Sagi: Because my RESETTING patch earlier is subset of your untested
> patch with cancel_work_sync, it would be logical to take a signed off
> from you as well.  May you review/ack/nack the patch?  Feel free to
> let me know if you want me to change it further or instead you want to
> post as author.

Its all yours :)

send a patch and we'll review it, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-06-05 11:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-30  7:16 [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever Rakesh Pandit
2017-05-30  7:16 ` Rakesh Pandit
2017-05-30  9:36 ` Christoph Hellwig
2017-05-30  9:36   ` Christoph Hellwig
2017-05-30 10:18 ` Sagi Grimberg
2017-05-30 10:18   ` Sagi Grimberg
2017-05-30 14:23   ` Rakesh Pandit
2017-05-30 14:23     ` Rakesh Pandit
2017-06-01 11:43     ` Christoph Hellwig
2017-06-01 11:43       ` Christoph Hellwig
2017-06-01 12:28       ` Rakesh Pandit
2017-06-01 12:28         ` Rakesh Pandit
2017-06-01 12:35         ` Christoph Hellwig
2017-06-01 12:35           ` Christoph Hellwig
2017-06-01 12:36         ` Rakesh Pandit
2017-06-01 12:36           ` Rakesh Pandit
2017-06-01 12:46           ` Christoph Hellwig
2017-06-01 12:46             ` Christoph Hellwig
2017-06-01 14:56             ` Ming Lei
2017-06-01 14:56               ` Ming Lei
2017-06-01 19:33               ` Rakesh Pandit
2017-06-01 19:33                 ` Rakesh Pandit
2017-06-02  1:42                 ` Ming Lei
2017-06-02  1:42                   ` Ming Lei
2017-06-04 15:28       ` Sagi Grimberg
2017-06-04 15:28         ` Sagi Grimberg
2017-06-05  8:18         ` Christoph Hellwig
2017-06-05  8:18           ` Christoph Hellwig
2017-06-05 10:52           ` Rakesh Pandit
2017-06-05 10:52             ` Rakesh Pandit
2017-06-05 11:09             ` Sagi Grimberg
2017-06-05 11:09               ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.