All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-24 14:26 ` Rakesh Pandit
  0 siblings, 0 replies; 14+ messages in thread
From: Rakesh Pandit @ 2017-05-24 14:26 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg

Commit c5f6ce97c1210 tries to address multiple resets but fails as
work_busy doesn't involve any synchronization and can fail.  This is
reproducible easily as can be seen by WARNING below which is triggered
with line:

WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)

Allowing multiple resets can result in multiple controller removal as
well if different conditions inside nvme_reset_work fail and which
might deadlock on device_release_driver.

This patch makes sure that work queue item (reset_work) is added only
if controller state != NVME_CTRL_RESETTING and that is achieved by
moving state change outside nvme_reset_work into nvme_reset and
removing old work_busy call.  State change is always synchronizated
using controller spinlock.

[  480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0
[  480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast...
[  480.327044]  btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart..
[  480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13
[  480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016
[  480.327066] Workqueue: nvme nvme_reset_work
[  480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000
[  480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0
[  480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246
[  480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200
[  480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128
[  480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000
[  480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000
[  480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128
[  480.327072] FS:  0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000
[  480.327073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0
[  480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  480.327075] Call Trace:
[  480.327079]  ? __switch_to+0x227/0x400
[  480.327081]  process_one_work+0x18c/0x3a0
[  480.327082]  worker_thread+0x4e/0x3b0
[  480.327084]  kthread+0x109/0x140
[  480.327085]  ? process_one_work+0x3a0/0x3a0
[  480.327087]  ? kthread_park+0x60/0x60
[  480.327102]  ret_from_fork+0x2c/0x40
[  480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f.....

V2: Use controller state to solve the problem (suggested by Christoph Hellwig)
Fixes: c5f6ce97c1210 ("nvme: don't schedule multiple resets")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
---
 drivers/nvme/host/pci.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4c2ff2b..ba54e2a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1903,9 +1903,6 @@ static void nvme_reset_work(struct work_struct *work)
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result = -ENODEV;
 
-	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
-		goto out;
-
 	/*
 	 * If we're called to reset a live controller first shut it down before
 	 * moving on.
@@ -1913,9 +1910,6 @@ static void nvme_reset_work(struct work_struct *work)
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
 		nvme_dev_disable(dev, false);
 
-	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
-		goto out;
-
 	result = nvme_pci_enable(dev);
 	if (result)
 		goto out;
@@ -2009,8 +2003,8 @@ static int nvme_reset(struct nvme_dev *dev)
 {
 	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
 		return -ENODEV;
-	if (work_busy(&dev->reset_work))
-		return -ENODEV;
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
+		return -EBUSY;
 	if (!queue_work(nvme_workq, &dev->reset_work))
 		return -EBUSY;
 	return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-24 14:26 ` Rakesh Pandit
  0 siblings, 0 replies; 14+ messages in thread
From: Rakesh Pandit @ 2017-05-24 14:26 UTC (permalink / raw)


Commit c5f6ce97c1210 tries to address multiple resets but fails as
work_busy doesn't involve any synchronization and can fail.  This is
reproducible easily as can be seen by WARNING below which is triggered
with line:

WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)

Allowing multiple resets can result in multiple controller removal as
well if different conditions inside nvme_reset_work fail and which
might deadlock on device_release_driver.

This patch makes sure that work queue item (reset_work) is added only
if controller state != NVME_CTRL_RESETTING and that is achieved by
moving state change outside nvme_reset_work into nvme_reset and
removing old work_busy call.  State change is always synchronizated
using controller spinlock.

[  480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0
[  480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast...
[  480.327044]  btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart..
[  480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13
[  480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016
[  480.327066] Workqueue: nvme nvme_reset_work
[  480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000
[  480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0
[  480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246
[  480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200
[  480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128
[  480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000
[  480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000
[  480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128
[  480.327072] FS:  0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000
[  480.327073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0
[  480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  480.327075] Call Trace:
[  480.327079]  ? __switch_to+0x227/0x400
[  480.327081]  process_one_work+0x18c/0x3a0
[  480.327082]  worker_thread+0x4e/0x3b0
[  480.327084]  kthread+0x109/0x140
[  480.327085]  ? process_one_work+0x3a0/0x3a0
[  480.327087]  ? kthread_park+0x60/0x60
[  480.327102]  ret_from_fork+0x2c/0x40
[  480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f.....

V2: Use controller state to solve the problem (suggested by Christoph Hellwig)
Fixes: c5f6ce97c1210 ("nvme: don't schedule multiple resets")
Signed-off-by: Rakesh Pandit <rakesh at tuxera.com>
---
 drivers/nvme/host/pci.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4c2ff2b..ba54e2a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1903,9 +1903,6 @@ static void nvme_reset_work(struct work_struct *work)
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result = -ENODEV;
 
-	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
-		goto out;
-
 	/*
 	 * If we're called to reset a live controller first shut it down before
 	 * moving on.
@@ -1913,9 +1910,6 @@ static void nvme_reset_work(struct work_struct *work)
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
 		nvme_dev_disable(dev, false);
 
-	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
-		goto out;
-
 	result = nvme_pci_enable(dev);
 	if (result)
 		goto out;
@@ -2009,8 +2003,8 @@ static int nvme_reset(struct nvme_dev *dev)
 {
 	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
 		return -ENODEV;
-	if (work_busy(&dev->reset_work))
-		return -ENODEV;
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
+		return -EBUSY;
 	if (!queue_work(nvme_workq, &dev->reset_work))
 		return -EBUSY;
 	return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
  2017-05-24 14:26 ` Rakesh Pandit
@ 2017-05-25  8:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-25  8:30 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg

> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4c2ff2b..ba54e2a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1903,9 +1903,6 @@ static void nvme_reset_work(struct work_struct *work)
>  	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>  	int result = -ENODEV;
>  
> -	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
> -		goto out;

Can we keep a

	WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))

here?

>  		goto out;
> @@ -2009,8 +2003,8 @@ static int nvme_reset(struct nvme_dev *dev)
>  {
>  	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
>  		return -ENODEV;
> -	if (work_busy(&dev->reset_work))
> -		return -ENODEV;
> +	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
> +		return -EBUSY;
>  	if (!queue_work(nvme_workq, &dev->reset_work))
>  		return -EBUSY;

nvme_probe will also have to set the state to NVME_CTRL_RESETTING to
keep the old behavior, which had some error handling implications.

Also we can replace the work_busy(&dev->reset_work) check in
nvme_should_reset with a check for the NVME_CTRL_RESETTING state now.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-25  8:30   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-25  8:30 UTC (permalink / raw)


> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4c2ff2b..ba54e2a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1903,9 +1903,6 @@ static void nvme_reset_work(struct work_struct *work)
>  	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>  	int result = -ENODEV;
>  
> -	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
> -		goto out;

Can we keep a

	WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))

here?

>  		goto out;
> @@ -2009,8 +2003,8 @@ static int nvme_reset(struct nvme_dev *dev)
>  {
>  	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
>  		return -ENODEV;
> -	if (work_busy(&dev->reset_work))
> -		return -ENODEV;
> +	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
> +		return -EBUSY;
>  	if (!queue_work(nvme_workq, &dev->reset_work))
>  		return -EBUSY;

nvme_probe will also have to set the state to NVME_CTRL_RESETTING to
keep the old behavior, which had some error handling implications.

Also we can replace the work_busy(&dev->reset_work) check in
nvme_should_reset with a check for the NVME_CTRL_RESETTING state now.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
  2017-05-24 14:26 ` Rakesh Pandit
@ 2017-05-26 10:06   ` Keith Busch
  -1 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2017-05-26 10:06 UTC (permalink / raw)
  To: Rakesh Pandit
  Cc: linux-nvme, linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg

On Wed, May 24, 2017 at 05:26:25PM +0300, Rakesh Pandit wrote:
> Commit c5f6ce97c1210 tries to address multiple resets but fails as
> work_busy doesn't involve any synchronization and can fail.  This is
> reproducible easily as can be seen by WARNING below which is triggered
> with line:
> 
> WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)
> 
> Allowing multiple resets can result in multiple controller removal as
> well if different conditions inside nvme_reset_work fail and which
> might deadlock on device_release_driver.
> 
> This patch makes sure that work queue item (reset_work) is added only
> if controller state != NVME_CTRL_RESETTING and that is achieved by
> moving state change outside nvme_reset_work into nvme_reset and
> removing old work_busy call.  State change is always synchronizated
> using controller spinlock.

So, the reason the state is changed when the work is running rather than
queueing is for the window when the state may be set to NVME_CTRL_DELETING,
and we don't want the reset work to proceed in that case.

What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
then leaving the NVME_CTRL_RESETTING state change as-is?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-26 10:06   ` Keith Busch
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2017-05-26 10:06 UTC (permalink / raw)


On Wed, May 24, 2017@05:26:25PM +0300, Rakesh Pandit wrote:
> Commit c5f6ce97c1210 tries to address multiple resets but fails as
> work_busy doesn't involve any synchronization and can fail.  This is
> reproducible easily as can be seen by WARNING below which is triggered
> with line:
> 
> WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)
> 
> Allowing multiple resets can result in multiple controller removal as
> well if different conditions inside nvme_reset_work fail and which
> might deadlock on device_release_driver.
> 
> This patch makes sure that work queue item (reset_work) is added only
> if controller state != NVME_CTRL_RESETTING and that is achieved by
> moving state change outside nvme_reset_work into nvme_reset and
> removing old work_busy call.  State change is always synchronizated
> using controller spinlock.

So, the reason the state is changed when the work is running rather than
queueing is for the window when the state may be set to NVME_CTRL_DELETING,
and we don't want the reset work to proceed in that case.

What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
then leaving the NVME_CTRL_RESETTING state change as-is?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
  2017-05-26 10:06   ` Keith Busch
@ 2017-05-26 15:28     ` Rakesh Pandit
  -1 siblings, 0 replies; 14+ messages in thread
From: Rakesh Pandit @ 2017-05-26 15:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Andy Lutomirski

Added Andy Lutomirski to CC (APST related issue)

On Fri, May 26, 2017 at 06:06:14AM -0400, Keith Busch wrote:
> On Wed, May 24, 2017 at 05:26:25PM +0300, Rakesh Pandit wrote:
> > Commit c5f6ce97c1210 tries to address multiple resets but fails as
> > work_busy doesn't involve any synchronization and can fail.  This is
> > reproducible easily as can be seen by WARNING below which is triggered
> > with line:
> > 
> > WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)
> > 
> > Allowing multiple resets can result in multiple controller removal as
> > well if different conditions inside nvme_reset_work fail and which
> > might deadlock on device_release_driver.
> > 
> > This patch makes sure that work queue item (reset_work) is added only
> > if controller state != NVME_CTRL_RESETTING and that is achieved by
> > moving state change outside nvme_reset_work into nvme_reset and
> > removing old work_busy call.  State change is always synchronizated
> > using controller spinlock.
> 
> So, the reason the state is changed when the work is running rather than
> queueing is for the window when the state may be set to NVME_CTRL_DELETING,
> and we don't want the reset work to proceed in that case.
>
> What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
> then leaving the NVME_CTRL_RESETTING state change as-is?

Thanks.  I will give it a go as soon as I have hardware available
(have limited access and yesterday was a holiday here) and address
issues pointed by Christoph earlier.

Also there is a related (because I can reproduce it easily on same
device with nvme_remove) but separate issue with APST implementation.
PID (undergoing nvme_uninit_ctrl) waits for ever at
blk_execute_rq. Controller is in DEAD state and nvme_remove_namespaces
just before device_destroy call has killed all queues which seems to
eventually make blk_execute_rq sleep for ever as it tries to sync
updated latency (0 most likely).

[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-26 15:28     ` Rakesh Pandit
  0 siblings, 0 replies; 14+ messages in thread
From: Rakesh Pandit @ 2017-05-26 15:28 UTC (permalink / raw)


Added Andy Lutomirski to CC (APST related issue)

On Fri, May 26, 2017@06:06:14AM -0400, Keith Busch wrote:
> On Wed, May 24, 2017@05:26:25PM +0300, Rakesh Pandit wrote:
> > Commit c5f6ce97c1210 tries to address multiple resets but fails as
> > work_busy doesn't involve any synchronization and can fail.  This is
> > reproducible easily as can be seen by WARNING below which is triggered
> > with line:
> > 
> > WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)
> > 
> > Allowing multiple resets can result in multiple controller removal as
> > well if different conditions inside nvme_reset_work fail and which
> > might deadlock on device_release_driver.
> > 
> > This patch makes sure that work queue item (reset_work) is added only
> > if controller state != NVME_CTRL_RESETTING and that is achieved by
> > moving state change outside nvme_reset_work into nvme_reset and
> > removing old work_busy call.  State change is always synchronizated
> > using controller spinlock.
> 
> So, the reason the state is changed when the work is running rather than
> queueing is for the window when the state may be set to NVME_CTRL_DELETING,
> and we don't want the reset work to proceed in that case.
>
> What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
> then leaving the NVME_CTRL_RESETTING state change as-is?

Thanks.  I will give it a go as soon as I have hardware available
(have limited access and yesterday was a holiday here) and address
issues pointed by Christoph earlier.

Also there is a related (because I can reproduce it easily on same
device with nvme_remove) but separate issue with APST implementation.
PID (undergoing nvme_uninit_ctrl) waits for ever at
blk_execute_rq. Controller is in DEAD state and nvme_remove_namespaces
just before device_destroy call has killed all queues which seems to
eventually make blk_execute_rq sleep for ever as it tries to sync
updated latency (0 most likely).

[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
  2017-05-25  8:30   ` Christoph Hellwig
@ 2017-05-26 20:12     ` Rakesh Pandit
  -1 siblings, 0 replies; 14+ messages in thread
From: Rakesh Pandit @ 2017-05-26 20:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe, Sagi Grimberg

On Thu, May 25, 2017 at 10:30:23AM +0200, Christoph Hellwig wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 4c2ff2b..ba54e2a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -1903,9 +1903,6 @@ static void nvme_reset_work(struct work_struct *work)
> >  	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
> >  	int result = -ENODEV;
> >  
> > -	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
> > -		goto out;
> 
> Can we keep a
> 
> 	WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> 
> here?

Yes. Will do in V3.

Taking it consideration Keith's suggestion of using new state makes
things better so V3 wouldn't touch RESETTING state.

> 
> >  		goto out;
> > @@ -2009,8 +2003,8 @@ static int nvme_reset(struct nvme_dev *dev)
> >  {
> >  	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
> >  		return -ENODEV;
> > -	if (work_busy(&dev->reset_work))
> > -		return -ENODEV;
> > +	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
> > +		return -EBUSY;
> >  	if (!queue_work(nvme_workq, &dev->reset_work))
> >  		return -EBUSY;
> 
> nvme_probe will also have to set the state to NVME_CTRL_RESETTING to
> keep the old behavior, which had some error handling implications.
>

Will keep.

> Also we can replace the work_busy(&dev->reset_work) check in
> nvme_should_reset with a check for the NVME_CTRL_RESETTING state now.

Not replacing it seems better as nvme_reset is always called if
nvme_should_reset returns true and takes care of synchrozation.
Replacing it wouldn't make it possible to use same logic in nvme_reset
as nvme_reset would always be called after.

Thanks,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-26 20:12     ` Rakesh Pandit
  0 siblings, 0 replies; 14+ messages in thread
From: Rakesh Pandit @ 2017-05-26 20:12 UTC (permalink / raw)


On Thu, May 25, 2017@10:30:23AM +0200, Christoph Hellwig wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 4c2ff2b..ba54e2a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -1903,9 +1903,6 @@ static void nvme_reset_work(struct work_struct *work)
> >  	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
> >  	int result = -ENODEV;
> >  
> > -	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
> > -		goto out;
> 
> Can we keep a
> 
> 	WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> 
> here?

Yes. Will do in V3.

Taking it consideration Keith's suggestion of using new state makes
things better so V3 wouldn't touch RESETTING state.

> 
> >  		goto out;
> > @@ -2009,8 +2003,8 @@ static int nvme_reset(struct nvme_dev *dev)
> >  {
> >  	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
> >  		return -ENODEV;
> > -	if (work_busy(&dev->reset_work))
> > -		return -ENODEV;
> > +	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
> > +		return -EBUSY;
> >  	if (!queue_work(nvme_workq, &dev->reset_work))
> >  		return -EBUSY;
> 
> nvme_probe will also have to set the state to NVME_CTRL_RESETTING to
> keep the old behavior, which had some error handling implications.
>

Will keep.

> Also we can replace the work_busy(&dev->reset_work) check in
> nvme_should_reset with a check for the NVME_CTRL_RESETTING state now.

Not replacing it seems better as nvme_reset is always called if
nvme_should_reset returns true and takes care of synchrozation.
Replacing it wouldn't make it possible to use same logic in nvme_reset
as nvme_reset would always be called after.

Thanks,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
  2017-05-26 10:06   ` Keith Busch
@ 2017-05-30 11:58     ` Sagi Grimberg
  -1 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 11:58 UTC (permalink / raw)
  To: Keith Busch, Rakesh Pandit
  Cc: linux-nvme, linux-kernel, Jens Axboe, Christoph Hellwig


>> Allowing multiple resets can result in multiple controller removal as
>> well if different conditions inside nvme_reset_work fail and which
>> might deadlock on device_release_driver.
>>
>> This patch makes sure that work queue item (reset_work) is added only
>> if controller state != NVME_CTRL_RESETTING and that is achieved by
>> moving state change outside nvme_reset_work into nvme_reset and
>> removing old work_busy call.  State change is always synchronizated
>> using controller spinlock.
>
> So, the reason the state is changed when the work is running rather than
> queueing is for the window when the state may be set to NVME_CTRL_DELETING,
> and we don't want the reset work to proceed in that case.
>
> What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
> then leaving the NVME_CTRL_RESETTING state change as-is?

OK, just got to this one.

Instead of adding yet another state, how about making controller delete
cancel the reset_work (cancel_work_sync)?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-30 11:58     ` Sagi Grimberg
  0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 11:58 UTC (permalink / raw)



>> Allowing multiple resets can result in multiple controller removal as
>> well if different conditions inside nvme_reset_work fail and which
>> might deadlock on device_release_driver.
>>
>> This patch makes sure that work queue item (reset_work) is added only
>> if controller state != NVME_CTRL_RESETTING and that is achieved by
>> moving state change outside nvme_reset_work into nvme_reset and
>> removing old work_busy call.  State change is always synchronizated
>> using controller spinlock.
>
> So, the reason the state is changed when the work is running rather than
> queueing is for the window when the state may be set to NVME_CTRL_DELETING,
> and we don't want the reset work to proceed in that case.
>
> What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
> then leaving the NVME_CTRL_RESETTING state change as-is?

OK, just got to this one.

Instead of adding yet another state, how about making controller delete
cancel the reset_work (cancel_work_sync)?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
  2017-05-30 11:58     ` Sagi Grimberg
@ 2017-05-31 16:17       ` Keith Busch
  -1 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2017-05-31 16:17 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Rakesh Pandit, linux-nvme, linux-kernel, Jens Axboe, Christoph Hellwig

On Tue, May 30, 2017 at 02:58:06PM +0300, Sagi Grimberg wrote:
> > So, the reason the state is changed when the work is running rather than
> > queueing is for the window when the state may be set to NVME_CTRL_DELETING,
> > and we don't want the reset work to proceed in that case.
> > 
> > What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
> > then leaving the NVME_CTRL_RESETTING state change as-is?
> 
> OK, just got to this one.
> 
> Instead of adding yet another state, how about making controller delete
> cancel the reset_work (cancel_work_sync)?

Yes, that should also work.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling
@ 2017-05-31 16:17       ` Keith Busch
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2017-05-31 16:17 UTC (permalink / raw)


On Tue, May 30, 2017@02:58:06PM +0300, Sagi Grimberg wrote:
> > So, the reason the state is changed when the work is running rather than
> > queueing is for the window when the state may be set to NVME_CTRL_DELETING,
> > and we don't want the reset work to proceed in that case.
> > 
> > What do you think about adding a new state, like NVME_CTRL_SCHED_RESET,
> > then leaving the NVME_CTRL_RESETTING state change as-is?
> 
> OK, just got to this one.
> 
> Instead of adding yet another state, how about making controller delete
> cancel the reset_work (cancel_work_sync)?

Yes, that should also work.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-05-31 16:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-24 14:26 [PATCH V2 1/1] nvme: fix multiple ctrl removal scheduling Rakesh Pandit
2017-05-24 14:26 ` Rakesh Pandit
2017-05-25  8:30 ` Christoph Hellwig
2017-05-25  8:30   ` Christoph Hellwig
2017-05-26 20:12   ` Rakesh Pandit
2017-05-26 20:12     ` Rakesh Pandit
2017-05-26 10:06 ` Keith Busch
2017-05-26 10:06   ` Keith Busch
2017-05-26 15:28   ` Rakesh Pandit
2017-05-26 15:28     ` Rakesh Pandit
2017-05-30 11:58   ` Sagi Grimberg
2017-05-30 11:58     ` Sagi Grimberg
2017-05-31 16:17     ` Keith Busch
2017-05-31 16:17       ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.