All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] nvme_ns_remove() deadlock fix
@ 2016-04-25 21:20 Ming Lin
  2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Ming Lin @ 2016-04-25 21:20 UTC (permalink / raw)


From: Ming Lin <ming.l@ssi.samsung.com>

Hi Sunad,

I don't have device that can do namespace attribute change.
Could you help to verify these 2 patches fix the ns remove deadlock?

Thanks.

v2:
  - switch to RCU freeing the namespace

Ming Lin (2):
  nvme: switch to RCU freeing the namespace
  nvme: fix nvme_ns_remove() deadlock

 drivers/nvme/host/core.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-04-25 21:20 [PATCH v2 0/2] nvme_ns_remove() deadlock fix Ming Lin
@ 2016-04-25 21:20 ` Ming Lin
  2016-04-26  8:41   ` Christoph Hellwig
                     ` (2 more replies)
  2016-04-25 21:20 ` [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock Ming Lin
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 24+ messages in thread
From: Ming Lin @ 2016-04-25 21:20 UTC (permalink / raw)


From: Ming Lin <ming.l@ssi.samsung.com>

Switch to RCU freeing the namespace structure so that
nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
be able to get away with only a RCU read side critical section.

Suggested-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Ming Lin <ming.l at ssi.samsung.com>
---
 drivers/nvme/host/core.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4eb5759..914d336 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1380,7 +1380,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (nvme_revalidate_disk(ns->disk))
 		goto out_free_disk;
 
-	list_add_tail(&ns->list, &ctrl->namespaces);
+	list_add_tail_rcu(&ns->list, &ctrl->namespaces);
 	kref_get(&ctrl->kref);
 	if (ns->type == NVME_NS_LIGHTNVM)
 		return;
@@ -1418,6 +1418,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
 	mutex_unlock(&ns->ctrl->namespaces_mutex);
+	synchronize_rcu();
 	nvme_put_ns(ns);
 }
 
@@ -1628,8 +1629,8 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
-	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
 		if (!kref_get_unless_zero(&ns->kref))
 			continue;
 
@@ -1646,7 +1647,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 		nvme_put_ns(ns);
 	}
-	mutex_unlock(&ctrl->namespaces_mutex);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
@@ -1654,8 +1655,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
-	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
 		spin_lock_irq(ns->queue->queue_lock);
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
@@ -1663,7 +1664,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
-	mutex_unlock(&ctrl->namespaces_mutex);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
@@ -1671,13 +1672,13 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
-	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
 		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-	mutex_unlock(&ctrl->namespaces_mutex);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock
  2016-04-25 21:20 [PATCH v2 0/2] nvme_ns_remove() deadlock fix Ming Lin
  2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
@ 2016-04-25 21:20 ` Ming Lin
  2016-04-26  8:42   ` Christoph Hellwig
  2016-04-26 21:17   ` Sagi Grimberg
  2016-04-26 15:39 ` [PATCH v2 0/2] nvme_ns_remove() deadlock fix Keith Busch
  2016-05-02 15:16 ` Jens Axboe
  3 siblings, 2 replies; 24+ messages in thread
From: Ming Lin @ 2016-04-25 21:20 UTC (permalink / raw)


From: Ming Lin <ming.l@ssi.samsung.com>

On receipt of a namespace attribute changed AER, we acquire the
namespace mutex lock before proceeding to scan and validate the
namespace list. In case of namespace detach/delete command,
nvme_ns_remove function deadlocks trying to acquire the already held
lock.

All callers, except nvme_remove_namespaces(), of nvme_ns_remove()
already held namespaces_mutex. So we can simply fix the deadlock by
not acquiring the mutex in nvme_ns_remove() and acquiring it in
nvme_remove_namespaces().

Reported-by: Sunad Bhandary S <sunad.s at samsung.com>
Signed-off-by: Ming Lin <ming.l at ssi.samsung.com>
---
 drivers/nvme/host/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 914d336..4fff373 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1403,6 +1403,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
+	lockdep_assert_held(&ns->ctrl->namespaces_mutex);
+
 	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
 		return;
 
@@ -1415,9 +1417,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 		blk_mq_abort_requeue_list(ns->queue);
 		blk_cleanup_queue(ns->queue);
 	}
-	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
-	mutex_unlock(&ns->ctrl->namespaces_mutex);
 	synchronize_rcu();
 	nvme_put_ns(ns);
 }
@@ -1513,8 +1513,10 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns, *next;
 
+	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list)
 		nvme_ns_remove(ns);
+	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_remove_namespaces);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
@ 2016-04-26  8:41   ` Christoph Hellwig
  2016-04-26 21:17   ` Sagi Grimberg
  2016-05-15  6:58   ` Ming Lin
  2 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2016-04-26  8:41 UTC (permalink / raw)


On Mon, Apr 25, 2016@02:20:18PM -0700, Ming Lin wrote:
> From: Ming Lin <ming.l at ssi.samsung.com>
> 
> Switch to RCU freeing the namespace structure so that
> nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
> be able to get away with only a RCU read side critical section.
> 
> Suggested-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Ming Lin <ming.l at ssi.samsung.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock
  2016-04-25 21:20 ` [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock Ming Lin
@ 2016-04-26  8:42   ` Christoph Hellwig
  2016-04-26 21:17   ` Sagi Grimberg
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2016-04-26  8:42 UTC (permalink / raw)


Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/2] nvme_ns_remove() deadlock fix
  2016-04-25 21:20 [PATCH v2 0/2] nvme_ns_remove() deadlock fix Ming Lin
  2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
  2016-04-25 21:20 ` [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock Ming Lin
@ 2016-04-26 15:39 ` Keith Busch
  2016-05-02 15:16 ` Jens Axboe
  3 siblings, 0 replies; 24+ messages in thread
From: Keith Busch @ 2016-04-26 15:39 UTC (permalink / raw)


On Mon, Apr 25, 2016@02:20:17PM -0700, Ming Lin wrote:
> From: Ming Lin <ming.l at ssi.samsung.com>
> 
> Hi Sunad,
> 
> I don't have device that can do namespace attribute change.
> Could you help to verify these 2 patches fix the ns remove deadlock?
> 
> Thanks.
> 
> v2:
>   - switch to RCU freeing the namespace
> 
> Ming Lin (2):
>   nvme: switch to RCU freeing the namespace
>   nvme: fix nvme_ns_remove() deadlock
> 
>  drivers/nvme/host/core.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)

This looks good, and successfully tested on a namespace management capable
controller. These tests will be run more often from now on. Thanks for
the patches.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock
  2016-04-25 21:20 ` [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock Ming Lin
  2016-04-26  8:42   ` Christoph Hellwig
@ 2016-04-26 21:17   ` Sagi Grimberg
  1 sibling, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2016-04-26 21:17 UTC (permalink / raw)


Looks good,

Reviewed-by: Sagi Grimerg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
  2016-04-26  8:41   ` Christoph Hellwig
@ 2016-04-26 21:17   ` Sagi Grimberg
  2016-05-15  6:58   ` Ming Lin
  2 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2016-04-26 21:17 UTC (permalink / raw)


Looks good,

Reviewed-by: Sagi Grimbeg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/2] nvme_ns_remove() deadlock fix
  2016-04-25 21:20 [PATCH v2 0/2] nvme_ns_remove() deadlock fix Ming Lin
                   ` (2 preceding siblings ...)
  2016-04-26 15:39 ` [PATCH v2 0/2] nvme_ns_remove() deadlock fix Keith Busch
@ 2016-05-02 15:16 ` Jens Axboe
  3 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2016-05-02 15:16 UTC (permalink / raw)


On 04/25/2016 03:20 PM, Ming Lin wrote:
> From: Ming Lin <ming.l at ssi.samsung.com>
>
> Hi Sunad,
>
> I don't have device that can do namespace attribute change.
> Could you help to verify these 2 patches fix the ns remove deadlock?

Applied for 4.7, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
  2016-04-26  8:41   ` Christoph Hellwig
  2016-04-26 21:17   ` Sagi Grimberg
@ 2016-05-15  6:58   ` Ming Lin
  2016-05-16 22:38     ` Ming Lin
  2 siblings, 1 reply; 24+ messages in thread
From: Ming Lin @ 2016-05-15  6:58 UTC (permalink / raw)


On Mon, 2016-04-25@14:20 -0700, Ming Lin wrote:
>?
> @@ -1654,8 +1655,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
> ?{
> ?	struct nvme_ns *ns;
> ?
> -	mutex_lock(&ctrl->namespaces_mutex);
> -	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
> ?		spin_lock_irq(ns->queue->queue_lock);
> ?		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
> ?		spin_unlock_irq(ns->queue->queue_lock);
> @@ -1663,7 +1664,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
> ?		blk_mq_cancel_requeue_work(ns->queue);

Blame myself.

We hold RCU lock, but?blk_mq_cancel_requeue_work() may sleep.

So "echo 1 > /sys/class/nvme/nvme0/reset_controller" triggers below
BUG.

Thinking on the fix ...

[ 2348.050146] BUG: sleeping function called from invalid context at /home/mlin/linux/kernel/workqueue.c:2783
[ 2348.062044] in_atomic(): 0, irqs_disabled(): 0, pid: 1696, name: kworker/u16:0
[ 2348.070810] 4 locks held by kworker/u16:0/1696:
[ 2348.076900]??#0:??("nvme"){++++.+}, at: [<ffffffff81088c87>] process_one_work+0x147/0x430
[ 2348.086626]??#1:??((&dev->reset_work)){+.+.+.}, at: [<ffffffff81088c87>] process_one_work+0x147/0x430
[ 2348.097326]??#2:??(&dev->shutdown_lock){+.+...}, at: [<ffffffffc08cef2a>] nvme_dev_disable+0x4a/0x350 [nvme]
[ 2348.108577]??#3:??(rcu_read_lock){......}, at: [<ffffffffc0813980>] nvme_stop_queues+0x0/0x1a0 [nvme_core]
[ 2348.119620] CPU: 3 PID: 1696 Comm: kworker/u16:0 Tainted: G???????????OE???4.6.0-rc3+ #197
[ 2348.129220] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013
[ 2348.137827] Workqueue: nvme nvme_reset_work [nvme]
[ 2348.144012]??0000000000000000 ffff8800d94d3a48 ffffffff81379e4c ffff88011a639640
[ 2348.152867]??ffffffff81a12688 ffff8800d94d3a70 ffffffff81094814 ffffffff81a12688
[ 2348.161728]??0000000000000adf 0000000000000000 ffff8800d94d3a98 ffffffff81094904
[ 2348.170584] Call Trace:
[ 2348.174441]??[<ffffffff81379e4c>] dump_stack+0x85/0xc9
[ 2348.181004]??[<ffffffff81094814>] ___might_sleep+0x144/0x1f0
[ 2348.188065]??[<ffffffff81094904>] __might_sleep+0x44/0x80
[ 2348.194863]??[<ffffffff81087b5e>] flush_work+0x6e/0x290
[ 2348.201492]??[<ffffffff81087af0>] ? __queue_delayed_work+0x150/0x150
[ 2348.209266]??[<ffffffff81126cf5>] ? irq_work_queue+0x75/0x90
[ 2348.216335]??[<ffffffff810ca136>] ? wake_up_klogd+0x36/0x50
[ 2348.223330]??[<ffffffff810b7fa6>] ? mark_held_locks+0x66/0x90
[ 2348.230495]??[<ffffffff81088898>] ? __cancel_work_timer+0xf8/0x1c0
[ 2348.238088]??[<ffffffff8108883b>] __cancel_work_timer+0x9b/0x1c0
[ 2348.245496]??[<ffffffff810cadaa>] ? vprintk_default+0x1a/0x20
[ 2348.252629]??[<ffffffff81142558>] ? printk+0x48/0x4a
[ 2348.258984]??[<ffffffff8108896b>] cancel_work_sync+0xb/0x10
[ 2348.265951]??[<ffffffff81350fb0>] blk_mq_cancel_requeue_work+0x10/0x20
[ 2348.273868]??[<ffffffffc0813ae7>] nvme_stop_queues+0x167/0x1a0 [nvme_core]
[ 2348.282132]??[<ffffffffc0813980>] ? nvme_kill_queues+0x190/0x190 [nvme_core]
[ 2348.290568]??[<ffffffffc08cef51>] nvme_dev_disable+0x71/0x350 [nvme]
[ 2348.298308]??[<ffffffff810b8f40>] ? __lock_acquire+0xa80/0x1ad0
[ 2348.305614]??[<ffffffff810944b6>] ? finish_task_switch+0xa6/0x2c0
[ 2348.313099]??[<ffffffffc08cffd4>] nvme_reset_work+0x214/0xd40 [nvme]
[ 2348.320841]??[<ffffffff8176df17>] ? _raw_spin_unlock_irq+0x27/0x50
[ 2348.328410]??[<ffffffff81088ce3>] process_one_work+0x1a3/0x430
[ 2348.335633]??[<ffffffff81088c87>] ? process_one_work+0x147/0x430
[ 2348.343030]??[<ffffffff810891d6>] worker_thread+0x266/0x4a0
[ 2348.349986]??[<ffffffff8176871b>] ? __schedule+0x2fb/0x8d0
[ 2348.356852]??[<ffffffff81088f70>] ? process_one_work+0x430/0x430
[ 2348.364238]??[<ffffffff8108f529>] kthread+0xf9/0x110
[ 2348.370581]??[<ffffffff8176e912>] ret_from_fork+0x22/0x50
[ 2348.377344]??[<ffffffff8108f430>] ? kthread_create_on_node+0x230/0x230

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-15  6:58   ` Ming Lin
@ 2016-05-16 22:38     ` Ming Lin
  2016-05-17 15:05       ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lin @ 2016-05-16 22:38 UTC (permalink / raw)


On Sat, May 14, 2016@11:58 PM, Ming Lin <mlin@kernel.org> wrote:
> On Mon, 2016-04-25@14:20 -0700, Ming Lin wrote:
>>
>> @@ -1654,8 +1655,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>  {
>>       struct nvme_ns *ns;
>>
>> -     mutex_lock(&ctrl->namespaces_mutex);
>> -     list_for_each_entry(ns, &ctrl->namespaces, list) {
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
>>               spin_lock_irq(ns->queue->queue_lock);
>>               queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
>>               spin_unlock_irq(ns->queue->queue_lock);
>> @@ -1663,7 +1664,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>               blk_mq_cancel_requeue_work(ns->queue);

Hi Keith,

I haven't found a way to fix below bug.
Could you help me to understand why blk_mq_cancel_requeue_work() here?

I know blk_mq_cancel_requeue_work() was introduced in:

commit c68ed59f534c318716c6189050af3c5ea03b8071
Author: Keith Busch <keith.busch at intel.com>
Date:   Wed Jan 7 18:55:44 2015 -0700

    blk-mq: Let drivers cancel requeue_work

    Kicking requeued requests will start h/w queues in a work_queue, which
    may alter the driver's requested state to temporarily stop them. This
    patch exports a method to cancel the q->requeue_work so a driver can be
    assured stopped h/w queues won't be started up before it is ready.

    Signed-off-by: Keith Busch <keith.busch at intel.com>
    Signed-off-by: Jens Axboe <axboe at fb.com>


Thanks,
Ming


>
> Blame myself.
>
> We hold RCU lock, but blk_mq_cancel_requeue_work() may sleep.
>
> So "echo 1 > /sys/class/nvme/nvme0/reset_controller" triggers below
> BUG.
>
> Thinking on the fix ...
>
> [ 2348.050146] BUG: sleeping function called from invalid context at /home/mlin/linux/kernel/workqueue.c:2783
> [ 2348.062044] in_atomic(): 0, irqs_disabled(): 0, pid: 1696, name: kworker/u16:0
> [ 2348.070810] 4 locks held by kworker/u16:0/1696:
> [ 2348.076900]  #0:  ("nvme"){++++.+}, at: [<ffffffff81088c87>] process_one_work+0x147/0x430
> [ 2348.086626]  #1:  ((&dev->reset_work)){+.+.+.}, at: [<ffffffff81088c87>] process_one_work+0x147/0x430
> [ 2348.097326]  #2:  (&dev->shutdown_lock){+.+...}, at: [<ffffffffc08cef2a>] nvme_dev_disable+0x4a/0x350 [nvme]
> [ 2348.108577]  #3:  (rcu_read_lock){......}, at: [<ffffffffc0813980>] nvme_stop_queues+0x0/0x1a0 [nvme_core]
> [ 2348.119620] CPU: 3 PID: 1696 Comm: kworker/u16:0 Tainted: G           OE   4.6.0-rc3+ #197
> [ 2348.129220] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013
> [ 2348.137827] Workqueue: nvme nvme_reset_work [nvme]
> [ 2348.144012]  0000000000000000 ffff8800d94d3a48 ffffffff81379e4c ffff88011a639640
> [ 2348.152867]  ffffffff81a12688 ffff8800d94d3a70 ffffffff81094814 ffffffff81a12688
> [ 2348.161728]  0000000000000adf 0000000000000000 ffff8800d94d3a98 ffffffff81094904
> [ 2348.170584] Call Trace:
> [ 2348.174441]  [<ffffffff81379e4c>] dump_stack+0x85/0xc9
> [ 2348.181004]  [<ffffffff81094814>] ___might_sleep+0x144/0x1f0
> [ 2348.188065]  [<ffffffff81094904>] __might_sleep+0x44/0x80
> [ 2348.194863]  [<ffffffff81087b5e>] flush_work+0x6e/0x290
> [ 2348.201492]  [<ffffffff81087af0>] ? __queue_delayed_work+0x150/0x150
> [ 2348.209266]  [<ffffffff81126cf5>] ? irq_work_queue+0x75/0x90
> [ 2348.216335]  [<ffffffff810ca136>] ? wake_up_klogd+0x36/0x50
> [ 2348.223330]  [<ffffffff810b7fa6>] ? mark_held_locks+0x66/0x90
> [ 2348.230495]  [<ffffffff81088898>] ? __cancel_work_timer+0xf8/0x1c0
> [ 2348.238088]  [<ffffffff8108883b>] __cancel_work_timer+0x9b/0x1c0
> [ 2348.245496]  [<ffffffff810cadaa>] ? vprintk_default+0x1a/0x20
> [ 2348.252629]  [<ffffffff81142558>] ? printk+0x48/0x4a
> [ 2348.258984]  [<ffffffff8108896b>] cancel_work_sync+0xb/0x10
> [ 2348.265951]  [<ffffffff81350fb0>] blk_mq_cancel_requeue_work+0x10/0x20
> [ 2348.273868]  [<ffffffffc0813ae7>] nvme_stop_queues+0x167/0x1a0 [nvme_core]
> [ 2348.282132]  [<ffffffffc0813980>] ? nvme_kill_queues+0x190/0x190 [nvme_core]
> [ 2348.290568]  [<ffffffffc08cef51>] nvme_dev_disable+0x71/0x350 [nvme]
> [ 2348.298308]  [<ffffffff810b8f40>] ? __lock_acquire+0xa80/0x1ad0
> [ 2348.305614]  [<ffffffff810944b6>] ? finish_task_switch+0xa6/0x2c0
> [ 2348.313099]  [<ffffffffc08cffd4>] nvme_reset_work+0x214/0xd40 [nvme]
> [ 2348.320841]  [<ffffffff8176df17>] ? _raw_spin_unlock_irq+0x27/0x50
> [ 2348.328410]  [<ffffffff81088ce3>] process_one_work+0x1a3/0x430
> [ 2348.335633]  [<ffffffff81088c87>] ? process_one_work+0x147/0x430
> [ 2348.343030]  [<ffffffff810891d6>] worker_thread+0x266/0x4a0
> [ 2348.349986]  [<ffffffff8176871b>] ? __schedule+0x2fb/0x8d0
> [ 2348.356852]  [<ffffffff81088f70>] ? process_one_work+0x430/0x430
> [ 2348.364238]  [<ffffffff8108f529>] kthread+0xf9/0x110
> [ 2348.370581]  [<ffffffff8176e912>] ret_from_fork+0x22/0x50
> [ 2348.377344]  [<ffffffff8108f430>] ? kthread_create_on_node+0x230/0x230

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-16 22:38     ` Ming Lin
@ 2016-05-17 15:05       ` Keith Busch
  2016-05-17 15:23         ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2016-05-17 15:05 UTC (permalink / raw)


On Mon, May 16, 2016@03:38:38PM -0700, Ming Lin wrote:
> On Sat, May 14, 2016@11:58 PM, Ming Lin <mlin@kernel.org> wrote:
> > On Mon, 2016-04-25@14:20 -0700, Ming Lin wrote:
> >>
> >> @@ -1654,8 +1655,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >>  {
> >>       struct nvme_ns *ns;
> >>
> >> -     mutex_lock(&ctrl->namespaces_mutex);
> >> -     list_for_each_entry(ns, &ctrl->namespaces, list) {
> >> +     rcu_read_lock();
> >> +     list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
> >>               spin_lock_irq(ns->queue->queue_lock);
> >>               queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
> >>               spin_unlock_irq(ns->queue->queue_lock);
> >> @@ -1663,7 +1664,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >>               blk_mq_cancel_requeue_work(ns->queue);
> 
> Hi Keith,
> 
> I haven't found a way to fix below bug.
> Could you help me to understand why blk_mq_cancel_requeue_work() here?

We cancel the work because blk_mq_requeue_work starts all the h/w queues,
negating the whole point of the driver request blk-mq stop sending more
commands during a reset.

Today, though, the driver has newer checks so it doesn't crash when
blk-mq submits a command the driver can't handle. If we change the nvme
pci driver's nvme_queue_rq to stop hw queues before returning
BLK_MQ_RQ_QUEUE_BUSY (scsi_queue_rq() in scsi_lib.c might be a good
example), we could skip cancelling requeue work in "nvme_stop_queues()"
if we're sure it won't race with the reset work's nvme_start_queues().

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 15:05       ` Keith Busch
@ 2016-05-17 15:23         ` Keith Busch
  2016-05-17 15:30           ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2016-05-17 15:23 UTC (permalink / raw)


On Tue, May 17, 2016@11:05:11AM -0400, Keith Busch wrote:
> Today, though, the driver has newer checks so it doesn't crash when
> blk-mq submits a command the driver can't handle. If we change the nvme
> pci driver's nvme_queue_rq to stop hw queues before returning
> BLK_MQ_RQ_QUEUE_BUSY (scsi_queue_rq() in scsi_lib.c might be a good
> example), we could skip cancelling requeue work in "nvme_stop_queues()"
> if we're sure it won't race with the reset work's nvme_start_queues().

Untested patch below:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 643f457..13d3cf9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1552,7 +1552,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
@@ -1565,7 +1564,10 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
+		spin_lock_irq(ns->queue->queue_lock);
+		queue_flag_clear(QUEUE_FLAG_STOPPED, ns->queue);
+		spin_unlock_irq(ns->queue->queue_lock);
+
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4fd733f..b65d7d6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -690,6 +690,12 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	spin_unlock_irq(&nvmeq->q_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 out:
+	if (ret == BLK_MQ_RQ_QUEUE_BUSY) {
+		spin_lock_irq(ns->queue->queue_lock);
+		if (!blk_queue_stopped(req->q))
+			blk_mq_stop_hw_queues(ns->queue);
+		spin_unlock_irq(ns->queue->queue_lock);
+	}
 	nvme_free_iod(dev, req);
 	return ret;
 }
--

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 15:23         ` Keith Busch
@ 2016-05-17 15:30           ` Keith Busch
  2016-05-17 20:48             ` Ming Lin
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2016-05-17 15:30 UTC (permalink / raw)


On Tue, May 17, 2016@11:23:59AM -0400, Keith Busch wrote:
>  out:
> +	if (ret == BLK_MQ_RQ_QUEUE_BUSY) {
> +		spin_lock_irq(ns->queue->queue_lock);
> +		if (!blk_queue_stopped(req->q))


Err ... rather, the above line should be:

+		if (blk_queue_stopped(req->q))


> +			blk_mq_stop_hw_queues(ns->queue);
> +		spin_unlock_irq(ns->queue->queue_lock);
> +	}
>  	nvme_free_iod(dev, req);
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 15:30           ` Keith Busch
@ 2016-05-17 20:48             ` Ming Lin
  2016-05-17 21:07               ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lin @ 2016-05-17 20:48 UTC (permalink / raw)


On Tue, May 17, 2016@8:30 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Tue, May 17, 2016@11:23:59AM -0400, Keith Busch wrote:
>>  out:
>> +     if (ret == BLK_MQ_RQ_QUEUE_BUSY) {
>> +             spin_lock_irq(ns->queue->queue_lock);
>> +             if (!blk_queue_stopped(req->q))
>
>
> Err ... rather, the above line should be:
>
> +               if (blk_queue_stopped(req->q))
>
>
>> +                     blk_mq_stop_hw_queues(ns->queue);
>> +             spin_unlock_irq(ns->queue->queue_lock);
>> +     }
>>       nvme_free_iod(dev, req);
>>       return ret;
>>  }

I applied below changes and it seems work.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 10c8006..ac950d1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1982,7 +1982,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
                queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
                spin_unlock_irq(ns->queue->queue_lock);

-               blk_mq_cancel_requeue_work(ns->queue);
                blk_mq_stop_hw_queues(ns->queue);
        }
        rcu_read_unlock();
@@ -1995,7 +1994,9 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)

        rcu_read_lock();
        list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
-               queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
+               spin_lock_irq(ns->queue->queue_lock);
+               queue_flag_clear(QUEUE_FLAG_STOPPED, ns->queue);
+               spin_unlock_irq(ns->queue->queue_lock);
                blk_mq_start_stopped_hw_queues(ns->queue, true);
                blk_mq_kick_requeue_list(ns->queue);
        }
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9f64e40..a62c9c5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -609,6 +609,12 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
        spin_unlock_irq(&nvmeq->q_lock);
        return BLK_MQ_RQ_QUEUE_OK;
 out:
+       if (ret == BLK_MQ_RQ_QUEUE_BUSY) {
+               spin_lock_irq(ns->queue->queue_lock);
+               if (blk_queue_stopped(req->q))
+                       blk_mq_stop_hw_queues(ns->queue);
+               spin_unlock_irq(ns->queue->queue_lock);
+       }
        nvme_free_iod(dev, req);
        return ret;
 }

But here is a crash when I do stress reset test with live IO.

while [ 1 ] ; do
        echo > /sys/class/nvme/nvme0/reset_controller
done

I think this crash is not related to your patch. Because I can
reproduce it without your patch.

[   44.985454] block (null): nvme_revalidate_disk: Identify failure
[   45.089224] BUG: unable to handle kernel paging request at 000000006fc81ab0
[   45.096949] IP: [<ffffffff811a0baf>] kmem_cache_alloc+0x7f/0x170
[   45.103705] PGD 0
[   45.106470] Oops: 0000 [#1] PREEMPT SMP

[   45.229716] CPU: 0 PID: 72 Comm: kworker/0:1 Tainted: G
OE   4.6.0-rc3+ #197
[   45.238557] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12
01/10/2013
[   45.246709] Workqueue: events nvme_scan_work [nvme_core]
[   45.252977] task: ffff8800da071640 ti: ffff8800da3c4000 task.ti:
ffff8800da3c4000
[   45.261403] RIP: 0010:[<ffffffff811a0baf>]  [<ffffffff811a0baf>]
kmem_cache_alloc+0x7f/0x170
[   45.270804] RSP: 0018:ffff8800da3c7b50  EFLAGS: 00010286
[   45.277075] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 0000000000f5cc00
[   45.285167] RDX: 0000000000f5cb00 RSI: 0000000000f5cb00 RDI: 0000000000000246
[   45.293252] RBP: ffff8800da3c7b70 R08: 00000000000199f0 R09: 0000000000000000
[   45.301342] R10: 0000000000000001 R11: 0000000000000000 R12: 000000006fc81ab0
[   45.309441] R13: ffff88011b402f00 R14: 00000000024000c0 R15: ffff88011b402f00
[   45.317554] FS:  0000000000000000(0000) GS:ffff880120200000(0000)
knlGS:0000000000000000
[   45.326624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.333372] CR2: 000000006fc81ab0 CR3: 0000000001c06000 CR4: 00000000001406f0
[   45.341512] Stack:
[   45.344548]  00000000024000c0 0000000000000002 ffffffff81147810
ffff8800d043d600
[   45.353020]  ffff8800da3c7b80 ffffffff81147820 ffff8800da3c7bc8
ffffffff81147d9f
[   45.361503]  ffffffff00000001 ffffffff81147830 ffff88011b403400
0000000000000002
[   45.369979] Call Trace:
[   45.373455]  [<ffffffff81147810>] ? mempool_kfree+0x10/0x10
[   45.380043]  [<ffffffff81147820>] mempool_alloc_slab+0x10/0x20
[   45.386888]  [<ffffffff81147d9f>] mempool_create_node+0xcf/0x130
[   45.393896]  [<ffffffff81147830>] ? mempool_alloc_slab+0x20/0x20
[   45.400910]  [<ffffffff81147e15>] mempool_create+0x15/0x20
[   45.407453]  [<ffffffff8134073e>] __bioset_create+0x1ee/0x2d0
[   45.414171]  [<ffffffff8137bec5>] ? ida_simple_get+0x85/0xe0
[   45.420799]  [<ffffffff8134082e>] bioset_create+0xe/0x10
[   45.427077]  [<ffffffff81344d0f>] blk_alloc_queue_node+0x5f/0x2e0
[   45.434124]  [<ffffffff8135390b>] blk_mq_init_queue+0x1b/0x60
[   45.440830]  [<ffffffffc081d272>] nvme_validate_ns+0xb2/0x290 [nvme_core]
[   45.448570]  [<ffffffffc081d665>] nvme_scan_work+0x215/0x330 [nvme_core]
[   45.456215]  [<ffffffff81088ce3>] process_one_work+0x1a3/0x430
[   45.462994]  [<ffffffff81088c87>] ? process_one_work+0x147/0x430
[   45.469947]  [<ffffffff81089096>] worker_thread+0x126/0x4a0
[   45.476468]  [<ffffffff8176871b>] ? __schedule+0x2fb/0x8d0
[   45.482902]  [<ffffffff81088f70>] ? process_one_work+0x430/0x430
[   45.489855]  [<ffffffff8108f529>] kthread+0xf9/0x110
[   45.495770]  [<ffffffff8176e912>] ret_from_fork+0x22/0x50
[   45.502109]  [<ffffffff8108f430>] ? kthread_create_on_node+0x230/0x230

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 20:48             ` Ming Lin
@ 2016-05-17 21:07               ` Keith Busch
  2016-05-17 21:09                 ` Ming Lin
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2016-05-17 21:07 UTC (permalink / raw)


On Tue, May 17, 2016@01:48:18PM -0700, Ming Lin wrote:
> I applied below changes and it seems work.

Great, thanks. I was getting good results with this as well.
 
> But here is a crash when I do stress reset test with live IO.
> 
> while [ 1 ] ; do
>         echo > /sys/class/nvme/nvme0/reset_controller
> done
> 
> I think this crash is not related to your patch. Because I can
> reproduce it without your patch.

Bummer. Is your controller using sparse namespaces? The kernel message
before the bug appears to indicate that.


> [   44.985454] block (null): nvme_revalidate_disk: Identify failure
> [   45.089224] BUG: unable to handle kernel paging request at 000000006fc81ab0
> [   45.096949] IP: [<ffffffff811a0baf>] kmem_cache_alloc+0x7f/0x170
> [   45.103705] PGD 0
> [   45.106470] Oops: 0000 [#1] PREEMPT SMP
> [   45.229716] CPU: 0 PID: 72 Comm: kworker/0:1 Tainted: GOE   4.6.0-rc3+ #197
> [   45.238557] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013
> [   45.246709] Workqueue: events nvme_scan_work [nvme_core]
> [   45.252977] task: ffff8800da071640 ti: ffff8800da3c4000 task.ti: ffff8800da3c4000
> [   45.261403] RIP: 0010:[<ffffffff811a0baf>]  [<ffffffff811a0baf>] kmem_cache_alloc+0x7f/0x170
> [   45.270804] RSP: 0018:ffff8800da3c7b50  EFLAGS: 00010286
> [   45.277075] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 0000000000f5cc00
> [   45.285167] RDX: 0000000000f5cb00 RSI: 0000000000f5cb00 RDI: 0000000000000246
> [   45.293252] RBP: ffff8800da3c7b70 R08: 00000000000199f0 R09: 0000000000000000
> [   45.301342] R10: 0000000000000001 R11: 0000000000000000 R12: 000000006fc81ab0
> [   45.309441] R13: ffff88011b402f00 R14: 00000000024000c0 R15: ffff88011b402f00
> [   45.317554] FS:  0000000000000000(0000) GS:ffff880120200000(0000) knlGS:0000000000000000
> [   45.326624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   45.333372] CR2: 000000006fc81ab0 CR3: 0000000001c06000 CR4: 00000000001406f0
> [   45.341512] Stack:
> [   45.344548]  00000000024000c0 0000000000000002 ffffffff81147810 ffff8800d043d600
> [   45.353020]  ffff8800da3c7b80 ffffffff81147820 ffff8800da3c7bc8 ffffffff81147d9f
> [   45.361503]  ffffffff00000001 ffffffff81147830 ffff88011b403400 0000000000000002
> [   45.369979] Call Trace:
> [   45.373455]  [<ffffffff81147810>] ? mempool_kfree+0x10/0x10
> [   45.380043]  [<ffffffff81147820>] mempool_alloc_slab+0x10/0x20
> [   45.386888]  [<ffffffff81147d9f>] mempool_create_node+0xcf/0x130
> [   45.393896]  [<ffffffff81147830>] ? mempool_alloc_slab+0x20/0x20
> [   45.400910]  [<ffffffff81147e15>] mempool_create+0x15/0x20
> [   45.407453]  [<ffffffff8134073e>] __bioset_create+0x1ee/0x2d0
> [   45.414171]  [<ffffffff8137bec5>] ? ida_simple_get+0x85/0xe0
> [   45.420799]  [<ffffffff8134082e>] bioset_create+0xe/0x10
> [   45.427077]  [<ffffffff81344d0f>] blk_alloc_queue_node+0x5f/0x2e0
> [   45.434124]  [<ffffffff8135390b>] blk_mq_init_queue+0x1b/0x60
> [   45.440830]  [<ffffffffc081d272>] nvme_validate_ns+0xb2/0x290 [nvme_core]
> [   45.448570]  [<ffffffffc081d665>] nvme_scan_work+0x215/0x330 [nvme_core]
> [   45.456215]  [<ffffffff81088ce3>] process_one_work+0x1a3/0x430
> [   45.462994]  [<ffffffff81088c87>] ? process_one_work+0x147/0x430
> [   45.469947]  [<ffffffff81089096>] worker_thread+0x126/0x4a0
> [   45.476468]  [<ffffffff8176871b>] ? __schedule+0x2fb/0x8d0
> [   45.482902]  [<ffffffff81088f70>] ? process_one_work+0x430/0x430
> [   45.489855]  [<ffffffff8108f529>] kthread+0xf9/0x110
> [   45.495770]  [<ffffffff8176e912>] ret_from_fork+0x22/0x50
> [   45.502109]  [<ffffffff8108f430>] ? kthread_create_on_node+0x230/0x230

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 21:07               ` Keith Busch
@ 2016-05-17 21:09                 ` Ming Lin
  2016-05-17 21:25                   ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lin @ 2016-05-17 21:09 UTC (permalink / raw)


On Tue, May 17, 2016@2:07 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Tue, May 17, 2016@01:48:18PM -0700, Ming Lin wrote:
>> I applied below changes and it seems work.
>
> Great, thanks. I was getting good results with this as well.

Thanks for the fix. Could you send the patch formally?

>
>> But here is a crash when I do stress reset test with live IO.
>>
>> while [ 1 ] ; do
>>         echo > /sys/class/nvme/nvme0/reset_controller
>> done
>>
>> I think this crash is not related to your patch. Because I can
>> reproduce it without your patch.
>
> Bummer. Is your controller using sparse namespaces? The kernel message
> before the bug appears to indicate that.

No. Only 1 namespace.

Thanks.

>
>
>> [   44.985454] block (null): nvme_revalidate_disk: Identify failure
>> [   45.089224] BUG: unable to handle kernel paging request at 000000006fc81ab0
>> [   45.096949] IP: [<ffffffff811a0baf>] kmem_cache_alloc+0x7f/0x170
>> [   45.103705] PGD 0
>> [   45.106470] Oops: 0000 [#1] PREEMPT SMP
>> [   45.229716] CPU: 0 PID: 72 Comm: kworker/0:1 Tainted: GOE   4.6.0-rc3+ #197
>> [   45.238557] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013
>> [   45.246709] Workqueue: events nvme_scan_work [nvme_core]
>> [   45.252977] task: ffff8800da071640 ti: ffff8800da3c4000 task.ti: ffff8800da3c4000
>> [   45.261403] RIP: 0010:[<ffffffff811a0baf>]  [<ffffffff811a0baf>] kmem_cache_alloc+0x7f/0x170
>> [   45.270804] RSP: 0018:ffff8800da3c7b50  EFLAGS: 00010286
>> [   45.277075] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 0000000000f5cc00
>> [   45.285167] RDX: 0000000000f5cb00 RSI: 0000000000f5cb00 RDI: 0000000000000246
>> [   45.293252] RBP: ffff8800da3c7b70 R08: 00000000000199f0 R09: 0000000000000000
>> [   45.301342] R10: 0000000000000001 R11: 0000000000000000 R12: 000000006fc81ab0
>> [   45.309441] R13: ffff88011b402f00 R14: 00000000024000c0 R15: ffff88011b402f00
>> [   45.317554] FS:  0000000000000000(0000) GS:ffff880120200000(0000) knlGS:0000000000000000
>> [   45.326624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   45.333372] CR2: 000000006fc81ab0 CR3: 0000000001c06000 CR4: 00000000001406f0
>> [   45.341512] Stack:
>> [   45.344548]  00000000024000c0 0000000000000002 ffffffff81147810 ffff8800d043d600
>> [   45.353020]  ffff8800da3c7b80 ffffffff81147820 ffff8800da3c7bc8 ffffffff81147d9f
>> [   45.361503]  ffffffff00000001 ffffffff81147830 ffff88011b403400 0000000000000002
>> [   45.369979] Call Trace:
>> [   45.373455]  [<ffffffff81147810>] ? mempool_kfree+0x10/0x10
>> [   45.380043]  [<ffffffff81147820>] mempool_alloc_slab+0x10/0x20
>> [   45.386888]  [<ffffffff81147d9f>] mempool_create_node+0xcf/0x130
>> [   45.393896]  [<ffffffff81147830>] ? mempool_alloc_slab+0x20/0x20
>> [   45.400910]  [<ffffffff81147e15>] mempool_create+0x15/0x20
>> [   45.407453]  [<ffffffff8134073e>] __bioset_create+0x1ee/0x2d0
>> [   45.414171]  [<ffffffff8137bec5>] ? ida_simple_get+0x85/0xe0
>> [   45.420799]  [<ffffffff8134082e>] bioset_create+0xe/0x10
>> [   45.427077]  [<ffffffff81344d0f>] blk_alloc_queue_node+0x5f/0x2e0
>> [   45.434124]  [<ffffffff8135390b>] blk_mq_init_queue+0x1b/0x60
>> [   45.440830]  [<ffffffffc081d272>] nvme_validate_ns+0xb2/0x290 [nvme_core]
>> [   45.448570]  [<ffffffffc081d665>] nvme_scan_work+0x215/0x330 [nvme_core]
>> [   45.456215]  [<ffffffff81088ce3>] process_one_work+0x1a3/0x430
>> [   45.462994]  [<ffffffff81088c87>] ? process_one_work+0x147/0x430
>> [   45.469947]  [<ffffffff81089096>] worker_thread+0x126/0x4a0
>> [   45.476468]  [<ffffffff8176871b>] ? __schedule+0x2fb/0x8d0
>> [   45.482902]  [<ffffffff81088f70>] ? process_one_work+0x430/0x430
>> [   45.489855]  [<ffffffff8108f529>] kthread+0xf9/0x110
>> [   45.495770]  [<ffffffff8176e912>] ret_from_fork+0x22/0x50
>> [   45.502109]  [<ffffffff8108f430>] ? kthread_create_on_node+0x230/0x230

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 21:09                 ` Ming Lin
@ 2016-05-17 21:25                   ` Keith Busch
  2016-05-19  5:52                     ` Ming Lin
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2016-05-17 21:25 UTC (permalink / raw)


On Tue, May 17, 2016@02:09:34PM -0700, Ming Lin wrote:
> On Tue, May 17, 2016@2:07 PM, Keith Busch <keith.busch@intel.com> wrote:
> > Great, thanks. I was getting good results with this as well.
> 
> Thanks for the fix. Could you send the patch formally?

Will do. Sending shortly.

> > Bummer. Is your controller using sparse namespaces? The kernel message
> > before the bug appears to indicate that.
> 
> No. Only 1 namespace.

Something must be corrupt then. The below line shows an unallocated
namespace is failing to identify itself, but if you only report 1 ns,
we shouldn't have been able to get here from a simple nvme reset.

I think your resets are occuring faster than we anticipated and you've
uncovered another bug. It looks like these may cause trouble if reset
occurs during active scan work.

> >> [   44.985454] block (null): nvme_revalidate_disk: Identify failure

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-17 21:25                   ` Keith Busch
@ 2016-05-19  5:52                     ` Ming Lin
  2016-05-19 20:48                       ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lin @ 2016-05-19  5:52 UTC (permalink / raw)


On Tue, 2016-05-17@17:25 -0400, Keith Busch wrote:
> On Tue, May 17, 2016@02:09:34PM -0700, Ming Lin wrote:
> > On Tue, May 17, 2016 at 2:07 PM, Keith Busch <keith.busch at intel.com
> > > wrote:
> > > Great, thanks. I was getting good results with this as well.
> > 
> > Thanks for the fix. Could you send the patch formally?
> 
> Will do. Sending shortly.
> 
> > > Bummer. Is your controller using sparse namespaces? The kernel
> > > message
> > > before the bug appears to indicate that.
> > 
> > No. Only 1 namespace.
> 
> Something must be corrupt then. The below line shows an unallocated
> namespace is failing to identify itself, but if you only report 1 ns,
> we shouldn't have been able to get here from a simple nvme reset.
> 
> I think your resets are occuring faster than we anticipated and
> you've
> uncovered another bug. It looks like these may cause trouble if reset
> occurs during active scan work.

I have not found the root cause yet.
Below patch makes reset not occur during active scan work.
And I didn't see the crash any more with this patch.

So it seems there is a race somewhere between reset work and scan work.

?drivers/nvme/host/core.c | 13 ++++++++++++-
?drivers/nvme/host/nvme.h |??1 +
?drivers/nvme/host/pci.c??|??3 +++
?3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a57ccd3..8560774 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -89,6 +89,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
?		case NVME_CTRL_NEW:
?		case NVME_CTRL_RESETTING:
?		case NVME_CTRL_RECONNECTING:
+		case NVME_CTRL_SCANING:
?			changed = true;
?			/* FALLTHRU */
?		default:
@@ -126,6 +127,14 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
?			break;
?		}
?		break;
+	case NVME_CTRL_SCANING:
+		switch (old_state) {
+		case NVME_CTRL_LIVE:
+			changed = true;
+			/* FALLTHRU */
+		default:
+			break;
+		}
?	default:
?		break;
?	}
@@ -1755,7 +1764,7 @@ static void nvme_scan_work(struct work_struct *work)
?	struct nvme_id_ctrl *id;
?	unsigned nn;
?
-	if (ctrl->state != NVME_CTRL_LIVE)
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_SCANING))
?		return;
?
?	if (nvme_identify_ctrl(ctrl, &id))
@@ -1776,6 +1785,8 @@ static void nvme_scan_work(struct work_struct *work)
?
?	if (ctrl->ops->post_scan)
?		ctrl->ops->post_scan(ctrl);
+
+	nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
?}
?
?void nvme_queue_scan(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3f3945a..2827825 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -76,6 +76,7 @@ enum nvme_ctrl_state {
?	NVME_CTRL_RESETTING,
?	NVME_CTRL_RECONNECTING,
?	NVME_CTRL_DELETING,
+	NVME_CTRL_SCANING,
?};
?
?struct nvme_ctrl {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 02105da..71260c8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1761,6 +1761,9 @@ static void nvme_reset_work(struct work_struct *work)
?	struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
?	int result = -ENODEV;
?
+	if (dev->ctrl.state == NVME_CTRL_SCANING)
+		return;
+
?	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
?		goto out;
?

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-19  5:52                     ` Ming Lin
@ 2016-05-19 20:48                       ` Keith Busch
  2016-05-20 14:16                         ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2016-05-19 20:48 UTC (permalink / raw)


On Wed, May 18, 2016@10:52:26PM -0700, Ming Lin wrote:
> I have not found the root cause yet.
> Below patch makes reset not occur during active scan work.
> And I didn't see the crash any more with this patch.
> 
> So it seems there is a race somewhere between reset work and scan work.

I don't know about this. I think we should be able to reset during a
scan. A controller CSTS.CFS or IO timeout occuring during a scan should
be able to recover, but this patch could leave everything stuck if
that happens: the watchdog timer that kicked the reset work won't get
restarted since the reset work returns immediately before it disables
the controller to reclaim IO from the failed controller.

Alternatively, a reset scheduled from the timeout handler races with
the scan work changing the controller state, and may not proceed.


> ?drivers/nvme/host/core.c | 13 ++++++++++++-
> ?drivers/nvme/host/nvme.h |??1 +
> ?drivers/nvme/host/pci.c??|??3 +++
> ?3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a57ccd3..8560774 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -89,6 +89,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> ?		case NVME_CTRL_NEW:
> ?		case NVME_CTRL_RESETTING:
> ?		case NVME_CTRL_RECONNECTING:
> +		case NVME_CTRL_SCANING:

spelling: NVME_CTRL_SCANNING

> ?			changed = true;
> ?			/* FALLTHRU */
> ?		default:
> @@ -126,6 +127,14 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> ?			break;
> ?		}
> ?		break;
> +	case NVME_CTRL_SCANING:
> +		switch (old_state) {
> +		case NVME_CTRL_LIVE:
> +			changed = true;
> +			/* FALLTHRU */
> +		default:
> +			break;
> +		}
> ?	default:
> ?		break;
> ?	}
> @@ -1755,7 +1764,7 @@ static void nvme_scan_work(struct work_struct *work)
> ?	struct nvme_id_ctrl *id;
> ?	unsigned nn;
> ?
> -	if (ctrl->state != NVME_CTRL_LIVE)
> +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_SCANING))
> ?		return;
> ?
> ?	if (nvme_identify_ctrl(ctrl, &id))
> @@ -1776,6 +1785,8 @@ static void nvme_scan_work(struct work_struct *work)
> ?
> ?	if (ctrl->ops->post_scan)
> ?		ctrl->ops->post_scan(ctrl);
> +
> +	nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
> ?}
> ?
> ?void nvme_queue_scan(struct nvme_ctrl *ctrl)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 3f3945a..2827825 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -76,6 +76,7 @@ enum nvme_ctrl_state {
> ?	NVME_CTRL_RESETTING,
> ?	NVME_CTRL_RECONNECTING,
> ?	NVME_CTRL_DELETING,
> +	NVME_CTRL_SCANING,
> ?};
> ?
> ?struct nvme_ctrl {
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 02105da..71260c8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1761,6 +1761,9 @@ static void nvme_reset_work(struct work_struct *work)
> ?	struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
> ?	int result = -ENODEV;
> ?
> +	if (dev->ctrl.state == NVME_CTRL_SCANING)
> +		return;
> +
> ?	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
> ?		goto out;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-19 20:48                       ` Keith Busch
@ 2016-05-20 14:16                         ` Keith Busch
  2016-05-20 17:57                           ` Ming Lin
  2016-05-23 10:38                           ` Christoph Hellwig
  0 siblings, 2 replies; 24+ messages in thread
From: Keith Busch @ 2016-05-20 14:16 UTC (permalink / raw)


On Thu, May 19, 2016@04:48:29PM -0400, Keith Busch wrote:
> On Wed, May 18, 2016@10:52:26PM -0700, Ming Lin wrote:
> > I have not found the root cause yet.
> > Below patch makes reset not occur during active scan work.
> > And I didn't see the crash any more with this patch.
> > 
> > So it seems there is a race somewhere between reset work and scan work.

I haven't been able to reproduce the same failure you're getting. What I
see is namespaces being validated fail because the user interrupted the
process by disabling the controller in the middle of discovery. That's
not good either.

We can't just fence off resets during discovery since we need to let
recovery occur if a controller fails.

There's no good reason to let a user interrupt the process, though.

How about for user initiated resets, we synchronize with the scan
work and set a controller state that allows resets but not scan:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1a51584..f7e15df 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1209,6 +1209,13 @@ out_unlock:
 	return ret;
 }
 
+static int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
+{
+	ctrl->state = NVME_CTRL_NEW;
+	flush_work(&ctrl->scan_work);
+	return ctrl->ops->reset_ctrl(ctrl);
+}
+
 static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
 		unsigned long arg)
 {
@@ -1222,7 +1229,7 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
 		return nvme_dev_user_cmd(ctrl, argp);
 	case NVME_IOCTL_RESET:
 		dev_warn(ctrl->device, "resetting controller\n");
-		return ctrl->ops->reset_ctrl(ctrl);
+		return nvme_reset_ctrl(ctrl);
 	case NVME_IOCTL_SUBSYS_RESET:
 		return nvme_reset_subsystem(ctrl);
 	case NVME_IOCTL_RESCAN:
@@ -1248,7 +1255,7 @@ static ssize_t nvme_sysfs_reset(struct device *dev,
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
 	int ret;
 
-	ret = ctrl->ops->reset_ctrl(ctrl);
+	ret = nvme_reset_ctrl(ctrl);
 	if (ret < 0)
 		return ret;
 	return count;
--

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-20 14:16                         ` Keith Busch
@ 2016-05-20 17:57                           ` Ming Lin
  2016-05-23 10:38                           ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Ming Lin @ 2016-05-20 17:57 UTC (permalink / raw)


On Fri, May 20, 2016@7:16 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Thu, May 19, 2016@04:48:29PM -0400, Keith Busch wrote:
>> On Wed, May 18, 2016@10:52:26PM -0700, Ming Lin wrote:
>> > I have not found the root cause yet.
>> > Below patch makes reset not occur during active scan work.
>> > And I didn't see the crash any more with this patch.
>> >
>> > So it seems there is a race somewhere between reset work and scan work.
>
> I haven't been able to reproduce the same failure you're getting. What I
> see is namespaces being validated fail because the user interrupted the
> process by disabling the controller in the middle of discovery. That's
> not good either.

Sorry I should mention that I was testing nvme-over-fabric tree.
I can't reproduce it either with Linus tree.

>
> We can't just fence off resets during discovery since we need to let
> recovery occur if a controller fails.
>
> There's no good reason to let a user interrupt the process, though.
>
> How about for user initiated resets, we synchronize with the scan
> work and set a controller state that allows resets but not scan:

I didn't see the crash any more with below patch applied to
nvme-over-fabric tree.
I'll still try to find out what the root cause is.

Thanks.

>
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1a51584..f7e15df 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1209,6 +1209,13 @@ out_unlock:
>         return ret;
>  }
>
> +static int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
> +{
> +       ctrl->state = NVME_CTRL_NEW;
> +       flush_work(&ctrl->scan_work);
> +       return ctrl->ops->reset_ctrl(ctrl);
> +}
> +
>  static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
>                 unsigned long arg)
>  {
> @@ -1222,7 +1229,7 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
>                 return nvme_dev_user_cmd(ctrl, argp);
>         case NVME_IOCTL_RESET:
>                 dev_warn(ctrl->device, "resetting controller\n");
> -               return ctrl->ops->reset_ctrl(ctrl);
> +               return nvme_reset_ctrl(ctrl);
>         case NVME_IOCTL_SUBSYS_RESET:
>                 return nvme_reset_subsystem(ctrl);
>         case NVME_IOCTL_RESCAN:
> @@ -1248,7 +1255,7 @@ static ssize_t nvme_sysfs_reset(struct device *dev,
>         struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
>         int ret;
>
> -       ret = ctrl->ops->reset_ctrl(ctrl);
> +       ret = nvme_reset_ctrl(ctrl);
>         if (ret < 0)
>                 return ret;
>         return count;
> --

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-20 14:16                         ` Keith Busch
  2016-05-20 17:57                           ` Ming Lin
@ 2016-05-23 10:38                           ` Christoph Hellwig
  2016-05-23 15:22                             ` Keith Busch
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-05-23 10:38 UTC (permalink / raw)


On Fri, May 20, 2016@10:16:50AM -0400, Keith Busch wrote:
> +static int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
> +{
> +	ctrl->state = NVME_CTRL_NEW;

Please always go through nvme_change_ctrl_state for state machine
changes - this function documents the possible ?tate transitions.

Also I don't think returning to state called _new is a good idea,
I'd rather have a different name for a state like that.  That being
said I thinkg we absolutely need a state for the time between
scheduling a reset and changing the state to NVME_CTRL_RESETTING
in nvme_reset_work.  I just can't think of a really food name for it.
NVME_CTRL_PRE_RESET?  NVME_CTRL_RESET_PENDING?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] nvme: switch to RCU freeing the namespace
  2016-05-23 10:38                           ` Christoph Hellwig
@ 2016-05-23 15:22                             ` Keith Busch
  0 siblings, 0 replies; 24+ messages in thread
From: Keith Busch @ 2016-05-23 15:22 UTC (permalink / raw)


On Mon, May 23, 2016@12:38:58PM +0200, Christoph Hellwig wrote:
> On Fri, May 20, 2016@10:16:50AM -0400, Keith Busch wrote:
> > +static int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
> > +{
> > +	ctrl->state = NVME_CTRL_NEW;
> 
> Please always go through nvme_change_ctrl_state for state machine
> changes - this function documents the possible ?tate transitions.
> 
> Also I don't think returning to state called _new is a good idea,
> I'd rather have a different name for a state like that.  That being
> said I thinkg we absolutely need a state for the time between
> scheduling a reset and changing the state to NVME_CTRL_RESETTING
> in nvme_reset_work.  I just can't think of a really food name for it.
> NVME_CTRL_PRE_RESET?  NVME_CTRL_RESET_PENDING?

Thanks for the suggestions. I'll fix it up as a proper patch and send
for consideration.

NVME_CTRL_SCHED_RESET? Naming is hard ...

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-05-23 15:22 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-25 21:20 [PATCH v2 0/2] nvme_ns_remove() deadlock fix Ming Lin
2016-04-25 21:20 ` [PATCH v2 1/2] nvme: switch to RCU freeing the namespace Ming Lin
2016-04-26  8:41   ` Christoph Hellwig
2016-04-26 21:17   ` Sagi Grimberg
2016-05-15  6:58   ` Ming Lin
2016-05-16 22:38     ` Ming Lin
2016-05-17 15:05       ` Keith Busch
2016-05-17 15:23         ` Keith Busch
2016-05-17 15:30           ` Keith Busch
2016-05-17 20:48             ` Ming Lin
2016-05-17 21:07               ` Keith Busch
2016-05-17 21:09                 ` Ming Lin
2016-05-17 21:25                   ` Keith Busch
2016-05-19  5:52                     ` Ming Lin
2016-05-19 20:48                       ` Keith Busch
2016-05-20 14:16                         ` Keith Busch
2016-05-20 17:57                           ` Ming Lin
2016-05-23 10:38                           ` Christoph Hellwig
2016-05-23 15:22                             ` Keith Busch
2016-04-25 21:20 ` [PATCH v2 2/2] nvme: fix nvme_ns_remove() deadlock Ming Lin
2016-04-26  8:42   ` Christoph Hellwig
2016-04-26 21:17   ` Sagi Grimberg
2016-04-26 15:39 ` [PATCH v2 0/2] nvme_ns_remove() deadlock fix Keith Busch
2016-05-02 15:16 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.