All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Namespace iteration fixes
@ 2016-06-23 17:29 Keith Busch
  2016-06-23 17:29 ` [PATCHv2 1/3] nvme: Remove RCU namespace protection Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Keith Busch @ 2016-06-23 17:29 UTC (permalink / raw)


Patch 1/3 cleanly applies to linux-block/for-linus and for-next. The
other two only apply to for-next.

The first fixes a bug introduced in 4.7 and hoping this can be applied
to 4.7. It's version 2 of this patch. The reset of the series is targeted
to 4.8. 

The second patch fixes issues where buffered writers block removal
from proceeding.

The third locks the namespace mutex when searching for invalid
namespaces. While it doesn't appear necessary in the existing paths that
use it, it makes this function safe for potential future use.

Keith Busch (3):
  nvme: Remove RCU namespace protection
  nvme: Kill detached namespaces prior to removal
  nvme: Put invalid namespaces on removal list

 drivers/nvme/host/core.c | 112 ++++++++++++++++++++++++++++-------------------
 1 file changed, 67 insertions(+), 45 deletions(-)

-- 
2.7.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] nvme: Remove RCU namespace protection
  2016-06-23 17:29 [PATCH 0/3] Namespace iteration fixes Keith Busch
@ 2016-06-23 17:29 ` Keith Busch
  2016-06-28  8:31   ` Christoph Hellwig
  2016-06-23 17:29 ` [PATCH 2/3] nvme: Kill detached namespaces prior to removal Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-06-23 17:29 UTC (permalink / raw)


We can't block with RCU read lock held, but we need to do potentially
blocking stuff to namespaces while traversing the list.

This patch removes the rcu read locking to make this work, and leverages
how namespace list manipulation currently occurs in order to be safe. Only
scan work, reset work, or device removal can manipulate the namespace
list, and none of those can execute at the same time. So, this patch
locks the namespace list mutex only when the list is being changed,
or when iterating the list by a non-IO task, like controller reset. We
can't hold the lock for IO because we won't be able to clean up IO
if it fails, so all those paths rely on the state machine to prevent
two tasks from corrupting the list.

Since the scanning occurs unlocked, nvme_find_ns is updated to take
a reference on found namespaces, and the function name is updated to
reflect the new action.

This fixes these two BUGs:

 BUG: sleeping function called from invalid context at include/linux/writeback.h:185
 in_atomic(): 1, irqs_disabled(): 0, pid: 757, name: kworker/97:1
 CPU: 97 PID: 757 Comm: kworker/97:1 Tainted: G            E   4.6.0-2016-06-14+ #1
 Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0087.D08.1605241555 05/24/2016
 Workqueue: pciehp-2 pciehp_power_thread
  0000000000000000 ffff880462377b38 ffffffff81310c61 ffff8804623704c0
  00000000000000b9 ffff880462377b50 ffffffff8108fe14 ffffffff81809e2a
  ffff880462377b78 ffffffff8108fea9 ffff880469eb4800 ffffc900011824a0
 Call Trace:
  [<ffffffff81310c61>] dump_stack+0x63/0x82
  [<ffffffff8108fe14>] ___might_sleep+0xd4/0x120
  [<ffffffff8108fea9>] __might_sleep+0x49/0x80
  [<ffffffff8120d060>] iget5_locked+0xa0/0x210
  [<ffffffff8122ac80>] ? bdev_test+0x20/0x20
  [<ffffffff8122b5ce>] bdget+0x3e/0x130
  [<ffffffff812f2b24>] bdget_disk+0x24/0x40
  [<ffffffff8122bd3d>] revalidate_disk+0x3d/0x90
  [<ffffffffa0483208>] nvme_kill_queues+0x38/0xc0 [nvme_core]
  [<ffffffffa04833da>] nvme_remove_namespaces+0x5a/0x60 [nvme_core]
  [<ffffffffa048340d>] nvme_uninit_ctrl+0x2d/0xa0 [nvme_core]
  [<ffffffffa085c9eb>] nvme_remove+0x5b/0x100 [nvme]
  [<ffffffff8135bfc9>] pci_device_remove+0x39/0xc0

And:

 BUG: sleeping function called from invalid context at kernel/workqueue.c:2783
 in_atomic(): 0, irqs_disabled(): 0, pid: 1696, name: kworker/u16:0
 CPU: 3 PID: 1696 Comm: kworker/u16:0 Tainted: G           OE   4.6.0-rc3+ #197
 Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013
 Workqueue: nvme nvme_reset_work [nvme]
  0000000000000000 ffff8800d94d3a48 ffffffff81379e4c ffff88011a639640
  ffffffff81a12688 ffff8800d94d3a70 ffffffff81094814 ffffffff81a12688
  0000000000000adf 0000000000000000 ffff8800d94d3a98 ffffffff81094904
 Call Trace:
  [<ffffffff81379e4c>] dump_stack+0x85/0xc9
  [<ffffffff81094814>] ___might_sleep+0x144/0x1f0
  [<ffffffff81094904>] __might_sleep+0x44/0x80
  [<ffffffff81087b5e>] flush_work+0x6e/0x290
  [<ffffffff81087af0>] ? __queue_delayed_work+0x150/0x150
  [<ffffffff81126cf5>] ? irq_work_queue+0x75/0x90
  [<ffffffff810ca136>] ? wake_up_klogd+0x36/0x50
  [<ffffffff810b7fa6>] ? mark_held_locks+0x66/0x90
  [<ffffffff81088898>] ? __cancel_work_timer+0xf8/0x1c0
  [<ffffffff8108883b>] __cancel_work_timer+0x9b/0x1c0
  [<ffffffff810cadaa>] ? vprintk_default+0x1a/0x20
  [<ffffffff81142558>] ? printk+0x48/0x4a
  [<ffffffff8108896b>] cancel_work_sync+0xb/0x10
  [<ffffffff81350fb0>] blk_mq_cancel_requeue_work+0x10/0x20
  [<ffffffffc0813ae7>] nvme_stop_queues+0x167/0x1a0 [nvme_core]
  [<ffffffffc0813980>] ? nvme_kill_queues+0x190/0x190 [nvme_core]
  [<ffffffffc08cef51>] nvme_dev_disable+0x71/0x350 [nvme]
  [<ffffffff810b8f40>] ? __lock_acquire+0xa80/0x1ad0
  [<ffffffff810944b6>] ? finish_task_switch+0xa6/0x2c0
  [<ffffffffc08cffd4>] nvme_reset_work+0x214/0xd40 [nvme]

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
v1 -> v2:
  Take a reference on the namespace if we find it. Not necessary in
  existing namespace scanning usage, but is safe for potential future
  changes.

 drivers/nvme/host/core.c | 64 +++++++++++++++++++++++++-----------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9d7cee4..67aba46 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1424,19 +1424,22 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 	return nsa->ns_id - nsb->ns_id;
 }
 
-static struct nvme_ns *nvme_find_ns(struct nvme_ctrl *ctrl, unsigned nsid)
+static struct nvme_ns *nvme_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
-	struct nvme_ns *ns;
-
-	lockdep_assert_held(&ctrl->namespaces_mutex);
+	struct nvme_ns *ns, *ret = NULL;
 
+	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->ns_id == nsid)
-			return ns;
+		if (ns->ns_id == nsid) {
+			kref_get(&ns->kref);
+			ret = ns;
+			break;
+		}
 		if (ns->ns_id > nsid)
 			break;
 	}
-	return NULL;
+	mutex_unlock(&ctrl->namespaces_mutex);
+	return ret;
 }
 
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
@@ -1445,8 +1448,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	struct gendisk *disk;
 	int node = dev_to_node(ctrl->dev);
 
-	lockdep_assert_held(&ctrl->namespaces_mutex);
-
 	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
 	if (!ns)
 		return;
@@ -1487,7 +1488,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (nvme_revalidate_disk(ns->disk))
 		goto out_free_disk;
 
-	list_add_tail_rcu(&ns->list, &ctrl->namespaces);
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_add_tail(&ns->list, &ctrl->namespaces);
+	mutex_unlock(&ctrl->namespaces_mutex);
+
 	kref_get(&ctrl->kref);
 	if (ns->type == NVME_NS_LIGHTNVM)
 		return;
@@ -1510,8 +1514,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
-	lockdep_assert_held(&ns->ctrl->namespaces_mutex);
-
 	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
 		return;
 
@@ -1524,8 +1526,11 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 		blk_mq_abort_requeue_list(ns->queue);
 		blk_cleanup_queue(ns->queue);
 	}
+
+	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
-	synchronize_rcu();
+	mutex_unlock(&ns->ctrl->namespaces_mutex);
+
 	nvme_put_ns(ns);
 }
 
@@ -1533,10 +1538,11 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
 
-	ns = nvme_find_ns(ctrl, nsid);
+	ns = nvme_get_ns(ctrl, nsid);
 	if (ns) {
 		if (revalidate_disk(ns->disk))
 			nvme_ns_remove(ns);
+		nvme_put_ns(ns);
 	} else
 		nvme_alloc_ns(ctrl, nsid);
 }
@@ -1576,9 +1582,11 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
 			nvme_validate_ns(ctrl, nsid);
 
 			while (++prev < nsid) {
-				ns = nvme_find_ns(ctrl, prev);
-				if (ns)
+				ns = nvme_get_ns(ctrl, prev);
+				if (ns) {
 					nvme_ns_remove(ns);
+					nvme_put_ns(ns);
+				}
 			}
 		}
 		nn -= j;
@@ -1594,8 +1602,6 @@ static void nvme_scan_ns_sequential(struct nvme_ctrl *ctrl, unsigned nn)
 {
 	unsigned i;
 
-	lockdep_assert_held(&ctrl->namespaces_mutex);
-
 	for (i = 1; i <= nn; i++)
 		nvme_validate_ns(ctrl, i);
 
@@ -1615,7 +1621,6 @@ static void nvme_scan_work(struct work_struct *work)
 	if (nvme_identify_ctrl(ctrl, &id))
 		return;
 
-	mutex_lock(&ctrl->namespaces_mutex);
 	nn = le32_to_cpu(id->nn);
 	if (ctrl->vs >= NVME_VS(1, 1) &&
 	    !(ctrl->quirks & NVME_QUIRK_IDENTIFY_CNS)) {
@@ -1624,6 +1629,7 @@ static void nvme_scan_work(struct work_struct *work)
 	}
 	nvme_scan_ns_sequential(ctrl, nn);
  done:
+	mutex_lock(&ctrl->namespaces_mutex);
 	list_sort(NULL, &ctrl->namespaces, ns_cmp);
 	mutex_unlock(&ctrl->namespaces_mutex);
 	kfree(id);
@@ -1656,10 +1662,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	if (ctrl->state == NVME_CTRL_DEAD)
 		nvme_kill_queues(ctrl);
 
-	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list)
 		nvme_ns_remove(ns);
-	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_remove_namespaces);
 
@@ -1830,8 +1834,8 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		if (!kref_get_unless_zero(&ns->kref))
 			continue;
 
@@ -1848,7 +1852,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 		nvme_put_ns(ns);
 	}
-	rcu_read_unlock();
+	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
@@ -1856,8 +1860,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		spin_lock_irq(ns->queue->queue_lock);
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
@@ -1865,7 +1869,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
-	rcu_read_unlock();
+	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
@@ -1873,13 +1877,13 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-	rcu_read_unlock();
+	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] nvme: Kill detached namespaces prior to removal
  2016-06-23 17:29 [PATCH 0/3] Namespace iteration fixes Keith Busch
  2016-06-23 17:29 ` [PATCHv2 1/3] nvme: Remove RCU namespace protection Keith Busch
@ 2016-06-23 17:29 ` Keith Busch
  2016-06-28  8:32   ` Christoph Hellwig
  2016-06-23 17:29 ` [PATCH 3/3] nvme: Put invalid namespaces on removal list Keith Busch
  2016-06-23 17:44 ` [PATCH 0/3] Namespace iteration fixes Keith Busch
  3 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-06-23 17:29 UTC (permalink / raw)


If a previously namespace allocated is detatched or deleted from the
controller, the driver needs to make sure nothing can send new IO to
the namespace prior to removing. This prevents buffered writers from
potentially stalling device removal indefinitely.

The patch pulls the request queue stoppage into a function and provides
a function to kill and remove the namespace. Any invalid namespace being
removed should go through here.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 67aba46..43d6947 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1534,6 +1534,26 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	nvme_put_ns(ns);
 }
 
+static void nvme_ns_kill(struct nvme_ns *ns)
+{
+	/*
+	 * Revalidating a dead namespace sets capacity to 0. This will
+	 * end buffered writers dirtying pages that can't be synced.
+	 */
+	if (!test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+		revalidate_disk(ns->disk);
+
+	blk_set_queue_dying(ns->queue);
+	blk_mq_abort_requeue_list(ns->queue);
+	blk_mq_start_stopped_hw_queues(ns->queue, true);
+}
+
+static void nvme_remove_dead_ns(struct nvme_ns *ns)
+{
+	nvme_ns_kill(ns);
+	nvme_ns_remove(ns);
+}
+
 static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
@@ -1554,7 +1574,7 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
 		if (ns->ns_id > nsid)
-			nvme_ns_remove(ns);
+			nvme_remove_dead_ns(ns);
 	}
 }
 
@@ -1584,7 +1604,7 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
 			while (++prev < nsid) {
 				ns = nvme_get_ns(ctrl, prev);
 				if (ns) {
-					nvme_ns_remove(ns);
+					nvme_remove_dead_ns(ns);
 					nvme_put_ns(ns);
 				}
 			}
@@ -1837,19 +1857,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		if (!kref_get_unless_zero(&ns->kref))
-			continue;
-
-		/*
-		 * Revalidating a dead namespace sets capacity to 0. This will
-		 * end buffered writers dirtying pages that can't be synced.
-		 */
-		if (!test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-			revalidate_disk(ns->disk);
-
-		blk_set_queue_dying(ns->queue);
-		blk_mq_abort_requeue_list(ns->queue);
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+			return;
 
+		nvme_ns_kill(ns);
 		nvme_put_ns(ns);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] nvme: Put invalid namespaces on removal list
  2016-06-23 17:29 [PATCH 0/3] Namespace iteration fixes Keith Busch
  2016-06-23 17:29 ` [PATCHv2 1/3] nvme: Remove RCU namespace protection Keith Busch
  2016-06-23 17:29 ` [PATCH 2/3] nvme: Kill detached namespaces prior to removal Keith Busch
@ 2016-06-23 17:29 ` Keith Busch
  2016-06-28  8:32   ` Christoph Hellwig
  2016-06-23 17:44 ` [PATCH 0/3] Namespace iteration fixes Keith Busch
  3 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-06-23 17:29 UTC (permalink / raw)


This patch puts invalid namespaces on a temporary removal list so they
are not part of the controller's namespace list during their remova tol
allow locked list traversal when scanning for invalid namespaces. This
addresses one of the last unlocked namespace list iterations.

The actual namespace removal runs unlocked on the local removal list. The
invalid namespaces do not need to be on the controller list since removing
them disables the queues from doing IO.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 43d6947..a56b394 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1571,11 +1571,19 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 					unsigned nsid)
 {
 	struct nvme_ns *ns, *next;
+	LIST_HEAD(rm_list);
 
+	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->ns_id > nsid)
-			nvme_remove_dead_ns(ns);
+		if (ns->ns_id > nsid) {
+			list_del_init(&ns->list);
+			list_add_tail(&ns->list, &rm_list);
+		}
 	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+
+	list_for_each_entry_safe(ns, next, &rm_list, list)
+		nvme_remove_dead_ns(ns);
 }
 
 static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 0/3] Namespace iteration fixes
  2016-06-23 17:29 [PATCH 0/3] Namespace iteration fixes Keith Busch
                   ` (2 preceding siblings ...)
  2016-06-23 17:29 ` [PATCH 3/3] nvme: Put invalid namespaces on removal list Keith Busch
@ 2016-06-23 17:44 ` Keith Busch
  3 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2016-06-23 17:44 UTC (permalink / raw)


Just correcting Sagi's email in the CC. I copy/pasted from a review in
an different message(*), but was missing some letters there too. :)

  * https://lkml.org/lkml/2016/6/23/529

On Thu, Jun 23, 2016@11:29:03AM -0600, Keith Busch wrote:
> Patch 1/3 cleanly applies to linux-block/for-linus and for-next. The
> other two only apply to for-next.
> 
> The first fixes a bug introduced in 4.7 and hoping this can be applied
> to 4.7. It's version 2 of this patch. The reset of the series is targeted
> to 4.8. 
> 
> The second patch fixes issues where buffered writers block removal
> from proceeding.
> 
> The third locks the namespace mutex when searching for invalid
> namespaces. While it doesn't appear necessary in the existing paths that
> use it, it makes this function safe for potential future use.
> 
> Keith Busch (3):
>   nvme: Remove RCU namespace protection
>   nvme: Kill detached namespaces prior to removal
>   nvme: Put invalid namespaces on removal list
> 
>  drivers/nvme/host/core.c | 112 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 67 insertions(+), 45 deletions(-)
> 
> -- 
> 2.7.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] nvme: Remove RCU namespace protection
  2016-06-23 17:29 ` [PATCHv2 1/3] nvme: Remove RCU namespace protection Keith Busch
@ 2016-06-28  8:31   ` Christoph Hellwig
  2016-06-28 16:35     ` Keith Busch
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-28  8:31 UTC (permalink / raw)


> +static struct nvme_ns *nvme_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)

Can you call this nvme_find_get_ns?  A plain get is usually just
a wrapper around kref_get.

>   done:
> +	mutex_lock(&ctrl->namespaces_mutex);
>  	list_sort(NULL, &ctrl->namespaces, ns_cmp);
>  	mutex_unlock(&ctrl->namespaces_mutex);

This now leaves the list unordered between lock drops (actually we
already had that issue with the RCU conversion).  I think we just need
to do an already sorted insert, which shouldn't be too hard.

> @@ -1656,10 +1662,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  	if (ctrl->state == NVME_CTRL_DEAD)
>  		nvme_kill_queues(ctrl);
>  
> -	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list)
>  		nvme_ns_remove(ns);
> -	mutex_unlock(&ctrl->namespaces_mutex);

And this is the scary one - it does an unprotected
list_for_each_entry_safe, and nvme_remove_namespaces isn't even called
from the scan workqueue.

I think this needs to be something like:

	mutex_lock(&ctrl->namespaces_mutex);
	list_splice_init(&ctrl->namespaces, &tmp);
	mutex_unlock(&ctrl->namespaces_mutex);

	list_for_each_entry_safe(ns, next, &tmp, list) {
		..

		nvme_ns_remove(ns);

> +	mutex_lock(&ctrl->namespaces_mutex);
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		if (!kref_get_unless_zero(&ns->kref))
>  			continue;

> @@ -1848,7 +1852,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  
>  		nvme_put_ns(ns);

The get/put pair here can go away now as there will always be at
least one reference to the ns if it is on the ->namespaces list.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] nvme: Kill detached namespaces prior to removal
  2016-06-23 17:29 ` [PATCH 2/3] nvme: Kill detached namespaces prior to removal Keith Busch
@ 2016-06-28  8:32   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-28  8:32 UTC (permalink / raw)


> +static void nvme_remove_dead_ns(struct nvme_ns *ns)
> +{
> +	nvme_ns_kill(ns);
> +	nvme_ns_remove(ns);
> +}

I would prefer not to add this helper, it just makes the code harder
to read.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] nvme: Put invalid namespaces on removal list
  2016-06-23 17:29 ` [PATCH 3/3] nvme: Put invalid namespaces on removal list Keith Busch
@ 2016-06-28  8:32   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-28  8:32 UTC (permalink / raw)


> +		if (ns->ns_id > nsid) {
> +			list_del_init(&ns->list);
> +			list_add_tail(&ns->list, &rm_list);
> +		}

This should use list_move_tail instead of the list_del_init /
list_add_tail combination.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] nvme: Remove RCU namespace protection
  2016-06-28  8:31   ` Christoph Hellwig
@ 2016-06-28 16:35     ` Keith Busch
  2016-06-30  6:48       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-06-28 16:35 UTC (permalink / raw)


On Tue, Jun 28, 2016@01:31:37AM -0700, Christoph Hellwig wrote:
> > @@ -1656,10 +1662,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> >  	if (ctrl->state == NVME_CTRL_DEAD)
> >  		nvme_kill_queues(ctrl);
> >  
> > -	mutex_lock(&ctrl->namespaces_mutex);
> >  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list)
> >  		nvme_ns_remove(ns);
> > -	mutex_unlock(&ctrl->namespaces_mutex);
> 
> And this is the scary one - it does an unprotected
> list_for_each_entry_safe, and nvme_remove_namespaces isn't even called
> from the scan workqueue.
> 
> I think this needs to be something like:
> 
> 	mutex_lock(&ctrl->namespaces_mutex);
> 	list_splice_init(&ctrl->namespaces, &tmp);
> 	mutex_unlock(&ctrl->namespaces_mutex);
> 
> 	list_for_each_entry_safe(ns, next, &tmp, list) {
> 		..
> 
> 		nvme_ns_remove(ns);

We actually can't do that. The namespace needs to be on ctrl->namespaces
during nvme_ns_remove because it does IO, and the controller can fail
during that IO. Every namespace needs to be on the ctrl's namespace
list until after del_gendisk completes so we can recover from potential
failures.

It does look concerning, but it's safe if the caller ensures no scan
work is or ever will be active. If that sounds okay or no alternative
exists, I can document this usage requirement in the code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] nvme: Remove RCU namespace protection
  2016-06-28 16:35     ` Keith Busch
@ 2016-06-30  6:48       ` Christoph Hellwig
  2016-06-30 14:57         ` Keith Busch
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-30  6:48 UTC (permalink / raw)


On Tue, Jun 28, 2016@12:35:13PM -0400, Keith Busch wrote:
> > And this is the scary one - it does an unprotected
> > list_for_each_entry_safe, and nvme_remove_namespaces isn't even called
> > from the scan workqueue.
> > 
> > I think this needs to be something like:
> > 
> > 	mutex_lock(&ctrl->namespaces_mutex);
> > 	list_splice_init(&ctrl->namespaces, &tmp);
> > 	mutex_unlock(&ctrl->namespaces_mutex);
> > 
> > 	list_for_each_entry_safe(ns, next, &tmp, list) {
> > 		..
> > 
> > 		nvme_ns_remove(ns);
> 
> We actually can't do that. The namespace needs to be on ctrl->namespaces
> during nvme_ns_remove because it does IO, and the controller can fail
> during that IO. Every namespace needs to be on the ctrl's namespace
> list until after del_gendisk completes so we can recover from potential
> failures.

But we remove it from the list before del_gendisk in
nvme_remove_invalid_namespaces and nvme_scan_ns_list already.  I guess
that's fine because we're not going to do I/O on them at this point,
but what prevents us form doing this two step removal in
nvme_remove_namespaces?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] nvme: Remove RCU namespace protection
  2016-06-30  6:48       ` Christoph Hellwig
@ 2016-06-30 14:57         ` Keith Busch
  2016-06-30 22:59           ` Keith Busch
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-06-30 14:57 UTC (permalink / raw)


On Wed, Jun 29, 2016@11:48:38PM -0700, Christoph Hellwig wrote:
> But we remove it from the list before del_gendisk in
> nvme_remove_invalid_namespaces and nvme_scan_ns_list already.  I guess
> that's fine because we're not going to do I/O on them at this point,

Right, we can safely remove them from the list for those other cases
precisely because they can't do IO anymore.

> but what prevents us form doing this two step removal in
> nvme_remove_namespaces?

If the controller fails during del_gendisk, we have to kill all
the request queues and set capacities to 0 to force del_gendisk to
complete. If the namespace isn't in the ctrl->namespace list, the driver
can't find the namespace to kill.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] nvme: Remove RCU namespace protection
  2016-06-30 14:57         ` Keith Busch
@ 2016-06-30 22:59           ` Keith Busch
  0 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2016-06-30 22:59 UTC (permalink / raw)


On Thu, Jun 30, 2016@10:57:51AM -0400, Keith Busch wrote:
> On Wed, Jun 29, 2016@11:48:38PM -0700, Christoph Hellwig wrote:
> > But we remove it from the list before del_gendisk in
> > nvme_remove_invalid_namespaces and nvme_scan_ns_list already.  I guess
> > that's fine because we're not going to do I/O on them at this point,
> 
> Right, we can safely remove them from the list for those other cases
> precisely because they can't do IO anymore.
> 
> > but what prevents us form doing this two step removal in
> > nvme_remove_namespaces?
> 
> If the controller fails during del_gendisk, we have to kill all
> the request queues and set capacities to 0 to force del_gendisk to
> complete. If the namespace isn't in the ctrl->namespace list, the driver
> can't find the namespace to kill.

BTW, I have addressed your other concerns and ready to post another
version, but holding off on resending until we all understand why it has
to be this way. I'm also open to an alternative, but this is the best
I've come up with. I'd really like to see this fixed in the current rc
cycle; the bugs this is fixing are pretty bad.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-30 22:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-23 17:29 [PATCH 0/3] Namespace iteration fixes Keith Busch
2016-06-23 17:29 ` [PATCHv2 1/3] nvme: Remove RCU namespace protection Keith Busch
2016-06-28  8:31   ` Christoph Hellwig
2016-06-28 16:35     ` Keith Busch
2016-06-30  6:48       ` Christoph Hellwig
2016-06-30 14:57         ` Keith Busch
2016-06-30 22:59           ` Keith Busch
2016-06-23 17:29 ` [PATCH 2/3] nvme: Kill detached namespaces prior to removal Keith Busch
2016-06-28  8:32   ` Christoph Hellwig
2016-06-23 17:29 ` [PATCH 3/3] nvme: Put invalid namespaces on removal list Keith Busch
2016-06-28  8:32   ` Christoph Hellwig
2016-06-23 17:44 ` [PATCH 0/3] Namespace iteration fixes Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.