All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
@ 2019-04-03 23:12 Hannes Reinecke
  2019-04-13  8:29 ` Ming Lei
  2019-04-24 16:30 ` Sagi Grimberg
  0 siblings, 2 replies; 7+ messages in thread
From: Hannes Reinecke @ 2019-04-03 23:12 UTC (permalink / raw)


From: Hannes Reinecke <hare@suse.com>

There is a race condition between namespace rescanning and controller
reset; during controller reset all namespaces are quiesed vie
nams_stop_ctrl(), and after reset all namespaces are unquiesced
again.
When namespace scanning was active by the time controller reset was
triggered the rescan code will call nvme_ns_remove(), which then will
cause a kernel crash in nvme_start_ctrl() as it'll trip over
uninitialized namespaces.

This patch calls nvme_set_queue_dying() during namespace rescan,
which will already unquiesce the queue. Hence we can skip all namespaces
with the 'DEAD' flag during unquiesce in nvme_start_ctrl() and eliminate
this issue.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/core.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 23000a368e1f..21aa5c516d2a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3359,7 +3359,7 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns = nvme_find_get_ns(ctrl, nsid);
 	if (ns) {
 		if (ns->disk && revalidate_disk(ns->disk))
-			nvme_ns_remove(ns);
+			nvme_set_queue_dying(ns);
 		nvme_put_ns(ns);
 	} else
 		nvme_alloc_ns(ctrl, nsid);
@@ -3409,7 +3409,7 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
 			while (++prev < nsid) {
 				ns = nvme_find_get_ns(ctrl, prev);
 				if (ns) {
-					nvme_ns_remove(ns);
+					nvme_set_queue_dying(ns);
 					nvme_put_ns(ns);
 				}
 			}
@@ -3871,8 +3871,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (test_bit(NVME_NS_DEAD, &ns->flags))
+			continue;
 		blk_mq_quiesce_queue(ns->queue);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -3882,8 +3885,11 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (test_bit(NVME_NS_DEAD, &ns->flags))
+			continue;
 		blk_mq_unquiesce_queue(ns->queue);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
  2019-04-03 23:12 [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning Hannes Reinecke
@ 2019-04-13  8:29 ` Ming Lei
  2019-04-13  9:13   ` Ming Lei
  2019-04-17 11:32   ` Hannes Reinecke
  2019-04-24 16:30 ` Sagi Grimberg
  1 sibling, 2 replies; 7+ messages in thread
From: Ming Lei @ 2019-04-13  8:29 UTC (permalink / raw)


On Thu, Apr 04, 2019@01:12:21AM +0200, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare at suse.com>
> 
> There is a race condition between namespace rescanning and controller
> reset; during controller reset all namespaces are quiesed vie
> nams_stop_ctrl(), and after reset all namespaces are unquiesced
> again.
> When namespace scanning was active by the time controller reset was
> triggered the rescan code will call nvme_ns_remove(), which then will
> cause a kernel crash in nvme_start_ctrl() as it'll trip over
> uninitialized namespaces.
> 
> This patch calls nvme_set_queue_dying() during namespace rescan,
> which will already unquiesce the queue. Hence we can skip all namespaces
> with the 'DEAD' flag during unquiesce in nvme_start_ctrl() and eliminate
> this issue.
> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---
>  drivers/nvme/host/core.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 23000a368e1f..21aa5c516d2a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3359,7 +3359,7 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>  	ns = nvme_find_get_ns(ctrl, nsid);
>  	if (ns) {
>  		if (ns->disk && revalidate_disk(ns->disk))
> -			nvme_ns_remove(ns);
> +			nvme_set_queue_dying(ns);
>  		nvme_put_ns(ns);
>  	} else
>  		nvme_alloc_ns(ctrl, nsid);
> @@ -3409,7 +3409,7 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
>  			while (++prev < nsid) {
>  				ns = nvme_find_get_ns(ctrl, prev);
>  				if (ns) {
> -					nvme_ns_remove(ns);
> +					nvme_set_queue_dying(ns);
>  					nvme_put_ns(ns);
>  				}
>  			}
> @@ -3871,8 +3871,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (test_bit(NVME_NS_DEAD, &ns->flags))
> +			continue;
>  		blk_mq_quiesce_queue(ns->queue);
> +	}
>  	up_read(&ctrl->namespaces_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(nvme_stop_queues);
> @@ -3882,8 +3885,11 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (test_bit(NVME_NS_DEAD, &ns->flags))
> +			continue;
>  		blk_mq_unquiesce_queue(ns->queue);
> +	}
>  	up_read(&ctrl->namespaces_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(nvme_start_queues);
> -- 
> 2.16.4

Another candidate is to hold ns's refcount and switch to remove
the ns from 'ctrl->namespaces' in nvme_free_ns() via the following
patch[1]. Together with patch "blk-mq: free hw queue's resource in
hctx's release handler" in the following link:

https://lore.kernel.org/linux-block/20190413071829.GB9108 at ming.t460p/T/#m41c04517a37cbc1b4c61357f8cb52cd3cbf31f1b

[1] fix race between nvme scan and reset

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ddb943395118..12507b223584 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -402,6 +402,10 @@ static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
 
+	down_write(&ns->ctrl->namespaces_rwsem);
+	list_del_init(&ns->list);
+	up_write(&ns->ctrl->namespaces_rwsem);
+
 	if (ns->ndev)
 		nvme_nvm_unregister(ns);
 
@@ -3166,6 +3170,29 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 	return nsa->head->ns_id - nsb->head->ns_id;
 }
 
+void nvme_get_all_ns(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		if (kref_get_unless_zero(&ns->kref))
+			continue;
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_get_all_ns);
+
+void nvme_put_all_ns(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		nvme_put_ns(ns);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_put_all_ns);
+
 static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns, *ret = NULL;
@@ -3329,10 +3356,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	nvme_mpath_clear_current_path(ns);
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
-	down_write(&ns->ctrl->namespaces_rwsem);
-	list_del_init(&ns->list);
-	up_write(&ns->ctrl->namespaces_rwsem);
-
 	synchronize_srcu(&ns->head->srcu);
 	nvme_mpath_check_last_path(ns);
 	nvme_put_ns(ns);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 527d64545023..f8f2a012c3ba 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -430,6 +430,9 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
 void nvme_put_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_identify(struct nvme_ctrl *ctrl);
 
+void nvme_get_all_ns(struct nvme_ctrl *ctrl);
+void nvme_put_all_ns(struct nvme_ctrl *ctrl);
+
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
 int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c1eecde6b853..0d4fea14ccdc 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2496,6 +2496,8 @@ static void nvme_reset_work(struct work_struct *work)
 	int result = -ENODEV;
 	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
 
+	nvme_get_all_ns(&dev->ctrl);
+
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 		goto out;
 
@@ -2603,6 +2605,7 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
+	nvme_put_all_ns(&dev->ctrl);
 	nvme_remove_dead_ctrl(dev, result);
 }
 
Thanks,
Ming

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
  2019-04-13  8:29 ` Ming Lei
@ 2019-04-13  9:13   ` Ming Lei
  2019-04-17 11:32   ` Hannes Reinecke
  1 sibling, 0 replies; 7+ messages in thread
From: Ming Lei @ 2019-04-13  9:13 UTC (permalink / raw)


On Sat, Apr 13, 2019@04:29:55PM +0800, Ming Lei wrote:
> On Thu, Apr 04, 2019@01:12:21AM +0200, Hannes Reinecke wrote:
> > From: Hannes Reinecke <hare at suse.com>
> > 
> > There is a race condition between namespace rescanning and controller
> > reset; during controller reset all namespaces are quiesed vie
> > nams_stop_ctrl(), and after reset all namespaces are unquiesced
> > again.
> > When namespace scanning was active by the time controller reset was
> > triggered the rescan code will call nvme_ns_remove(), which then will
> > cause a kernel crash in nvme_start_ctrl() as it'll trip over
> > uninitialized namespaces.
> > 
> > This patch calls nvme_set_queue_dying() during namespace rescan,
> > which will already unquiesce the queue. Hence we can skip all namespaces
> > with the 'DEAD' flag during unquiesce in nvme_start_ctrl() and eliminate
> > this issue.
> > 
> > Signed-off-by: Hannes Reinecke <hare at suse.com>
> > ---
> >  drivers/nvme/host/core.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 23000a368e1f..21aa5c516d2a 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3359,7 +3359,7 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> >  	ns = nvme_find_get_ns(ctrl, nsid);
> >  	if (ns) {
> >  		if (ns->disk && revalidate_disk(ns->disk))
> > -			nvme_ns_remove(ns);
> > +			nvme_set_queue_dying(ns);
> >  		nvme_put_ns(ns);
> >  	} else
> >  		nvme_alloc_ns(ctrl, nsid);
> > @@ -3409,7 +3409,7 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
> >  			while (++prev < nsid) {
> >  				ns = nvme_find_get_ns(ctrl, prev);
> >  				if (ns) {
> > -					nvme_ns_remove(ns);
> > +					nvme_set_queue_dying(ns);
> >  					nvme_put_ns(ns);
> >  				}
> >  			}
> > @@ -3871,8 +3871,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >  	struct nvme_ns *ns;
> >  
> >  	down_read(&ctrl->namespaces_rwsem);
> > -	list_for_each_entry(ns, &ctrl->namespaces, list)
> > +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> > +		if (test_bit(NVME_NS_DEAD, &ns->flags))
> > +			continue;
> >  		blk_mq_quiesce_queue(ns->queue);
> > +	}
> >  	up_read(&ctrl->namespaces_rwsem);
> >  }
> >  EXPORT_SYMBOL_GPL(nvme_stop_queues);
> > @@ -3882,8 +3885,11 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
> >  	struct nvme_ns *ns;
> >  
> >  	down_read(&ctrl->namespaces_rwsem);
> > -	list_for_each_entry(ns, &ctrl->namespaces, list)
> > +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> > +		if (test_bit(NVME_NS_DEAD, &ns->flags))
> > +			continue;
> >  		blk_mq_unquiesce_queue(ns->queue);
> > +	}
> >  	up_read(&ctrl->namespaces_rwsem);
> >  }
> >  EXPORT_SYMBOL_GPL(nvme_start_queues);
> > -- 
> > 2.16.4
> 
> Another candidate is to hold ns's refcount and switch to remove
> the ns from 'ctrl->namespaces' in nvme_free_ns() via the following
> patch[1]. Together with patch "blk-mq: free hw queue's resource in
> hctx's release handler" in the following link:
> 
> https://lore.kernel.org/linux-block/20190413071829.GB9108 at ming.t460p/T/#m41c04517a37cbc1b4c61357f8cb52cd3cbf31f1b
> 
> [1] fix race between nvme scan and reset
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ddb943395118..12507b223584 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -402,6 +402,10 @@ static void nvme_free_ns(struct kref *kref)
>  {
>  	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
>  
> +	down_write(&ns->ctrl->namespaces_rwsem);
> +	list_del_init(&ns->list);
> +	up_write(&ns->ctrl->namespaces_rwsem);
> +
>  	if (ns->ndev)
>  		nvme_nvm_unregister(ns);
>  
> @@ -3166,6 +3170,29 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
>  	return nsa->head->ns_id - nsb->head->ns_id;
>  }
>  
> +void nvme_get_all_ns(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	down_read(&ctrl->namespaces_rwsem);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		if (kref_get_unless_zero(&ns->kref))

Kref_get() is enough.

> +			continue;
> +	up_read(&ctrl->namespaces_rwsem);
> +}
> +EXPORT_SYMBOL_GPL(nvme_get_all_ns);
> +
> +void nvme_put_all_ns(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	down_read(&ctrl->namespaces_rwsem);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		nvme_put_ns(ns);
> +	up_read(&ctrl->namespaces_rwsem);
> +}

The above will cause deadlock.

Follows V2:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ddb943395118..fbaf019f7c60 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -402,6 +402,10 @@ static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
 
+	down_write(&ns->ctrl->namespaces_rwsem);
+	list_del_init(&ns->list);
+	up_write(&ns->ctrl->namespaces_rwsem);
+
 	if (ns->ndev)
 		nvme_nvm_unregister(ns);
 
@@ -3166,6 +3170,42 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 	return nsa->head->ns_id - nsb->head->ns_id;
 }
 
+void nvme_get_all_ns(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		kref_get(&ns->kref);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_get_all_ns);
+
+static void nvme_free_ns_nop(struct kref *kref)
+{
+}
+
+void nvme_put_all_ns(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+	LIST_HEAD(ns_list);
+
+	down_write(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (kref_put(&ns->kref, nvme_free_ns_nop)) {
+			list_del_init(&ns->list);
+			list_add(&ns->list, &ns_list);
+		}
+	}
+	up_write(&ctrl->namespaces_rwsem);
+
+	while (!list_empty(&ns_list)) {
+		ns = list_first_entry(&ns_list, typeof(*ns), list);
+		nvme_free_ns(&ns->kref);
+	}
+}
+EXPORT_SYMBOL_GPL(nvme_put_all_ns);
+
 static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns, *ret = NULL;
@@ -3329,10 +3369,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	nvme_mpath_clear_current_path(ns);
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
-	down_write(&ns->ctrl->namespaces_rwsem);
-	list_del_init(&ns->list);
-	up_write(&ns->ctrl->namespaces_rwsem);
-
 	synchronize_srcu(&ns->head->srcu);
 	nvme_mpath_check_last_path(ns);
 	nvme_put_ns(ns);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 527d64545023..f8f2a012c3ba 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -430,6 +430,9 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
 void nvme_put_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_identify(struct nvme_ctrl *ctrl);
 
+void nvme_get_all_ns(struct nvme_ctrl *ctrl);
+void nvme_put_all_ns(struct nvme_ctrl *ctrl);
+
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
 int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c1eecde6b853..0d4fea14ccdc 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2496,6 +2496,8 @@ static void nvme_reset_work(struct work_struct *work)
 	int result = -ENODEV;
 	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
 
+	nvme_get_all_ns(&dev->ctrl);
+
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 		goto out;
 
@@ -2603,6 +2605,7 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
+	nvme_put_all_ns(&dev->ctrl);
 	nvme_remove_dead_ctrl(dev, result);
 }
 

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
  2019-04-13  8:29 ` Ming Lei
  2019-04-13  9:13   ` Ming Lei
@ 2019-04-17 11:32   ` Hannes Reinecke
  2019-04-17 12:49     ` Ming Lei
  1 sibling, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2019-04-17 11:32 UTC (permalink / raw)


On 4/13/19 10:29 AM, Ming Lei wrote:
[ .. ]
> Another candidate is to hold ns's refcount and switch to remove
> the ns from 'ctrl->namespaces' in nvme_free_ns() via the following
> patch[1]. Together with patch "blk-mq: free hw queue's resource in
> hctx's release handler" in the following link:
> 
> https://lore.kernel.org/linux-block/20190413071829.GB9108 at ming.t460p/T/#m41c04517a37cbc1b4c61357f8cb52cd3cbf31f1b
> 
> [1] fix race between nvme scan and reset
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ddb943395118..12507b223584 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -402,6 +402,10 @@ static void nvme_free_ns(struct kref *kref)
>   {
>   	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
>   
> +	down_write(&ns->ctrl->namespaces_rwsem);
> +	list_del_init(&ns->list);
> +	up_write(&ns->ctrl->namespaces_rwsem);
> +
>   	if (ns->ndev)
>   		nvme_nvm_unregister(ns);
>   
> @@ -3166,6 +3170,29 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
>   	return nsa->head->ns_id - nsb->head->ns_id;
>   }
>   
> +void nvme_get_all_ns(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	down_read(&ctrl->namespaces_rwsem);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		if (kref_get_unless_zero(&ns->kref))
> +			continue;
> +	up_read(&ctrl->namespaces_rwsem);
> +}
> +EXPORT_SYMBOL_GPL(nvme_get_all_ns);
> +
> +void nvme_put_all_ns(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	down_read(&ctrl->namespaces_rwsem);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		nvme_put_ns(ns);
> +	up_read(&ctrl->namespaces_rwsem);
> +}
> +EXPORT_SYMBOL_GPL(nvme_put_all_ns);
> +
>   static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   {
>   	struct nvme_ns *ns, *ret = NULL;
> @@ -3329,10 +3356,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>   	nvme_mpath_clear_current_path(ns);
>   	mutex_unlock(&ns->ctrl->subsys->lock);
>   
> -	down_write(&ns->ctrl->namespaces_rwsem);
> -	list_del_init(&ns->list);
> -	up_write(&ns->ctrl->namespaces_rwsem);
> -
>   	synchronize_srcu(&ns->head->srcu);
>   	nvme_mpath_check_last_path(ns);
>   	nvme_put_ns(ns);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 527d64545023..f8f2a012c3ba 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -430,6 +430,9 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
>   void nvme_put_ctrl(struct nvme_ctrl *ctrl);
>   int nvme_init_identify(struct nvme_ctrl *ctrl);
>   
> +void nvme_get_all_ns(struct nvme_ctrl *ctrl);
> +void nvme_put_all_ns(struct nvme_ctrl *ctrl);
> +
>   void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
>   
>   int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c1eecde6b853..0d4fea14ccdc 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2496,6 +2496,8 @@ static void nvme_reset_work(struct work_struct *work)
>   	int result = -ENODEV;
>   	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
>   
> +	nvme_get_all_ns(&dev->ctrl);
> +
>   	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>   		goto out;
>   
> @@ -2603,6 +2605,7 @@ static void nvme_reset_work(struct work_struct *work)
>    out_unlock:
>   	mutex_unlock(&dev->shutdown_lock);
>    out:
> +	nvme_put_all_ns(&dev->ctrl);
>   	nvme_remove_dead_ctrl(dev, result);
>   }
>   
Hmm.
Might; I'll have to check.
The entire condition under which this particular error is triggered is 
very convoluted, and this issue isn't the only one contributing to it.
Will be posting my findings once I have confirmation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare at suse.de                              +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
  2019-04-17 11:32   ` Hannes Reinecke
@ 2019-04-17 12:49     ` Ming Lei
  0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2019-04-17 12:49 UTC (permalink / raw)


On Wed, Apr 17, 2019@01:32:57PM +0200, Hannes Reinecke wrote:
> On 4/13/19 10:29 AM, Ming Lei wrote:
> [ .. ]
> > Another candidate is to hold ns's refcount and switch to remove
> > the ns from 'ctrl->namespaces' in nvme_free_ns() via the following
> > patch[1]. Together with patch "blk-mq: free hw queue's resource in
> > hctx's release handler" in the following link:
> > 
> > https://lore.kernel.org/linux-block/20190413071829.GB9108 at ming.t460p/T/#m41c04517a37cbc1b4c61357f8cb52cd3cbf31f1b
> > 
> > [1] fix race between nvme scan and reset
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index ddb943395118..12507b223584 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -402,6 +402,10 @@ static void nvme_free_ns(struct kref *kref)
> >   {
> >   	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
> > +	down_write(&ns->ctrl->namespaces_rwsem);
> > +	list_del_init(&ns->list);
> > +	up_write(&ns->ctrl->namespaces_rwsem);
> > +
> >   	if (ns->ndev)
> >   		nvme_nvm_unregister(ns);
> > @@ -3166,6 +3170,29 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
> >   	return nsa->head->ns_id - nsb->head->ns_id;
> >   }
> > +void nvme_get_all_ns(struct nvme_ctrl *ctrl)
> > +{
> > +	struct nvme_ns *ns;
> > +
> > +	down_read(&ctrl->namespaces_rwsem);
> > +	list_for_each_entry(ns, &ctrl->namespaces, list)
> > +		if (kref_get_unless_zero(&ns->kref))
> > +			continue;
> > +	up_read(&ctrl->namespaces_rwsem);
> > +}
> > +EXPORT_SYMBOL_GPL(nvme_get_all_ns);
> > +
> > +void nvme_put_all_ns(struct nvme_ctrl *ctrl)
> > +{
> > +	struct nvme_ns *ns;
> > +
> > +	down_read(&ctrl->namespaces_rwsem);
> > +	list_for_each_entry(ns, &ctrl->namespaces, list)
> > +		nvme_put_ns(ns);
> > +	up_read(&ctrl->namespaces_rwsem);
> > +}
> > +EXPORT_SYMBOL_GPL(nvme_put_all_ns);
> > +
> >   static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> >   {
> >   	struct nvme_ns *ns, *ret = NULL;
> > @@ -3329,10 +3356,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> >   	nvme_mpath_clear_current_path(ns);
> >   	mutex_unlock(&ns->ctrl->subsys->lock);
> > -	down_write(&ns->ctrl->namespaces_rwsem);
> > -	list_del_init(&ns->list);
> > -	up_write(&ns->ctrl->namespaces_rwsem);
> > -
> >   	synchronize_srcu(&ns->head->srcu);
> >   	nvme_mpath_check_last_path(ns);
> >   	nvme_put_ns(ns);
> > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> > index 527d64545023..f8f2a012c3ba 100644
> > --- a/drivers/nvme/host/nvme.h
> > +++ b/drivers/nvme/host/nvme.h
> > @@ -430,6 +430,9 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
> >   void nvme_put_ctrl(struct nvme_ctrl *ctrl);
> >   int nvme_init_identify(struct nvme_ctrl *ctrl);
> > +void nvme_get_all_ns(struct nvme_ctrl *ctrl);
> > +void nvme_put_all_ns(struct nvme_ctrl *ctrl);
> > +
> >   void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
> >   int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index c1eecde6b853..0d4fea14ccdc 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2496,6 +2496,8 @@ static void nvme_reset_work(struct work_struct *work)
> >   	int result = -ENODEV;
> >   	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
> > +	nvme_get_all_ns(&dev->ctrl);
> > +
> >   	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> >   		goto out;
> > @@ -2603,6 +2605,7 @@ static void nvme_reset_work(struct work_struct *work)
> >    out_unlock:
> >   	mutex_unlock(&dev->shutdown_lock);
> >    out:
> > +	nvme_put_all_ns(&dev->ctrl);
> >   	nvme_remove_dead_ctrl(dev, result);
> >   }
> Hmm.
> Might; I'll have to check.

The patch actually isn't correct, especially in case that new ns is
added during resetting.

> The entire condition under which this particular error is triggered is very
> convoluted, and this issue isn't the only one contributing to it.
> Will be posting my findings once I have confirmation.

I have posted the queue freeing patch V6, which should cover this issue,
especially by the following two:

https://lore.kernel.org/linux-block/ec6aed0d-a6dc-f829-46f4-10140c7c37df at suse.de/T/#u
https://lore.kernel.org/linux-block/bc4b8607-9fe7-aece-97f7-9d7d1c2c4e0b at suse.de/T/#u


Thanks,
Ming

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
  2019-04-03 23:12 [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning Hannes Reinecke
  2019-04-13  8:29 ` Ming Lei
@ 2019-04-24 16:30 ` Sagi Grimberg
  2019-04-25  7:10   ` Hannes Reinecke
  1 sibling, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2019-04-24 16:30 UTC (permalink / raw)



> From: Hannes Reinecke <hare at suse.com>
> 
> There is a race condition between namespace rescanning and controller
> reset; during controller reset all namespaces are quiesed vie
> nams_stop_ctrl(), and after reset all namespaces are unquiesced
> again.
> When namespace scanning was active by the time controller reset was
> triggered the rescan code will call nvme_ns_remove(), which then will
> cause a kernel crash in nvme_start_ctrl() as it'll trip over
> uninitialized namespaces.
> 
> This patch calls nvme_set_queue_dying() during namespace rescan,
> which will already unquiesce the queue. Hence we can skip all namespaces
> with the 'DEAD' flag during unquiesce in nvme_start_ctrl() and eliminate
> this issue.

Hannes, won't that make the namespace enumeration inconsistent?

> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---
>   drivers/nvme/host/core.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 23000a368e1f..21aa5c516d2a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3359,7 +3359,7 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   	ns = nvme_find_get_ns(ctrl, nsid);
>   	if (ns) {
>   		if (ns->disk && revalidate_disk(ns->disk))
> -			nvme_ns_remove(ns);
> +			nvme_set_queue_dying(ns);
>   		nvme_put_ns(ns);
>   	} else
>   		nvme_alloc_ns(ctrl, nsid);
> @@ -3409,7 +3409,7 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl, unsigned nn)
>   			while (++prev < nsid) {
>   				ns = nvme_find_get_ns(ctrl, prev);
>   				if (ns) {
> -					nvme_ns_remove(ns);
> +					nvme_set_queue_dying(ns);
>   					nvme_put_ns(ns);
>   				}
>   			}
> @@ -3871,8 +3871,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (test_bit(NVME_NS_DEAD, &ns->flags))
> +			continue;
>   		blk_mq_quiesce_queue(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_stop_queues);
> @@ -3882,8 +3885,11 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (test_bit(NVME_NS_DEAD, &ns->flags))
> +			continue;
>   		blk_mq_unquiesce_queue(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_start_queues);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning
  2019-04-24 16:30 ` Sagi Grimberg
@ 2019-04-25  7:10   ` Hannes Reinecke
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2019-04-25  7:10 UTC (permalink / raw)


On 4/24/19 6:30 PM, Sagi Grimberg wrote:
> 
>> From: Hannes Reinecke <hare at suse.com>
>>
>> There is a race condition between namespace rescanning and controller
>> reset; during controller reset all namespaces are quiesed vie
>> nams_stop_ctrl(), and after reset all namespaces are unquiesced
>> again.
>> When namespace scanning was active by the time controller reset was
>> triggered the rescan code will call nvme_ns_remove(), which then will
>> cause a kernel crash in nvme_start_ctrl() as it'll trip over
>> uninitialized namespaces.
>>
>> This patch calls nvme_set_queue_dying() during namespace rescan,
>> which will already unquiesce the queue. Hence we can skip all namespaces
>> with the 'DEAD' flag during unquiesce in nvme_start_ctrl() and eliminate
>> this issue.
> 
> Hannes, won't that make the namespace enumeration inconsistent?
> 
How so?
The basic problem is that nvme_validate_ns() will more-or-less 
automatically _remove_ all namespaces if it's called concurrently during 
a reset (as trivially no I/O is possible).
However, after reset we'll be rescanning all namespace _anyway_, 
re-establishing all previously deleted namespaces.

With this patch we'll just deferring the deletion (which we would've 
done anyway).

However, this patch has been folded into Ming Leis patchset (as the last 
patch in the series), so I guess we can discard it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare at suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-25  7:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03 23:12 [PATCHv2 RFC] nvme: use nvme_set_queue_dying() during namespace rescanning Hannes Reinecke
2019-04-13  8:29 ` Ming Lei
2019-04-13  9:13   ` Ming Lei
2019-04-17 11:32   ` Hannes Reinecke
2019-04-17 12:49     ` Ming Lei
2019-04-24 16:30 ` Sagi Grimberg
2019-04-25  7:10   ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.