All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: Block ns inventory changes during admin
@ 2019-01-25 19:48 Keith Busch
       [not found] ` <OF5147C249.BC64CC03-ON0025838D.007F28EE-8625838D.007F6785@notes.na.collabserv.com>
  2019-01-28  7:39 ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Keith Busch @ 2019-01-25 19:48 UTC (permalink / raw)


The driver quiesces IO when the admin command effects say it may alter
namespaces.

A device that supports nvme namespace management may trigger a namespace
scan automatically. If this changes the namespace inventory by adding
more namespace, the driver's unquiesce sequence at the end of sending
the admin command may try to unfreeze a queue that isn't frozen,
breaking the freeze depth for future use.

Fix that by preventing scan work from altering the namespace list while
we are pairing freeze and unfreeze operations.

Reported-by: Wen Xiong <wenxiong at us.ibm.com>
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 8 +++++++-
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 150e49723c15..6a9dd68c0f4f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1253,6 +1253,7 @@ static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	 * effects say only one namespace is affected.
 	 */
 	if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK)) {
+		mutex_lock(&ctrl->scan_lock);
 		nvme_start_freeze(ctrl);
 		nvme_wait_freeze(ctrl);
 	}
@@ -1281,8 +1282,10 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
 	 */
 	if (effects & NVME_CMD_EFFECTS_LBCC)
 		nvme_update_formats(ctrl);
-	if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK))
+	if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK)) {
 		nvme_unfreeze(ctrl);
+		mutex_unlock(&ctrl->scan_lock);
+	}
 	if (effects & NVME_CMD_EFFECTS_CCC)
 		nvme_init_identify(ctrl);
 	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
@@ -3401,6 +3404,7 @@ static void nvme_scan_work(struct work_struct *work)
 	if (nvme_identify_ctrl(ctrl, &id))
 		return;
 
+	mutex_lock(&ctrl->scan_lock);
 	nn = le32_to_cpu(id->nn);
 	if (ctrl->vs >= NVME_VS(1, 1, 0) &&
 	    !(ctrl->quirks & NVME_QUIRK_IDENTIFY_CNS)) {
@@ -3409,6 +3413,7 @@ static void nvme_scan_work(struct work_struct *work)
 	}
 	nvme_scan_ns_sequential(ctrl, nn);
 out_free_id:
+	mutex_unlock(&ctrl->scan_lock);
 	kfree(id);
 	down_write(&ctrl->namespaces_rwsem);
 	list_sort(NULL, &ctrl->namespaces, ns_cmp);
@@ -3652,6 +3657,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 
 	ctrl->state = NVME_CTRL_NEW;
 	spin_lock_init(&ctrl->lock);
+	mutex_init(&ctrl->scan_lock);
 	INIT_LIST_HEAD(&ctrl->namespaces);
 	init_rwsem(&ctrl->namespaces_rwsem);
 	ctrl->dev = dev;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ab961bdeea89..c4a1bb41abf0 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -154,6 +154,7 @@ struct nvme_ctrl {
 	enum nvme_ctrl_state state;
 	bool identified;
 	spinlock_t lock;
+	struct mutex scan_lock;
 	const struct nvme_ctrl_ops *ops;
 	struct request_queue *admin_q;
 	struct request_queue *connect_q;
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
       [not found] ` <OF5147C249.BC64CC03-ON0025838D.007F28EE-8625838D.007F6785@notes.na.collabserv.com>
@ 2019-01-25 23:37   ` Keith Busch
       [not found]     ` <BYAPR04MB45021C05936FF7EF982D579686940@BYAPR04MB4502.namprd04.prod.outlook.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2019-01-25 23:37 UTC (permalink / raw)


[replying in plain-text for mailing list posterity]
On Fri, Jan 25, 2019@05:11:35PM -0600, Wen Xiong wrote:
> Hi Keith,
> 
> I have tested the patch and fixed the two hung issues on our system.
> 
> >>Reported-by: Wen Xiong <wenxiong at us.ibm.com>
> >>Signed-off-by: Keith Busch <keith.busch at intel.com>
> 
> Tested-by: Wen Xiong<wenxiong at us.ibm.com>
> 
> Thanks,
> Wendy

Thank you for confirming this fixes your issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
  2019-01-25 19:48 [PATCH] nvme: Block ns inventory changes during admin Keith Busch
       [not found] ` <OF5147C249.BC64CC03-ON0025838D.007F28EE-8625838D.007F6785@notes.na.collabserv.com>
@ 2019-01-28  7:39 ` Christoph Hellwig
  2019-01-28 15:33   ` Keith Busch
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-01-28  7:39 UTC (permalink / raw)


On Fri, Jan 25, 2019@12:48:19PM -0700, Keith Busch wrote:
> The driver quiesces IO when the admin command effects say it may alter
> namespaces.
> 
> A device that supports nvme namespace management may trigger a namespace
> scan automatically. If this changes the namespace inventory by adding
> more namespace, the driver's unquiesce sequence at the end of sending
> the admin command may try to unfreeze a queue that isn't frozen,
> breaking the freeze depth for future use.
> 
> Fix that by preventing scan work from altering the namespace list while
> we are pairing freeze and unfreeze operations.

Do you have a reference for the bug report?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
  2019-01-28  7:39 ` Christoph Hellwig
@ 2019-01-28 15:33   ` Keith Busch
  2019-01-28 15:58     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2019-01-28 15:33 UTC (permalink / raw)


On Sun, Jan 27, 2019@11:39:24PM -0800, Christoph Hellwig wrote:
> On Fri, Jan 25, 2019@12:48:19PM -0700, Keith Busch wrote:
> > The driver quiesces IO when the admin command effects say it may alter
> > namespaces.
> > 
> > A device that supports nvme namespace management may trigger a namespace
> > scan automatically. If this changes the namespace inventory by adding
> > more namespace, the driver's unquiesce sequence at the end of sending
> > the admin command may try to unfreeze a queue that isn't frozen,
> > breaking the freeze depth for future use.
> > 
> > Fix that by preventing scan work from altering the namespace list while
> > we are pairing freeze and unfreeze operations.
> 
> Do you have a reference for the bug report?

This was reported internally from Wen. If you prefer, we can open a
public bz sighting and append the kernel logs for a changelog Link:.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
       [not found]     ` <BYAPR04MB45021C05936FF7EF982D579686940@BYAPR04MB4502.namprd04.prod.outlook.com>
@ 2019-01-28 15:36       ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2019-01-28 15:36 UTC (permalink / raw)


On Sat, Jan 26, 2019@09:55:52AM -0800, Chaitanya Kulkarni wrote:
> Looks good to me. Just thinking out loud we can only unfreeze the namespace if
> that is frozen in nvme_unfreeze(ctrl)? We can mark it freeze in the
> nvme_wait_freeze(). 

Adding a freeze state to namespaces may work too if there are no callers
making use of a layer freeze depth.
 
> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>
> 
> ???????????????????????????????????????????????????????????????????????????????
> From: Linux-nvme <linux-nvme-bounces at lists.infradead.org> on behalf of Keith
> Busch <keith.busch at intel.com>
> Sent: Friday, January 25, 2019 3:37:43 PM
> To: Wen Xiong
> Cc: Christoph Hellwig; linux-nvme at lists.infradead.org; Sagi Grimberg
> Subject: Re: [PATCH] nvme: Block ns inventory changes during admin
>  
> [replying in plain-text for mailing list posterity]
> On Fri, Jan 25, 2019@05:11:35PM -0600, Wen Xiong wrote:
> > Hi Keith,
> >
> > I have tested the patch and fixed the two hung issues on our system.
> >
> > >>Reported-by: Wen Xiong <wenxiong at us.ibm.com>
> > >>Signed-off-by: Keith Busch <keith.busch at intel.com>
> >
> > Tested-by: Wen Xiong<wenxiong at us.ibm.com>
> >
> > Thanks,
> > Wendy
> 
> Thank you for confirming this fixes your issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
  2019-01-28 15:33   ` Keith Busch
@ 2019-01-28 15:58     ` Christoph Hellwig
  2019-01-28 16:07       ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-01-28 15:58 UTC (permalink / raw)


On Mon, Jan 28, 2019@08:33:45AM -0700, Keith Busch wrote:
> This was reported internally from Wen. If you prefer, we can open a
> public bz sighting and append the kernel logs for a changelog Link:.

I don't really need a bugzilla, but I'd like to understand the reported
problem a little better.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
  2019-01-28 15:58     ` Christoph Hellwig
@ 2019-01-28 16:07       ` Keith Busch
  2019-01-28 16:30         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2019-01-28 16:07 UTC (permalink / raw)


On Mon, Jan 28, 2019@04:58:34PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 28, 2019@08:33:45AM -0700, Keith Busch wrote:
> > This was reported internally from Wen. If you prefer, we can open a
> > public bz sighting and append the kernel logs for a changelog Link:.
> 
> I don't really need a bugzilla, but I'd like to understand the reported
> problem a little better.

Sure thing. The reporter is attaching namespaces in a loop. The controller
does support the NS Change Notification, so that automatically triggers
the namespace scan_work.

Occasionally the namespace scan_work appends the new namespace to the
list before the admin command effects handling is completed. The effects
handling attempts to unfreeze the namespace, but if it unfreezes the
new one, that request_queue freeze depth will be off and we'll hit the
warning in blk_mq_unfreeze_queue().

On the next namespace add, we fail to freeze that queue due to the
previous bad accounting and deadlock waiting for frozen.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] nvme: Block ns inventory changes during admin
  2019-01-28 16:07       ` Keith Busch
@ 2019-01-28 16:30         ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2019-01-28 16:30 UTC (permalink / raw)


On Mon, Jan 28, 2019@09:07:13AM -0700, Keith Busch wrote:
> Sure thing. The reporter is attaching namespaces in a loop. The controller
> does support the NS Change Notification, so that automatically triggers
> the namespace scan_work.
> 
> Occasionally the namespace scan_work appends the new namespace to the
> list before the admin command effects handling is completed. The effects
> handling attempts to unfreeze the namespace, but if it unfreezes the
> new one, that request_queue freeze depth will be off and we'll hit the
> warning in blk_mq_unfreeze_queue().
> 
> On the next namespace add, we fail to freeze that queue due to the
> previous bad accounting and deadlock waiting for frozen.

Ok, makes sense.  Can you fold this into the patch description?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-01-28 16:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25 19:48 [PATCH] nvme: Block ns inventory changes during admin Keith Busch
     [not found] ` <OF5147C249.BC64CC03-ON0025838D.007F28EE-8625838D.007F6785@notes.na.collabserv.com>
2019-01-25 23:37   ` Keith Busch
     [not found]     ` <BYAPR04MB45021C05936FF7EF982D579686940@BYAPR04MB4502.namprd04.prod.outlook.com>
2019-01-28 15:36       ` Keith Busch
2019-01-28  7:39 ` Christoph Hellwig
2019-01-28 15:33   ` Keith Busch
2019-01-28 15:58     ` Christoph Hellwig
2019-01-28 16:07       ` Keith Busch
2019-01-28 16:30         ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.