All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: don't flush scan work with non-idle request
@ 2022-08-12 18:21 Keith Busch
  2022-08-12 19:12 ` Jonathan Derrick
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Keith Busch @ 2022-08-12 18:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, Keith Busch, Jonathan Derrick

From: Keith Busch <kbusch@kernel.org>

If a reset occurs after the scan work attempts to issue a command, the
reset may quisce the admin queue, which blocks the scan work's command
from dispatching. The scan work will not be able to complete while the
queue is quiesced.

Meanwhile, the reset work will cancel all outstanding admin tags and
wait until all requests have transitioned to idle, which includes the
passthrough request. But the passthrough request won't be set to idle
until after the scan_work flushes, so we're deadlocked.

Fix this by moving the flush_work after the request has been freed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354
Reported-by: Jonathan Derrick <Jonathan.Derrick@solidigm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c  |  5 ++---
 drivers/nvme/host/ioctl.c | 12 ++++++++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index af367b22871b..1143f625e195 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1121,12 +1121,11 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
 		nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
 		mutex_unlock(&ctrl->scan_lock);
 	}
+
 	if (effects & NVME_CMD_EFFECTS_CCC)
 		nvme_init_ctrl_finish(ctrl);
-	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
+	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
 		nvme_queue_scan(ctrl);
-		flush_work(&ctrl->scan_work);
-	}
 
 	switch (cmd->common.opcode) {
 	case nvme_admin_set_features:
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 27614bee7380..97febd5f41a3 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -136,9 +136,11 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
 		u32 meta_seed, u64 *result, unsigned timeout, bool vec)
 {
+	struct nvme_ctrl *ctrl;
 	struct request *req;
 	void *meta = NULL;
 	struct bio *bio;
+	u32 effects;
 	int ret;
 
 	req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer,
@@ -147,6 +149,8 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 		return PTR_ERR(req);
 
 	bio = req->bio;
+	ctrl = nvme_req(req)->ctrl;
+	effects = nvme_command_effects(ctrl, q->queuedata, cmd->common.opcode);
 
 	ret = nvme_execute_passthru_rq(req);
 
@@ -158,6 +162,14 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	if (bio)
 		blk_rq_unmap_user(bio);
 	blk_mq_free_request(req);
+
+	/*
+	 * Ensure the namespace inventory is up-to-date before returning if
+	 * this command can change it.
+	 */
+	if (ret >= 0 && effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
+		flush_work(&ctrl->scan_work);
+
 	return ret;
 }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-12 18:21 [PATCH] nvme: don't flush scan work with non-idle request Keith Busch
@ 2022-08-12 19:12 ` Jonathan Derrick
  2022-08-16  8:53 ` Chao Leng
  2022-08-21 14:40 ` Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Jonathan Derrick @ 2022-08-12 19:12 UTC (permalink / raw)
  To: linux-nvme



On 8/12/2022 12:21 PM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> If a reset occurs after the scan work attempts to issue a command, the
> reset may quisce the admin queue, which blocks the scan work's command
> from dispatching. The scan work will not be able to complete while the
> queue is quiesced.
> 
> Meanwhile, the reset work will cancel all outstanding admin tags and
> wait until all requests have transitioned to idle, which includes the
> passthrough request. But the passthrough request won't be set to idle
> until after the scan_work flushes, so we're deadlocked.
> 
> Fix this by moving the flush_work after the request has been freed.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354
> Reported-by: Jonathan Derrick <Jonathan.Derrick@solidigm.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Seems to fix the repro case. Thanks Keith.

> ---
>   drivers/nvme/host/core.c  |  5 ++---
>   drivers/nvme/host/ioctl.c | 12 ++++++++++++
>   2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index af367b22871b..1143f625e195 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1121,12 +1121,11 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
>   		nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
>   		mutex_unlock(&ctrl->scan_lock);
>   	}
> +
>   	if (effects & NVME_CMD_EFFECTS_CCC)
>   		nvme_init_ctrl_finish(ctrl);
> -	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
> +	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
>   		nvme_queue_scan(ctrl);
> -		flush_work(&ctrl->scan_work);
> -	}
>   
>   	switch (cmd->common.opcode) {
>   	case nvme_admin_set_features:
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index 27614bee7380..97febd5f41a3 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -136,9 +136,11 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>   		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
>   		u32 meta_seed, u64 *result, unsigned timeout, bool vec)
>   {
> +	struct nvme_ctrl *ctrl;
>   	struct request *req;
>   	void *meta = NULL;
>   	struct bio *bio;
> +	u32 effects;
>   	int ret;
>   
>   	req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer,
> @@ -147,6 +149,8 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>   		return PTR_ERR(req);
>   
>   	bio = req->bio;
> +	ctrl = nvme_req(req)->ctrl;
> +	effects = nvme_command_effects(ctrl, q->queuedata, cmd->common.opcode);
>   
>   	ret = nvme_execute_passthru_rq(req);
>   
> @@ -158,6 +162,14 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>   	if (bio)
>   		blk_rq_unmap_user(bio);
>   	blk_mq_free_request(req);
> +
> +	/*
> +	 * Ensure the namespace inventory is up-to-date before returning if
> +	 * this command can change it.
> +	 */
> +	if (ret >= 0 && effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
> +		flush_work(&ctrl->scan_work);
> +
>   	return ret;
>   }
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-12 18:21 [PATCH] nvme: don't flush scan work with non-idle request Keith Busch
  2022-08-12 19:12 ` Jonathan Derrick
@ 2022-08-16  8:53 ` Chao Leng
  2022-08-17 10:29   ` Sagi Grimberg
  2022-08-21 14:40 ` Christoph Hellwig
  2 siblings, 1 reply; 9+ messages in thread
From: Chao Leng @ 2022-08-16  8:53 UTC (permalink / raw)
  To: Keith Busch, linux-nvme; +Cc: hch, Keith Busch, Jonathan Derrick

Looks good to me.
Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/8/13 2:21, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> If a reset occurs after the scan work attempts to issue a command, the
> reset may quisce the admin queue, which blocks the scan work's command
> from dispatching. The scan work will not be able to complete while the
> queue is quiesced.
> 
> Meanwhile, the reset work will cancel all outstanding admin tags and
> wait until all requests have transitioned to idle, which includes the
> passthrough request. But the passthrough request won't be set to idle
> until after the scan_work flushes, so we're deadlocked.
> 
> Fix this by moving the flush_work after the request has been freed.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354
> Reported-by: Jonathan Derrick <Jonathan.Derrick@solidigm.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c  |  5 ++---
>   drivers/nvme/host/ioctl.c | 12 ++++++++++++
>   2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index af367b22871b..1143f625e195 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1121,12 +1121,11 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
>   		nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
>   		mutex_unlock(&ctrl->scan_lock);
>   	}
> +
>   	if (effects & NVME_CMD_EFFECTS_CCC)
>   		nvme_init_ctrl_finish(ctrl);
> -	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
> +	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
>   		nvme_queue_scan(ctrl);
> -		flush_work(&ctrl->scan_work);
> -	}
>   
>   	switch (cmd->common.opcode) {
>   	case nvme_admin_set_features:
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index 27614bee7380..97febd5f41a3 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -136,9 +136,11 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>   		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
>   		u32 meta_seed, u64 *result, unsigned timeout, bool vec)
>   {
> +	struct nvme_ctrl *ctrl;
>   	struct request *req;
>   	void *meta = NULL;
>   	struct bio *bio;
> +	u32 effects;
>   	int ret;
>   
>   	req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer,
> @@ -147,6 +149,8 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>   		return PTR_ERR(req);
>   
>   	bio = req->bio;
> +	ctrl = nvme_req(req)->ctrl;
> +	effects = nvme_command_effects(ctrl, q->queuedata, cmd->common.opcode);
>   
>   	ret = nvme_execute_passthru_rq(req);
>   
> @@ -158,6 +162,14 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>   	if (bio)
>   		blk_rq_unmap_user(bio);
>   	blk_mq_free_request(req);
> +
> +	/*
> +	 * Ensure the namespace inventory is up-to-date before returning if
> +	 * this command can change it.
> +	 */
> +	if (ret >= 0 && effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
> +		flush_work(&ctrl->scan_work);
Maybe it is better to move nvme_queue_scan together here.
> +
>   	return ret;
>   }
>   
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-16  8:53 ` Chao Leng
@ 2022-08-17 10:29   ` Sagi Grimberg
  2022-08-17 15:07     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2022-08-17 10:29 UTC (permalink / raw)
  To: Chao Leng, Keith Busch, linux-nvme; +Cc: hch, Keith Busch, Jonathan Derrick


>> +
>> +    /*
>> +     * Ensure the namespace inventory is up-to-date before returning if
>> +     * this command can change it.
>> +     */
>> +    if (ret >= 0 && effects & (NVME_CMD_EFFECTS_NIC | 
>> NVME_CMD_EFFECTS_NCC))
>> +        flush_work(&ctrl->scan_work);
> Maybe it is better to move nvme_queue_scan together here.

I agree, why not move the scan here as well?

>> +
>>       return ret;
>>   }
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-17 10:29   ` Sagi Grimberg
@ 2022-08-17 15:07     ` Keith Busch
  2022-08-17 15:39       ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2022-08-17 15:07 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Chao Leng, Keith Busch, linux-nvme, hch, Jonathan Derrick

On Wed, Aug 17, 2022 at 01:29:37PM +0300, Sagi Grimberg wrote:
> 
> > > +
> > > +    /*
> > > +     * Ensure the namespace inventory is up-to-date before returning if
> > > +     * this command can change it.
> > > +     */
> > > +    if (ret >= 0 && effects & (NVME_CMD_EFFECTS_NIC |
> > > NVME_CMD_EFFECTS_NCC))
> > > +        flush_work(&ctrl->scan_work);
> > Maybe it is better to move nvme_queue_scan together here.
> 
> I agree, why not move the scan here as well?

Then I'd need to duplicate this to nvmet_passthru_execute_cmd_work(), and I
didn't want to do that. :)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-17 15:07     ` Keith Busch
@ 2022-08-17 15:39       ` Sagi Grimberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2022-08-17 15:39 UTC (permalink / raw)
  To: Keith Busch; +Cc: Chao Leng, Keith Busch, linux-nvme, hch, Jonathan Derrick


>>>> +    /*
>>>> +     * Ensure the namespace inventory is up-to-date before returning if
>>>> +     * this command can change it.
>>>> +     */
>>>> +    if (ret >= 0 && effects & (NVME_CMD_EFFECTS_NIC |
>>>> NVME_CMD_EFFECTS_NCC))
>>>> +        flush_work(&ctrl->scan_work);
>>> Maybe it is better to move nvme_queue_scan together here.
>>
>> I agree, why not move the scan here as well?
> 
> Then I'd need to duplicate this to nvmet_passthru_execute_cmd_work(), and I
> didn't want to do that. :)

Got it,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-12 18:21 [PATCH] nvme: don't flush scan work with non-idle request Keith Busch
  2022-08-12 19:12 ` Jonathan Derrick
  2022-08-16  8:53 ` Chao Leng
@ 2022-08-21 14:40 ` Christoph Hellwig
  2022-08-22 15:09   ` Keith Busch
  2 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2022-08-21 14:40 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch, Jonathan Derrick

With this the flush_work is lost for the target passthrough case.

I also don't really like the double call to nvme_command_effects().

What about just removing nvme_execute_passthru_rq() and open coding
it in the two calles, with the nvme_passthru_end call moved until
after freeing the request?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-21 14:40 ` Christoph Hellwig
@ 2022-08-22 15:09   ` Keith Busch
  2022-08-23 16:14     ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2022-08-22 15:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Jonathan Derrick

On Sun, Aug 21, 2022 at 04:40:39PM +0200, Christoph Hellwig wrote:
> With this the flush_work is lost for the target passthrough case.
> 
> I also don't really like the double call to nvme_command_effects().
> 
> What about just removing nvme_execute_passthru_rq() and open coding
> it in the two calles, with the nvme_passthru_end call moved until
> after freeing the request?

I didn't think it mattered for passthrough because the host needs to flush the
scan work on its side before safely using the changed namespace. It doesn't
matter if the target side has flushed out the new namespace format because it's
not the one creating LBA command parameters.

Never-the-less, if you do prefer to have each caller open code the sequence, I
have that patch ready to send.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't flush scan work with non-idle request
  2022-08-22 15:09   ` Keith Busch
@ 2022-08-23 16:14     ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2022-08-23 16:14 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, Jonathan Derrick

On Mon, Aug 22, 2022 at 09:09:10AM -0600, Keith Busch wrote:
> On Sun, Aug 21, 2022 at 04:40:39PM +0200, Christoph Hellwig wrote:
> > With this the flush_work is lost for the target passthrough case.
> > 
> > I also don't really like the double call to nvme_command_effects().
> > 
> > What about just removing nvme_execute_passthru_rq() and open coding
> > it in the two calles, with the nvme_passthru_end call moved until
> > after freeing the request?
> 
> I didn't think it mattered for passthrough because the host needs to flush the
> scan work on its side before safely using the changed namespace. It doesn't
> matter if the target side has flushed out the new namespace format because it's
> not the one creating LBA command parameters.

Well, a passthrough controller is still also available locally, so I don't
think we can just skip the scan.

> Never-the-less, if you do prefer to have each caller open code the sequence, I
> have that patch ready to send.

It's not that I really like the open coding, but it was the first
thing that came to mind on how to fix the passthrough case.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-23 16:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 18:21 [PATCH] nvme: don't flush scan work with non-idle request Keith Busch
2022-08-12 19:12 ` Jonathan Derrick
2022-08-16  8:53 ` Chao Leng
2022-08-17 10:29   ` Sagi Grimberg
2022-08-17 15:07     ` Keith Busch
2022-08-17 15:39       ` Sagi Grimberg
2022-08-21 14:40 ` Christoph Hellwig
2022-08-22 15:09   ` Keith Busch
2022-08-23 16:14     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.