All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: Make sure to ring doorbell when last request is short-circuited
@ 2022-09-18  5:48 Mohamed Khalfella
  2022-09-19 17:35 ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Mohamed Khalfella @ 2022-09-18  5:48 UTC (permalink / raw)
  To: mkhalfella
  Cc: stable, Eric Badger, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Tao Chiu, Leon Chien, Cody Wong,
	open list:NVM EXPRESS DRIVER, open list

When processing a batch of requests, it is possible that nvme_queue_rq()
misses to ring nvme queue doorbell if the last request fails because the
controller is not ready. As a result of that, previously queued requests
will timeout because the device had not chance to know about the commands
existence. This failure can cause nvme controller reset to timeout if
there was another App using adminq while nvme reset was taking place.

Consider this case:
- App is hammering adminq with NVME_ADMIN_IDENTIFY commands
- Controller reset triggered by "echo 1 > /sys/.../nvme0/reset_controller"

nvme_reset_ctrl() will change controller state to NVME_CTRL_RESETTING.
From that point on all requests from App will be forced to fail because
the controller is no longer ready. More importantly these requests will
not make it to adminq and will be short-circuited in nvme_queue_rq().
Unlike App requests, requests issued by reset code path will be allowed
to go through adminq in order to carry out the reset process. The problem
happens when blk-mq decides to mix requests from reset code path and App
in one batch, in particular when the last request in such batch happens
to be from App.

In this case the last request will have bd->last set to true telling the
driver to ring doorbell after queuing this request. However, since the
controller is not ready, this App request will be completed without going
through adminq, and nvme_queue_rq() will miss the opportunity to ring
adminq doorbell leaving earlier queued requests unknown to the device.

Fixes: d4060d2be1132 ("nvme-pci: fix controller reset hang when racing with nvme_timeout")
Cc: stable@vger.kernel.org
Reported-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Eric Badger <ebadger@purestorage.com>
---
 drivers/nvme/host/pci.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 98864b853eef..f6b1ae593e8e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -946,8 +946,12 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
 		return BLK_STS_IOERR;
 
-	if (unlikely(!nvme_check_ready(&dev->ctrl, req, true)))
-		return nvme_fail_nonready_command(&dev->ctrl, req);
+	if (unlikely(!nvme_check_ready(&dev->ctrl, req, true))) {
+		ret = nvme_fail_nonready_command(&dev->ctrl, req);
+		if (ret == BLK_STS_OK && bd->last)
+			nvme_commit_rqs(hctx);
+		return ret;
+	}
 
 	ret = nvme_prep_rq(dev, req);
 	if (unlikely(ret))
@@ -1724,6 +1728,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
 static const struct blk_mq_ops nvme_mq_admin_ops = {
 	.queue_rq	= nvme_queue_rq,
 	.complete	= nvme_pci_complete_rq,
+	.commit_rqs	= nvme_commit_rqs,
 	.init_hctx	= nvme_admin_init_hctx,
 	.init_request	= nvme_pci_init_request,
 	.timeout	= nvme_timeout,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme-pci: Make sure to ring doorbell when last request is short-circuited
  2022-09-18  5:48 [PATCH] nvme-pci: Make sure to ring doorbell when last request is short-circuited Mohamed Khalfella
@ 2022-09-19 17:35 ` Keith Busch
  2022-09-20 16:06   ` Mohamed Khalfella
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2022-09-19 17:35 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: stable, Eric Badger, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Tao Chiu, Leon Chien, Cody Wong,
	open list:NVM EXPRESS DRIVER, open list

On Sun, Sep 18, 2022 at 05:48:16AM +0000, Mohamed Khalfella wrote:
> When processing a batch of requests, it is possible that nvme_queue_rq()
> misses to ring nvme queue doorbell if the last request fails because the
> controller is not ready. As a result of that, previously queued requests
> will timeout because the device had not chance to know about the commands
> existence. This failure can cause nvme controller reset to timeout if
> there was another App using adminq while nvme reset was taking place.
> 
> Consider this case:
> - App is hammering adminq with NVME_ADMIN_IDENTIFY commands
> - Controller reset triggered by "echo 1 > /sys/.../nvme0/reset_controller"
> 
> nvme_reset_ctrl() will change controller state to NVME_CTRL_RESETTING.
> From that point on all requests from App will be forced to fail because
> the controller is no longer ready. More importantly these requests will
> not make it to adminq and will be short-circuited in nvme_queue_rq().
> Unlike App requests, requests issued by reset code path will be allowed
> to go through adminq in order to carry out the reset process. The problem
> happens when blk-mq decides to mix requests from reset code path and App
> in one batch, in particular when the last request in such batch happens
> to be from App.
> 
> In this case the last request will have bd->last set to true telling the
> driver to ring doorbell after queuing this request. However, since the
> controller is not ready, this App request will be completed without going
> through adminq, and nvme_queue_rq() will miss the opportunity to ring
> adminq doorbell leaving earlier queued requests unknown to the device.
> 
> Fixes: d4060d2be1132 ("nvme-pci: fix controller reset hang when racing with nvme_timeout")

I revisted that commit, and it doesn't sound correct. Specifically this part:

    5) reset_work() continues to setup_io_queues() as it observes no error
       in init_identify(). However, the admin queue has already been
       quiesced in dev_disable(). Thus, any following commands would be
       blocked forever in blk_execute_rq().

When a timeout occurs in the CONNECTING state, the timeout handler unquiesces
the queue specifically to flush out any blocked requests. Is that commit really
necessary? I'd rather just revert it to save the extra per-IO checks if not.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme-pci: Make sure to ring doorbell when last request is short-circuited
  2022-09-19 17:35 ` Keith Busch
@ 2022-09-20 16:06   ` Mohamed Khalfella
  0 siblings, 0 replies; 3+ messages in thread
From: Mohamed Khalfella @ 2022-09-20 16:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: stable, Eric Badger, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Tao Chiu, Leon Chien, Cody Wong,
	open list:NVM EXPRESS DRIVER, open list

On 2022-09-19 11:35:09 -0600, Keith Busch wrote:
> > Fixes: d4060d2be1132 ("nvme-pci: fix controller reset hang when racing with nvme_timeout")
> 
> I revisted that commit, and it doesn't sound correct. Specifically this part:
> 
>     5) reset_work() continues to setup_io_queues() as it observes no error
>        in init_identify(). However, the admin queue has already been
>        quiesced in dev_disable(). Thus, any following commands would be
>        blocked forever in blk_execute_rq().
> 
> When a timeout occurs in the CONNECTING state, the timeout handler unquiesces
> the queue specifically to flush out any blocked requests. Is that commit really
> necessary? I'd rather just revert it to save the extra per-IO checks if not.

I can not speak with certainty whether 4060d2be1132 need to be reverted or not.
I will need to carefully inspect reset code path and do more experiments. If this
commit gets reverted we still need to add `nvme_commit_rqs` to `nvme_mq_admin_ops`.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-20 16:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-18  5:48 [PATCH] nvme-pci: Make sure to ring doorbell when last request is short-circuited Mohamed Khalfella
2022-09-19 17:35 ` Keith Busch
2022-09-20 16:06   ` Mohamed Khalfella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.