All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal
@ 2017-10-22 18:42 Sagi Grimberg
  2017-10-22 18:54 ` Sagi Grimberg
  2017-10-23  8:23 ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Sagi Grimberg @ 2017-10-22 18:42 UTC (permalink / raw)


nvme_rdma_queue_is_ready() fails requests in case a queue is not
LIVE. If the controller is in RECONNECTING state, we might be in
this state for a long time (until we successfully reconnect) and
we are better off with failing the request fast. Otherwise, we
fail with BLK_STS_RESOURCE to have the block layer try again
soon.

In case we are removing the controller when the admin queue
is not LIVE, we will terminate the request with BLK_STS_RESOURCE
but it happens before we call blk_mq_start_request() so the
request timeout never expires, and the queue will never get
back to LIVE (because we are removing the controller). This
causes the removal operation to block infinitly [1].

Thus, if we are removing (state DELETING), and the queue is
not LIVE, we need to fail the request permanently as there is
no chance for it to ever complete successfully.

[1]
--
sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/u66:2   D    0   440      2 0x80000000
Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x221/0x580
 io_schedule_timeout+0x1e/0x50
 wait_for_completion_io_timeout+0x118/0x180
 blk_execute_rq+0x86/0xc0
 __nvme_submit_sync_cmd+0x89/0xf0
 nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
 nvme_shutdown_ctrl+0x41/0xe0
 nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
 nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
 nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
 process_one_work+0x1fd/0x630
 worker_thread+0x1db/0x3b0
 kthread+0x11e/0x150
 ret_from_fork+0x27/0x40
01              D    0  2868   2862 0x00000000
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x260/0x580
 wait_for_completion+0x108/0x170
 flush_work+0x1e0/0x270
 nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
 nvme_sysfs_delete+0x2a/0x40
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x45/0x60
 kernfs_fop_write+0x124/0x1c0
 __vfs_write+0x28/0x150
 vfs_write+0xc7/0x1b0
 SyS_write+0x49/0xa0
 entry_SYSCALL_64_fastpath+0x18/0xad
--

Reported-by: Bart Van Assche <bart.vanassche at wdc.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5b5458012c2c..57f1c9f49a54 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1393,6 +1393,14 @@ nvme_rdma_queue_is_ready(struct nvme_rdma_queue *queue, struct request *rq)
 		    cmd->common.opcode != nvme_fabrics_command ||
 		    cmd->fabrics.fctype != nvme_fabrics_type_connect) {
 			/*
+			 * deleting state means that the ctrl will never accept
+			 * commands again, fail it permanently.
+			 */
+			if (queue->ctrl->ctrl.state == NVME_CTRL_DELETING) {
+				nvme_req(rq)->status = NVME_SC_ABORT_REQ;
+				return BLK_STS_IOERR;
+			}
+			/*
 			 * reconnecting state means transport disruption, which
 			 * can take a long time and even might fail permanently,
 			 * so we can't let incoming I/O be requeued forever.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal
  2017-10-22 18:42 [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal Sagi Grimberg
@ 2017-10-22 18:54 ` Sagi Grimberg
  2017-10-23  8:23 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2017-10-22 18:54 UTC (permalink / raw)


Replaced Bart's old email address (now updated in .gitaliases)

On 22/10/17 21:42, Sagi Grimberg wrote:
> nvme_rdma_queue_is_ready() fails requests in case a queue is not
> LIVE. If the controller is in RECONNECTING state, we might be in
> this state for a long time (until we successfully reconnect) and
> we are better off with failing the request fast. Otherwise, we
> fail with BLK_STS_RESOURCE to have the block layer try again
> soon.
> 
> In case we are removing the controller when the admin queue
> is not LIVE, we will terminate the request with BLK_STS_RESOURCE
> but it happens before we call blk_mq_start_request() so the
> request timeout never expires, and the queue will never get
> back to LIVE (because we are removing the controller). This
> causes the removal operation to block infinitly [1].
> 
> Thus, if we are removing (state DELETING), and the queue is
> not LIVE, we need to fail the request permanently as there is
> no chance for it to ever complete successfully.
> 
> [1]
> --
> sysrq: SysRq : Show Blocked State
>    task                        PC stack   pid father
> kworker/u66:2   D    0   440      2 0x80000000
> Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
> Call Trace:
>   __schedule+0x3e9/0xb00
>   schedule+0x40/0x90
>   schedule_timeout+0x221/0x580
>   io_schedule_timeout+0x1e/0x50
>   wait_for_completion_io_timeout+0x118/0x180
>   blk_execute_rq+0x86/0xc0
>   __nvme_submit_sync_cmd+0x89/0xf0
>   nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
>   nvme_shutdown_ctrl+0x41/0xe0
>   nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
>   nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
>   nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
>   process_one_work+0x1fd/0x630
>   worker_thread+0x1db/0x3b0
>   kthread+0x11e/0x150
>   ret_from_fork+0x27/0x40
> 01              D    0  2868   2862 0x00000000
> Call Trace:
>   __schedule+0x3e9/0xb00
>   schedule+0x40/0x90
>   schedule_timeout+0x260/0x580
>   wait_for_completion+0x108/0x170
>   flush_work+0x1e0/0x270
>   nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
>   nvme_sysfs_delete+0x2a/0x40
>   dev_attr_store+0x18/0x30
>   sysfs_kf_write+0x45/0x60
>   kernfs_fop_write+0x124/0x1c0
>   __vfs_write+0x28/0x150
>   vfs_write+0xc7/0x1b0
>   SyS_write+0x49/0xa0
>   entry_SYSCALL_64_fastpath+0x18/0xad
> --
> 
> Reported-by: Bart Van Assche <bart.vanassche at wdc.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>   drivers/nvme/host/rdma.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 5b5458012c2c..57f1c9f49a54 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1393,6 +1393,14 @@ nvme_rdma_queue_is_ready(struct nvme_rdma_queue *queue, struct request *rq)
>   		    cmd->common.opcode != nvme_fabrics_command ||
>   		    cmd->fabrics.fctype != nvme_fabrics_type_connect) {
>   			/*
> +			 * deleting state means that the ctrl will never accept
> +			 * commands again, fail it permanently.
> +			 */
> +			if (queue->ctrl->ctrl.state == NVME_CTRL_DELETING) {
> +				nvme_req(rq)->status = NVME_SC_ABORT_REQ;
> +				return BLK_STS_IOERR;
> +			}
> +			/*
>   			 * reconnecting state means transport disruption, which
>   			 * can take a long time and even might fail permanently,
>   			 * so we can't let incoming I/O be requeued forever.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal
  2017-10-22 18:42 [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal Sagi Grimberg
  2017-10-22 18:54 ` Sagi Grimberg
@ 2017-10-23  8:23 ` Christoph Hellwig
  2017-10-23  9:04   ` Sagi Grimberg
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2017-10-23  8:23 UTC (permalink / raw)


On Sun, Oct 22, 2017@09:42:51PM +0300, Sagi Grimberg wrote:
>  			/*
> +			 * deleting state means that the ctrl will never accept
> +			 * commands again, fail it permanently.
> +			 */
> +			if (queue->ctrl->ctrl.state == NVME_CTRL_DELETING) {
> +				nvme_req(rq)->status = NVME_SC_ABORT_REQ;
> +				return BLK_STS_IOERR;

What does the NVME_SC_ABORT_REQ buy us here compared to just returning
BLK_STS_IOERR as in the reconnecting case?  Why can't we just merge the
two cases?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal
  2017-10-23  8:23 ` Christoph Hellwig
@ 2017-10-23  9:04   ` Sagi Grimberg
  0 siblings, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2017-10-23  9:04 UTC (permalink / raw)



>>   			/*
>> +			 * deleting state means that the ctrl will never accept
>> +			 * commands again, fail it permanently.
>> +			 */
>> +			if (queue->ctrl->ctrl.state == NVME_CTRL_DELETING) {
>> +				nvme_req(rq)->status = NVME_SC_ABORT_REQ;
>> +				return BLK_STS_IOERR;
> 
> What does the NVME_SC_ABORT_REQ buy us here compared to just returning
> BLK_STS_IOERR as in the reconnecting case?

reg_write32() issues a sync command, which returns
nvme_req(req)->status (blk_execute_request error detection is caller
specific). without it nvme_shutdown_ctrl thinks reg_write32() succeeded
and polls with reg_read32() for no good reason.

And its needed for the reconnecting case too, we just never noticed
it for that case.

> Why can't we just merge the two cases?

Because we fail for slightly different reasons so I thought two
distinct conditionals looks cleaner, but we can merge the two.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-23  9:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-22 18:42 [PATCH] nvme-rdma: fix possible hang when issuing commands during ctrl removal Sagi Grimberg
2017-10-22 18:54 ` Sagi Grimberg
2017-10-23  8:23 ` Christoph Hellwig
2017-10-23  9:04   ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.