* [PATCH 0/3] avoid race between time out and tear down @ 2020-10-20 9:08 Chao Leng 2020-10-20 18:43 ` Sagi Grimberg 0 siblings, 1 reply; 7+ messages in thread From: Chao Leng @ 2020-10-20 9:08 UTC (permalink / raw) To: linux-nvme; +Cc: kbusch, axboe, hch, lengchao, sagi Avoid race between time out and tear down for rdma and tcp. Chao Leng (3): nvme-core: introduce sync io queues nvme-rdma: avoid race between time out and tear down nvme-tcp: avoid race between time out and tear down drivers/nvme/host/core.c | 8 ++++++-- drivers/nvme/host/nvme.h | 1 + drivers/nvme/host/rdma.c | 12 ++---------- drivers/nvme/host/tcp.c | 14 +++----------- 4 files changed, 12 insertions(+), 23 deletions(-) -- 2.16.4 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] avoid race between time out and tear down 2020-10-20 9:08 [PATCH 0/3] avoid race between time out and tear down Chao Leng @ 2020-10-20 18:43 ` Sagi Grimberg 2020-10-21 2:16 ` Chao Leng 0 siblings, 1 reply; 7+ messages in thread From: Sagi Grimberg @ 2020-10-20 18:43 UTC (permalink / raw) To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch > Avoid race between time out and tear down for rdma and tcp. This patchset overall looks good, but we still need the patch that avoids double completion: -- diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 629b025685d1..46428ff0b0fc 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) /* fence other contexts that may complete the command */ mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); - if (!blk_mq_request_completed(rq)) { + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request_sync(rq); } -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] avoid race between time out and tear down 2020-10-20 18:43 ` Sagi Grimberg @ 2020-10-21 2:16 ` Chao Leng 2020-10-21 4:59 ` Sagi Grimberg 0 siblings, 1 reply; 7+ messages in thread From: Chao Leng @ 2020-10-21 2:16 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch On 2020/10/21 2:43, Sagi Grimberg wrote: > >> Avoid race between time out and tear down for rdma and tcp. > > This patchset overall looks good, but we still need the patch that > avoids double completion: > > -- > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > index 629b025685d1..46428ff0b0fc 100644 > --- a/drivers/nvme/host/tcp.c > +++ b/drivers/nvme/host/tcp.c > @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) > /* fence other contexts that may complete the command */ > mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); > nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); > - if (!blk_mq_request_completed(rq)) { > + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { Yes, this patch is need. and samely for nvme_cancel_request. This will fix the race with asynchronous completion. diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index e85f6304efd7..1e838d952096 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved) "Cancelling I/O %d", req->tag); /* don't abort one completed request */ - if (blk_mq_request_completed(req)) + if (blk_mq_request_completed(req) || !blk_mq_request_started(rq)) return true; nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD; -- > nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; > blk_mq_complete_request_sync(rq); > } > -- > . _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] avoid race between time out and tear down 2020-10-21 2:16 ` Chao Leng @ 2020-10-21 4:59 ` Sagi Grimberg 2020-10-21 6:51 ` Chao Leng 0 siblings, 1 reply; 7+ messages in thread From: Sagi Grimberg @ 2020-10-21 4:59 UTC (permalink / raw) To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch >>> Avoid race between time out and tear down for rdma and tcp. >> >> This patchset overall looks good, but we still need the patch that >> avoids double completion: >> >> -- >> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c >> index 629b025685d1..46428ff0b0fc 100644 >> --- a/drivers/nvme/host/tcp.c >> +++ b/drivers/nvme/host/tcp.c >> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct >> request *rq) >> /* fence other contexts that may complete the command */ >> mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); >> nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); >> - if (!blk_mq_request_completed(rq)) { >> + if (blk_mq_request_started(rq) && >> !blk_mq_request_completed(rq)) { > Yes, this patch is need. and samely for nvme_cancel_request. > This will fix the race with asynchronous completion. > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index e85f6304efd7..1e838d952096 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void > *data, bool reserved) > "Cancelling I/O %d", req->tag); > > /* don't abort one completed request */ > - if (blk_mq_request_completed(req)) > + if (blk_mq_request_completed(req) || !blk_mq_request_started(rq)) > return true; > > nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD; This one is unneeded because blk_mq_tagset_busy_iter checks that the request has started... _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] avoid race between time out and tear down 2020-10-21 4:59 ` Sagi Grimberg @ 2020-10-21 6:51 ` Chao Leng 2020-10-21 8:53 ` Sagi Grimberg 0 siblings, 1 reply; 7+ messages in thread From: Chao Leng @ 2020-10-21 6:51 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch On 2020/10/21 12:59, Sagi Grimberg wrote: > >>>> Avoid race between time out and tear down for rdma and tcp. >>> >>> This patchset overall looks good, but we still need the patch that >>> avoids double completion: >>> >>> -- >>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c >>> index 629b025685d1..46428ff0b0fc 100644 >>> --- a/drivers/nvme/host/tcp.c >>> +++ b/drivers/nvme/host/tcp.c >>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) >>> /* fence other contexts that may complete the command */ >>> mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); >>> nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); >>> - if (!blk_mq_request_completed(rq)) { >>> + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { >> Yes, this patch is need. and samely for nvme_cancel_request. >> This will fix the race with asynchronous completion. >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index e85f6304efd7..1e838d952096 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved) >> "Cancelling I/O %d", req->tag); >> >> /* don't abort one completed request */ >> - if (blk_mq_request_completed(req)) >> + if (blk_mq_request_completed(req) || !blk_mq_request_started(rq)) >> return true; >> >> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD; > > This one is unneeded because blk_mq_tagset_busy_iter checks that the > request has started... Yes, it is already checked. > . _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] avoid race between time out and tear down 2020-10-21 6:51 ` Chao Leng @ 2020-10-21 8:53 ` Sagi Grimberg 2020-10-22 1:58 ` Chao Leng 0 siblings, 1 reply; 7+ messages in thread From: Sagi Grimberg @ 2020-10-21 8:53 UTC (permalink / raw) To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch [-- Attachment #1: Type: text/plain, Size: 1930 bytes --] >>>>> Avoid race between time out and tear down for rdma and tcp. >>>> >>>> This patchset overall looks good, but we still need the patch that >>>> avoids double completion: >>>> >>>> -- >>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c >>>> index 629b025685d1..46428ff0b0fc 100644 >>>> --- a/drivers/nvme/host/tcp.c >>>> +++ b/drivers/nvme/host/tcp.c >>>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct >>>> request *rq) >>>> /* fence other contexts that may complete the command */ >>>> mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); >>>> nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); >>>> - if (!blk_mq_request_completed(rq)) { >>>> + if (blk_mq_request_started(rq) && >>>> !blk_mq_request_completed(rq)) { >>> Yes, this patch is need. and samely for nvme_cancel_request. >>> This will fix the race with asynchronous completion. >>> >>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >>> index e85f6304efd7..1e838d952096 100644 >>> --- a/drivers/nvme/host/core.c >>> +++ b/drivers/nvme/host/core.c >>> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, >>> void *data, bool reserved) >>> "Cancelling I/O %d", req->tag); >>> >>> /* don't abort one completed request */ >>> - if (blk_mq_request_completed(req)) >>> + if (blk_mq_request_completed(req) || >>> !blk_mq_request_started(rq)) >>> return true; >>> >>> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD; >> >> This one is unneeded because blk_mq_tagset_busy_iter checks that the >> request has started... > Yes, it is already checked. Can you add the attached patches and resend? You can also add my: Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [-- Attachment #2: 0002-nvme-rdma-fix-possible-double-completion-for-timed-o.patch --] [-- Type: text/x-patch, Size: 2506 bytes --] From 53c3b144d71f7ab96619f917571fe63dbd3a023c Mon Sep 17 00:00:00 2001 From: Sagi Grimberg <sagi@grimberg.me> Date: Wed, 21 Oct 2020 01:51:29 -0700 Subject: [PATCH 2/2] nvme-rdma: fix possible double completion for timed out requests If error recovery ran before the timeout handler did and fully completed the timed out request, it will appear as not started (rq state is MQ_RQ_IDLE), however the timeout handler only verifies that the request was not completed (rq state is MQ_RQ_COMPLETE). Check and make sure that the request is both started and did not yet complete. Example trace: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 6 PID: 45 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb71b80837dc8 EFLAGS: 00010292 RAX: 0000000000000026 RBX: 0000000000000000 RCX: ffff93f37dcd8d08 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93f37dcd8d00 RBP: ffff93f37a812400 R08: 00000203c5221fce R09: ffffffffa7fffbc4 R10: 0000000000000477 R11: 000000000002835c R12: ffff93f37a8124e8 R13: ffff93f37a2b0000 R14: ffffb71b80837e70 R15: ffff93f37a2b0000 FS: 0000000000000000(0000) GS:ffff93f37dcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005637b4137028 CR3: 000000007c1be000 CR4: 00000000000006e0 Call Trace: blk_mq_check_expired+0x109/0x1b0 blk_mq_queue_tag_busy_iter+0x1b8/0x330 ? blk_poll+0x300/0x300 blk_mq_timeout_work+0x44/0xe0 process_one_work+0x1b4/0x370 worker_thread+0x53/0x3e0 ? process_one_work+0x370/0x370 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 ---[ end trace 7d137e36e23c0d19 ]--- Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- drivers/nvme/host/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 45fc605349da..ebd6b1080c9d 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1973,7 +1973,7 @@ static void nvme_rdma_complete_timed_out(struct request *rq) /* fence other contexts that may complete the command */ mutex_lock(&ctrl->teardown_lock); nvme_rdma_stop_queue(queue); - if (!blk_mq_request_completed(rq)) { + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request_sync(rq); } -- 2.25.1 [-- Attachment #3: 0001-nvme-tcp-fix-possible-double-completion-for-timed-ou.patch --] [-- Type: text/x-patch, Size: 2541 bytes --] From d9ccd0baf15779d3dcdd488577eda8077bb7cc21 Mon Sep 17 00:00:00 2001 From: Sagi Grimberg <sagi@grimberg.me> Date: Wed, 21 Oct 2020 01:45:34 -0700 Subject: [PATCH 1/2] nvme-tcp: fix possible double completion for timed out requests If error recovery ran before the timeout handler did and fully completed the timed out request, it will appear as not started (rq state is MQ_RQ_IDLE), however the timeout handler only verifies that the request was not completed (rq state is MQ_RQ_COMPLETE). Check and make sure that the request is both started and did not yet complete. Example trace: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 6 PID: 45 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb71b80837dc8 EFLAGS: 00010292 RAX: 0000000000000026 RBX: 0000000000000000 RCX: ffff93f37dcd8d08 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93f37dcd8d00 RBP: ffff93f37a812400 R08: 00000203c5221fce R09: ffffffffa7fffbc4 R10: 0000000000000477 R11: 000000000002835c R12: ffff93f37a8124e8 R13: ffff93f37a2b0000 R14: ffffb71b80837e70 R15: ffff93f37a2b0000 FS: 0000000000000000(0000) GS:ffff93f37dcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005637b4137028 CR3: 000000007c1be000 CR4: 00000000000006e0 Call Trace: blk_mq_check_expired+0x109/0x1b0 blk_mq_queue_tag_busy_iter+0x1b8/0x330 ? blk_poll+0x300/0x300 blk_mq_timeout_work+0x44/0xe0 process_one_work+0x1b4/0x370 worker_thread+0x53/0x3e0 ? process_one_work+0x370/0x370 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 ---[ end trace 7d137e36e23c0d19 ]--- Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- drivers/nvme/host/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 629b025685d1..46428ff0b0fc 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) /* fence other contexts that may complete the command */ mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); - if (!blk_mq_request_completed(rq)) { + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request_sync(rq); } -- 2.25.1 [-- Attachment #4: Type: text/plain, Size: 158 bytes --] _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] avoid race between time out and tear down 2020-10-21 8:53 ` Sagi Grimberg @ 2020-10-22 1:58 ` Chao Leng 0 siblings, 0 replies; 7+ messages in thread From: Chao Leng @ 2020-10-22 1:58 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch On 2020/10/21 16:53, Sagi Grimberg wrote: > >>>>>> Avoid race between time out and tear down for rdma and tcp. >>>>> >>>>> This patchset overall looks good, but we still need the patch that >>>>> avoids double completion: >>>>> >>>>> -- >>>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c >>>>> index 629b025685d1..46428ff0b0fc 100644 >>>>> --- a/drivers/nvme/host/tcp.c >>>>> +++ b/drivers/nvme/host/tcp.c >>>>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) >>>>> /* fence other contexts that may complete the command */ >>>>> mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); >>>>> nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); >>>>> - if (!blk_mq_request_completed(rq)) { >>>>> + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { >>>> Yes, this patch is need. and samely for nvme_cancel_request. >>>> This will fix the race with asynchronous completion. >>>> >>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >>>> index e85f6304efd7..1e838d952096 100644 >>>> --- a/drivers/nvme/host/core.c >>>> +++ b/drivers/nvme/host/core.c >>>> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved) >>>> "Cancelling I/O %d", req->tag); >>>> >>>> /* don't abort one completed request */ >>>> - if (blk_mq_request_completed(req)) >>>> + if (blk_mq_request_completed(req) || !blk_mq_request_started(rq)) >>>> return true; >>>> >>>> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD; >>> >>> This one is unneeded because blk_mq_tagset_busy_iter checks that the >>> request has started... >> Yes, it is already checked. > > Can you add the attached patches and resend? > > You can also add my: > Reviewed-by: Sagi Grimberg <sagi@grimberg.me> ok, I will do it later. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-10-22 1:58 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-20 9:08 [PATCH 0/3] avoid race between time out and tear down Chao Leng 2020-10-20 18:43 ` Sagi Grimberg 2020-10-21 2:16 ` Chao Leng 2020-10-21 4:59 ` Sagi Grimberg 2020-10-21 6:51 ` Chao Leng 2020-10-21 8:53 ` Sagi Grimberg 2020-10-22 1:58 ` Chao Leng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).