[PATCH 0/3] avoid race between time out and tear down

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] avoid race between time out and tear down
@ 2020-10-20  9:08 Chao Leng
  2020-10-20 18:43 ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Chao Leng @ 2020-10-20  9:08 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, lengchao, sagi

Avoid race between time out and tear down for rdma and tcp.

Chao Leng (3):
  nvme-core: introduce sync io queues
  nvme-rdma: avoid race between time out and tear down
  nvme-tcp: avoid race between time out and tear down

 drivers/nvme/host/core.c |  8 ++++++--
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/rdma.c | 12 ++----------
 drivers/nvme/host/tcp.c  | 14 +++-----------
 4 files changed, 12 insertions(+), 23 deletions(-)

-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] avoid race between time out and tear down
  2020-10-20  9:08 [PATCH 0/3] avoid race between time out and tear down Chao Leng
@ 2020-10-20 18:43 ` Sagi Grimberg
  2020-10-21  2:16   ` Chao Leng
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2020-10-20 18:43 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch


> Avoid race between time out and tear down for rdma and tcp.

This patchset overall looks good, but we still need the patch that
avoids double completion:

--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 629b025685d1..46428ff0b0fc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct 
request *rq)
         /* fence other contexts that may complete the command */
         mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
         nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
-       if (!blk_mq_request_completed(rq)) {
+       if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
                 nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
                 blk_mq_complete_request_sync(rq);
         }
--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] avoid race between time out and tear down
  2020-10-20 18:43 ` Sagi Grimberg
@ 2020-10-21  2:16   ` Chao Leng
  2020-10-21  4:59     ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Chao Leng @ 2020-10-21  2:16 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch



On 2020/10/21 2:43, Sagi Grimberg wrote:
> 
>> Avoid race between time out and tear down for rdma and tcp.
> 
> This patchset overall looks good, but we still need the patch that
> avoids double completion:
> 
> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 629b025685d1..46428ff0b0fc 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq)
>          /* fence other contexts that may complete the command */
>          mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
>          nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
> -       if (!blk_mq_request_completed(rq)) {
> +       if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
Yes, this patch is need. and samely for nvme_cancel_request.
This will fix the race with asynchronous completion.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e85f6304efd7..1e838d952096 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
                                 "Cancelling I/O %d", req->tag);

         /* don't abort one completed request */
-       if (blk_mq_request_completed(req))
+       if (blk_mq_request_completed(req) || !blk_mq_request_started(rq))
                 return true;

         nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
-- 
>                  nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
>                  blk_mq_complete_request_sync(rq);
>          }
> -- 
> .

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] avoid race between time out and tear down
  2020-10-21  2:16   ` Chao Leng
@ 2020-10-21  4:59     ` Sagi Grimberg
  2020-10-21  6:51       ` Chao Leng
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2020-10-21  4:59 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch


>>> Avoid race between time out and tear down for rdma and tcp.
>>
>> This patchset overall looks good, but we still need the patch that
>> avoids double completion:
>>
>> -- 
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 629b025685d1..46428ff0b0fc 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct 
>> request *rq)
>>          /* fence other contexts that may complete the command */
>>          mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
>>          nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
>> -       if (!blk_mq_request_completed(rq)) {
>> +       if (blk_mq_request_started(rq) && 
>> !blk_mq_request_completed(rq)) {
> Yes, this patch is need. and samely for nvme_cancel_request.
> This will fix the race with asynchronous completion.
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e85f6304efd7..1e838d952096 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void 
> *data, bool reserved)
>                                  "Cancelling I/O %d", req->tag);
> 
>          /* don't abort one completed request */
> -       if (blk_mq_request_completed(req))
> +       if (blk_mq_request_completed(req) || !blk_mq_request_started(rq))
>                  return true;
> 
>          nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;

This one is unneeded because blk_mq_tagset_busy_iter checks that the
request has started...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] avoid race between time out and tear down
  2020-10-21  4:59     ` Sagi Grimberg
@ 2020-10-21  6:51       ` Chao Leng
  2020-10-21  8:53         ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Chao Leng @ 2020-10-21  6:51 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch



On 2020/10/21 12:59, Sagi Grimberg wrote:
> 
>>>> Avoid race between time out and tear down for rdma and tcp.
>>>
>>> This patchset overall looks good, but we still need the patch that
>>> avoids double completion:
>>>
>>> -- 
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index 629b025685d1..46428ff0b0fc 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq)
>>>          /* fence other contexts that may complete the command */
>>>          mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
>>>          nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
>>> -       if (!blk_mq_request_completed(rq)) {
>>> +       if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
>> Yes, this patch is need. and samely for nvme_cancel_request.
>> This will fix the race with asynchronous completion.
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index e85f6304efd7..1e838d952096 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>>                                  "Cancelling I/O %d", req->tag);
>>
>>          /* don't abort one completed request */
>> -       if (blk_mq_request_completed(req))
>> +       if (blk_mq_request_completed(req) || !blk_mq_request_started(rq))
>>                  return true;
>>
>>          nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> 
> This one is unneeded because blk_mq_tagset_busy_iter checks that the
> request has started...
Yes, it is already checked.
> .

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] avoid race between time out and tear down
  2020-10-21  6:51       ` Chao Leng
@ 2020-10-21  8:53         ` Sagi Grimberg
  2020-10-22  1:58           ` Chao Leng
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2020-10-21  8:53 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch

[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]


>>>>> Avoid race between time out and tear down for rdma and tcp.
>>>>
>>>> This patchset overall looks good, but we still need the patch that
>>>> avoids double completion:
>>>>
>>>> -- 
>>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>>> index 629b025685d1..46428ff0b0fc 100644
>>>> --- a/drivers/nvme/host/tcp.c
>>>> +++ b/drivers/nvme/host/tcp.c
>>>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct 
>>>> request *rq)
>>>>          /* fence other contexts that may complete the command */
>>>>          mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
>>>>          nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
>>>> -       if (!blk_mq_request_completed(rq)) {
>>>> +       if (blk_mq_request_started(rq) && 
>>>> !blk_mq_request_completed(rq)) {
>>> Yes, this patch is need. and samely for nvme_cancel_request.
>>> This will fix the race with asynchronous completion.
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index e85f6304efd7..1e838d952096 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, 
>>> void *data, bool reserved)
>>>                                  "Cancelling I/O %d", req->tag);
>>>
>>>          /* don't abort one completed request */
>>> -       if (blk_mq_request_completed(req))
>>> +       if (blk_mq_request_completed(req) || 
>>> !blk_mq_request_started(rq))
>>>                  return true;
>>>
>>>          nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
>>
>> This one is unneeded because blk_mq_tagset_busy_iter checks that the
>> request has started...
> Yes, it is already checked.

Can you add the attached patches and resend?

You can also add my:
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

[-- Attachment #2: 0002-nvme-rdma-fix-possible-double-completion-for-timed-o.patch --]
[-- Type: text/x-patch, Size: 2506 bytes --]

From 53c3b144d71f7ab96619f917571fe63dbd3a023c Mon Sep 17 00:00:00 2001
From: Sagi Grimberg <sagi@grimberg.me>
Date: Wed, 21 Oct 2020 01:51:29 -0700
Subject: [PATCH 2/2] nvme-rdma: fix possible double completion for timed out
 requests

If error recovery ran before the timeout handler did and fully completed
the timed out request, it will appear as not started (rq state is
MQ_RQ_IDLE), however the timeout handler only verifies that the request
was not completed (rq state is MQ_RQ_COMPLETE).

Check and make sure that the request is both started and did not yet
complete.

Example trace:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 6 PID: 45 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:refcount_warn_saturate+0xa6/0xf0
RSP: 0018:ffffb71b80837dc8 EFLAGS: 00010292
RAX: 0000000000000026 RBX: 0000000000000000 RCX: ffff93f37dcd8d08
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93f37dcd8d00
RBP: ffff93f37a812400 R08: 00000203c5221fce R09: ffffffffa7fffbc4
R10: 0000000000000477 R11: 000000000002835c R12: ffff93f37a8124e8
R13: ffff93f37a2b0000 R14: ffffb71b80837e70 R15: ffff93f37a2b0000
FS:  0000000000000000(0000) GS:ffff93f37dcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005637b4137028 CR3: 000000007c1be000 CR4: 00000000000006e0
Call Trace:
 blk_mq_check_expired+0x109/0x1b0
 blk_mq_queue_tag_busy_iter+0x1b8/0x330
 ? blk_poll+0x300/0x300
 blk_mq_timeout_work+0x44/0xe0
 process_one_work+0x1b4/0x370
 worker_thread+0x53/0x3e0
 ? process_one_work+0x370/0x370
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
---[ end trace 7d137e36e23c0d19 ]---

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 45fc605349da..ebd6b1080c9d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1973,7 +1973,7 @@ static void nvme_rdma_complete_timed_out(struct request *rq)
 	/* fence other contexts that may complete the command */
 	mutex_lock(&ctrl->teardown_lock);
 	nvme_rdma_stop_queue(queue);
-	if (!blk_mq_request_completed(rq)) {
+	if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
 		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
 		blk_mq_complete_request_sync(rq);
 	}
-- 
2.25.1


[-- Attachment #3: 0001-nvme-tcp-fix-possible-double-completion-for-timed-ou.patch --]
[-- Type: text/x-patch, Size: 2541 bytes --]

From d9ccd0baf15779d3dcdd488577eda8077bb7cc21 Mon Sep 17 00:00:00 2001
From: Sagi Grimberg <sagi@grimberg.me>
Date: Wed, 21 Oct 2020 01:45:34 -0700
Subject: [PATCH 1/2] nvme-tcp: fix possible double completion for timed out
 requests

If error recovery ran before the timeout handler did and fully completed
the timed out request, it will appear as not started (rq state is
MQ_RQ_IDLE), however the timeout handler only verifies that the request
was not completed (rq state is MQ_RQ_COMPLETE).

Check and make sure that the request is both started and did not yet
complete.

Example trace:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 6 PID: 45 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:refcount_warn_saturate+0xa6/0xf0
RSP: 0018:ffffb71b80837dc8 EFLAGS: 00010292
RAX: 0000000000000026 RBX: 0000000000000000 RCX: ffff93f37dcd8d08
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93f37dcd8d00
RBP: ffff93f37a812400 R08: 00000203c5221fce R09: ffffffffa7fffbc4
R10: 0000000000000477 R11: 000000000002835c R12: ffff93f37a8124e8
R13: ffff93f37a2b0000 R14: ffffb71b80837e70 R15: ffff93f37a2b0000
FS:  0000000000000000(0000) GS:ffff93f37dcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005637b4137028 CR3: 000000007c1be000 CR4: 00000000000006e0
Call Trace:
 blk_mq_check_expired+0x109/0x1b0
 blk_mq_queue_tag_busy_iter+0x1b8/0x330
 ? blk_poll+0x300/0x300
 blk_mq_timeout_work+0x44/0xe0
 process_one_work+0x1b4/0x370
 worker_thread+0x53/0x3e0
 ? process_one_work+0x370/0x370
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
---[ end trace 7d137e36e23c0d19 ]---

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 629b025685d1..46428ff0b0fc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq)
 	/* fence other contexts that may complete the command */
 	mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
 	nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
-	if (!blk_mq_request_completed(rq)) {
+	if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
 		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
 		blk_mq_complete_request_sync(rq);
 	}
-- 
2.25.1


[-- Attachment #4: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] avoid race between time out and tear down
  2020-10-21  8:53         ` Sagi Grimberg
@ 2020-10-22  1:58           ` Chao Leng
  0 siblings, 0 replies; 7+ messages in thread
From: Chao Leng @ 2020-10-22  1:58 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch



On 2020/10/21 16:53, Sagi Grimberg wrote:
> 
>>>>>> Avoid race between time out and tear down for rdma and tcp.
>>>>>
>>>>> This patchset overall looks good, but we still need the patch that
>>>>> avoids double completion:
>>>>>
>>>>> -- 
>>>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>>>> index 629b025685d1..46428ff0b0fc 100644
>>>>> --- a/drivers/nvme/host/tcp.c
>>>>> +++ b/drivers/nvme/host/tcp.c
>>>>> @@ -2175,7 +2175,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq)
>>>>>          /* fence other contexts that may complete the command */
>>>>>          mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock);
>>>>>          nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue));
>>>>> -       if (!blk_mq_request_completed(rq)) {
>>>>> +       if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
>>>> Yes, this patch is need. and samely for nvme_cancel_request.
>>>> This will fix the race with asynchronous completion.
>>>>
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index e85f6304efd7..1e838d952096 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -338,7 +338,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>>>>                                  "Cancelling I/O %d", req->tag);
>>>>
>>>>          /* don't abort one completed request */
>>>> -       if (blk_mq_request_completed(req))
>>>> +       if (blk_mq_request_completed(req) || !blk_mq_request_started(rq))
>>>>                  return true;
>>>>
>>>>          nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
>>>
>>> This one is unneeded because blk_mq_tagset_busy_iter checks that the
>>> request has started...
>> Yes, it is already checked.
> 
> Can you add the attached patches and resend?
> 
> You can also add my:
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
ok, I will do it later.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-22  1:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20  9:08 [PATCH 0/3] avoid race between time out and tear down Chao Leng
2020-10-20 18:43 ` Sagi Grimberg
2020-10-21  2:16   ` Chao Leng
2020-10-21  4:59     ` Sagi Grimberg
2020-10-21  6:51       ` Chao Leng
2020-10-21  8:53         ` Sagi Grimberg
2020-10-22  1:58           ` Chao Leng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).