All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Chao Leng <lengchao@huawei.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	James Smart <james.smart@broadcom.com>
Subject: Re: [PATCH 5/6] nvme-rdma: fix timeout handler
Date: Tue, 4 Aug 2020 08:36:23 -0700	[thread overview]
Message-ID: <938aa34b-b4db-f8ca-2478-0b48954899ea@grimberg.me> (raw)
In-Reply-To: <d0b6e574-eee1-02db-e27d-e9c88e5b0c80@huawei.com>


>>>> +static void nvme_rdma_complete_timed_out(struct request *rq)
>>>> +{
>>>> +    struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
>>>> +    struct nvme_rdma_queue *queue = req->queue;
>>>> +    struct nvme_rdma_ctrl *ctrl = queue->ctrl;
>>>> +
>>>> +    /* fence other contexts that may complete the command */
>>>> +    flush_work(&ctrl->err_work);
>>>> +    nvme_rdma_stop_queue(queue);
>>> There maybe concurrent with error recovery, may cause abnormal because
>>> nvme_rdma_stop_queue will return but the queue is not stoped,
>>> maybe is stopping by the error recovery.
>>
>> err_work flush used to fence, once we did queue stop, it should be safe
>> to complete the command from the timeout handler.
> 
> Flush work just can avoid trigger error recovery by nvme_rdma_timeout or
> reduce concurrent probalibity trigger error recovery by other progress,
> but can not avoid.

The point is that we can complete the command because err_work
was flushed and the queue was stopped, which means we shouldn't have
any context completing the request.

> if nvme_rdma_cm_handler or other progress call
> nvme_rdma_error_recovery, between change state to queue_work may
> interrupt by hard interrupt, and then timeout happen, thus flush work
> can not avoid concurrent.
> Like this:
> 
> static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
> {
>      if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
>          return;
> --------------------------------

If we are in RESETTING/CONNECTING state already, this won't do anything.

> may interrupt by hard interrupt, and then timeout progress flush work
> at this time. Thus error recovery and nvme_rdma_complete_timed_out may
> concurrent to stop queue. will cause: error recovery may cancel request
> or nvme_rdma_complete_timed_out may complete request, but the queue may
> not be stoped. Thus will cause abnormal.

We should be fine and safe to complete the I/O.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-08-04 15:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03  6:58 [PATCH 0/6] fix possible controller reset hangs in nvme-tcp/nvme-rdma Sagi Grimberg
2020-08-03  6:58 ` [PATCH 1/6] nvme-fabrics: allow to queue requests for live queues Sagi Grimberg
2020-08-03  6:58 ` [PATCH 2/6] nvme: have nvme_wait_freeze_timeout return if it timed out Sagi Grimberg
2020-08-03  6:58 ` [PATCH 3/6] nvme-tcp: fix timeout handler Sagi Grimberg
2020-08-03  6:58 ` [PATCH 4/6] nvme-tcp: fix reset hang if controller died in the middle of a reset Sagi Grimberg
2020-08-03  6:58 ` [PATCH 5/6] nvme-rdma: fix timeout handler Sagi Grimberg
2020-08-03 10:25   ` Chao Leng
2020-08-03 15:03     ` Sagi Grimberg
2020-08-04  1:49       ` Chao Leng
2020-08-04 15:36         ` Sagi Grimberg [this message]
2020-08-05  1:07           ` Chao Leng
2020-08-05  1:12             ` Sagi Grimberg
2020-08-05  6:27               ` Chao Leng
2020-08-05  7:00                 ` Sagi Grimberg
2020-08-05  7:14                   ` Chao Leng
2020-08-05  7:19                     ` Sagi Grimberg
2020-08-05  7:35                       ` Chao Leng
2020-08-05  8:17                         ` Sagi Grimberg
2020-08-06 19:52   ` David Milburn
2020-08-06 20:11     ` Sagi Grimberg
2020-08-03  6:58 ` [PATCH 6/6] nvme-rdma: fix reset hang if controller died in the middle of a reset Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=938aa34b-b4db-f8ca-2478-0b48954899ea@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.