linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Victor Gladkov <Victor.Gladkov@kioxia.com>,
	Hannes Reinecke <hare@suse.de>,
	James Smart <james.smart@broadcom.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [PATCH] nvme-fabrics: reject I/O to offline device
Date: Tue, 17 Dec 2019 13:46:46 -0800	[thread overview]
Message-ID: <73006c25-b6a8-fc36-0789-772e3ea59a02@grimberg.me> (raw)
In-Reply-To: <d7953accf06e418a893b9cc6017b981a@kioxia.com>



On 12/9/19 7:30 AM, Victor Gladkov wrote:
> On 12/8/19 14:18 PM, Hannes Reinecke wrote:
>>
>> On 12/6/19 11:18 PM, Sagi Grimberg wrote:
>>>
>>>>> ---
>>>>> diff --git a/drivers/nvme/host/fabrics.c
>>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644
>>>>> --- a/drivers/nvme/host/fabrics.c
>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct
>>>>> nvme_ctrl *ctrl,
>>>>>    {
>>>>>           if (ctrl->state != NVME_CTRL_DELETING &&
>>>>>               ctrl->state != NVME_CTRL_DEAD &&
>>>>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
>>>>> +            ((ktime_get_ns() - rq->start_time_ns) >
>>>>> jiffies_to_nsecs(rq->timeout))) &&
>>>>>               !blk_noretry_request(rq) && !(rq->cmd_flags &
>>>>> REQ_NVME_MPATH))
>>>>>                   return BLK_STS_RESOURCE;
>>>>>
>>>>
>>>> Did you test this to ensure it's doing what you expect. I'm not sure
>>>> that all the timers are set right at this point. Most I/O's timeout
>>>> from a deadline time stamped at blk_mq_start_request(). But that
>>>> routine is actually called by the transports post the
>>>> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in
>>>> flight, thus queued, and the blk-mq internal queuing doesn't count
>>>> against the io timeout.  I can't see anything that guarantees
>>>> start_time_ns is set.
>>>
>>> I'm not sure this behavior for failing I/O always desired? some
>>> consumers would actually not want the I/O to fail prematurely if we
>>> are not multipathing...
>>>
>>> I think we need a fail_fast_tmo set in when establishing the
>>> controller to get it right.
>>>
>> Agreed. This whole patch looks like someone is trying to reimplement
>> fast_io_fail_tmo / dev_loss_tmo.
>> As we're moving into unreliable fabrics I guess we'll need a similar mechanism.
>>
>> Cheers,
>>
>> Hannes
> 
> 
> Following your suggestions, I added a new session parameter called "fast_fail_tmo".
> The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected.
> The new parameter value may be passed during ‘connect’, and its default value is 30 seconds.

The default should be consistent with the existing behavior.

> A value of -1 means no timeout (in similar to current behavior).
> 
> ---
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 74b8818..ed6b911 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -406,6 +406,7 @@
>   	}
> 
>   	ctrl->cntlid = le16_to_cpu(res.u16);
> +	ctrl->start_reconnect_ns = ktime_get_ns();
> 
>   out_free_data:
>   	kfree(data);
> @@ -474,8 +475,12 @@
>   bool nvmf_should_reconnect(struct nvme_ctrl *ctrl)
>   {
>   	if (ctrl->opts->max_reconnects == -1 ||
> -	    ctrl->nr_reconnects < ctrl->opts->max_reconnects)
> +	    ctrl->nr_reconnects < ctrl->opts->max_reconnects){
> +		if(ctrl->nr_reconnects == 0)
> +			ctrl->start_reconnect_ns = ktime_get_ns();
> +
>   		return true;
> +	}
> 
>   	return false;
>   }
> @@ -549,6 +554,8 @@
>   {
>   	if (ctrl->state != NVME_CTRL_DELETING &&
>   	    ctrl->state != NVME_CTRL_DEAD &&
> +            !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 &&
> +            ((ktime_get_ns() - ctrl->start_reconnect_ns) >  ctrl->opts->fail_fast_tmo_ns)) &&
>   	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>   		return BLK_STS_RESOURCE;

I cannot comprehend what is going on here...

We should have a dedicated delayed_work that transitions the controller
to a FAIL_FAST state and cancels the inflight requests again. This
work should be triggered when the error is detected.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  parent reply	other threads:[~2019-12-17 21:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-01  7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov
2019-12-02 22:26 ` Chaitanya Kulkarni
2019-12-02 22:47 ` James Smart
2019-12-03 10:04   ` Victor Gladkov
2019-12-03 16:19     ` James Smart
2019-12-04  8:28       ` Victor Gladkov
2019-12-06  0:38         ` James Smart
2019-12-06 22:18           ` Sagi Grimberg
2019-12-08 12:31             ` Hannes Reinecke
2019-12-09 15:30               ` Victor Gladkov
2019-12-17 18:03                 ` James Smart
2019-12-17 21:46                 ` Sagi Grimberg [this message]
2019-12-18 22:20                   ` James Smart
2019-12-15 12:33               ` Victor Gladkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73006c25-b6a8-fc36-0789-772e3ea59a02@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=Victor.Gladkov@kioxia.com \
    --cc=hare@suse.de \
    --cc=james.smart@broadcom.com \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).