From: James Smart <james.smart@broadcom.com>
To: Sagi Grimberg <sagi@grimberg.me>,
Victor Gladkov <Victor.Gladkov@kioxia.com>,
Hannes Reinecke <hare@suse.de>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [PATCH] nvme-fabrics: reject I/O to offline device
Date: Wed, 18 Dec 2019 14:20:38 -0800 [thread overview]
Message-ID: <bef8f5a3-5dee-ba7d-7423-8ab130b1aa65@broadcom.com> (raw)
In-Reply-To: <73006c25-b6a8-fc36-0789-772e3ea59a02@grimberg.me>
On 12/17/2019 1:46 PM, Sagi Grimberg wrote:
>
>
> On 12/9/19 7:30 AM, Victor Gladkov wrote:
>> On 12/8/19 14:18 PM, Hannes Reinecke wrote:
>>>
>>> On 12/6/19 11:18 PM, Sagi Grimberg wrote:
>>>>
>>>>>> ---
>>>>>> diff --git a/drivers/nvme/host/fabrics.c
>>>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644
>>>>>> --- a/drivers/nvme/host/fabrics.c
>>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct
>>>>>> nvme_ctrl *ctrl,
>>>>>> {
>>>>>> if (ctrl->state != NVME_CTRL_DELETING &&
>>>>>> ctrl->state != NVME_CTRL_DEAD &&
>>>>>> + !(ctrl->state == NVME_CTRL_CONNECTING &&
>>>>>> + ((ktime_get_ns() - rq->start_time_ns) >
>>>>>> jiffies_to_nsecs(rq->timeout))) &&
>>>>>> !blk_noretry_request(rq) && !(rq->cmd_flags &
>>>>>> REQ_NVME_MPATH))
>>>>>> return BLK_STS_RESOURCE;
>>>>>>
>>>>>
>>>>> Did you test this to ensure it's doing what you expect. I'm not sure
>>>>> that all the timers are set right at this point. Most I/O's timeout
>>>>> from a deadline time stamped at blk_mq_start_request(). But that
>>>>> routine is actually called by the transports post the
>>>>> nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in
>>>>> flight, thus queued, and the blk-mq internal queuing doesn't count
>>>>> against the io timeout. I can't see anything that guarantees
>>>>> start_time_ns is set.
>>>>
>>>> I'm not sure this behavior for failing I/O always desired? some
>>>> consumers would actually not want the I/O to fail prematurely if we
>>>> are not multipathing...
>>>>
>>>> I think we need a fail_fast_tmo set in when establishing the
>>>> controller to get it right.
>>>>
>>> Agreed. This whole patch looks like someone is trying to reimplement
>>> fast_io_fail_tmo / dev_loss_tmo.
>>> As we're moving into unreliable fabrics I guess we'll need a similar
>>> mechanism.
>>>
>>> Cheers,
>>>
>>> Hannes
>>
>>
>> Following your suggestions, I added a new session parameter called
>> "fast_fail_tmo".
>> The timeout is measured in seconds from the controller reconnect, any
>> command beyond that timeout is rejected.
>> The new parameter value may be passed during ‘connect’, and its
>> default value is 30 seconds.
>
> The default should be consistent with the existing behavior.
>
>> A value of -1 means no timeout (in similar to current behavior).
>>
>> ---
>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 74b8818..ed6b911 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -406,6 +406,7 @@
>> }
>>
>> ctrl->cntlid = le16_to_cpu(res.u16);
>> + ctrl->start_reconnect_ns = ktime_get_ns();
>>
>> out_free_data:
>> kfree(data);
>> @@ -474,8 +475,12 @@
>> bool nvmf_should_reconnect(struct nvme_ctrl *ctrl)
>> {
>> if (ctrl->opts->max_reconnects == -1 ||
>> - ctrl->nr_reconnects < ctrl->opts->max_reconnects)
>> + ctrl->nr_reconnects < ctrl->opts->max_reconnects){
>> + if(ctrl->nr_reconnects == 0)
>> + ctrl->start_reconnect_ns = ktime_get_ns();
>> +
>> return true;
>> + }
>>
>> return false;
>> }
>> @@ -549,6 +554,8 @@
>> {
>> if (ctrl->state != NVME_CTRL_DELETING &&
>> ctrl->state != NVME_CTRL_DEAD &&
>> + !(ctrl->state == NVME_CTRL_CONNECTING &&
>> ctrl->opts->fail_fast_tmo_ns >= 0 &&
>> + ((ktime_get_ns() - ctrl->start_reconnect_ns) >
>> ctrl->opts->fail_fast_tmo_ns)) &&
>> !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>> return BLK_STS_RESOURCE;
>
> I cannot comprehend what is going on here...
>
> We should have a dedicated delayed_work that transitions the controller
> to a FAIL_FAST state and cancels the inflight requests again. This
> work should be triggered when the error is detected.
I hope you're not suggesting a FAILFAST state. No new controller state
is needed.
I do agree that managing the time since transitioning to CONNECTING can
be handled better and can address "abort all now" rather than waiting
for retries to kick in.
In other words:
Add a controller flag of "failfast_expired"
When entering CONNECTING, schedule a delayed work item based on failfast
timeout value.
If transition out of CONNECTING, terminate delayed work item and ensure
failfast_expired is false.
If delayed work item expires: set "failfast_expired" flag to true. Run
through all inflight ios and cancel them.
Update nvmf_fail_nonready_command() (above) per above, but with check on
"!ctrl->failfast_expired".
-- james
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2019-12-18 22:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-01 7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov
2019-12-02 22:26 ` Chaitanya Kulkarni
2019-12-02 22:47 ` James Smart
2019-12-03 10:04 ` Victor Gladkov
2019-12-03 16:19 ` James Smart
2019-12-04 8:28 ` Victor Gladkov
2019-12-06 0:38 ` James Smart
2019-12-06 22:18 ` Sagi Grimberg
2019-12-08 12:31 ` Hannes Reinecke
2019-12-09 15:30 ` Victor Gladkov
2019-12-17 18:03 ` James Smart
2019-12-17 21:46 ` Sagi Grimberg
2019-12-18 22:20 ` James Smart [this message]
2019-12-15 12:33 ` Victor Gladkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bef8f5a3-5dee-ba7d-7423-8ab130b1aa65@broadcom.com \
--to=james.smart@broadcom.com \
--cc=Victor.Gladkov@kioxia.com \
--cc=hare@suse.de \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).