linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-fabrics: reject I/O to offline device
@ 2019-12-01  7:59 Victor Gladkov
  2019-12-02 22:26 ` Chaitanya Kulkarni
  2019-12-02 22:47 ` James Smart
  0 siblings, 2 replies; 14+ messages in thread
From: Victor Gladkov @ 2019-12-01  7:59 UTC (permalink / raw)
  To: linux-nvme

Issue Description:
Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state. 
NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes.
This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device"

Fix Description:
Added a kernel module parameter "nvmef_reconnect_failfast" for nvme-fabrics module (default is true).
Interfere in the decision whether to queue IO command or retry IO command. The interface takes into account the controller reconnect state, in a way that during reconnect state, IO commands shall fail immediacy (default) or according to IO command timeout (depends on the module parameter value), and IO retry is prevented. As a result, commands do not get stuck in in reconnect state.

branch nvme-5.5

---
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 74b8818..ef89aff 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -13,6 +13,10 @@
 #include "nvme.h"
 #include "fabrics.h"

+static bool nvmef_reconnect_failfast = 1;
+module_param_named(nvmef_reconnect_failfast, nvmef_reconnect_failfast, bool, S_IRUGO);
+MODULE_PARM_DESC(nvmef_reconnect_failfast, "failfast flag for I/O when controler is reconnecting, else use I/O command timeout (default true).");
+
 static LIST_HEAD(nvmf_transports);
 static DECLARE_RWSEM(nvmf_transports_rwsem);

@@ -549,6 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 {
        if (ctrl->state != NVME_CTRL_DELETING &&
            ctrl->state != NVME_CTRL_DEAD &&
+           !(ctrl->state == NVME_CTRL_CONNECTING && (((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)) || nvmef_reconnect_failfast)) &&
            !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
                return BLK_STS_RESOURCE;


Regards,
Victor


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-01  7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov
@ 2019-12-02 22:26 ` Chaitanya Kulkarni
  2019-12-02 22:47 ` James Smart
  1 sibling, 0 replies; 14+ messages in thread
From: Chaitanya Kulkarni @ 2019-12-02 22:26 UTC (permalink / raw)
  To: Victor Gladkov, linux-nvme

On 12/01/2019 12:14 AM, Victor Gladkov wrote:
> Issue Description:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state.
> NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes.
> This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device"


The values NVMF_DEF_RECONNECT_DELAY and NVMF_OPT_CTRL_LOSS_TMO are
configurable using :-

  602         { NVMF_OPT_RECONNECT_DELAY,     "reconnect_delay=%d"    },
  603         { NVMF_OPT_CTRL_LOSS_TMO,       "ctrl_loss_tmo=%d"      }, 


Is it possible to get the same behavior with proper combination of above
two values set ?




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-01  7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov
  2019-12-02 22:26 ` Chaitanya Kulkarni
@ 2019-12-02 22:47 ` James Smart
  2019-12-03 10:04   ` Victor Gladkov
  1 sibling, 1 reply; 14+ messages in thread
From: James Smart @ 2019-12-02 22:47 UTC (permalink / raw)
  To: linux-nvme

On 11/30/2019 11:59 PM, Victor Gladkov wrote:
> Issue Description:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state.
> NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes.
> This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device"
>
> Fix Description:
> Added a kernel module parameter "nvmef_reconnect_failfast" for nvme-fabrics module (default is true).
> Interfere in the decision whether to queue IO command or retry IO command. The interface takes into account the controller reconnect state, in a way that during reconnect state, IO commands shall fail immediacy (default) or according to IO command timeout (depends on the module parameter value), and IO retry is prevented. As a result, commands do not get stuck in in reconnect state.

This the patch seems incorrect at least as described. Multipathing 
inherently will "fastfail" and send to other paths. So the only way 
something is "stuck" is if it's last path. If last path, we definitely 
don't want to prematurely release i/o before we've given the subsystem 
every opportunity to reconnect.

What I hear you saying is you don't like the kernel default 
controller-loss-timeout of 600s. What was designed, if you didn't like 
the kernel default, was to use the per-connection "--cltr-loss-tmo" 
option for "nvme connect".  The auto-connect scripts or the admin script 
that specifies the connection can set whatever value it likes.

If that seems hard to do, perhaps it's time to implement the options 
that allow for a config file to specify new values to be used on all 
connections, or on connections to specific subsystems, and so on. But I 
don't think the kernel needs to change.

-- james



>
> branch nvme-5.5
>
> ---
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 74b8818..ef89aff 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -13,6 +13,10 @@
>   #include "nvme.h"
>   #include "fabrics.h"
>
> +static bool nvmef_reconnect_failfast = 1;
> +module_param_named(nvmef_reconnect_failfast, nvmef_reconnect_failfast, bool, S_IRUGO);
> +MODULE_PARM_DESC(nvmef_reconnect_failfast, "failfast flag for I/O when controler is reconnecting, else use I/O command timeout (default true).");
> +
>   static LIST_HEAD(nvmf_transports);
>   static DECLARE_RWSEM(nvmf_transports_rwsem);
>
> @@ -549,6 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
>   {
>          if (ctrl->state != NVME_CTRL_DELETING &&
>              ctrl->state != NVME_CTRL_DEAD &&
> +           !(ctrl->state == NVME_CTRL_CONNECTING && (((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)) || nvmef_reconnect_failfast)) &&
>              !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>                  return BLK_STS_RESOURCE;
>
>
> Regards,
> Victor
>
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-02 22:47 ` James Smart
@ 2019-12-03 10:04   ` Victor Gladkov
  2019-12-03 16:19     ` James Smart
  0 siblings, 1 reply; 14+ messages in thread
From: Victor Gladkov @ 2019-12-03 10:04 UTC (permalink / raw)
  To: James Smart, linux-nvme


On 12/03/2019 00:47 AM, James Smart wrote:
> On 11/30/2019 11:59 PM, Victor Gladkov wrote:
> > Issue Description:
> > Commands get stuck while Host NVMe controller (TCP or RDMA) is in
> reconnect state.
> > NVMe controller enters into reconnect state when it loses connection with
> the target. It tries to reconnect every 10 seconds (default) until successful
> reconnection or until reconnect time-out is reached. The default reconnect
> time out is 10 minutes.
> > This behavior is different than ISCSI where Commands during reconnect
> state returns with the following error: "rejecting I/O to offline device"
> >
> > Fix Description:
> > Added a kernel module parameter "nvmef_reconnect_failfast" for nvme-
> fabrics module (default is true).
> > Interfere in the decision whether to queue IO command or retry IO
> command. The interface takes into account the controller reconnect state, in
> a way that during reconnect state, IO commands shall fail immediacy (default)
> or according to IO command timeout (depends on the module parameter
> value), and IO retry is prevented. As a result, commands do not get stuck in in
> reconnect state.
> 
> This the patch seems incorrect at least as described. Multipathing inherently
> will "fastfail" and send to other paths. So the only way something is "stuck" is
> if it's last path. If last path, we definitely don't want to prematurely release
> i/o before we've given the subsystem every opportunity to reconnect.
> 
> What I hear you saying is you don't like the kernel default controller-loss-
> timeout of 600s. What was designed, if you didn't like the kernel default, was
> to use the per-connection "--cltr-loss-tmo"
> option for "nvme connect".  The auto-connect scripts or the admin script that
> specifies the connection can set whatever value it likes.
> 
> If that seems hard to do, perhaps it's time to implement the options that
> allow for a config file to specify new values to be used on all connections, or
> on connections to specific subsystems, and so on. But I don't think the kernel
> needs to change.
> 
> -- james

James, thank you for the suggestion.
But let me explain it differently.

Applications are expecting commands to complete with success or error
within a certain timeout (30 seconds by default). 
The NVMe host is enforcing that timeout while it is connected, never the less, 
during reconnection, the timeout is not enforced and commands may get stuck for a long period or even forever.

The controller-loss-timeout should not affect IO timeout policy, these are two different policies.

Regards,
Victor
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-03 10:04   ` Victor Gladkov
@ 2019-12-03 16:19     ` James Smart
  2019-12-04  8:28       ` Victor Gladkov
  0 siblings, 1 reply; 14+ messages in thread
From: James Smart @ 2019-12-03 16:19 UTC (permalink / raw)
  To: Victor Gladkov, linux-nvme


On 12/3/2019 2:04 AM, Victor Gladkov wrote:
> On 12/03/2019 00:47 AM, James Smart wrote:
>> O
>> James, thank you for the suggestion.
>> But let me explain it differently.
>>
>> Applications are expecting commands to complete with success or error
>> within a certain timeout (30 seconds by default).
>> The NVMe host is enforcing that timeout while it is connected, never the less,
>> during reconnection, the timeout is not enforced and commands may get stuck for a long period or even forever.
>>
>> The controller-loss-timeout should not affect IO timeout policy, these are two different policies.
>>
>> Regards,
>> Victor

Ok - which says what does make sense to add is the portion:

   !(ctrl->state == NVME_CTRL_CONNECTING && ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)))


But I don't think we need the failfast flag.

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-03 16:19     ` James Smart
@ 2019-12-04  8:28       ` Victor Gladkov
  2019-12-06  0:38         ` James Smart
  0 siblings, 1 reply; 14+ messages in thread
From: Victor Gladkov @ 2019-12-04  8:28 UTC (permalink / raw)
  To: James Smart, linux-nvme

On 12/03/2019 06:19 PM, James Smart wrote:
> On 12/3/2019 2:04 AM, Victor Gladkov wrote:
> > On 12/03/2019 00:47 AM, James Smart wrote:
> >> O
> >> The controller-loss-timeout should not affect IO timeout policy, these are
> two different policies.

> 
> Ok - which says what does make sense to add is the portion:
> 
>    !(ctrl->state == NVME_CTRL_CONNECTING && ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)))
> 
> 
> But I don't think we need the failfast flag.
> 
> -- james

OK. I think, it's good enough.

This is updated patch:

---
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 74b8818..b58abc1 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 {
        if (ctrl->state != NVME_CTRL_DELETING &&
            ctrl->state != NVME_CTRL_DEAD &&
+           !(ctrl->state == NVME_CTRL_CONNECTING &&
+            ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) &&
            !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
                return BLK_STS_RESOURCE;

---

Regards,
Victor
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-04  8:28       ` Victor Gladkov
@ 2019-12-06  0:38         ` James Smart
  2019-12-06 22:18           ` Sagi Grimberg
  0 siblings, 1 reply; 14+ messages in thread
From: James Smart @ 2019-12-06  0:38 UTC (permalink / raw)
  To: Victor Gladkov, linux-nvme



On 12/4/2019 12:28 AM, Victor Gladkov wrote:
> On 12/03/2019 06:19 PM, James Smart wrote:
>> On 12/3/2019 2:04 AM, Victor Gladkov wrote:
>>> On 12/03/2019 00:47 AM, James Smart wrote:
>>>> O
>>>> The controller-loss-timeout should not affect IO timeout policy, these are
>> two different policies.
>> Ok - which says what does make sense to add is the portion:
>>
>>     !(ctrl->state == NVME_CTRL_CONNECTING && ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)))
>>
>>
>> But I don't think we need the failfast flag.
>>
>> -- james
> OK. I think, it's good enough.
>
> This is updated patch:
>
> ---
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 74b8818..b58abc1 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
>   {
>          if (ctrl->state != NVME_CTRL_DELETING &&
>              ctrl->state != NVME_CTRL_DEAD &&
> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
> +            ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) &&
>              !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>                  return BLK_STS_RESOURCE;
>

Did you test this to ensure it's doing what you expect. I'm not sure 
that all the timers are set right at this point. Most I/O's timeout from 
a deadline time stamped at blk_mq_start_request(). But that routine is 
actually called by the transports post the 
nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in flight, 
thus queued, and the blk-mq internal queuing doesn't count against the 
io timeout.  I can't see anything that guarantees start_time_ns is set.

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-06  0:38         ` James Smart
@ 2019-12-06 22:18           ` Sagi Grimberg
  2019-12-08 12:31             ` Hannes Reinecke
  0 siblings, 1 reply; 14+ messages in thread
From: Sagi Grimberg @ 2019-12-06 22:18 UTC (permalink / raw)
  To: James Smart, Victor Gladkov, linux-nvme


>> ---
>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 74b8818..b58abc1 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct 
>> nvme_ctrl *ctrl,
>>   {
>>          if (ctrl->state != NVME_CTRL_DELETING &&
>>              ctrl->state != NVME_CTRL_DEAD &&
>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
>> +            ((ktime_get_ns() - rq->start_time_ns) > 
>> jiffies_to_nsecs(rq->timeout))) &&
>>              !blk_noretry_request(rq) && !(rq->cmd_flags & 
>> REQ_NVME_MPATH))
>>                  return BLK_STS_RESOURCE;
>>
> 
> Did you test this to ensure it's doing what you expect. I'm not sure 
> that all the timers are set right at this point. Most I/O's timeout from 
> a deadline time stamped at blk_mq_start_request(). But that routine is 
> actually called by the transports post the 
> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in flight, 
> thus queued, and the blk-mq internal queuing doesn't count against the 
> io timeout.  I can't see anything that guarantees start_time_ns is set.

I'm not sure this behavior for failing I/O always desired? some
consumers would actually not want the I/O to fail prematurely if we
are not multipathing...

I think we need a fail_fast_tmo set in when establishing the controller
to get it right.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-06 22:18           ` Sagi Grimberg
@ 2019-12-08 12:31             ` Hannes Reinecke
  2019-12-09 15:30               ` Victor Gladkov
  2019-12-15 12:33               ` Victor Gladkov
  0 siblings, 2 replies; 14+ messages in thread
From: Hannes Reinecke @ 2019-12-08 12:31 UTC (permalink / raw)
  To: Sagi Grimberg, James Smart, Victor Gladkov, linux-nvme

On 12/6/19 11:18 PM, Sagi Grimberg wrote:
> 
>>> ---
>>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>>> index 74b8818..b58abc1 100644
>>> --- a/drivers/nvme/host/fabrics.c
>>> +++ b/drivers/nvme/host/fabrics.c
>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct 
>>> nvme_ctrl *ctrl,
>>>   {
>>>          if (ctrl->state != NVME_CTRL_DELETING &&
>>>              ctrl->state != NVME_CTRL_DEAD &&
>>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
>>> +            ((ktime_get_ns() - rq->start_time_ns) > 
>>> jiffies_to_nsecs(rq->timeout))) &&
>>>              !blk_noretry_request(rq) && !(rq->cmd_flags & 
>>> REQ_NVME_MPATH))
>>>                  return BLK_STS_RESOURCE;
>>>
>>
>> Did you test this to ensure it's doing what you expect. I'm not sure 
>> that all the timers are set right at this point. Most I/O's timeout 
>> from a deadline time stamped at blk_mq_start_request(). But that 
>> routine is actually called by the transports post the 
>> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in 
>> flight, thus queued, and the blk-mq internal queuing doesn't count 
>> against the io timeout.  I can't see anything that guarantees 
>> start_time_ns is set.
> 
> I'm not sure this behavior for failing I/O always desired? some
> consumers would actually not want the I/O to fail prematurely if we
> are not multipathing...
> 
> I think we need a fail_fast_tmo set in when establishing the controller
> to get it right.
> 
Agreed. This whole patch looks like someone is trying to reimplement 
fast_io_fail_tmo / dev_loss_tmo.
As we're moving into unreliable fabrics I guess we'll need a similar 
mechanism.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-08 12:31             ` Hannes Reinecke
@ 2019-12-09 15:30               ` Victor Gladkov
  2019-12-17 18:03                 ` James Smart
  2019-12-17 21:46                 ` Sagi Grimberg
  2019-12-15 12:33               ` Victor Gladkov
  1 sibling, 2 replies; 14+ messages in thread
From: Victor Gladkov @ 2019-12-09 15:30 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, James Smart, linux-nvme

On 12/8/19 14:18 PM, Hannes Reinecke wrote:
> 
> On 12/6/19 11:18 PM, Sagi Grimberg wrote:
> >
> >>> ---
> >>> diff --git a/drivers/nvme/host/fabrics.c
> >>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644
> >>> --- a/drivers/nvme/host/fabrics.c
> >>> +++ b/drivers/nvme/host/fabrics.c
> >>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct
> >>> nvme_ctrl *ctrl,
> >>>   {
> >>>          if (ctrl->state != NVME_CTRL_DELETING &&
> >>>              ctrl->state != NVME_CTRL_DEAD &&
> >>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
> >>> +            ((ktime_get_ns() - rq->start_time_ns) >
> >>> jiffies_to_nsecs(rq->timeout))) &&
> >>>              !blk_noretry_request(rq) && !(rq->cmd_flags &
> >>> REQ_NVME_MPATH))
> >>>                  return BLK_STS_RESOURCE;
> >>>
> >>
> >> Did you test this to ensure it's doing what you expect. I'm not sure
> >> that all the timers are set right at this point. Most I/O's timeout
> >> from a deadline time stamped at blk_mq_start_request(). But that
> >> routine is actually called by the transports post the
> >> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in
> >> flight, thus queued, and the blk-mq internal queuing doesn't count
> >> against the io timeout.  I can't see anything that guarantees
> >> start_time_ns is set.
> >
> > I'm not sure this behavior for failing I/O always desired? some
> > consumers would actually not want the I/O to fail prematurely if we
> > are not multipathing...
> >
> > I think we need a fail_fast_tmo set in when establishing the
> > controller to get it right.
> >
> Agreed. This whole patch looks like someone is trying to reimplement
> fast_io_fail_tmo / dev_loss_tmo.
> As we're moving into unreliable fabrics I guess we'll need a similar mechanism.
> 
> Cheers,
> 
> Hannes


Following your suggestions, I added a new session parameter called "fast_fail_tmo". 
The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected. 
The new parameter value may be passed during ‘connect’, and its default value is 30 seconds. 
A value of -1 means no timeout (in similar to current behavior).

---
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 74b8818..ed6b911 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -406,6 +406,7 @@
 	}

 	ctrl->cntlid = le16_to_cpu(res.u16);
+	ctrl->start_reconnect_ns = ktime_get_ns();

 out_free_data:
 	kfree(data);
@@ -474,8 +475,12 @@
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl)
 {
 	if (ctrl->opts->max_reconnects == -1 ||
-	    ctrl->nr_reconnects < ctrl->opts->max_reconnects)
+	    ctrl->nr_reconnects < ctrl->opts->max_reconnects){
+		if(ctrl->nr_reconnects == 0)
+			ctrl->start_reconnect_ns = ktime_get_ns();
+
 		return true;
+	}

 	return false;
 }
@@ -549,6 +554,8 @@
 {
 	if (ctrl->state != NVME_CTRL_DELETING &&
 	    ctrl->state != NVME_CTRL_DEAD &&
+            !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 &&
+            ((ktime_get_ns() - ctrl->start_reconnect_ns) >  ctrl->opts->fail_fast_tmo_ns)) &&
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;

@@ -612,6 +619,7 @@
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
 	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_TOS,			"tos=%d"		},
+	{ NVMF_OPT_FAIL_FAST_TMO,	"fail_fast_tmo=%d"	},
 	{ NVMF_OPT_ERR,			NULL			}
 };

@@ -623,6 +631,7 @@
 	int token, ret = 0;
 	size_t nqnlen  = 0;
 	int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO;
+	int fail_fast_tmo = NVMF_DEF_FAIL_FAST_TMO;
 	uuid_t hostid;

 	/* Set defaults */
@@ -751,6 +760,16 @@
 				pr_warn("ctrl_loss_tmo < 0 will reconnect forever\n");
 			ctrl_loss_tmo = token;
 			break;
+		case NVMF_OPT_FAIL_FAST_TMO:
+			if (match_int(args, &token)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			if (token < 0)
+				pr_warn("fail_fast_tmo < 0, I/O will wait for reconnect/loss controller\n");
+			fail_fast_tmo = token;
+			break;
 		case NVMF_OPT_HOSTNQN:
 			if (opts->host) {
 				pr_err("hostnqn already user-assigned: %s\n",
@@ -887,6 +906,11 @@
 		opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
 						opts->reconnect_delay);

+	if (fail_fast_tmo < 0)
+		opts->fail_fast_tmo_ns = -1;
+	else
+		opts->fail_fast_tmo_ns = fail_fast_tmo * NSEC_PER_SEC;
+
 	if (!opts->host) {
 		kref_get(&nvmf_default_host->ref);
 		opts->host = nvmf_default_host;
@@ -985,7 +1009,7 @@
 #define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
 				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
 				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
-				 NVMF_OPT_DISABLE_SQFLOW)
+				 NVMF_OPT_DISABLE_SQFLOW|NVMF_OPT_FAIL_FAST_TMO)

 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
@@ -1198,7 +1222,6 @@
 	class_destroy(nvmf_class);
 	nvmf_host_put(nvmf_default_host);

-	BUILD_BUG_ON(sizeof(struct nvmf_common_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvmf_connect_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvmf_property_get_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvmf_property_set_command) != 64);

diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 93f08d7..235c767 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -15,6 +15,8 @@
 #define NVMF_DEF_RECONNECT_DELAY	10
 /* default to 600 seconds of reconnect attempts before giving up */
 #define NVMF_DEF_CTRL_LOSS_TMO		600
+/* default to 30 seconds of fail fast IO commands  */
+#define NVMF_DEF_FAIL_FAST_TMO		30

 /*
  * Define a host as seen by the target.  We allocate one at boot, but also
@@ -56,6 +58,7 @@
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
 	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 	NVMF_OPT_TOS		= 1 << 19,
+	NVMF_OPT_FAIL_FAST_TMO	= 1 << 20,
 };

 /**
@@ -89,6 +92,7 @@
  * @nr_write_queues: number of queues for write I/O
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
+ * @fast_fail_tmo_ns: Fast I/O fail timeout in nanoseconds;
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -111,6 +115,7 @@
 	unsigned int		nr_write_queues;
 	unsigned int		nr_poll_queues;
 	int			tos;
+	u64			fail_fast_tmo_ns;
 };

 /*

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 34ac79c..9b5f6d9 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -282,6 +282,7 @@
 	u16 icdoff;
 	u16 maxcmd;
 	int nr_reconnects;
+	u64 start_reconnect_ns;
 	struct nvmf_ctrl_options *opts;

 	struct page *discard_page;

---
Regards,
Victor
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-08 12:31             ` Hannes Reinecke
  2019-12-09 15:30               ` Victor Gladkov
@ 2019-12-15 12:33               ` Victor Gladkov
  1 sibling, 0 replies; 14+ messages in thread
From: Victor Gladkov @ 2019-12-15 12:33 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, James Smart, linux-nvme, james.smart


On 12/9/19 17:30 PM, Victor Gladkov wrote:
> 
> On 12/8/19 14:18 PM, Hannes Reinecke wrote:
> >
> > On 12/6/19 11:18 PM, Sagi Grimberg wrote:
> > >
...............................................
> > > I think we need a fail_fast_tmo set in when establishing the
> > > controller to get it right.
> > >
> > Agreed. This whole patch looks like someone is trying to reimplement
> > fast_io_fail_tmo / dev_loss_tmo.
> > As we're moving into unreliable fabrics I guess we'll need a similar
> mechanism.
> >
> > Cheers,
> >
> > Hannes
> 
> 
> Following your suggestions, I added a new session parameter called
> "fast_fail_tmo".
> The timeout is measured in seconds from the controller reconnect, any
> command beyond that timeout is rejected.
> The new parameter value may be passed during ‘connect’, and its default
> value is 30 seconds.
> A value of -1 means no timeout (in similar to current behavior).
> 
> ---
> Regards,
> Victor

James, Hannes & Sagi. 
Can you please review last proposed patch?

Regards,
Victor
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-09 15:30               ` Victor Gladkov
@ 2019-12-17 18:03                 ` James Smart
  2019-12-17 21:46                 ` Sagi Grimberg
  1 sibling, 0 replies; 14+ messages in thread
From: James Smart @ 2019-12-17 18:03 UTC (permalink / raw)
  To: Victor Gladkov, Hannes Reinecke, Sagi Grimberg, linux-nvme



On 12/9/2019 7:30 AM, Victor Gladkov wrote:
> On 12/8/19 14:18 PM, Hannes Reinecke wrote:
>> On 12/6/19 11:18 PM, Sagi Grimberg wrote:
>>>>> ---
>>>>> diff --git a/drivers/nvme/host/fabrics.c
>>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644
>>>>> --- a/drivers/nvme/host/fabrics.c
>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct
>>>>> nvme_ctrl *ctrl,
>>>>>    {
>>>>>           if (ctrl->state != NVME_CTRL_DELETING &&
>>>>>               ctrl->state != NVME_CTRL_DEAD &&
>>>>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
>>>>> +            ((ktime_get_ns() - rq->start_time_ns) >
>>>>> jiffies_to_nsecs(rq->timeout))) &&
>>>>>               !blk_noretry_request(rq) && !(rq->cmd_flags &
>>>>> REQ_NVME_MPATH))
>>>>>                   return BLK_STS_RESOURCE;
>>>>>
>>>> Did you test this to ensure it's doing what you expect. I'm not sure
>>>> that all the timers are set right at this point. Most I/O's timeout
>>>> from a deadline time stamped at blk_mq_start_request(). But that
>>>> routine is actually called by the transports post the
>>>> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in
>>>> flight, thus queued, and the blk-mq internal queuing doesn't count
>>>> against the io timeout.  I can't see anything that guarantees
>>>> start_time_ns is set.
>>> I'm not sure this behavior for failing I/O always desired? some
>>> consumers would actually not want the I/O to fail prematurely if we
>>> are not multipathing...
>>>
>>> I think we need a fail_fast_tmo set in when establishing the
>>> controller to get it right.
>>>
>> Agreed. This whole patch looks like someone is trying to reimplement
>> fast_io_fail_tmo / dev_loss_tmo.
>> As we're moving into unreliable fabrics I guess we'll need a similar mechanism.
>>
>> Cheers,
>>
>> Hannes
>
> Following your suggestions, I added a new session parameter called "fast_fail_tmo".
> The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected.
> The new parameter value may be passed during ‘connect’, and its default value is 30 seconds.
> A value of -1 means no timeout (in similar to current behavior).
>
> ---
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 74b8818..ed6b911 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -406,6 +406,7 @@
>   	}
>
>   	ctrl->cntlid = le16_to_cpu(res.u16);
> +	ctrl->start_reconnect_ns = ktime_get_ns();

It doesn't seem correct to start the reconnect timer once a admin queue 
connect has succeeded. There may be lots of time prior when there is no 
connectivity and the timer hasn't been set and thus isn't running.

ctrl->start_reconnect_ns should be set when the ctrl transitions to 
NVME_CTRL_CONNECTING. Would have been nice to set the value in , but as 
that's in nvme_change_ctrl_state() in core, and as it's more than just 
changing state, I guess that means it has to be in each of the fabric 
transports as they make the call to change to the state.

note: if posting a new patch. Start it as a new patch thread, not a 
reply on this thread.

>
>   out_free_data:
>   	kfree(data);
> @@ -474,8 +475,12 @@
>   bool nvmf_should_reconnect(struct nvme_ctrl *ctrl)
>   {
>   	if (ctrl->opts->max_reconnects == -1 ||
> -	    ctrl->nr_reconnects < ctrl->opts->max_reconnects)
> +	    ctrl->nr_reconnects < ctrl->opts->max_reconnects){
> +		if(ctrl->nr_reconnects == 0)
> +			ctrl->start_reconnect_ns = ktime_get_ns();
> +
>   		return true;
> +	}

This is making up for what I described above, but means 
start_reconnect_ns won't be set until after the first expiration of the 
reconnect attempt. So there's at least 1 window skipped with no timer.  
If the transports just start the timer after the state transition, there 
is no need for this.

>
>   	return false;
>   }
> @@ -549,6 +554,8 @@
>   {
>   	if (ctrl->state != NVME_CTRL_DELETING &&
>   	    ctrl->state != NVME_CTRL_DEAD &&
> +            !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 &&
> +            ((ktime_get_ns() - ctrl->start_reconnect_ns) >  ctrl->opts->fail_fast_tmo_ns)) &&

This is hard to read - I'd prefer just making a small inline helper that 
is called.
As for the logic: looks good. I'm a little concerned about a race on 
CONNECTING and start_reconnect_ns being set right as there's no locks 
(and we don't want to add them), but it's very very very small.

prefer it's just fail_fast_tmo_ns be unsigned, with 0 being disabled/no 
timeout.

>   	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>   		return BLK_STS_RESOURCE;
>
> @@ -612,6 +619,7 @@
>   	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
>   	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
>   	{ NVMF_OPT_TOS,			"tos=%d"		},
> +	{ NVMF_OPT_FAIL_FAST_TMO,	"fail_fast_tmo=%d"	},
>   	{ NVMF_OPT_ERR,			NULL			}
>   };
>
> @@ -623,6 +631,7 @@
>   	int token, ret = 0;
>   	size_t nqnlen  = 0;
>   	int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO;
> +	int fail_fast_tmo = NVMF_DEF_FAIL_FAST_TMO;
>   	uuid_t hostid;
>
>   	/* Set defaults */
> @@ -751,6 +760,16 @@
>   				pr_warn("ctrl_loss_tmo < 0 will reconnect forever\n");
>   			ctrl_loss_tmo = token;
>   			break;
> +		case NVMF_OPT_FAIL_FAST_TMO:
> +			if (match_int(args, &token)) {
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +
> +			if (token < 0)
> +				pr_warn("fail_fast_tmo < 0, I/O will wait for reconnect/loss controller\n");
see above. Prefer <=0 be no timeout.
do you want a warning if larger than controller loss tmo ?

> +			fail_fast_tmo = token;
> +			break;
>   		case NVMF_OPT_HOSTNQN:
>   			if (opts->host) {
>   				pr_err("hostnqn already user-assigned: %s\n",
> @@ -887,6 +906,11 @@
>   		opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
>   						opts->reconnect_delay);
>
> +	if (fail_fast_tmo < 0)
> +		opts->fail_fast_tmo_ns = -1;
would prefer it be set to 0 to mean disabled.

> +	else
> +		opts->fail_fast_tmo_ns = fail_fast_tmo * NSEC_PER_SEC;
> +
>   	if (!opts->host) {
>   		kref_get(&nvmf_default_host->ref);
>   		opts->host = nvmf_default_host;
> @@ -985,7 +1009,7 @@
>   #define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
>   				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
>   				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
> -				 NVMF_OPT_DISABLE_SQFLOW)
> +				 NVMF_OPT_DISABLE_SQFLOW|NVMF_OPT_FAIL_FAST_TMO)
>
>   static struct nvme_ctrl *
>   nvmf_create_ctrl(struct device *dev, const char *buf)
> @@ -1198,7 +1222,6 @@
>   	class_destroy(nvmf_class);
>   	nvmf_host_put(nvmf_default_host);
>
> -	BUILD_BUG_ON(sizeof(struct nvmf_common_command) != 64);

Odd line to have in this patch.

>   	BUILD_BUG_ON(sizeof(struct nvmf_connect_command) != 64);
>   	BUILD_BUG_ON(sizeof(struct nvmf_property_get_command) != 64);
>   	BUILD_BUG_ON(sizeof(struct nvmf_property_set_command) != 64);
>
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index 93f08d7..235c767 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -15,6 +15,8 @@
>   #define NVMF_DEF_RECONNECT_DELAY	10
>   /* default to 600 seconds of reconnect attempts before giving up */
>   #define NVMF_DEF_CTRL_LOSS_TMO		600
> +/* default to 30 seconds of fail fast IO commands  */
> +#define NVMF_DEF_FAIL_FAST_TMO		30
>
>   /*
>    * Define a host as seen by the target.  We allocate one at boot, but also
> @@ -56,6 +58,7 @@
>   	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
>   	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
>   	NVMF_OPT_TOS		= 1 << 19,
> +	NVMF_OPT_FAIL_FAST_TMO	= 1 << 20,
>   };
>
>   /**
> @@ -89,6 +92,7 @@
>    * @nr_write_queues: number of queues for write I/O
>    * @nr_poll_queues: number of queues for polling I/O
>    * @tos: type of service
> + * @fast_fail_tmo_ns: Fast I/O fail timeout in nanoseconds;
>    */
>   struct nvmf_ctrl_options {
>   	unsigned		mask;
> @@ -111,6 +115,7 @@
>   	unsigned int		nr_write_queues;
>   	unsigned int		nr_poll_queues;
>   	int			tos;
> +	u64			fail_fast_tmo_ns;
>   };
>
>   /*
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 34ac79c..9b5f6d9 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -282,6 +282,7 @@
>   	u16 icdoff;
>   	u16 maxcmd;
>   	int nr_reconnects;
> +	u64 start_reconnect_ns;
>   	struct nvmf_ctrl_options *opts;
>
>   	struct page *discard_page;
>
> ---
> Regards,
> Victor


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-09 15:30               ` Victor Gladkov
  2019-12-17 18:03                 ` James Smart
@ 2019-12-17 21:46                 ` Sagi Grimberg
  2019-12-18 22:20                   ` James Smart
  1 sibling, 1 reply; 14+ messages in thread
From: Sagi Grimberg @ 2019-12-17 21:46 UTC (permalink / raw)
  To: Victor Gladkov, Hannes Reinecke, James Smart, linux-nvme



On 12/9/19 7:30 AM, Victor Gladkov wrote:
> On 12/8/19 14:18 PM, Hannes Reinecke wrote:
>>
>> On 12/6/19 11:18 PM, Sagi Grimberg wrote:
>>>
>>>>> ---
>>>>> diff --git a/drivers/nvme/host/fabrics.c
>>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644
>>>>> --- a/drivers/nvme/host/fabrics.c
>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct
>>>>> nvme_ctrl *ctrl,
>>>>>    {
>>>>>           if (ctrl->state != NVME_CTRL_DELETING &&
>>>>>               ctrl->state != NVME_CTRL_DEAD &&
>>>>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
>>>>> +            ((ktime_get_ns() - rq->start_time_ns) >
>>>>> jiffies_to_nsecs(rq->timeout))) &&
>>>>>               !blk_noretry_request(rq) && !(rq->cmd_flags &
>>>>> REQ_NVME_MPATH))
>>>>>                   return BLK_STS_RESOURCE;
>>>>>
>>>>
>>>> Did you test this to ensure it's doing what you expect. I'm not sure
>>>> that all the timers are set right at this point. Most I/O's timeout
>>>> from a deadline time stamped at blk_mq_start_request(). But that
>>>> routine is actually called by the transports post the
>>>> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in
>>>> flight, thus queued, and the blk-mq internal queuing doesn't count
>>>> against the io timeout.  I can't see anything that guarantees
>>>> start_time_ns is set.
>>>
>>> I'm not sure this behavior for failing I/O always desired? some
>>> consumers would actually not want the I/O to fail prematurely if we
>>> are not multipathing...
>>>
>>> I think we need a fail_fast_tmo set in when establishing the
>>> controller to get it right.
>>>
>> Agreed. This whole patch looks like someone is trying to reimplement
>> fast_io_fail_tmo / dev_loss_tmo.
>> As we're moving into unreliable fabrics I guess we'll need a similar mechanism.
>>
>> Cheers,
>>
>> Hannes
> 
> 
> Following your suggestions, I added a new session parameter called "fast_fail_tmo".
> The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected.
> The new parameter value may be passed during ‘connect’, and its default value is 30 seconds.

The default should be consistent with the existing behavior.

> A value of -1 means no timeout (in similar to current behavior).
> 
> ---
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 74b8818..ed6b911 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -406,6 +406,7 @@
>   	}
> 
>   	ctrl->cntlid = le16_to_cpu(res.u16);
> +	ctrl->start_reconnect_ns = ktime_get_ns();
> 
>   out_free_data:
>   	kfree(data);
> @@ -474,8 +475,12 @@
>   bool nvmf_should_reconnect(struct nvme_ctrl *ctrl)
>   {
>   	if (ctrl->opts->max_reconnects == -1 ||
> -	    ctrl->nr_reconnects < ctrl->opts->max_reconnects)
> +	    ctrl->nr_reconnects < ctrl->opts->max_reconnects){
> +		if(ctrl->nr_reconnects == 0)
> +			ctrl->start_reconnect_ns = ktime_get_ns();
> +
>   		return true;
> +	}
> 
>   	return false;
>   }
> @@ -549,6 +554,8 @@
>   {
>   	if (ctrl->state != NVME_CTRL_DELETING &&
>   	    ctrl->state != NVME_CTRL_DEAD &&
> +            !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 &&
> +            ((ktime_get_ns() - ctrl->start_reconnect_ns) >  ctrl->opts->fail_fast_tmo_ns)) &&
>   	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>   		return BLK_STS_RESOURCE;

I cannot comprehend what is going on here...

We should have a dedicated delayed_work that transitions the controller
to a FAIL_FAST state and cancels the inflight requests again. This
work should be triggered when the error is detected.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-fabrics: reject I/O to offline device
  2019-12-17 21:46                 ` Sagi Grimberg
@ 2019-12-18 22:20                   ` James Smart
  0 siblings, 0 replies; 14+ messages in thread
From: James Smart @ 2019-12-18 22:20 UTC (permalink / raw)
  To: Sagi Grimberg, Victor Gladkov, Hannes Reinecke, linux-nvme



On 12/17/2019 1:46 PM, Sagi Grimberg wrote:
>
>
> On 12/9/19 7:30 AM, Victor Gladkov wrote:
>> On 12/8/19 14:18 PM, Hannes Reinecke wrote:
>>>
>>> On 12/6/19 11:18 PM, Sagi Grimberg wrote:
>>>>
>>>>>> ---
>>>>>> diff --git a/drivers/nvme/host/fabrics.c
>>>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644
>>>>>> --- a/drivers/nvme/host/fabrics.c
>>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct
>>>>>> nvme_ctrl *ctrl,
>>>>>>    {
>>>>>>           if (ctrl->state != NVME_CTRL_DELETING &&
>>>>>>               ctrl->state != NVME_CTRL_DEAD &&
>>>>>> +           !(ctrl->state == NVME_CTRL_CONNECTING &&
>>>>>> +            ((ktime_get_ns() - rq->start_time_ns) >
>>>>>> jiffies_to_nsecs(rq->timeout))) &&
>>>>>>               !blk_noretry_request(rq) && !(rq->cmd_flags &
>>>>>> REQ_NVME_MPATH))
>>>>>>                   return BLK_STS_RESOURCE;
>>>>>>
>>>>>
>>>>> Did you test this to ensure it's doing what you expect. I'm not sure
>>>>> that all the timers are set right at this point. Most I/O's timeout
>>>>> from a deadline time stamped at blk_mq_start_request(). But that
>>>>> routine is actually called by the transports post the
>>>>> nvmf_check_ready/fail_nonready calls.  E.g. the io is not yet in
>>>>> flight, thus queued, and the blk-mq internal queuing doesn't count
>>>>> against the io timeout.  I can't see anything that guarantees
>>>>> start_time_ns is set.
>>>>
>>>> I'm not sure this behavior for failing I/O always desired? some
>>>> consumers would actually not want the I/O to fail prematurely if we
>>>> are not multipathing...
>>>>
>>>> I think we need a fail_fast_tmo set in when establishing the
>>>> controller to get it right.
>>>>
>>> Agreed. This whole patch looks like someone is trying to reimplement
>>> fast_io_fail_tmo / dev_loss_tmo.
>>> As we're moving into unreliable fabrics I guess we'll need a similar 
>>> mechanism.
>>>
>>> Cheers,
>>>
>>> Hannes
>>
>>
>> Following your suggestions, I added a new session parameter called 
>> "fast_fail_tmo".
>> The timeout is measured in seconds from the controller reconnect, any 
>> command beyond that timeout is rejected.
>> The new parameter value may be passed during ‘connect’, and its 
>> default value is 30 seconds.
>
> The default should be consistent with the existing behavior.
>
>> A value of -1 means no timeout (in similar to current behavior).
>>
>> ---
>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 74b8818..ed6b911 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -406,6 +406,7 @@
>>       }
>>
>>       ctrl->cntlid = le16_to_cpu(res.u16);
>> +    ctrl->start_reconnect_ns = ktime_get_ns();
>>
>>   out_free_data:
>>       kfree(data);
>> @@ -474,8 +475,12 @@
>>   bool nvmf_should_reconnect(struct nvme_ctrl *ctrl)
>>   {
>>       if (ctrl->opts->max_reconnects == -1 ||
>> -        ctrl->nr_reconnects < ctrl->opts->max_reconnects)
>> +        ctrl->nr_reconnects < ctrl->opts->max_reconnects){
>> +        if(ctrl->nr_reconnects == 0)
>> +            ctrl->start_reconnect_ns = ktime_get_ns();
>> +
>>           return true;
>> +    }
>>
>>       return false;
>>   }
>> @@ -549,6 +554,8 @@
>>   {
>>       if (ctrl->state != NVME_CTRL_DELETING &&
>>           ctrl->state != NVME_CTRL_DEAD &&
>> +            !(ctrl->state == NVME_CTRL_CONNECTING && 
>> ctrl->opts->fail_fast_tmo_ns >= 0 &&
>> +            ((ktime_get_ns() - ctrl->start_reconnect_ns) >  
>> ctrl->opts->fail_fast_tmo_ns)) &&
>>           !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>>           return BLK_STS_RESOURCE;
>
> I cannot comprehend what is going on here...
>
> We should have a dedicated delayed_work that transitions the controller
> to a FAIL_FAST state and cancels the inflight requests again. This
> work should be triggered when the error is detected.

I hope you're not suggesting a FAILFAST state.  No new controller state 
is needed.

I do agree that managing the time since transitioning to CONNECTING can 
be handled better and can address "abort all now" rather than waiting 
for retries to kick in.

In other words:
Add a controller flag of "failfast_expired"
When entering CONNECTING, schedule a delayed work item based on failfast 
timeout value.
If transition out of CONNECTING, terminate delayed work item and ensure 
failfast_expired is false.
If delayed work item expires: set "failfast_expired" flag to true. Run 
through all inflight ios and cancel them.
Update nvmf_fail_nonready_command() (above) per above, but with check on 
"!ctrl->failfast_expired".

-- james

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-12-18 22:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-01  7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov
2019-12-02 22:26 ` Chaitanya Kulkarni
2019-12-02 22:47 ` James Smart
2019-12-03 10:04   ` Victor Gladkov
2019-12-03 16:19     ` James Smart
2019-12-04  8:28       ` Victor Gladkov
2019-12-06  0:38         ` James Smart
2019-12-06 22:18           ` Sagi Grimberg
2019-12-08 12:31             ` Hannes Reinecke
2019-12-09 15:30               ` Victor Gladkov
2019-12-17 18:03                 ` James Smart
2019-12-17 21:46                 ` Sagi Grimberg
2019-12-18 22:20                   ` James Smart
2019-12-15 12:33               ` Victor Gladkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).