* [PATCH] nvme-fabrics: reject I/O to offline device @ 2019-12-01 7:59 Victor Gladkov 2019-12-02 22:26 ` Chaitanya Kulkarni 2019-12-02 22:47 ` James Smart 0 siblings, 2 replies; 14+ messages in thread From: Victor Gladkov @ 2019-12-01 7:59 UTC (permalink / raw) To: linux-nvme Issue Description: Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state. NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes. This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device" Fix Description: Added a kernel module parameter "nvmef_reconnect_failfast" for nvme-fabrics module (default is true). Interfere in the decision whether to queue IO command or retry IO command. The interface takes into account the controller reconnect state, in a way that during reconnect state, IO commands shall fail immediacy (default) or according to IO command timeout (depends on the module parameter value), and IO retry is prevented. As a result, commands do not get stuck in in reconnect state. branch nvme-5.5 --- diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 74b8818..ef89aff 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -13,6 +13,10 @@ #include "nvme.h" #include "fabrics.h" +static bool nvmef_reconnect_failfast = 1; +module_param_named(nvmef_reconnect_failfast, nvmef_reconnect_failfast, bool, S_IRUGO); +MODULE_PARM_DESC(nvmef_reconnect_failfast, "failfast flag for I/O when controler is reconnecting, else use I/O command timeout (default true)."); + static LIST_HEAD(nvmf_transports); static DECLARE_RWSEM(nvmf_transports_rwsem); @@ -549,6 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl, { if (ctrl->state != NVME_CTRL_DELETING && ctrl->state != NVME_CTRL_DEAD && + !(ctrl->state == NVME_CTRL_CONNECTING && (((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)) || nvmef_reconnect_failfast)) && !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) return BLK_STS_RESOURCE; Regards, Victor _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-01 7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov @ 2019-12-02 22:26 ` Chaitanya Kulkarni 2019-12-02 22:47 ` James Smart 1 sibling, 0 replies; 14+ messages in thread From: Chaitanya Kulkarni @ 2019-12-02 22:26 UTC (permalink / raw) To: Victor Gladkov, linux-nvme On 12/01/2019 12:14 AM, Victor Gladkov wrote: > Issue Description: > Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state. > NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes. > This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device" The values NVMF_DEF_RECONNECT_DELAY and NVMF_OPT_CTRL_LOSS_TMO are configurable using :- 602 { NVMF_OPT_RECONNECT_DELAY, "reconnect_delay=%d" }, 603 { NVMF_OPT_CTRL_LOSS_TMO, "ctrl_loss_tmo=%d" }, Is it possible to get the same behavior with proper combination of above two values set ? _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-01 7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov 2019-12-02 22:26 ` Chaitanya Kulkarni @ 2019-12-02 22:47 ` James Smart 2019-12-03 10:04 ` Victor Gladkov 1 sibling, 1 reply; 14+ messages in thread From: James Smart @ 2019-12-02 22:47 UTC (permalink / raw) To: linux-nvme On 11/30/2019 11:59 PM, Victor Gladkov wrote: > Issue Description: > Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state. > NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes. > This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device" > > Fix Description: > Added a kernel module parameter "nvmef_reconnect_failfast" for nvme-fabrics module (default is true). > Interfere in the decision whether to queue IO command or retry IO command. The interface takes into account the controller reconnect state, in a way that during reconnect state, IO commands shall fail immediacy (default) or according to IO command timeout (depends on the module parameter value), and IO retry is prevented. As a result, commands do not get stuck in in reconnect state. This the patch seems incorrect at least as described. Multipathing inherently will "fastfail" and send to other paths. So the only way something is "stuck" is if it's last path. If last path, we definitely don't want to prematurely release i/o before we've given the subsystem every opportunity to reconnect. What I hear you saying is you don't like the kernel default controller-loss-timeout of 600s. What was designed, if you didn't like the kernel default, was to use the per-connection "--cltr-loss-tmo" option for "nvme connect". The auto-connect scripts or the admin script that specifies the connection can set whatever value it likes. If that seems hard to do, perhaps it's time to implement the options that allow for a config file to specify new values to be used on all connections, or on connections to specific subsystems, and so on. But I don't think the kernel needs to change. -- james > > branch nvme-5.5 > > --- > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c > index 74b8818..ef89aff 100644 > --- a/drivers/nvme/host/fabrics.c > +++ b/drivers/nvme/host/fabrics.c > @@ -13,6 +13,10 @@ > #include "nvme.h" > #include "fabrics.h" > > +static bool nvmef_reconnect_failfast = 1; > +module_param_named(nvmef_reconnect_failfast, nvmef_reconnect_failfast, bool, S_IRUGO); > +MODULE_PARM_DESC(nvmef_reconnect_failfast, "failfast flag for I/O when controler is reconnecting, else use I/O command timeout (default true)."); > + > static LIST_HEAD(nvmf_transports); > static DECLARE_RWSEM(nvmf_transports_rwsem); > > @@ -549,6 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl, > { > if (ctrl->state != NVME_CTRL_DELETING && > ctrl->state != NVME_CTRL_DEAD && > + !(ctrl->state == NVME_CTRL_CONNECTING && (((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)) || nvmef_reconnect_failfast)) && > !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) > return BLK_STS_RESOURCE; > > > Regards, > Victor > > > _______________________________________________ > linux-nvme mailing list > linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-02 22:47 ` James Smart @ 2019-12-03 10:04 ` Victor Gladkov 2019-12-03 16:19 ` James Smart 0 siblings, 1 reply; 14+ messages in thread From: Victor Gladkov @ 2019-12-03 10:04 UTC (permalink / raw) To: James Smart, linux-nvme On 12/03/2019 00:47 AM, James Smart wrote: > On 11/30/2019 11:59 PM, Victor Gladkov wrote: > > Issue Description: > > Commands get stuck while Host NVMe controller (TCP or RDMA) is in > reconnect state. > > NVMe controller enters into reconnect state when it loses connection with > the target. It tries to reconnect every 10 seconds (default) until successful > reconnection or until reconnect time-out is reached. The default reconnect > time out is 10 minutes. > > This behavior is different than ISCSI where Commands during reconnect > state returns with the following error: "rejecting I/O to offline device" > > > > Fix Description: > > Added a kernel module parameter "nvmef_reconnect_failfast" for nvme- > fabrics module (default is true). > > Interfere in the decision whether to queue IO command or retry IO > command. The interface takes into account the controller reconnect state, in > a way that during reconnect state, IO commands shall fail immediacy (default) > or according to IO command timeout (depends on the module parameter > value), and IO retry is prevented. As a result, commands do not get stuck in in > reconnect state. > > This the patch seems incorrect at least as described. Multipathing inherently > will "fastfail" and send to other paths. So the only way something is "stuck" is > if it's last path. If last path, we definitely don't want to prematurely release > i/o before we've given the subsystem every opportunity to reconnect. > > What I hear you saying is you don't like the kernel default controller-loss- > timeout of 600s. What was designed, if you didn't like the kernel default, was > to use the per-connection "--cltr-loss-tmo" > option for "nvme connect". The auto-connect scripts or the admin script that > specifies the connection can set whatever value it likes. > > If that seems hard to do, perhaps it's time to implement the options that > allow for a config file to specify new values to be used on all connections, or > on connections to specific subsystems, and so on. But I don't think the kernel > needs to change. > > -- james James, thank you for the suggestion. But let me explain it differently. Applications are expecting commands to complete with success or error within a certain timeout (30 seconds by default). The NVMe host is enforcing that timeout while it is connected, never the less, during reconnection, the timeout is not enforced and commands may get stuck for a long period or even forever. The controller-loss-timeout should not affect IO timeout policy, these are two different policies. Regards, Victor _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-03 10:04 ` Victor Gladkov @ 2019-12-03 16:19 ` James Smart 2019-12-04 8:28 ` Victor Gladkov 0 siblings, 1 reply; 14+ messages in thread From: James Smart @ 2019-12-03 16:19 UTC (permalink / raw) To: Victor Gladkov, linux-nvme On 12/3/2019 2:04 AM, Victor Gladkov wrote: > On 12/03/2019 00:47 AM, James Smart wrote: >> O >> James, thank you for the suggestion. >> But let me explain it differently. >> >> Applications are expecting commands to complete with success or error >> within a certain timeout (30 seconds by default). >> The NVMe host is enforcing that timeout while it is connected, never the less, >> during reconnection, the timeout is not enforced and commands may get stuck for a long period or even forever. >> >> The controller-loss-timeout should not affect IO timeout policy, these are two different policies. >> >> Regards, >> Victor Ok - which says what does make sense to add is the portion: !(ctrl->state == NVME_CTRL_CONNECTING && ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) But I don't think we need the failfast flag. -- james _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-03 16:19 ` James Smart @ 2019-12-04 8:28 ` Victor Gladkov 2019-12-06 0:38 ` James Smart 0 siblings, 1 reply; 14+ messages in thread From: Victor Gladkov @ 2019-12-04 8:28 UTC (permalink / raw) To: James Smart, linux-nvme On 12/03/2019 06:19 PM, James Smart wrote: > On 12/3/2019 2:04 AM, Victor Gladkov wrote: > > On 12/03/2019 00:47 AM, James Smart wrote: > >> O > >> The controller-loss-timeout should not affect IO timeout policy, these are > two different policies. > > Ok - which says what does make sense to add is the portion: > > !(ctrl->state == NVME_CTRL_CONNECTING && ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) > > > But I don't think we need the failfast flag. > > -- james OK. I think, it's good enough. This is updated patch: --- diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl, { if (ctrl->state != NVME_CTRL_DELETING && ctrl->state != NVME_CTRL_DEAD && + !(ctrl->state == NVME_CTRL_CONNECTING && + ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) && !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) return BLK_STS_RESOURCE; --- Regards, Victor _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-04 8:28 ` Victor Gladkov @ 2019-12-06 0:38 ` James Smart 2019-12-06 22:18 ` Sagi Grimberg 0 siblings, 1 reply; 14+ messages in thread From: James Smart @ 2019-12-06 0:38 UTC (permalink / raw) To: Victor Gladkov, linux-nvme On 12/4/2019 12:28 AM, Victor Gladkov wrote: > On 12/03/2019 06:19 PM, James Smart wrote: >> On 12/3/2019 2:04 AM, Victor Gladkov wrote: >>> On 12/03/2019 00:47 AM, James Smart wrote: >>>> O >>>> The controller-loss-timeout should not affect IO timeout policy, these are >> two different policies. >> Ok - which says what does make sense to add is the portion: >> >> !(ctrl->state == NVME_CTRL_CONNECTING && ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) >> >> >> But I don't think we need the failfast flag. >> >> -- james > OK. I think, it's good enough. > > This is updated patch: > > --- > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c > index 74b8818..b58abc1 100644 > --- a/drivers/nvme/host/fabrics.c > +++ b/drivers/nvme/host/fabrics.c > @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl, > { > if (ctrl->state != NVME_CTRL_DELETING && > ctrl->state != NVME_CTRL_DEAD && > + !(ctrl->state == NVME_CTRL_CONNECTING && > + ((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout))) && > !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) > return BLK_STS_RESOURCE; > Did you test this to ensure it's doing what you expect. I'm not sure that all the timers are set right at this point. Most I/O's timeout from a deadline time stamped at blk_mq_start_request(). But that routine is actually called by the transports post the nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in flight, thus queued, and the blk-mq internal queuing doesn't count against the io timeout. I can't see anything that guarantees start_time_ns is set. -- james _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-06 0:38 ` James Smart @ 2019-12-06 22:18 ` Sagi Grimberg 2019-12-08 12:31 ` Hannes Reinecke 0 siblings, 1 reply; 14+ messages in thread From: Sagi Grimberg @ 2019-12-06 22:18 UTC (permalink / raw) To: James Smart, Victor Gladkov, linux-nvme >> --- >> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c >> index 74b8818..b58abc1 100644 >> --- a/drivers/nvme/host/fabrics.c >> +++ b/drivers/nvme/host/fabrics.c >> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct >> nvme_ctrl *ctrl, >> { >> if (ctrl->state != NVME_CTRL_DELETING && >> ctrl->state != NVME_CTRL_DEAD && >> + !(ctrl->state == NVME_CTRL_CONNECTING && >> + ((ktime_get_ns() - rq->start_time_ns) > >> jiffies_to_nsecs(rq->timeout))) && >> !blk_noretry_request(rq) && !(rq->cmd_flags & >> REQ_NVME_MPATH)) >> return BLK_STS_RESOURCE; >> > > Did you test this to ensure it's doing what you expect. I'm not sure > that all the timers are set right at this point. Most I/O's timeout from > a deadline time stamped at blk_mq_start_request(). But that routine is > actually called by the transports post the > nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in flight, > thus queued, and the blk-mq internal queuing doesn't count against the > io timeout. I can't see anything that guarantees start_time_ns is set. I'm not sure this behavior for failing I/O always desired? some consumers would actually not want the I/O to fail prematurely if we are not multipathing... I think we need a fail_fast_tmo set in when establishing the controller to get it right. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-06 22:18 ` Sagi Grimberg @ 2019-12-08 12:31 ` Hannes Reinecke 2019-12-09 15:30 ` Victor Gladkov 2019-12-15 12:33 ` Victor Gladkov 0 siblings, 2 replies; 14+ messages in thread From: Hannes Reinecke @ 2019-12-08 12:31 UTC (permalink / raw) To: Sagi Grimberg, James Smart, Victor Gladkov, linux-nvme On 12/6/19 11:18 PM, Sagi Grimberg wrote: > >>> --- >>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c >>> index 74b8818..b58abc1 100644 >>> --- a/drivers/nvme/host/fabrics.c >>> +++ b/drivers/nvme/host/fabrics.c >>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct >>> nvme_ctrl *ctrl, >>> { >>> if (ctrl->state != NVME_CTRL_DELETING && >>> ctrl->state != NVME_CTRL_DEAD && >>> + !(ctrl->state == NVME_CTRL_CONNECTING && >>> + ((ktime_get_ns() - rq->start_time_ns) > >>> jiffies_to_nsecs(rq->timeout))) && >>> !blk_noretry_request(rq) && !(rq->cmd_flags & >>> REQ_NVME_MPATH)) >>> return BLK_STS_RESOURCE; >>> >> >> Did you test this to ensure it's doing what you expect. I'm not sure >> that all the timers are set right at this point. Most I/O's timeout >> from a deadline time stamped at blk_mq_start_request(). But that >> routine is actually called by the transports post the >> nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in >> flight, thus queued, and the blk-mq internal queuing doesn't count >> against the io timeout. I can't see anything that guarantees >> start_time_ns is set. > > I'm not sure this behavior for failing I/O always desired? some > consumers would actually not want the I/O to fail prematurely if we > are not multipathing... > > I think we need a fail_fast_tmo set in when establishing the controller > to get it right. > Agreed. This whole patch looks like someone is trying to reimplement fast_io_fail_tmo / dev_loss_tmo. As we're moving into unreliable fabrics I guess we'll need a similar mechanism. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-08 12:31 ` Hannes Reinecke @ 2019-12-09 15:30 ` Victor Gladkov 2019-12-17 18:03 ` James Smart 2019-12-17 21:46 ` Sagi Grimberg 2019-12-15 12:33 ` Victor Gladkov 1 sibling, 2 replies; 14+ messages in thread From: Victor Gladkov @ 2019-12-09 15:30 UTC (permalink / raw) To: Hannes Reinecke, Sagi Grimberg, James Smart, linux-nvme On 12/8/19 14:18 PM, Hannes Reinecke wrote: > > On 12/6/19 11:18 PM, Sagi Grimberg wrote: > > > >>> --- > >>> diff --git a/drivers/nvme/host/fabrics.c > >>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644 > >>> --- a/drivers/nvme/host/fabrics.c > >>> +++ b/drivers/nvme/host/fabrics.c > >>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct > >>> nvme_ctrl *ctrl, > >>> { > >>> if (ctrl->state != NVME_CTRL_DELETING && > >>> ctrl->state != NVME_CTRL_DEAD && > >>> + !(ctrl->state == NVME_CTRL_CONNECTING && > >>> + ((ktime_get_ns() - rq->start_time_ns) > > >>> jiffies_to_nsecs(rq->timeout))) && > >>> !blk_noretry_request(rq) && !(rq->cmd_flags & > >>> REQ_NVME_MPATH)) > >>> return BLK_STS_RESOURCE; > >>> > >> > >> Did you test this to ensure it's doing what you expect. I'm not sure > >> that all the timers are set right at this point. Most I/O's timeout > >> from a deadline time stamped at blk_mq_start_request(). But that > >> routine is actually called by the transports post the > >> nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in > >> flight, thus queued, and the blk-mq internal queuing doesn't count > >> against the io timeout. I can't see anything that guarantees > >> start_time_ns is set. > > > > I'm not sure this behavior for failing I/O always desired? some > > consumers would actually not want the I/O to fail prematurely if we > > are not multipathing... > > > > I think we need a fail_fast_tmo set in when establishing the > > controller to get it right. > > > Agreed. This whole patch looks like someone is trying to reimplement > fast_io_fail_tmo / dev_loss_tmo. > As we're moving into unreliable fabrics I guess we'll need a similar mechanism. > > Cheers, > > Hannes Following your suggestions, I added a new session parameter called "fast_fail_tmo". The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected. The new parameter value may be passed during ‘connect’, and its default value is 30 seconds. A value of -1 means no timeout (in similar to current behavior). --- diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 74b8818..ed6b911 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -406,6 +406,7 @@ } ctrl->cntlid = le16_to_cpu(res.u16); + ctrl->start_reconnect_ns = ktime_get_ns(); out_free_data: kfree(data); @@ -474,8 +475,12 @@ bool nvmf_should_reconnect(struct nvme_ctrl *ctrl) { if (ctrl->opts->max_reconnects == -1 || - ctrl->nr_reconnects < ctrl->opts->max_reconnects) + ctrl->nr_reconnects < ctrl->opts->max_reconnects){ + if(ctrl->nr_reconnects == 0) + ctrl->start_reconnect_ns = ktime_get_ns(); + return true; + } return false; } @@ -549,6 +554,8 @@ { if (ctrl->state != NVME_CTRL_DELETING && ctrl->state != NVME_CTRL_DEAD && + !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 && + ((ktime_get_ns() - ctrl->start_reconnect_ns) > ctrl->opts->fail_fast_tmo_ns)) && !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) return BLK_STS_RESOURCE; @@ -612,6 +619,7 @@ { NVMF_OPT_NR_WRITE_QUEUES, "nr_write_queues=%d" }, { NVMF_OPT_NR_POLL_QUEUES, "nr_poll_queues=%d" }, { NVMF_OPT_TOS, "tos=%d" }, + { NVMF_OPT_FAIL_FAST_TMO, "fail_fast_tmo=%d" }, { NVMF_OPT_ERR, NULL } }; @@ -623,6 +631,7 @@ int token, ret = 0; size_t nqnlen = 0; int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO; + int fail_fast_tmo = NVMF_DEF_FAIL_FAST_TMO; uuid_t hostid; /* Set defaults */ @@ -751,6 +760,16 @@ pr_warn("ctrl_loss_tmo < 0 will reconnect forever\n"); ctrl_loss_tmo = token; break; + case NVMF_OPT_FAIL_FAST_TMO: + if (match_int(args, &token)) { + ret = -EINVAL; + goto out; + } + + if (token < 0) + pr_warn("fail_fast_tmo < 0, I/O will wait for reconnect/loss controller\n"); + fail_fast_tmo = token; + break; case NVMF_OPT_HOSTNQN: if (opts->host) { pr_err("hostnqn already user-assigned: %s\n", @@ -887,6 +906,11 @@ opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo, opts->reconnect_delay); + if (fail_fast_tmo < 0) + opts->fail_fast_tmo_ns = -1; + else + opts->fail_fast_tmo_ns = fail_fast_tmo * NSEC_PER_SEC; + if (!opts->host) { kref_get(&nvmf_default_host->ref); opts->host = nvmf_default_host; @@ -985,7 +1009,7 @@ #define NVMF_ALLOWED_OPTS (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \ NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \ NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\ - NVMF_OPT_DISABLE_SQFLOW) + NVMF_OPT_DISABLE_SQFLOW|NVMF_OPT_FAIL_FAST_TMO) static struct nvme_ctrl * nvmf_create_ctrl(struct device *dev, const char *buf) @@ -1198,7 +1222,6 @@ class_destroy(nvmf_class); nvmf_host_put(nvmf_default_host); - BUILD_BUG_ON(sizeof(struct nvmf_common_command) != 64); BUILD_BUG_ON(sizeof(struct nvmf_connect_command) != 64); BUILD_BUG_ON(sizeof(struct nvmf_property_get_command) != 64); BUILD_BUG_ON(sizeof(struct nvmf_property_set_command) != 64); diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index 93f08d7..235c767 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -15,6 +15,8 @@ #define NVMF_DEF_RECONNECT_DELAY 10 /* default to 600 seconds of reconnect attempts before giving up */ #define NVMF_DEF_CTRL_LOSS_TMO 600 +/* default to 30 seconds of fail fast IO commands */ +#define NVMF_DEF_FAIL_FAST_TMO 30 /* * Define a host as seen by the target. We allocate one at boot, but also @@ -56,6 +58,7 @@ NVMF_OPT_NR_WRITE_QUEUES = 1 << 17, NVMF_OPT_NR_POLL_QUEUES = 1 << 18, NVMF_OPT_TOS = 1 << 19, + NVMF_OPT_FAIL_FAST_TMO = 1 << 20, }; /** @@ -89,6 +92,7 @@ * @nr_write_queues: number of queues for write I/O * @nr_poll_queues: number of queues for polling I/O * @tos: type of service + * @fast_fail_tmo_ns: Fast I/O fail timeout in nanoseconds; */ struct nvmf_ctrl_options { unsigned mask; @@ -111,6 +115,7 @@ unsigned int nr_write_queues; unsigned int nr_poll_queues; int tos; + u64 fail_fast_tmo_ns; }; /* diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 34ac79c..9b5f6d9 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -282,6 +282,7 @@ u16 icdoff; u16 maxcmd; int nr_reconnects; + u64 start_reconnect_ns; struct nvmf_ctrl_options *opts; struct page *discard_page; --- Regards, Victor _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-09 15:30 ` Victor Gladkov @ 2019-12-17 18:03 ` James Smart 2019-12-17 21:46 ` Sagi Grimberg 1 sibling, 0 replies; 14+ messages in thread From: James Smart @ 2019-12-17 18:03 UTC (permalink / raw) To: Victor Gladkov, Hannes Reinecke, Sagi Grimberg, linux-nvme On 12/9/2019 7:30 AM, Victor Gladkov wrote: > On 12/8/19 14:18 PM, Hannes Reinecke wrote: >> On 12/6/19 11:18 PM, Sagi Grimberg wrote: >>>>> --- >>>>> diff --git a/drivers/nvme/host/fabrics.c >>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644 >>>>> --- a/drivers/nvme/host/fabrics.c >>>>> +++ b/drivers/nvme/host/fabrics.c >>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct >>>>> nvme_ctrl *ctrl, >>>>> { >>>>> if (ctrl->state != NVME_CTRL_DELETING && >>>>> ctrl->state != NVME_CTRL_DEAD && >>>>> + !(ctrl->state == NVME_CTRL_CONNECTING && >>>>> + ((ktime_get_ns() - rq->start_time_ns) > >>>>> jiffies_to_nsecs(rq->timeout))) && >>>>> !blk_noretry_request(rq) && !(rq->cmd_flags & >>>>> REQ_NVME_MPATH)) >>>>> return BLK_STS_RESOURCE; >>>>> >>>> Did you test this to ensure it's doing what you expect. I'm not sure >>>> that all the timers are set right at this point. Most I/O's timeout >>>> from a deadline time stamped at blk_mq_start_request(). But that >>>> routine is actually called by the transports post the >>>> nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in >>>> flight, thus queued, and the blk-mq internal queuing doesn't count >>>> against the io timeout. I can't see anything that guarantees >>>> start_time_ns is set. >>> I'm not sure this behavior for failing I/O always desired? some >>> consumers would actually not want the I/O to fail prematurely if we >>> are not multipathing... >>> >>> I think we need a fail_fast_tmo set in when establishing the >>> controller to get it right. >>> >> Agreed. This whole patch looks like someone is trying to reimplement >> fast_io_fail_tmo / dev_loss_tmo. >> As we're moving into unreliable fabrics I guess we'll need a similar mechanism. >> >> Cheers, >> >> Hannes > > Following your suggestions, I added a new session parameter called "fast_fail_tmo". > The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected. > The new parameter value may be passed during ‘connect’, and its default value is 30 seconds. > A value of -1 means no timeout (in similar to current behavior). > > --- > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c > index 74b8818..ed6b911 100644 > --- a/drivers/nvme/host/fabrics.c > +++ b/drivers/nvme/host/fabrics.c > @@ -406,6 +406,7 @@ > } > > ctrl->cntlid = le16_to_cpu(res.u16); > + ctrl->start_reconnect_ns = ktime_get_ns(); It doesn't seem correct to start the reconnect timer once a admin queue connect has succeeded. There may be lots of time prior when there is no connectivity and the timer hasn't been set and thus isn't running. ctrl->start_reconnect_ns should be set when the ctrl transitions to NVME_CTRL_CONNECTING. Would have been nice to set the value in , but as that's in nvme_change_ctrl_state() in core, and as it's more than just changing state, I guess that means it has to be in each of the fabric transports as they make the call to change to the state. note: if posting a new patch. Start it as a new patch thread, not a reply on this thread. > > out_free_data: > kfree(data); > @@ -474,8 +475,12 @@ > bool nvmf_should_reconnect(struct nvme_ctrl *ctrl) > { > if (ctrl->opts->max_reconnects == -1 || > - ctrl->nr_reconnects < ctrl->opts->max_reconnects) > + ctrl->nr_reconnects < ctrl->opts->max_reconnects){ > + if(ctrl->nr_reconnects == 0) > + ctrl->start_reconnect_ns = ktime_get_ns(); > + > return true; > + } This is making up for what I described above, but means start_reconnect_ns won't be set until after the first expiration of the reconnect attempt. So there's at least 1 window skipped with no timer. If the transports just start the timer after the state transition, there is no need for this. > > return false; > } > @@ -549,6 +554,8 @@ > { > if (ctrl->state != NVME_CTRL_DELETING && > ctrl->state != NVME_CTRL_DEAD && > + !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 && > + ((ktime_get_ns() - ctrl->start_reconnect_ns) > ctrl->opts->fail_fast_tmo_ns)) && This is hard to read - I'd prefer just making a small inline helper that is called. As for the logic: looks good. I'm a little concerned about a race on CONNECTING and start_reconnect_ns being set right as there's no locks (and we don't want to add them), but it's very very very small. prefer it's just fail_fast_tmo_ns be unsigned, with 0 being disabled/no timeout. > !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) > return BLK_STS_RESOURCE; > > @@ -612,6 +619,7 @@ > { NVMF_OPT_NR_WRITE_QUEUES, "nr_write_queues=%d" }, > { NVMF_OPT_NR_POLL_QUEUES, "nr_poll_queues=%d" }, > { NVMF_OPT_TOS, "tos=%d" }, > + { NVMF_OPT_FAIL_FAST_TMO, "fail_fast_tmo=%d" }, > { NVMF_OPT_ERR, NULL } > }; > > @@ -623,6 +631,7 @@ > int token, ret = 0; > size_t nqnlen = 0; > int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO; > + int fail_fast_tmo = NVMF_DEF_FAIL_FAST_TMO; > uuid_t hostid; > > /* Set defaults */ > @@ -751,6 +760,16 @@ > pr_warn("ctrl_loss_tmo < 0 will reconnect forever\n"); > ctrl_loss_tmo = token; > break; > + case NVMF_OPT_FAIL_FAST_TMO: > + if (match_int(args, &token)) { > + ret = -EINVAL; > + goto out; > + } > + > + if (token < 0) > + pr_warn("fail_fast_tmo < 0, I/O will wait for reconnect/loss controller\n"); see above. Prefer <=0 be no timeout. do you want a warning if larger than controller loss tmo ? > + fail_fast_tmo = token; > + break; > case NVMF_OPT_HOSTNQN: > if (opts->host) { > pr_err("hostnqn already user-assigned: %s\n", > @@ -887,6 +906,11 @@ > opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo, > opts->reconnect_delay); > > + if (fail_fast_tmo < 0) > + opts->fail_fast_tmo_ns = -1; would prefer it be set to 0 to mean disabled. > + else > + opts->fail_fast_tmo_ns = fail_fast_tmo * NSEC_PER_SEC; > + > if (!opts->host) { > kref_get(&nvmf_default_host->ref); > opts->host = nvmf_default_host; > @@ -985,7 +1009,7 @@ > #define NVMF_ALLOWED_OPTS (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \ > NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \ > NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\ > - NVMF_OPT_DISABLE_SQFLOW) > + NVMF_OPT_DISABLE_SQFLOW|NVMF_OPT_FAIL_FAST_TMO) > > static struct nvme_ctrl * > nvmf_create_ctrl(struct device *dev, const char *buf) > @@ -1198,7 +1222,6 @@ > class_destroy(nvmf_class); > nvmf_host_put(nvmf_default_host); > > - BUILD_BUG_ON(sizeof(struct nvmf_common_command) != 64); Odd line to have in this patch. > BUILD_BUG_ON(sizeof(struct nvmf_connect_command) != 64); > BUILD_BUG_ON(sizeof(struct nvmf_property_get_command) != 64); > BUILD_BUG_ON(sizeof(struct nvmf_property_set_command) != 64); > > diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h > index 93f08d7..235c767 100644 > --- a/drivers/nvme/host/fabrics.h > +++ b/drivers/nvme/host/fabrics.h > @@ -15,6 +15,8 @@ > #define NVMF_DEF_RECONNECT_DELAY 10 > /* default to 600 seconds of reconnect attempts before giving up */ > #define NVMF_DEF_CTRL_LOSS_TMO 600 > +/* default to 30 seconds of fail fast IO commands */ > +#define NVMF_DEF_FAIL_FAST_TMO 30 > > /* > * Define a host as seen by the target. We allocate one at boot, but also > @@ -56,6 +58,7 @@ > NVMF_OPT_NR_WRITE_QUEUES = 1 << 17, > NVMF_OPT_NR_POLL_QUEUES = 1 << 18, > NVMF_OPT_TOS = 1 << 19, > + NVMF_OPT_FAIL_FAST_TMO = 1 << 20, > }; > > /** > @@ -89,6 +92,7 @@ > * @nr_write_queues: number of queues for write I/O > * @nr_poll_queues: number of queues for polling I/O > * @tos: type of service > + * @fast_fail_tmo_ns: Fast I/O fail timeout in nanoseconds; > */ > struct nvmf_ctrl_options { > unsigned mask; > @@ -111,6 +115,7 @@ > unsigned int nr_write_queues; > unsigned int nr_poll_queues; > int tos; > + u64 fail_fast_tmo_ns; > }; > > /* > > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > index 34ac79c..9b5f6d9 100644 > --- a/drivers/nvme/host/nvme.h > +++ b/drivers/nvme/host/nvme.h > @@ -282,6 +282,7 @@ > u16 icdoff; > u16 maxcmd; > int nr_reconnects; > + u64 start_reconnect_ns; > struct nvmf_ctrl_options *opts; > > struct page *discard_page; > > --- > Regards, > Victor _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-09 15:30 ` Victor Gladkov 2019-12-17 18:03 ` James Smart @ 2019-12-17 21:46 ` Sagi Grimberg 2019-12-18 22:20 ` James Smart 1 sibling, 1 reply; 14+ messages in thread From: Sagi Grimberg @ 2019-12-17 21:46 UTC (permalink / raw) To: Victor Gladkov, Hannes Reinecke, James Smart, linux-nvme On 12/9/19 7:30 AM, Victor Gladkov wrote: > On 12/8/19 14:18 PM, Hannes Reinecke wrote: >> >> On 12/6/19 11:18 PM, Sagi Grimberg wrote: >>> >>>>> --- >>>>> diff --git a/drivers/nvme/host/fabrics.c >>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644 >>>>> --- a/drivers/nvme/host/fabrics.c >>>>> +++ b/drivers/nvme/host/fabrics.c >>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct >>>>> nvme_ctrl *ctrl, >>>>> { >>>>> if (ctrl->state != NVME_CTRL_DELETING && >>>>> ctrl->state != NVME_CTRL_DEAD && >>>>> + !(ctrl->state == NVME_CTRL_CONNECTING && >>>>> + ((ktime_get_ns() - rq->start_time_ns) > >>>>> jiffies_to_nsecs(rq->timeout))) && >>>>> !blk_noretry_request(rq) && !(rq->cmd_flags & >>>>> REQ_NVME_MPATH)) >>>>> return BLK_STS_RESOURCE; >>>>> >>>> >>>> Did you test this to ensure it's doing what you expect. I'm not sure >>>> that all the timers are set right at this point. Most I/O's timeout >>>> from a deadline time stamped at blk_mq_start_request(). But that >>>> routine is actually called by the transports post the >>>> nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in >>>> flight, thus queued, and the blk-mq internal queuing doesn't count >>>> against the io timeout. I can't see anything that guarantees >>>> start_time_ns is set. >>> >>> I'm not sure this behavior for failing I/O always desired? some >>> consumers would actually not want the I/O to fail prematurely if we >>> are not multipathing... >>> >>> I think we need a fail_fast_tmo set in when establishing the >>> controller to get it right. >>> >> Agreed. This whole patch looks like someone is trying to reimplement >> fast_io_fail_tmo / dev_loss_tmo. >> As we're moving into unreliable fabrics I guess we'll need a similar mechanism. >> >> Cheers, >> >> Hannes > > > Following your suggestions, I added a new session parameter called "fast_fail_tmo". > The timeout is measured in seconds from the controller reconnect, any command beyond that timeout is rejected. > The new parameter value may be passed during ‘connect’, and its default value is 30 seconds. The default should be consistent with the existing behavior. > A value of -1 means no timeout (in similar to current behavior). > > --- > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c > index 74b8818..ed6b911 100644 > --- a/drivers/nvme/host/fabrics.c > +++ b/drivers/nvme/host/fabrics.c > @@ -406,6 +406,7 @@ > } > > ctrl->cntlid = le16_to_cpu(res.u16); > + ctrl->start_reconnect_ns = ktime_get_ns(); > > out_free_data: > kfree(data); > @@ -474,8 +475,12 @@ > bool nvmf_should_reconnect(struct nvme_ctrl *ctrl) > { > if (ctrl->opts->max_reconnects == -1 || > - ctrl->nr_reconnects < ctrl->opts->max_reconnects) > + ctrl->nr_reconnects < ctrl->opts->max_reconnects){ > + if(ctrl->nr_reconnects == 0) > + ctrl->start_reconnect_ns = ktime_get_ns(); > + > return true; > + } > > return false; > } > @@ -549,6 +554,8 @@ > { > if (ctrl->state != NVME_CTRL_DELETING && > ctrl->state != NVME_CTRL_DEAD && > + !(ctrl->state == NVME_CTRL_CONNECTING && ctrl->opts->fail_fast_tmo_ns >= 0 && > + ((ktime_get_ns() - ctrl->start_reconnect_ns) > ctrl->opts->fail_fast_tmo_ns)) && > !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) > return BLK_STS_RESOURCE; I cannot comprehend what is going on here... We should have a dedicated delayed_work that transitions the controller to a FAIL_FAST state and cancels the inflight requests again. This work should be triggered when the error is detected. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-17 21:46 ` Sagi Grimberg @ 2019-12-18 22:20 ` James Smart 0 siblings, 0 replies; 14+ messages in thread From: James Smart @ 2019-12-18 22:20 UTC (permalink / raw) To: Sagi Grimberg, Victor Gladkov, Hannes Reinecke, linux-nvme On 12/17/2019 1:46 PM, Sagi Grimberg wrote: > > > On 12/9/19 7:30 AM, Victor Gladkov wrote: >> On 12/8/19 14:18 PM, Hannes Reinecke wrote: >>> >>> On 12/6/19 11:18 PM, Sagi Grimberg wrote: >>>> >>>>>> --- >>>>>> diff --git a/drivers/nvme/host/fabrics.c >>>>>> b/drivers/nvme/host/fabrics.c index 74b8818..b58abc1 100644 >>>>>> --- a/drivers/nvme/host/fabrics.c >>>>>> +++ b/drivers/nvme/host/fabrics.c >>>>>> @@ -549,6 +549,8 @@ blk_status_t nvmf_fail_nonready_command(struct >>>>>> nvme_ctrl *ctrl, >>>>>> { >>>>>> if (ctrl->state != NVME_CTRL_DELETING && >>>>>> ctrl->state != NVME_CTRL_DEAD && >>>>>> + !(ctrl->state == NVME_CTRL_CONNECTING && >>>>>> + ((ktime_get_ns() - rq->start_time_ns) > >>>>>> jiffies_to_nsecs(rq->timeout))) && >>>>>> !blk_noretry_request(rq) && !(rq->cmd_flags & >>>>>> REQ_NVME_MPATH)) >>>>>> return BLK_STS_RESOURCE; >>>>>> >>>>> >>>>> Did you test this to ensure it's doing what you expect. I'm not sure >>>>> that all the timers are set right at this point. Most I/O's timeout >>>>> from a deadline time stamped at blk_mq_start_request(). But that >>>>> routine is actually called by the transports post the >>>>> nvmf_check_ready/fail_nonready calls. E.g. the io is not yet in >>>>> flight, thus queued, and the blk-mq internal queuing doesn't count >>>>> against the io timeout. I can't see anything that guarantees >>>>> start_time_ns is set. >>>> >>>> I'm not sure this behavior for failing I/O always desired? some >>>> consumers would actually not want the I/O to fail prematurely if we >>>> are not multipathing... >>>> >>>> I think we need a fail_fast_tmo set in when establishing the >>>> controller to get it right. >>>> >>> Agreed. This whole patch looks like someone is trying to reimplement >>> fast_io_fail_tmo / dev_loss_tmo. >>> As we're moving into unreliable fabrics I guess we'll need a similar >>> mechanism. >>> >>> Cheers, >>> >>> Hannes >> >> >> Following your suggestions, I added a new session parameter called >> "fast_fail_tmo". >> The timeout is measured in seconds from the controller reconnect, any >> command beyond that timeout is rejected. >> The new parameter value may be passed during ‘connect’, and its >> default value is 30 seconds. > > The default should be consistent with the existing behavior. > >> A value of -1 means no timeout (in similar to current behavior). >> >> --- >> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c >> index 74b8818..ed6b911 100644 >> --- a/drivers/nvme/host/fabrics.c >> +++ b/drivers/nvme/host/fabrics.c >> @@ -406,6 +406,7 @@ >> } >> >> ctrl->cntlid = le16_to_cpu(res.u16); >> + ctrl->start_reconnect_ns = ktime_get_ns(); >> >> out_free_data: >> kfree(data); >> @@ -474,8 +475,12 @@ >> bool nvmf_should_reconnect(struct nvme_ctrl *ctrl) >> { >> if (ctrl->opts->max_reconnects == -1 || >> - ctrl->nr_reconnects < ctrl->opts->max_reconnects) >> + ctrl->nr_reconnects < ctrl->opts->max_reconnects){ >> + if(ctrl->nr_reconnects == 0) >> + ctrl->start_reconnect_ns = ktime_get_ns(); >> + >> return true; >> + } >> >> return false; >> } >> @@ -549,6 +554,8 @@ >> { >> if (ctrl->state != NVME_CTRL_DELETING && >> ctrl->state != NVME_CTRL_DEAD && >> + !(ctrl->state == NVME_CTRL_CONNECTING && >> ctrl->opts->fail_fast_tmo_ns >= 0 && >> + ((ktime_get_ns() - ctrl->start_reconnect_ns) > >> ctrl->opts->fail_fast_tmo_ns)) && >> !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) >> return BLK_STS_RESOURCE; > > I cannot comprehend what is going on here... > > We should have a dedicated delayed_work that transitions the controller > to a FAIL_FAST state and cancels the inflight requests again. This > work should be triggered when the error is detected. I hope you're not suggesting a FAILFAST state. No new controller state is needed. I do agree that managing the time since transitioning to CONNECTING can be handled better and can address "abort all now" rather than waiting for retries to kick in. In other words: Add a controller flag of "failfast_expired" When entering CONNECTING, schedule a delayed work item based on failfast timeout value. If transition out of CONNECTING, terminate delayed work item and ensure failfast_expired is false. If delayed work item expires: set "failfast_expired" flag to true. Run through all inflight ios and cancel them. Update nvmf_fail_nonready_command() (above) per above, but with check on "!ctrl->failfast_expired". -- james _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] nvme-fabrics: reject I/O to offline device 2019-12-08 12:31 ` Hannes Reinecke 2019-12-09 15:30 ` Victor Gladkov @ 2019-12-15 12:33 ` Victor Gladkov 1 sibling, 0 replies; 14+ messages in thread From: Victor Gladkov @ 2019-12-15 12:33 UTC (permalink / raw) To: Hannes Reinecke, Sagi Grimberg, James Smart, linux-nvme, james.smart On 12/9/19 17:30 PM, Victor Gladkov wrote: > > On 12/8/19 14:18 PM, Hannes Reinecke wrote: > > > > On 12/6/19 11:18 PM, Sagi Grimberg wrote: > > > ............................................... > > > I think we need a fail_fast_tmo set in when establishing the > > > controller to get it right. > > > > > Agreed. This whole patch looks like someone is trying to reimplement > > fast_io_fail_tmo / dev_loss_tmo. > > As we're moving into unreliable fabrics I guess we'll need a similar > mechanism. > > > > Cheers, > > > > Hannes > > > Following your suggestions, I added a new session parameter called > "fast_fail_tmo". > The timeout is measured in seconds from the controller reconnect, any > command beyond that timeout is rejected. > The new parameter value may be passed during ‘connect’, and its default > value is 30 seconds. > A value of -1 means no timeout (in similar to current behavior). > > --- > Regards, > Victor James, Hannes & Sagi. Can you please review last proposed patch? Regards, Victor _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-12-18 22:20 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-01 7:59 [PATCH] nvme-fabrics: reject I/O to offline device Victor Gladkov 2019-12-02 22:26 ` Chaitanya Kulkarni 2019-12-02 22:47 ` James Smart 2019-12-03 10:04 ` Victor Gladkov 2019-12-03 16:19 ` James Smart 2019-12-04 8:28 ` Victor Gladkov 2019-12-06 0:38 ` James Smart 2019-12-06 22:18 ` Sagi Grimberg 2019-12-08 12:31 ` Hannes Reinecke 2019-12-09 15:30 ` Victor Gladkov 2019-12-17 18:03 ` James Smart 2019-12-17 21:46 ` Sagi Grimberg 2019-12-18 22:20 ` James Smart 2019-12-15 12:33 ` Victor Gladkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).