Re: [PATCH] nvme-fabrics: reject I/O to offline device

From: James Smart <james.smart@broadcom.com>
To: linux-nvme@lists.infradead.org
Subject: Re: [PATCH] nvme-fabrics: reject I/O to offline device
Date: Mon, 2 Dec 2019 14:47:07 -0800	[thread overview]
Message-ID: <78d980de-b2b8-bd47-fc3f-20314653598e@broadcom.com> (raw)
In-Reply-To: <2caa40133c444771b706406b928ad88a@kioxia.com>

On 11/30/2019 11:59 PM, Victor Gladkov wrote:
> Issue Description:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in reconnect state.
> NVMe controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until successful reconnection or until reconnect time-out is reached. The default reconnect time out is 10 minutes.
> This behavior is different than ISCSI where Commands during reconnect state returns with the following error: "rejecting I/O to offline device"
>
> Fix Description:
> Added a kernel module parameter "nvmef_reconnect_failfast" for nvme-fabrics module (default is true).
> Interfere in the decision whether to queue IO command or retry IO command. The interface takes into account the controller reconnect state, in a way that during reconnect state, IO commands shall fail immediacy (default) or according to IO command timeout (depends on the module parameter value), and IO retry is prevented. As a result, commands do not get stuck in in reconnect state.

This the patch seems incorrect at least as described. Multipathing 
inherently will "fastfail" and send to other paths. So the only way 
something is "stuck" is if it's last path. If last path, we definitely 
don't want to prematurely release i/o before we've given the subsystem 
every opportunity to reconnect.

What I hear you saying is you don't like the kernel default 
controller-loss-timeout of 600s. What was designed, if you didn't like 
the kernel default, was to use the per-connection "--cltr-loss-tmo" 
option for "nvme connect".  The auto-connect scripts or the admin script 
that specifies the connection can set whatever value it likes.

If that seems hard to do, perhaps it's time to implement the options 
that allow for a config file to specify new values to be used on all 
connections, or on connections to specific subsystems, and so on. But I 
don't think the kernel needs to change.

-- james

>
> branch nvme-5.5
>
> ---
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 74b8818..ef89aff 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -13,6 +13,10 @@
>   #include "nvme.h"
>   #include "fabrics.h"
>
> +static bool nvmef_reconnect_failfast = 1;
> +module_param_named(nvmef_reconnect_failfast, nvmef_reconnect_failfast, bool, S_IRUGO);
> +MODULE_PARM_DESC(nvmef_reconnect_failfast, "failfast flag for I/O when controler is reconnecting, else use I/O command timeout (default true).");
> +
>   static LIST_HEAD(nvmf_transports);
>   static DECLARE_RWSEM(nvmf_transports_rwsem);
>
> @@ -549,6 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
>   {
>          if (ctrl->state != NVME_CTRL_DELETING &&
>              ctrl->state != NVME_CTRL_DEAD &&
> +           !(ctrl->state == NVME_CTRL_CONNECTING && (((ktime_get_ns() - rq->start_time_ns) > jiffies_to_nsecs(rq->timeout)) || nvmef_reconnect_failfast)) &&
>              !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>                  return BLK_STS_RESOURCE;
>
>
> Regards,
> Victor
>
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme