All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7] nvme-fabrics: reject I/O to offline device
@ 2020-07-08 15:07 Victor Gladkov
  2020-07-09 20:34 ` James Smart
  2020-08-13 15:00 ` James Smart
  0 siblings, 2 replies; 10+ messages in thread
From: Victor Gladkov @ 2020-07-08 15:07 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Ewan D. Milne, Hannes Reinecke

Commands get stuck while Host NVMe controller (TCP or RDMA) is in
reconnect state. NVMe controller enters into reconnect state when it
loses connection with the target. It tries to reconnect every 10
seconds (default) until successful reconnection or until reconnect
time-out is reached. The default reconnect time out is 10 minutes.

To fix this long delay due to the default timeout we introduce new
session parameter "fast_io_fail_tmo". The timeout is measured in
seconds from the controller reconnect, any command beyond that
timeout is rejected. The new parameter value may be passed during
'connect'.
The default value of 0 means no timeout (similar to current behavior).

We add a new controller NVME_CTRL_FAILFAST_EXPIRED and respective
delayed work that updates the NVME_CTRL_FAILFAST_EXPIRED flag.

When the controller is entering the CONNECTING state, we schedule
the delayed_work based on failfast timeout value. If the transition
is out of CONNECTING, terminate delayed work item and ensure
failfast_expired is false. If delayed work item expires then set
"NVME_CTRL_FAILFAST_EXPIRED" flag to true.

We also update nvmf_fail_nonready_command() and
nvme_available_path() functions with check the
"NVME_CTRL_FAILFAST_EXPIRED" controller flag.

Signed-off-by: Victor Gladkov <victor.gladkov at kioxia.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

---
Changes From V6:

1. Changed according to Hannes Reinecke review: 
    in nvme_start_failfast_work() and nvme_stop_failfast_work() procedures.

Changes From V5:

1. Drop the "off" string option for fast_io_fail_tmo.

Changes From V4:

1. Remove subsysnqn from dev_info just keep "failfast expired" 
    in nvme_failfast_work().
2. Remove excess lock in nvme_failfast_work().
3. Change "timeout disabled" value to -1, '0' is to fail I/O right away now.
4. Add "off" string for fast_io_fail_tmo option as a -1 equivalent.

Changes From V3:

1. BUG_FIX: Fix a bug in nvme_start_failfast_work() amd
    nvme_stop_failfast_work() when accessing ctrl->opts as it will fail
    for PCIe transport when is called nvme_change_ctrl_state()
    from nvme_reset_work(), since we don't set the ctrl->opts for
    PCIe transport.
2. Line wrap in nvme_start_failfast_work(), nvme_parse_option() and
    for macro NVMF_ALLOWED_OPTS definition.
3. Just like all the state change code add a switch for newly
    added state handling outside of the state machine in
    nvme_state_change().
4. nvme_available_path() add /* fallthru */ after if..break inside
    switch which is under list_for_each_entry_rcu().
5. Align newly added member nvmf_ctrl_options fast_io_fail_tmp.
6. Fix the tabs before if in nvme_available_path() and line wrap
    for the same.
7. In nvme_failfast_work() use state != NVME_CTRL_CONNECTING
    instead of == to get rid of the parentheses and avoid char > 80.
8. Get rid of the ";" at the end of the comment for @fast_io_fail_tmp.
9. Change the commit log style to match the one we have in the NVMe
    repo.

Changes from V2:

1. Several coding style and small fixes.
2. Fix the comment for NVMF_DEF_FAIL_FAST_TMO.
3. Don't call functionality from the state machine.
4. Rename fast_fail_tmo -> fast_io_fail_tmo to match SCSI
    semantics.

Changes from V1:

1. Add a new session parameter called "fast_fail_tmo". The timeout
    is measured in seconds from the controller reconnect, any command
    beyond that timeout is rejected. The new parameter value may be
    passed during 'connect', and its default value is 30 seconds.
    A value of 0 means no timeout (similar to current behavior).
2. Add a controller flag of "failfast_expired".
3. Add dedicated delayed_work that update the "failfast_expired"
    controller flag.
4. When entering CONNECTING, schedule the delayed_work based on
    failfast timeout value. If transition out of CONNECTING, terminate
    delayed work item and ensure failfast_expired is false.
    If delayed work item expires: set "failfast_expired" flag to true.
5. Update nvmf_fail_nonready_command() with check the
    "failfast_expired" controller flag.

---
 drivers/nvme/host/core.c      | 49 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/fabrics.c   | 25 +++++++++++++++++++---
 drivers/nvme/host/fabrics.h   |  5 +++++
 drivers/nvme/host/multipath.c |  5 ++++-
 drivers/nvme/host/nvme.h      |  3 +++
 5 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f3c037f..ca990bb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -137,6 +137,37 @@ int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_try_sched_reset);

+static void nvme_failfast_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
+			struct nvme_ctrl, failfast_work);
+
+	if (ctrl->state != NVME_CTRL_CONNECTING)
+		return;
+
+	set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
+	dev_info(ctrl->device, "failfast expired\n");
+	nvme_kick_requeue_lists(ctrl);
+}
+
+static inline void nvme_start_failfast_work(struct nvme_ctrl *ctrl)
+{
+	if (!ctrl->opts || ctrl->opts->fast_io_fail_tmo == -1)
+		return;
+
+	schedule_delayed_work(&ctrl->failfast_work,
+			      ctrl->opts->fast_io_fail_tmo * HZ);
+}
+
+static inline void nvme_stop_failfast_work(struct nvme_ctrl *ctrl)
+{
+	if (!ctrl->opts)
+		return;
+
+	cancel_delayed_work_sync(&ctrl->failfast_work);
+	clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
+}
+
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
 {
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
@@ -387,8 +418,21 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 	}

 	spin_unlock_irqrestore(&ctrl->lock, flags);
-	if (changed && ctrl->state == NVME_CTRL_LIVE)
+	if (changed) {
+		switch (ctrl->state) {
+			case NVME_CTRL_LIVE:
+				if (old_state == NVME_CTRL_CONNECTING)
+					nvme_stop_failfast_work(ctrl);
  				nvme_kick_requeue_lists(ctrl);
+				break;
+			case NVME_CTRL_CONNECTING:
+				if (old_state == NVME_CTRL_RESETTING)
+					nvme_start_failfast_work(ctrl);
+				break;
+			default:
+				break;
+		}
+	}
 	return changed;
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
@@ -4045,6 +4089,7 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 {
 	nvme_mpath_stop(ctrl);
 	nvme_stop_keep_alive(ctrl);
+	nvme_stop_failfast_work(ctrl);
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fw_act_work);
 }
@@ -4111,6 +4156,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	int ret;

 	ctrl->state = NVME_CTRL_NEW;
+	clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
 	spin_lock_init(&ctrl->lock);
 	mutex_init(&ctrl->scan_lock);
 	INIT_LIST_HEAD(&ctrl->namespaces);
@@ -4125,6 +4171,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	init_waitqueue_head(&ctrl->state_wq);

 	INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
+	INIT_DELAYED_WORK(&ctrl->failfast_work, nvme_failfast_work);
 	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
 	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 2a6c819..4afe173 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -549,6 +549,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->state != NVME_CTRL_DELETING &&
 	    ctrl->state != NVME_CTRL_DEAD &&
+	    !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;

@@ -612,6 +613,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
 	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_TOS,			"tos=%d"		},
+	{ NVMF_OPT_FAIL_FAST_TMO,	"fast_io_fail_tmo=%d"	},
 	{ NVMF_OPT_ERR,			NULL			}
 };

@@ -631,6 +633,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	opts->reconnect_delay = NVMF_DEF_RECONNECT_DELAY;
 	opts->kato = NVME_DEFAULT_KATO;
 	opts->duplicate_connect = false;
+	opts->fast_io_fail_tmo = NVMF_DEF_FAIL_FAST_TMO;
 	opts->hdr_digest = false;
 	opts->data_digest = false;
 	opts->tos = -1; /* < 0 == use transport default */
@@ -751,6 +754,17 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 				pr_warn("ctrl_loss_tmo < 0 will reconnect forever\n");
 			ctrl_loss_tmo = token;
 			break;
+		case NVMF_OPT_FAIL_FAST_TMO:
+			if (match_int(args, &token)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			if (token >= 0)
+				pr_warn("I/O will fail on reconnect controller after"
+					" %d sec\n", token);
+			opts->fast_io_fail_tmo = token;
+			break;
 		case NVMF_OPT_HOSTNQN:
 			if (opts->host) {
 				pr_err("hostnqn already user-assigned: %s\n",
@@ -881,11 +895,16 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 		opts->nr_poll_queues = 0;
 		opts->duplicate_connect = true;
 	}
-	if (ctrl_loss_tmo < 0)
+	if (ctrl_loss_tmo < 0) {
 		opts->max_reconnects = -1;
-	else
+	} else {
 		opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
 						opts->reconnect_delay);
+		if (ctrl_loss_tmo < opts->fast_io_fail_tmo)
+			pr_warn("failfast tmo (%d) larger than controller "
+				"loss tmo (%d)\n",
+				opts->fast_io_fail_tmo, ctrl_loss_tmo);
+	}

 	if (!opts->host) {
 		kref_get(&nvmf_default_host->ref);
@@ -985,7 +1004,7 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
 #define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
 				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
 				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
-				 NVMF_OPT_DISABLE_SQFLOW)
+				 NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_FAIL_FAST_TMO)

 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a0ec40a..05a1158 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -15,6 +15,8 @@
 #define NVMF_DEF_RECONNECT_DELAY	10
 /* default to 600 seconds of reconnect attempts before giving up */
 #define NVMF_DEF_CTRL_LOSS_TMO		600
+/* default is -1: the fail fast mechanism is disable  */
+#define NVMF_DEF_FAIL_FAST_TMO		-1

 /*
  * Define a host as seen by the target.  We allocate one at boot, but also
@@ -56,6 +58,7 @@ enum {
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
 	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 	NVMF_OPT_TOS		= 1 << 19,
+	NVMF_OPT_FAIL_FAST_TMO	= 1 << 20,
 };

 /**
@@ -89,6 +92,7 @@ enum {
  * @nr_write_queues: number of queues for write I/O
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
+ * @fast_io_fail_tmo: Fast I/O fail timeout in seconds
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -111,6 +115,7 @@ struct nvmf_ctrl_options {
 	unsigned int		nr_write_queues;
 	unsigned int		nr_poll_queues;
 	int			tos;
+	int			fast_io_fail_tmo;
 };

 /*
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 54603bd..d8b7f45 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -278,9 +278,12 @@ static bool nvme_available_path(struct nvme_ns_head *head)

 	list_for_each_entry_rcu(ns, &head->list, siblings) {
 		switch (ns->ctrl->state) {
+		case NVME_CTRL_CONNECTING:
+			if (test_bit(NVME_CTRL_FAILFAST_EXPIRED,
+				     &ns->ctrl->flags))
+				break;
 		case NVME_CTRL_LIVE:
 		case NVME_CTRL_RESETTING:
-		case NVME_CTRL_CONNECTING:
 			/* fallthru */
 			return true;
 		default:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2e04a36..9112886 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -256,6 +256,7 @@ struct nvme_ctrl {
 	struct work_struct scan_work;
 	struct work_struct async_event_work;
 	struct delayed_work ka_work;
+	struct delayed_work failfast_work;
 	struct nvme_command ka_cmd;
 	struct work_struct fw_act_work;
 	unsigned long events;
@@ -290,6 +291,8 @@ struct nvme_ctrl {
 	u16 icdoff;
 	u16 maxcmd;
 	int nr_reconnects;
+	unsigned long flags;
+#define NVME_CTRL_FAILFAST_EXPIRED	0
 	struct nvmf_ctrl_options *opts;

 	struct page *discard_page;
--
1.8.3.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-08 15:07 [PATCH v7] nvme-fabrics: reject I/O to offline device Victor Gladkov
@ 2020-07-09 20:34 ` James Smart
  2020-07-10  4:50   ` Sagi Grimberg
  2020-08-13 15:00 ` James Smart
  1 sibling, 1 reply; 10+ messages in thread
From: James Smart @ 2020-07-09 20:34 UTC (permalink / raw)
  To: Victor Gladkov, linux-nvme; +Cc: Sagi Grimberg, Ewan D. Milne, Hannes Reinecke



On 7/8/2020 8:07 AM, Victor Gladkov wrote:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
> reconnect state. NVMe controller enters into reconnect state when it
> loses connection with the target. It tries to reconnect every 10
> seconds (default) until successful reconnection or until reconnect
> time-out is reached. The default reconnect time out is 10 minutes.
>
> To fix this long delay due to the default timeout we introduce new
> session parameter "fast_io_fail_tmo". The timeout is measured in
> seconds from the controller reconnect, any command beyond that
> timeout is rejected. The new parameter value may be passed during
> 'connect'.
> The default value of 0 means no timeout (similar to current behavior).
>
> We add a new controller NVME_CTRL_FAILFAST_EXPIRED and respective
> delayed work that updates the NVME_CTRL_FAILFAST_EXPIRED flag.
>
> When the controller is entering the CONNECTING state, we schedule
> the delayed_work based on failfast timeout value. If the transition
> is out of CONNECTING, terminate delayed work item and ensure
> failfast_expired is false. If delayed work item expires then set
> "NVME_CTRL_FAILFAST_EXPIRED" flag to true.
>
> We also update nvmf_fail_nonready_command() and
> nvme_available_path() functions with check the
> "NVME_CTRL_FAILFAST_EXPIRED" controller flag.
>
> Signed-off-by: Victor Gladkov <victor.gladkov at kioxia.com>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
>
> ...
> ---
>   drivers/nvme/host/core.c      | 49 ++++++++++++++++++++++++++++++++++++++++++-
>   drivers/nvme/host/fabrics.c   | 25 +++++++++++++++++++---
>   drivers/nvme/host/fabrics.h   |  5 +++++
>   drivers/nvme/host/multipath.c |  5 ++++-
>   drivers/nvme/host/nvme.h      |  3 +++
>   5 files changed, 82 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f3c037f..ca990bb 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -137,6 +137,37 @@ int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
>   }
>   EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
>
> +static void nvme_failfast_work(struct work_struct *work)
> +{
> +	struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
> +			struct nvme_ctrl, failfast_work);
> +
> +	if (ctrl->state != NVME_CTRL_CONNECTING)
> +		return;
> +
> +	set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> +	dev_info(ctrl->device, "failfast expired\n");
> +	nvme_kick_requeue_lists(ctrl);
> +}
> +
> +static inline void nvme_start_failfast_work(struct nvme_ctrl *ctrl)
> +{
> +	if (!ctrl->opts || ctrl->opts->fast_io_fail_tmo == -1)
> +		return;
> +
> +	schedule_delayed_work(&ctrl->failfast_work,
> +			      ctrl->opts->fast_io_fail_tmo * HZ);
> +}
> +
> +static inline void nvme_stop_failfast_work(struct nvme_ctrl *ctrl)
> +{
> +	if (!ctrl->opts)
> +		return;
> +
> +	cancel_delayed_work_sync(&ctrl->failfast_work);
> +	clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> +}
> +
>   int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
>   {
>

Looks good to me.  Only nit I see is - why are we checking ctrl->opts in 
the xxx_failfast_work() routines ?  If that's actually null that's very 
bad news. It should only be null of the controller is in the process of 
being deleted, which should have terminated all these routine sequences 
before then.

Otherwise...
Reviewed-by: James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-09 20:34 ` James Smart
@ 2020-07-10  4:50   ` Sagi Grimberg
  2020-07-10  6:58     ` Hannes Reinecke
  2020-07-14 11:04     ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: Sagi Grimberg @ 2020-07-10  4:50 UTC (permalink / raw)
  To: James Smart, Victor Gladkov, linux-nvme
  Cc: Keith Busch, Hannes Reinecke, Ewan D. Milne, Christoph Hellwig


> Looks good to me.  Only nit I see is - why are we checking ctrl->opts in 
> the xxx_failfast_work() routines ?

Which brings me to the question, is there any desire to have this
behavior in pci as well?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-10  4:50   ` Sagi Grimberg
@ 2020-07-10  6:58     ` Hannes Reinecke
  2020-07-14 11:04     ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-07-10  6:58 UTC (permalink / raw)
  To: Sagi Grimberg, James Smart, Victor Gladkov, linux-nvme
  Cc: Keith Busch, Christoph Hellwig, Ewan D. Milne

On 7/10/20 6:50 AM, Sagi Grimberg wrote:
> 
>> Looks good to me.  Only nit I see is - why are we checking ctrl->opts 
>> in the xxx_failfast_work() routines ?
> 
> Which brings me to the question, is there any desire to have this
> behavior in pci as well?

I was under the impression that an offline NVMe-PCI device is a pretty 
terminal condition. This patch only would make sense if a reconnect 
would be able to recover this situation.
For NVMe-PCI we would notify the application earlier; device's still 
dead, though, so I'm not sure if that makes a huge difference.
Nor should it be a common occurence.

Unless you talk about ARM servers, for which frying NVMe devices seems 
to be a hobby :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-10  4:50   ` Sagi Grimberg
  2020-07-10  6:58     ` Hannes Reinecke
@ 2020-07-14 11:04     ` Christoph Hellwig
  2020-07-22 22:57       ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2020-07-14 11:04 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: James Smart, linux-nvme, Victor Gladkov, Hannes Reinecke,
	Keith Busch, Christoph Hellwig, Ewan D. Milne

On Thu, Jul 09, 2020 at 09:50:13PM -0700, Sagi Grimberg wrote:
>
>> Looks good to me.  Only nit I see is - why are we checking ctrl->opts in 
>> the xxx_failfast_work() routines ?
>
> Which brings me to the question, is there any desire to have this
> behavior in pci as well?

Where is this discussion coming from?  In generaly if something is not
transport specific I think we should support it for all transports.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-14 11:04     ` Christoph Hellwig
@ 2020-07-22 22:57       ` Sagi Grimberg
  2020-08-09 15:32         ` Victor Gladkov
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2020-07-22 22:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Smart, Ewan D. Milne, Victor Gladkov, linux-nvme,
	Keith Busch, Hannes Reinecke


>>> Looks good to me.  Only nit I see is - why are we checking ctrl->opts in
>>> the xxx_failfast_work() routines ?
>>
>> Which brings me to the question, is there any desire to have this
>> behavior in pci as well?
> 
> Where is this discussion coming from?  In generaly if something is not
> transport specific I think we should support it for all transports.

Agreed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-22 22:57       ` Sagi Grimberg
@ 2020-08-09 15:32         ` Victor Gladkov
  2020-08-11 20:56           ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Victor Gladkov @ 2020-08-09 15:32 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke
  Cc: Keith Busch, Christoph Hellwig, James Smart, linux-nvme, Ewan D. Milne

On 7/8/2020 8:07 AM, Victor Gladkov wrote:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
> reconnect state. NVMe controller enters into reconnect state when it
> loses connection with the target. It tries to reconnect every 10
> seconds (default) until successful reconnection or until reconnect
> time-out is reached. The default reconnect time out is 10 minutes.
>
> To fix this long delay due to the default timeout we introduce new
> session parameter "fast_io_fail_tmo". The timeout is measured in
> seconds from the controller reconnect, any command beyond that
> timeout is rejected. The new parameter value may be passed during
> 'connect'.
> The default value of 0 means no timeout (similar to current behavior).

I'd like to remind you that this improvement is pending for commit.
Pay attention, please.

Regards,
Victor.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-08-09 15:32         ` Victor Gladkov
@ 2020-08-11 20:56           ` Sagi Grimberg
  2020-08-12 14:09             ` Victor Gladkov
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2020-08-11 20:56 UTC (permalink / raw)
  To: Victor Gladkov, Hannes Reinecke
  Cc: Keith Busch, Ewan D. Milne, Christoph Hellwig, linux-nvme, James Smart


>> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
>> reconnect state. NVMe controller enters into reconnect state when it
>> loses connection with the target. It tries to reconnect every 10
>> seconds (default) until successful reconnection or until reconnect
>> time-out is reached. The default reconnect time out is 10 minutes.
>>
>> To fix this long delay due to the default timeout we introduce new
>> session parameter "fast_io_fail_tmo". The timeout is measured in
>> seconds from the controller reconnect, any command beyond that
>> timeout is rejected. The new parameter value may be passed during
>> 'connect'.
>> The default value of 0 means no timeout (similar to current behavior).
> 
> I'd like to remind you that this improvement is pending for commit.
> Pay attention, please.

I think we still have a question to why is this fabrics specific, and
if this is needed in fabrics, why is it not needed in pci as well.

Keith? Personally speaking, I also share Chirstoph's opinion that
if it's not clearly fabrics specific, we should try to make pci
and fabrics unified.

Your thoughts on this?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-08-11 20:56           ` Sagi Grimberg
@ 2020-08-12 14:09             ` Victor Gladkov
  0 siblings, 0 replies; 10+ messages in thread
From: Victor Gladkov @ 2020-08-12 14:09 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke
  Cc: Keith Busch, Ewan D. Milne, Christoph Hellwig, linux-nvme, James Smart

On 2020/8/11 23:56, Sagi Grimberg wrote:
> 
> I think we still have a question to why is this fabrics specific, and if this is
> needed in fabrics, why is it not needed in pci as well.
> 
> Keith? Personally speaking, I also share Chirstoph's opinion that if it's not
> clearly fabrics specific, we should try to make pci and fabrics unified.
> 
> Your thoughts on this?

PCI doesn't define 'timeout'/'number of retries' for reconnect of the controller.
And the controller shuts down immediately if controller times out while starting.
NVME_CTRL_CONNECTING state used as transition state in the
nvme_reset_work() procedure to save similarities with fabric transports.
See below quotes from pci.c

1. 
----------------------------------------------------
Quote from nvme_timeout() function (pci.h)
	/*
	 * Shutdown immediately if controller times out while starting. The
	 * reset work will see the pci device disabled when it gets the forced
	 * cancellation error. All outstanding requests are completed on
	 * shutdown, so we return BLK_EH_DONE.
	 */
	switch (dev->ctrl.state) {
	case NVME_CTRL_CONNECTING:
		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
		/* fall through */
----------------------------------------------------
2.
----------------------------------------------------
Quote from nvme_reset_work() function (pci.h)
	/*
	 * Introduce CONNECTING state from nvme-fc/rdma transports to mark the
	 * initializing procedure here.
	 */
	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
----------------------------------------------------


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7] nvme-fabrics: reject I/O to offline device
  2020-07-08 15:07 [PATCH v7] nvme-fabrics: reject I/O to offline device Victor Gladkov
  2020-07-09 20:34 ` James Smart
@ 2020-08-13 15:00 ` James Smart
  1 sibling, 0 replies; 10+ messages in thread
From: James Smart @ 2020-08-13 15:00 UTC (permalink / raw)
  To: Victor Gladkov, linux-nvme; +Cc: Sagi Grimberg, Ewan D. Milne, Hannes Reinecke

On 7/8/2020 8:07 AM, Victor Gladkov wrote:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
> reconnect state. NVMe controller enters into reconnect state when it
> loses connection with the target. It tries to reconnect every 10
> seconds (default) until successful reconnection or until reconnect
> time-out is reached. The default reconnect time out is 10 minutes.
>
> To fix this long delay due to the default timeout we introduce new
> session parameter "fast_io_fail_tmo". The timeout is measured in
> seconds from the controller reconnect, any command beyond that
> timeout is rejected. The new parameter value may be passed during
> 'connect'.
> The default value of 0 means no timeout (similar to current behavior).
>
> We add a new controller NVME_CTRL_FAILFAST_EXPIRED and respective
> delayed work that updates the NVME_CTRL_FAILFAST_EXPIRED flag.
>
> When the controller is entering the CONNECTING state, we schedule
> the delayed_work based on failfast timeout value. If the transition
> is out of CONNECTING, terminate delayed work item and ensure
> failfast_expired is false. If delayed work item expires then set
> "NVME_CTRL_FAILFAST_EXPIRED" flag to true.
>
> We also update nvmf_fail_nonready_command() and
> nvme_available_path() functions with check the
> "NVME_CTRL_FAILFAST_EXPIRED" controller flag.
>
> Signed-off-by: Victor Gladkov <victor.gladkov at kioxia.com>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Victor,

when we talked about what fast fail does, we discussed how it was 
different from SCSI's notion of fast fail, which was in support of 
mpio.  nvme didn't need scsi's fast_fail behavior as the fabric check 
ready's are spotting the mpath attributes immediately. So this patch set 
is for "normal" io that normally enters retries.  You replied with a 
very important distinction about this patch that isn't described by the 
above.

Please add this paragraph to the patch description:

 From email from Victor on 12/3/2019 at 2:04 am US PST:
Applications are expecting commands to complete with success or error 
within a certain timeout (30 seconds by default).  The NVMe host is 
enforcing that timeout while it is connected, never the less, during 
reconnection, the timeout is not enforced and commands may get stuck for 
a long period or even forever.

-- james




_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-08-13 15:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 15:07 [PATCH v7] nvme-fabrics: reject I/O to offline device Victor Gladkov
2020-07-09 20:34 ` James Smart
2020-07-10  4:50   ` Sagi Grimberg
2020-07-10  6:58     ` Hannes Reinecke
2020-07-14 11:04     ` Christoph Hellwig
2020-07-22 22:57       ` Sagi Grimberg
2020-08-09 15:32         ` Victor Gladkov
2020-08-11 20:56           ` Sagi Grimberg
2020-08-12 14:09             ` Victor Gladkov
2020-08-13 15:00 ` James Smart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.