All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] nvme: expand nvmf_check_if_ready checks
@ 2018-03-28 20:21 James Smart
  2018-03-29 20:57 ` Mike Snitzer
  2018-04-04 13:42 ` Sagi Grimberg
  0 siblings, 2 replies; 5+ messages in thread
From: James Smart @ 2018-03-28 20:21 UTC (permalink / raw)


The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted.  When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Signed-off-by: James Smart <james.smart at broadcom.com>

---
v2:
  needed to set nvme status when rejecting io
v3:
  renamed qlive to queue_live and connectivity to is_connected
  converted from inline routine to fabrics exported routine.
---
 drivers/nvme/host/fabrics.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/fabrics.h | 33 ++------------------
 drivers/nvme/host/fc.c      | 12 ++-----
 drivers/nvme/host/rdma.c    | 14 ++-------
 drivers/nvme/target/loop.c  | 11 ++-----
 5 files changed, 85 insertions(+), 61 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index a1c58e35075e..73ac41f93ae0 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -536,6 +536,82 @@ static struct nvmf_transport_ops *nvmf_lookup_transport(
 	return NULL;
 }
 
+blk_status_t nvmf_check_if_ready(struct nvme_ctrl *ctrl, struct request *rq,
+		bool queue_live, bool is_connected)
+{
+	struct nvme_command *cmd = nvme_req(rq)->cmd;
+
+	if (ctrl->state == NVME_CTRL_LIVE && is_connected)
+		return BLK_STS_OK;
+
+	switch (ctrl->state) {
+	case NVME_CTRL_DELETING:
+		goto reject_io;
+
+	case NVME_CTRL_NEW:
+	case NVME_CTRL_CONNECTING:
+		if (!is_connected)
+			/*
+			 * This is the case of starting a new
+			 * association but connectivity was lost
+			 * before it was fully created. We need to
+			 * error the commands used to initialize the
+			 * controller so the reconnect can go into a
+			 * retry attempt. The commands should all be
+			 * marked REQ_FAILFAST_DRIVER, which will hit
+			 * the reject path below. Anything else will
+			 * be queued while the state settles.
+			 */
+			goto reject_or_queue_io;
+
+		if (queue_live ||
+		    (cmd->common.opcode == nvme_fabrics_command &&
+		     cmd->fabrics.fctype == nvme_fabrics_type_connect))
+			/*
+			 * let anything to a live queue through.
+			 * Typically this will be commands to the admin
+			 * queue which are either being used to initialize
+			 * the controller or are commands being issued
+			 * via the cli/ioctl path.
+			 *
+			 * if the q isn't live, allow only the connect
+			 * command through.
+			 */
+			return BLK_STS_OK;
+
+		/*
+		 * q isn't live to accept the command.
+		 * fall-thru to the reject_or_queue_io clause
+		 */
+		break;
+
+	/* these cases fall-thru
+	 * case NVME_CTRL_LIVE:
+	 * case NVME_CTRL_RESETTING:
+	 */
+	default:
+		break;
+	}
+
+reject_or_queue_io:
+	/*
+	 * Any other new io is something we're not in a state to send
+	 * to the device. Default action is to busy it and retry it
+	 * after the controller state is recovered. However, anything
+	 * marked for failfast or nvme multipath is immediately failed.
+	 * Note: commands used to initialize the controller will be
+	 *  marked for failfast.
+	 * Note: nvme cli/ioctl commands are marked for failfast.
+	 */
+	if (!blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
+		return BLK_STS_RESOURCE;
+
+reject_io:
+	nvme_req(rq)->status = NVME_SC_ABORT_REQ;
+	return BLK_STS_IOERR;
+}
+EXPORT_SYMBOL_GPL(nvmf_check_if_ready);
+
 static const match_table_t opt_tokens = {
 	{ NVMF_OPT_TRANSPORT,		"transport=%s"		},
 	{ NVMF_OPT_TRADDR,		"traddr=%s"		},
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a3145d90c1d2..ef46c915b7b5 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -157,36 +157,7 @@ void nvmf_unregister_transport(struct nvmf_transport_ops *ops);
 void nvmf_free_options(struct nvmf_ctrl_options *opts);
 int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
-
-static inline blk_status_t nvmf_check_init_req(struct nvme_ctrl *ctrl,
-		struct request *rq)
-{
-	struct nvme_command *cmd = nvme_req(rq)->cmd;
-
-	/*
-	 * We cannot accept any other command until the connect command has
-	 * completed, so only allow connect to pass.
-	 */
-	if (!blk_rq_is_passthrough(rq) ||
-	    cmd->common.opcode != nvme_fabrics_command ||
-	    cmd->fabrics.fctype != nvme_fabrics_type_connect) {
-		/*
-		 * Connecting state means transport disruption or initial
-		 * establishment, which can take a long time and even might
-		 * fail permanently, fail fast to give upper layers a chance
-		 * to failover.
-		 * Deleting state means that the ctrl will never accept commands
-		 * again, fail it permanently.
-		 */
-		if (ctrl->state == NVME_CTRL_CONNECTING ||
-		    ctrl->state == NVME_CTRL_DELETING) {
-			nvme_req(rq)->status = NVME_SC_ABORT_REQ;
-			return BLK_STS_IOERR;
-		}
-		return BLK_STS_RESOURCE; /* try again later */
-	}
-
-	return BLK_STS_OK;
-}
+blk_status_t nvmf_check_if_ready(struct nvme_ctrl *ctrl,
+	struct request *rq, bool queue_live, bool is_connected);
 
 #endif /* _NVME_FABRICS_H */
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index c6e719b2f3ca..6cb26bcf6ec0 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2277,14 +2277,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue,
 	return BLK_STS_OK;
 }
 
-static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue,
-		struct request *rq)
-{
-	if (unlikely(!test_bit(NVME_FC_Q_LIVE, &queue->flags)))
-		return nvmf_check_init_req(&queue->ctrl->ctrl, rq);
-	return BLK_STS_OK;
-}
-
 static blk_status_t
 nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 			const struct blk_mq_queue_data *bd)
@@ -2300,7 +2292,9 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	u32 data_len;
 	blk_status_t ret;
 
-	ret = nvme_fc_is_ready(queue, rq);
+	ret = nvmf_check_if_ready(&queue->ctrl->ctrl, rq,
+		test_bit(NVME_FC_Q_LIVE, &queue->flags),
+		ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f5f460b8045c..e65b762723a3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1593,17 +1593,6 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
 	return BLK_EH_HANDLED;
 }
 
-/*
- * We cannot accept any other command until the Connect command has completed.
- */
-static inline blk_status_t
-nvme_rdma_is_ready(struct nvme_rdma_queue *queue, struct request *rq)
-{
-	if (unlikely(!test_bit(NVME_RDMA_Q_LIVE, &queue->flags)))
-		return nvmf_check_init_req(&queue->ctrl->ctrl, rq);
-	return BLK_STS_OK;
-}
-
 static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 		const struct blk_mq_queue_data *bd)
 {
@@ -1619,7 +1608,8 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	WARN_ON_ONCE(rq->tag < 0);
 
-	ret = nvme_rdma_is_ready(queue, rq);
+	ret = nvmf_check_if_ready(&queue->ctrl->ctrl, rq,
+		test_bit(NVME_RDMA_Q_LIVE, &queue->flags), true);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 861d1509b22b..e10987f87603 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -149,14 +149,6 @@ nvme_loop_timeout(struct request *rq, bool reserved)
 	return BLK_EH_HANDLED;
 }
 
-static inline blk_status_t nvme_loop_is_ready(struct nvme_loop_queue *queue,
-		struct request *rq)
-{
-	if (unlikely(!test_bit(NVME_LOOP_Q_LIVE, &queue->flags)))
-		return nvmf_check_init_req(&queue->ctrl->ctrl, rq);
-	return BLK_STS_OK;
-}
-
 static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 		const struct blk_mq_queue_data *bd)
 {
@@ -166,7 +158,8 @@ static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_loop_iod *iod = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
 
-	ret = nvme_loop_is_ready(queue, req);
+	ret = nvmf_check_if_ready(&queue->ctrl->ctrl, req,
+		test_bit(NVME_LOOP_Q_LIVE, &queue->flags), true);
 	if (unlikely(ret))
 		return ret;
 
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3] nvme: expand nvmf_check_if_ready checks
  2018-03-28 20:21 [PATCH v3] nvme: expand nvmf_check_if_ready checks James Smart
@ 2018-03-29 20:57 ` Mike Snitzer
  2018-03-29 22:13   ` James Smart
  2018-04-04 13:42 ` Sagi Grimberg
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2018-03-29 20:57 UTC (permalink / raw)


On Wed, Mar 28 2018 at  4:21pm -0400,
James Smart <jsmart2021@gmail.com> wrote:

> The nvmf_check_if_ready() checks that were added are very simplistic.
> As such, the routine allows a lot of cases to fail ios during windows
> of reset or re-connection. In cases where there are not multi-path
> options present, the error goes back to the callee - the filesystem
> or application. Not good.
> 
> The common routine was rewritten and calling syntax slightly expanded
> so that per-transport is_ready routines don't need to be present.
> The transports now call the routine directly. The routine is now a
> fabrics routine rather than an inline function.
> 
> The routine now looks at controller state to decide the action to
> take. Some states mandate io failure. Others define the condition where
> a command can be accepted.  When the decision is unclear, a generic
> queue-or-reject check is made to look for failfast or multipath ios and
> only fails the io if it is so marked. Otherwise, the io will be queued
> and wait for the controller state to resolve.
> 
> Signed-off-by: James Smart <james.smart at broadcom.com>
> 
> ---
> v2:
>   needed to set nvme status when rejecting io
> v3:
>   renamed qlive to queue_live and connectivity to is_connected
>   converted from inline routine to fabrics exported routine.

Hey James,

I tried this patch against my test_01_nvme_offline test in the dm-mpath
"mptest" testsuite: git clone git://github.com/snitm/mptest.git

Unfortunately I'm still seeing that nvmefcloop path recovery take ages
(as compared to native FC, via tcmloop).  And that errors aren't
noticed, and paths switched by DM multipath, until after a considerable
stall while waiting for nvme (I assume connection recovery, etc).

As such mptest's tests/test_01_nvme_offline still doesn't even come
close to achieving the level of fitness, or throughput, that the SCSI
one does (tests/test_01_sdev_offline).

Now it could easily be that the same SCSI test ported to NVMe, like I've
done, is just a misplaced attempt at having NVMe do something it
cannot.

Contrast mptest's lib/failpath_nvme_offline vs lib/failpath_sdev_offline
-- as you can see the lib/failpath_nvme_offline has quite a few sleeps
to prop up the test to allow fio to make _any_ progress at all.  Whereas
the tcmloop variant can quickly transition devices from "offline" to
"running" and vice-versa.  Could be my del_remote_ports to
add_remote_ports (and vice-versa) in a loop just isn't even close to
equivalent?  E.g. way more heavy than SCSI's "offline" to "running".

If you'd like to try mptest out here are some things to help you get
going quickly:

install targetcli, fio, nvme-cli, nvmetcli

1) To run NVMe fcloop tests, you need to modify runtest:
   export MULTIPATH_QUEUE_MODE="bio"
   export MULTIPATH_BACKEND_MODULE="nvmefcloop"
   export NVME_FCLOOP_DEVICE="/dev/some_test_device"
   (e.g. /dev/pmem0, /dev/nvme0n1, /dev/sdb or whatever)
   then:
   ./runtest tests/test_01_nvme_offline

2) To run SCSI tcmloop tests, you need to modify runtest:
   export MULTIPATH_BACKEND_MODULE="tcmloop"
   then:
   ./runtest tests/test_01_sdev_offline

3) Ideally the various sleeps in lib/failpath_nvme_offline would be
removed, but right now if you were to do that the test wouldn't make any
forward progress and the multipath device would fail to teardown due to
pending IO that never completes due to the devices getting torn down
quicker than any IO could be issued.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3] nvme: expand nvmf_check_if_ready checks
  2018-03-29 20:57 ` Mike Snitzer
@ 2018-03-29 22:13   ` James Smart
  2018-03-30  0:23     ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: James Smart @ 2018-03-29 22:13 UTC (permalink / raw)


On 3/29/2018 1:57 PM, Mike Snitzer wrote:
> On Wed, Mar 28 2018 at  4:21pm -0400,
> James Smart <jsmart2021@gmail.com> wrote:
> 
>> The nvmf_check_if_ready() checks that were added are very simplistic.
>> As such, the routine allows a lot of cases to fail ios during windows
>> of reset or re-connection. In cases where there are not multi-path
>> options present, the error goes back to the callee - the filesystem
>> or application. Not good.
>>
>> The common routine was rewritten and calling syntax slightly expanded
>> so that per-transport is_ready routines don't need to be present.
>> The transports now call the routine directly. The routine is now a
>> fabrics routine rather than an inline function.
>>
>> The routine now looks at controller state to decide the action to
>> take. Some states mandate io failure. Others define the condition where
>> a command can be accepted.  When the decision is unclear, a generic
>> queue-or-reject check is made to look for failfast or multipath ios and
>> only fails the io if it is so marked. Otherwise, the io will be queued
>> and wait for the controller state to resolve.
>>
>> Signed-off-by: James Smart <james.smart at broadcom.com>
>>
>> ---
>> v2:
>>    needed to set nvme status when rejecting io
>> v3:
>>    renamed qlive to queue_live and connectivity to is_connected
>>    converted from inline routine to fabrics exported routine.
> 
> Hey James,
> 
> I tried this patch against my test_01_nvme_offline test in the dm-mpath
> "mptest" testsuite: git clone git://github.com/snitm/mptest.git
> 
> Unfortunately I'm still seeing that nvmefcloop path recovery take ages
> (as compared to native FC, via tcmloop).  And that errors aren't
> noticed, and paths switched by DM multipath, until after a considerable
> stall while waiting for nvme (I assume connection recovery, etc).
> 
> As such mptest's tests/test_01_nvme_offline still doesn't even come
> close to achieving the level of fitness, or throughput, that the SCSI
> one does (tests/test_01_sdev_offline).
> 
> Now it could easily be that the same SCSI test ported to NVMe, like I've
> done, is just a misplaced attempt at having NVMe do something it
> cannot.
> 
> Contrast mptest's lib/failpath_nvme_offline vs lib/failpath_sdev_offline
> -- as you can see the lib/failpath_nvme_offline has quite a few sleeps
> to prop up the test to allow fio to make _any_ progress at all.  Whereas
> the tcmloop variant can quickly transition devices from "offline" to
> "running" and vice-versa.  Could be my del_remote_ports to
> add_remote_ports (and vice-versa) in a loop just isn't even close to
> equivalent?  E.g. way more heavy than SCSI's "offline" to "running".
> 
> If you'd like to try mptest out here are some things to help you get
> going quickly:
> 
> install targetcli, fio, nvme-cli, nvmetcli
> 
> 1) To run NVMe fcloop tests, you need to modify runtest:
>     export MULTIPATH_QUEUE_MODE="bio"
>     export MULTIPATH_BACKEND_MODULE="nvmefcloop"
>     export NVME_FCLOOP_DEVICE="/dev/some_test_device"
>     (e.g. /dev/pmem0, /dev/nvme0n1, /dev/sdb or whatever)
>     then:
>     ./runtest tests/test_01_nvme_offline
> 
> 2) To run SCSI tcmloop tests, you need to modify runtest:
>     export MULTIPATH_BACKEND_MODULE="tcmloop"
>     then:
>     ./runtest tests/test_01_sdev_offline
> 
> 3) Ideally the various sleeps in lib/failpath_nvme_offline would be
> removed, but right now if you were to do that the test wouldn't make any
> forward progress and the multipath device would fail to teardown due to
> pending IO that never completes due to the devices getting torn down
> quicker than any IO could be issued.
> 

Thanks... I'm not sure that this invalidates the patch, as the real 
reason for the patch is to stop the EIO cases without multipath where 
they should have been requeued - the opposite of what you want.  I'll 
take a further look.

-- james

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3] nvme: expand nvmf_check_if_ready checks
  2018-03-29 22:13   ` James Smart
@ 2018-03-30  0:23     ` Mike Snitzer
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Snitzer @ 2018-03-30  0:23 UTC (permalink / raw)


On Thu, Mar 29 2018 at  6:13pm -0400,
James Smart <jsmart2021@gmail.com> wrote:

> Thanks... I'm not sure that this invalidates the patch, as the real
> reason for the patch is to stop the EIO cases without multipath
> where they should have been requeued - the opposite of what you
> want.  I'll take a further look.

I really wasn't making any statement about the patch.  Given you cc'd me
on this patch I thought it was the FAILFAST stuff we talked about last
week.  So I was pointing out I'm not seeing any failfast improvement.
But I'm not confident my mptest test is completely valid... so I was
fishing for your (or others') help.

But looking closer the header does speak to honoring failfast if
present.  dm-multipath does set failfast, so why wouldn't that be
applicable?

Maybe you're saying the patch is more concerned with other details but
did make some attempt to honor failfast..

Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3] nvme: expand nvmf_check_if_ready checks
  2018-03-28 20:21 [PATCH v3] nvme: expand nvmf_check_if_ready checks James Smart
  2018-03-29 20:57 ` Mike Snitzer
@ 2018-04-04 13:42 ` Sagi Grimberg
  1 sibling, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2018-04-04 13:42 UTC (permalink / raw)




On 03/28/2018 11:21 PM, James Smart wrote:
> The nvmf_check_if_ready() checks that were added are very simplistic.
> As such, the routine allows a lot of cases to fail ios during windows
> of reset or re-connection. In cases where there are not multi-path
> options present, the error goes back to the callee - the filesystem
> or application. Not good.
> 
> The common routine was rewritten and calling syntax slightly expanded
> so that per-transport is_ready routines don't need to be present.
> The transports now call the routine directly. The routine is now a
> fabrics routine rather than an inline function.
> 
> The routine now looks at controller state to decide the action to
> take. Some states mandate io failure. Others define the condition where
> a command can be accepted.  When the decision is unclear, a generic
> queue-or-reject check is made to look for failfast or multipath ios and
> only fails the io if it is so marked. Otherwise, the io will be queued
> and wait for the controller state to resolve.
> 
> Signed-off-by: James Smart <james.smart at broadcom.com>
> 
> ---
> v2:
>    needed to set nvme status when rejecting io
> v3:
>    renamed qlive to queue_live and connectivity to is_connected
>    converted from inline routine to fabrics exported routine.
> ---
>   drivers/nvme/host/fabrics.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/fabrics.h | 33 ++------------------
>   drivers/nvme/host/fc.c      | 12 ++-----
>   drivers/nvme/host/rdma.c    | 14 ++-------
>   drivers/nvme/target/loop.c  | 11 ++-----
>   5 files changed, 85 insertions(+), 61 deletions(-)
> 
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index a1c58e35075e..73ac41f93ae0 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -536,6 +536,82 @@ static struct nvmf_transport_ops *nvmf_lookup_transport(
>   	return NULL;
>   }
>   
> +blk_status_t nvmf_check_if_ready(struct nvme_ctrl *ctrl, struct request *rq,
> +		bool queue_live, bool is_connected)
> +{
> +	struct nvme_command *cmd = nvme_req(rq)->cmd;
> +
> +	if (ctrl->state == NVME_CTRL_LIVE && is_connected)
> +		return BLK_STS_OK;

Nit: would be nice to have a likely statement here as its the hot
path. Other than that, looks good,

Reviewed-by: Sagi Grimberg <sagi at grmberg.me>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-04 13:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-28 20:21 [PATCH v3] nvme: expand nvmf_check_if_ready checks James Smart
2018-03-29 20:57 ` Mike Snitzer
2018-03-29 22:13   ` James Smart
2018-03-30  0:23     ` Mike Snitzer
2018-04-04 13:42 ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.