All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
@ 2018-04-10 17:18 Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive" Max Gurtovoy
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


Hi all,
I've been debugging the KA mechanism lately and found a lack of
coordination between the target and host implementations.

Johannes,
Sorry for reverting your commit - I'll use nvme_start_keep_alive
for my fix.

I've noticed that there is no clear definition in the NVMe spec
regarding the keep-alive mechanism association. IMO, it should be
a property of the admin queue and should be triggered as soon as
the admin queue configured successfuly.

Idan/Christoph/Sagi,
Any thoughts on that proposal ?
Anyway we should make the spec clear about it, otherwise we'll have
interoperability issue running different implementations/versions.

This patchset was tested using RDMA transport only:
I've created 20 subsystems, 5 namespaces per subsystem and exposed
all through 8 portals (total 160 ctrl's created) on 1 target.
I used 1 initiator (host) and connected successfuly.
Later on I've destroyed the target and caused a reconnection flow
in the initiator side.
Ater ~30-50 seconds, I've configured the target again but the initiator
couldn't reconnect to it (after many retries).
The reason for this was that the keep-alive timer expired at the target
side, caused ctrl fatal error and the io-queue connect failed to find
the ctrl. This loop never converged.

With the patches below, the test passed successfully after 1/2
reconnection attempts.

I was able to test it only with RDMA fabric, so it will be great to have
Tested-by from FC guys as well (also need to test loop).


Max Gurtovoy (5):
  Revert "nvme: unexport nvme_start_keep_alive"
  nvme: remove association between ctrl and keep-alive
  nvme-rdma: add keep-alive mechanism as admin_q property
  nvme-fc: add keep-alive mechanism as admin_q property
  nvme-loop: add keep-alive mechanism as admin_q property

 drivers/nvme/host/core.c   | 7 ++-----
 drivers/nvme/host/fc.c     | 5 +++++
 drivers/nvme/host/nvme.h   | 1 +
 drivers/nvme/host/rdma.c   | 5 +++--
 drivers/nvme/target/loop.c | 3 +++
 5 files changed, 14 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive"
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
@ 2018-04-10 17:18 ` Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive Max Gurtovoy
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


This reverts commit c8799eee39e7523e5e0be10f8950b11cb66085bd.

nvme_start_keep_alive() will be used by the transport drivers
to fix keep-alive synchronization between NVMe-oF target/host.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 3 ++-
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 38b450e..6211066 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -826,7 +826,7 @@ static void nvme_keep_alive_work(struct work_struct *work)
 	}
 }
 
-static void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
+void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
 {
 	if (unlikely(ctrl->kato == 0))
 		return;
@@ -836,6 +836,7 @@ static void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
 	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
 	schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
 }
+EXPORT_SYMBOL_GPL(nvme_start_keep_alive);
 
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ced5b3d..0ecd30a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -422,6 +422,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		unsigned timeout, int qid, int at_head,
 		blk_mq_req_flags_t flags);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
+void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive" Max Gurtovoy
@ 2018-04-10 17:18 ` Max Gurtovoy
  2018-04-13 17:01   ` Christoph Hellwig
  2018-04-10 17:18 ` [PATCH v1 3/5] nvme-rdma: add keep-alive mechanism as admin_q property Max Gurtovoy
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


Keep-alive mechanism is an admin queue property and
should be activated/deactivated during admin queue
creation/destruction.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6211066..674e746 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3369,7 +3369,6 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 {
-	nvme_stop_keep_alive(ctrl);
 	flush_work(&ctrl->async_event_work);
 	flush_work(&ctrl->scan_work);
 	cancel_work_sync(&ctrl->fw_act_work);
@@ -3380,9 +3379,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 
 void nvme_start_ctrl(struct nvme_ctrl *ctrl)
 {
-	if (ctrl->kato)
-		nvme_start_keep_alive(ctrl);
-
 	if (ctrl->queue_count > 1) {
 		nvme_queue_scan(ctrl);
 		queue_work(nvme_wq, &ctrl->async_event_work);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 3/5] nvme-rdma: add keep-alive mechanism as admin_q property
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive" Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive Max Gurtovoy
@ 2018-04-10 17:18 ` Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 4/5] nvme-fc: " Max Gurtovoy
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


Activate/deactivate it during admin queue creation/destruction
and remove association to nvme ctrl.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 758537e..cf65183 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -734,6 +734,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
+	nvme_stop_keep_alive(&ctrl->ctrl);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	if (remove) {
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
@@ -801,6 +802,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	if (error)
 		goto out_cleanup_queue;
 
+	nvme_start_keep_alive(&ctrl->ctrl);
+
 	return 0;
 
 out_cleanup_queue:
@@ -959,8 +962,6 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
 
-	nvme_stop_keep_alive(&ctrl->ctrl);
-
 	if (ctrl->ctrl.queue_count > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 4/5] nvme-fc: add keep-alive mechanism as admin_q property
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
                   ` (2 preceding siblings ...)
  2018-04-10 17:18 ` [PATCH v1 3/5] nvme-rdma: add keep-alive mechanism as admin_q property Max Gurtovoy
@ 2018-04-10 17:18 ` Max Gurtovoy
  2018-04-10 17:18 ` [PATCH 5/5] nvme-loop: " Max Gurtovoy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


Activate/deactivate it during admin queue creation/destruction
and remove association to nvme ctrl.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/fc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0676d44..236a8f1 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2692,6 +2692,8 @@ static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue,
 		opts->queue_size = ctrl->ctrl.maxcmd;
 	}
 
+	nvme_start_keep_alive(&ctrl->ctrl);
+
 	ret = nvme_fc_init_aen_ops(ctrl);
 	if (ret)
 		goto out_term_aen_ops;
@@ -2720,6 +2722,7 @@ static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue,
 
 out_term_aen_ops:
 	nvme_fc_term_aen_ops(ctrl);
+	nvme_stop_keep_alive(&ctrl->ctrl);
 out_disconnect_admin_queue:
 	/* send a Disconnect(association) LS to fc-nvme target */
 	nvme_fc_xmt_disconnect_assoc(ctrl);
@@ -2804,6 +2807,8 @@ static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue,
 
 	nvme_fc_term_aen_ops(ctrl);
 
+	nvme_stop_keep_alive(&ctrl->ctrl);
+
 	/*
 	 * send a Disconnect(association) LS to fc-nvme target
 	 * Note: could have been sent at top of process, but
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/5] nvme-loop: add keep-alive mechanism as admin_q property
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
                   ` (3 preceding siblings ...)
  2018-04-10 17:18 ` [PATCH v1 4/5] nvme-fc: " Max Gurtovoy
@ 2018-04-10 17:18 ` Max Gurtovoy
  2018-04-11 13:04 ` [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Sagi Grimberg
  2018-04-13 17:06 ` Christoph Hellwig
  6 siblings, 0 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


Activate/deactivate it during admin queue creation/destruction
and remove association to nvme ctrl.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/target/loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b9d5b69..f54c787c 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -279,6 +279,7 @@ static int nvme_loop_init_admin_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 
 static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
 {
+	nvme_stop_keep_alive(&ctrl->ctrl);
 	clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags);
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
@@ -419,6 +420,8 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	if (error)
 		goto out_cleanup_queue;
 
+	nvme_start_keep_alive(&ctrl->ctrl);
+
 	return 0;
 
 out_cleanup_queue:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
                   ` (4 preceding siblings ...)
  2018-04-10 17:18 ` [PATCH 5/5] nvme-loop: " Max Gurtovoy
@ 2018-04-11 13:04 ` Sagi Grimberg
  2018-04-11 13:38   ` Max Gurtovoy
  2018-04-13 17:06 ` Christoph Hellwig
  6 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-11 13:04 UTC (permalink / raw)



> Hi all,
> I've been debugging the KA mechanism lately and found a lack of
> coordination between the target and host implementations.
> 
> Johannes,
> Sorry for reverting your commit - I'll use nvme_start_keep_alive
> for my fix.
> 
> I've noticed that there is no clear definition in the NVMe spec
> regarding the keep-alive mechanism association. IMO, it should be
> a property of the admin queue and should be triggered as soon as
> the admin queue configured successfuly.
> 
> Idan/Christoph/Sagi,
> Any thoughts on that proposal ?

I don't understand what you mean by a "property of the admin queue",
there is no such definition in the spec. In any event, the keep alive
mechanism was defined to detect host/controller health on a periodic
basis.

The spec clearly states that the host needs to send keep alive at
a faster rate than the Keep alive timeout accounting things such
as transport roundtrip times, transport delays, keep alive timer
granularity and so on.

To me, this is a clear case where your use-case requires a larger
keep alive timeout. I'm not sure why sprinkling the keep alive execution
around solve/help anything. To me this looks completely redundant
(sorry).

We could theoretically make NVME_KATO_GRACE configurable, but I
seriously doubt its useful at all...

As a side note, Note that this should be solved by "Traffic Based Keep
Alive" TP which is approaching ratification. This is yet another example
showing that Having keep alive timer firing regardless of command
execution is hurtful.

> This patchset was tested using RDMA transport only:
> I've created 20 subsystems, 5 namespaces per subsystem and exposed
> all through 8 portals (total 160 ctrl's created) on 1 target.
> I used 1 initiator (host) and connected successfuly.
> Later on I've destroyed the target and caused a reconnection flow
> in the initiator side.
> Ater ~30-50 seconds, I've configured the target again but the initiator
> couldn't reconnect to it (after many retries).
> The reason for this was that the keep-alive timer expired at the target
> side, caused ctrl fatal error and the io-queue connect failed to find
> the ctrl. This loop never converged.
> 
> With the patches below, the test passed successfully after 1/2
> reconnection attempts.

Hence my comment above. As I said, I think you should either increase
the keep alive timeout for this use-case, or wait for TBKAS to ratify.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 13:04 ` [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Sagi Grimberg
@ 2018-04-11 13:38   ` Max Gurtovoy
  2018-04-11 14:07     ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-11 13:38 UTC (permalink / raw)




On 4/11/2018 4:04 PM, Sagi Grimberg wrote:
> 
>> Hi all,
>> I've been debugging the KA mechanism lately and found a lack of
>> coordination between the target and host implementations.
>>
>> Johannes,
>> Sorry for reverting your commit - I'll use nvme_start_keep_alive
>> for my fix.
>>
>> I've noticed that there is no clear definition in the NVMe spec
>> regarding the keep-alive mechanism association. IMO, it should be
>> a property of the admin queue and should be triggered as soon as
>> the admin queue configured successfuly.
>>
>> Idan/Christoph/Sagi,
>> Any thoughts on that proposal ?
> 
> I don't understand what you mean by a "property of the admin queue",
> there is no such definition in the spec. In any event, the keep alive
> mechanism was defined to detect host/controller health on a periodic
> basis.
> 
> The spec clearly states that the host needs to send keep alive at
> a faster rate than the Keep alive timeout accounting things such
> as transport roundtrip times, transport delays, keep alive timer
> granularity and so on.
> 
> To me, this is a clear case where your use-case requires a larger
> keep alive timeout. I'm not sure why sprinkling the keep alive execution
> around solve/help anything. To me this looks completely redundant
> (sorry).

I think that starting keep-alive timer at the target side after admin 
connect and starting keep-alive timer at the host side after all 
io-queues connect is wrong.
In my solution I start keep-alive mechanism in the host side also after 
admin connect (exactly as the target side).
In this way I'll make sure the "health" is ok during the io-queues 
connection establishment (establisment of 4000-5000 IO queues can take 
long time and we hope sending a KA packet will not take more than 15 
seconds no matter what :) ).
Why to have inconsistant implementation ?
Increasing the timeout is not a solution (maybe can WA some scenarios). 
Also TBKA comes to solve different problem.
This is exectly why I think we should add it to the spec. Maybe Idan can 
suggest some proposal with the right terminology.

> 
> We could theoretically make NVME_KATO_GRACE configurable, but I
> seriously doubt its useful at all...
> 
> As a side note, Note that this should be solved by "Traffic Based Keep
> Alive" TP which is approaching ratification. This is yet another example
> showing that Having keep alive timer firing regardless of command
> execution is hurtful.

TBKA will work as nop-in/out in iSCSI AKAIK. But again, it will no solve 
the issue I described above.

> 
>> This patchset was tested using RDMA transport only:
>> I've created 20 subsystems, 5 namespaces per subsystem and exposed
>> all through 8 portals (total 160 ctrl's created) on 1 target.
>> I used 1 initiator (host) and connected successfuly.
>> Later on I've destroyed the target and caused a reconnection flow
>> in the initiator side.
>> Ater ~30-50 seconds, I've configured the target again but the initiator
>> couldn't reconnect to it (after many retries).
>> The reason for this was that the keep-alive timer expired at the target
>> side, caused ctrl fatal error and the io-queue connect failed to find
>> the ctrl. This loop never converged.
>>
>> With the patches below, the test passed successfully after 1/2
>> reconnection attempts.
> 
> Hence my comment above. As I said, I think you should either increase
> the keep alive timeout for this use-case, or wait for TBKAS to ratify.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 13:38   ` Max Gurtovoy
@ 2018-04-11 14:07     ` Sagi Grimberg
  2018-04-11 14:40       ` Max Gurtovoy
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-11 14:07 UTC (permalink / raw)



>> I don't understand what you mean by a "property of the admin queue",
>> there is no such definition in the spec. In any event, the keep alive
>> mechanism was defined to detect host/controller health on a periodic
>> basis.
>>
>> The spec clearly states that the host needs to send keep alive at
>> a faster rate than the Keep alive timeout accounting things such
>> as transport roundtrip times, transport delays, keep alive timer
>> granularity and so on.
>>
>> To me, this is a clear case where your use-case requires a larger
>> keep alive timeout. I'm not sure why sprinkling the keep alive execution
>> around solve/help anything. To me this looks completely redundant
>> (sorry).
> 
> I think that starting keep-alive timer at the target side after admin 
> connect and starting keep-alive timer at the host side after all 
> io-queues connect is wrong.
> In my solution I start keep-alive mechanism in the host side also after 
> admin connect (exactly as the target side).

That does not guarantee anything either. as you were simply able to
minimize the effect.

An alternative patch would also be able to minimize the effect:
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e1b708ee6783..a84198b1e0d8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct nvme_ctrl 
*ctrl)
         INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
         memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
         ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
-       schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
+       schedule_delayed_work(&ctrl->ka_work, 0);
  }

  void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
--

This way the first keep alive will execute immediately the first time
and not kato after...

> In this way I'll make sure the "health" is ok during the io-queues 
> connection establishment (establisment of 4000-5000 IO queues can take 
> long time and we hope sending a KA packet will not take more than 15 
> seconds no matter what :) ).

No you won't. nvme_start_keep_alive will execute a keep alive kato after
it was started. and why do you need to detect the health during
connection establishment? queue creation will fail if health is lost.

> Why to have inconsistant implementation ?
> Increasing the timeout is not a solution (maybe can WA some scenarios). 
> Also TBKA comes to solve different problem.

No, it came to solve a false positive of heart-bit lost as a result of
keep alive starvation, which is exactly what this scenario is.

>> As a side note, Note that this should be solved by "Traffic Based Keep
>> Alive" TP which is approaching ratification. This is yet another example
>> showing that Having keep alive timer firing regardless of command
>> execution is hurtful.
> 
> TBKA will work as nop-in/out in iSCSI AKAIK. But again, it will no solve 
> the issue I described above.

Yes it will, fabrics connect is a traffic indication and will reset the
keep alive timer.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 14:07     ` Sagi Grimberg
@ 2018-04-11 14:40       ` Max Gurtovoy
  2018-04-11 16:48         ` James Smart
  2018-04-12 12:14         ` Sagi Grimberg
  0 siblings, 2 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-11 14:40 UTC (permalink / raw)




On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
> 
>>> I don't understand what you mean by a "property of the admin queue",
>>> there is no such definition in the spec. In any event, the keep alive
>>> mechanism was defined to detect host/controller health on a periodic
>>> basis.
>>>
>>> The spec clearly states that the host needs to send keep alive at
>>> a faster rate than the Keep alive timeout accounting things such
>>> as transport roundtrip times, transport delays, keep alive timer
>>> granularity and so on.
>>>
>>> To me, this is a clear case where your use-case requires a larger
>>> keep alive timeout. I'm not sure why sprinkling the keep alive execution
>>> around solve/help anything. To me this looks completely redundant
>>> (sorry).
>>
>> I think that starting keep-alive timer at the target side after admin 
>> connect and starting keep-alive timer at the host side after all 
>> io-queues connect is wrong.
>> In my solution I start keep-alive mechanism in the host side also 
>> after admin connect (exactly as the target side).
> 
> That does not guarantee anything either. as you were simply able to
> minimize the effect.
> 
> An alternative patch would also be able to minimize the effect:
> -- 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e1b708ee6783..a84198b1e0d8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct nvme_ctrl 
> *ctrl)
>  ??????? INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>  ??????? memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>  ??????? ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
> -?????? schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
> +?????? schedule_delayed_work(&ctrl->ka_work, 0);
>  ?}
> 
>  ?void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
> -- 
> 
> This way the first keep alive will execute immediately the first time
> and not kato after...

This is a good patch, and I can add it to the series.
I still don't see why not be symetric and based on the transport that 
will be able to send a KA patcket fast enough. For systems that have 
68/72/128 cores, IO queues connection (that include buffers allocation) 
to hundreds of subsystems can take a while and until we have a TBKA (and 
also after), KA should be started as soon as admin queue is connected.
In both sides (not only the target side). The GRACE is ok, but needed 
for the overlapping the delay and lack of syncronization between timers, 
Not on logic.

> 
>> In this way I'll make sure the "health" is ok during the io-queues 
>> connection establishment (establisment of 4000-5000 IO queues can take 
>> long time and we hope sending a KA packet will not take more than 15 
>> seconds no matter what :) ).
> 
> No you won't. nvme_start_keep_alive will execute a keep alive kato after
> it was started. and why do you need to detect the health during
> connection establishment? queue creation will fail if health is lost.
> 
>> Why to have inconsistant implementation ?
>> Increasing the timeout is not a solution (maybe can WA some 
>> scenarios). Also TBKA comes to solve different problem.
> 
> No, it came to solve a false positive of heart-bit lost as a result of
> keep alive starvation, which is exactly what this scenario is.
> 
>>> As a side note, Note that this should be solved by "Traffic Based Keep
>>> Alive" TP which is approaching ratification. This is yet another example
>>> showing that Having keep alive timer firing regardless of command
>>> execution is hurtful.
>>
>> TBKA will work as nop-in/out in iSCSI AKAIK. But again, it will no 
>> solve the issue I described above.
> 
> Yes it will, fabrics connect is a traffic indication and will reset the
> keep alive timer.

Very good. But it will reset a KA timer that is not even started (in the 
initiator) ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 14:40       ` Max Gurtovoy
@ 2018-04-11 16:48         ` James Smart
  2018-04-12  8:49           ` Max Gurtovoy
  2018-04-12 12:29           ` Sagi Grimberg
  2018-04-12 12:14         ` Sagi Grimberg
  1 sibling, 2 replies; 23+ messages in thread
From: James Smart @ 2018-04-11 16:48 UTC (permalink / raw)


On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>
>
> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>
>>>> I don't understand what you mean by a "property of the admin queue",
>>>> there is no such definition in the spec. In any event, the keep alive
>>>> mechanism was defined to detect host/controller health on a periodic
>>>> basis.
>>>>
>>>> The spec clearly states that the host needs to send keep alive at
>>>> a faster rate than the Keep alive timeout accounting things such
>>>> as transport roundtrip times, transport delays, keep alive timer
>>>> granularity and so on.
>>>>
>>>> To me, this is a clear case where your use-case requires a larger
>>>> keep alive timeout. I'm not sure why sprinkling the keep alive 
>>>> execution
>>>> around solve/help anything. To me this looks completely redundant
>>>> (sorry).
>>>
>>> I think that starting keep-alive timer at the target side after 
>>> admin connect and starting keep-alive timer at the host side after 
>>> all io-queues connect is wrong.
>>> In my solution I start keep-alive mechanism in the host side also 
>>> after admin connect (exactly as the target side).

well - true, but I believe it should be started after the controller is 
enabled - not just that the admin queue has been created.? I don't think 
even the admin queue can be serviced till the adapter is enabled.?? 
Thus, I would agree with moving nvme_start_keep_alive() from 
nvme_start_ctrl() to nvme_enable_ctrl().?? We would also want the 
"enablement" to detect the keepalive style (today's keepalive or TBKA) 
supported by the time that call is made.?? Given that needs to be known, 
I believe there will always be a small window where a couple of things 
have to be accessed before the KA starts.

I strongly support Sagi's statements on trying to keep the KA start/stop 
in common code rather than sprinkling it around.?? I agree with the 
current placement of the stop code in nvme_stop_ctrl().?? The stop ctrl 
should always be called prior to the admin queue being torn down in the 
transport (as part of kicking off the reset of the controller).

Note: the FC patch is way off..? Also note that I've killed the 
xxx_is_ready() routines in the updated if_ready patch I sent. So your 
mods wouldn't apply (soon).

>>
>> That does not guarantee anything either. as you were simply able to
>> minimize the effect.
>>
>> An alternative patch would also be able to minimize the effect:
>> -- 
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index e1b708ee6783..a84198b1e0d8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct 
>> nvme_ctrl *ctrl)
>> ???????? INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>> ???????? memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>> ???????? ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>> -?????? schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
>> +?????? schedule_delayed_work(&ctrl->ka_work, 0);
>> ??}
>>
>> ??void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>> -- 
>>
>> This way the first keep alive will execute immediately the first time
>> and not kato after...
>
> This is a good patch, and I can add it to the series.

Uh... I wouldn't want to do this.? You can easily burn lots of cycles 
sending a KA.

I would lean toward making people get used to setting KATO to a large 
enough value that it can be used equally well with TBKA - thus it should 
be something like this instead:
 ?????? schedule_delayed_work(&ctrl->ka_work, (ctrl->kato * HZ) / 2);

This is somewhat independent from the grace value - as grace is a 
scaling factor to accommodate congestion and delayed processing.

-- james

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 16:48         ` James Smart
@ 2018-04-12  8:49           ` Max Gurtovoy
  2018-04-12 12:34             ` Sagi Grimberg
  2018-04-12 12:29           ` Sagi Grimberg
  1 sibling, 1 reply; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-12  8:49 UTC (permalink / raw)




On 4/11/2018 7:48 PM, James Smart wrote:
> On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>>
>>
>> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>>
>>>>> I don't understand what you mean by a "property of the admin queue",
>>>>> there is no such definition in the spec. In any event, the keep alive
>>>>> mechanism was defined to detect host/controller health on a periodic
>>>>> basis.
>>>>>
>>>>> The spec clearly states that the host needs to send keep alive at
>>>>> a faster rate than the Keep alive timeout accounting things such
>>>>> as transport roundtrip times, transport delays, keep alive timer
>>>>> granularity and so on.
>>>>>
>>>>> To me, this is a clear case where your use-case requires a larger
>>>>> keep alive timeout. I'm not sure why sprinkling the keep alive 
>>>>> execution
>>>>> around solve/help anything. To me this looks completely redundant
>>>>> (sorry).
>>>>
>>>> I think that starting keep-alive timer at the target side after 
>>>> admin connect and starting keep-alive timer at the host side after 
>>>> all io-queues connect is wrong.
>>>> In my solution I start keep-alive mechanism in the host side also 
>>>> after admin connect (exactly as the target side).
> 
> well - true, but I believe it should be started after the controller is 
> enabled - not just that the admin queue has been created.? I don't think 
> even the admin queue can be serviced till the adapter is enabled. Thus, 
> I would agree with moving nvme_start_keep_alive() from nvme_start_ctrl() 
> to nvme_enable_ctrl().?? We would also want the "enablement" to detect 
> the keepalive style (today's keepalive or TBKA) supported by the time 
> that call is made.?? Given that needs to be known, I believe there will 
> always be a small window where a couple of things have to be accessed 
> before the KA starts.

At least I see that you agree to start keep-alive timer before creating 
the IO queues so that's a progress :).
The fact that I didn't put it in nvme_enable_ctrl is because the FC code 
never calls nvme_disable_ctrl (and there we probably should stop the timer).
I'm trying to solve an issue without changing the whole design of the FC 
implementation. We can do it incrementaly.

> 
> I strongly support Sagi's statements on trying to keep the KA start/stop 
> in common code rather than sprinkling it around.?? I agree with the 
> current placement of the stop code in nvme_stop_ctrl().?? The stop ctrl 
> should always be called prior to the admin queue being torn down in the 
> transport (as part of kicking off the reset of the controller).

If we'll add KA start to nvme_enable_ctrl then the KA stop should be 
added to nvme_disable_ctrl. We should be symmetric.

> 
> Note: the FC patch is way off..? Also note that I've killed the 
> xxx_is_ready() routines in the updated if_ready patch I sent. So your 
> mods wouldn't apply (soon).

I'm rebased over latest nvme branch and your xxx_is_ready is not there.
If you have a better solution - please share with us.

> 
>>>
>>> That does not guarantee anything either. as you were simply able to
>>> minimize the effect.
>>>
>>> An alternative patch would also be able to minimize the effect:
>>> -- 
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index e1b708ee6783..a84198b1e0d8 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct 
>>> nvme_ctrl *ctrl)
>>> ???????? INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>>> ???????? memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>>> ???????? ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>>> -?????? schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
>>> +?????? schedule_delayed_work(&ctrl->ka_work, 0);
>>> ??}
>>>
>>> ??void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>>> -- 
>>>
>>> This way the first keep alive will execute immediately the first time
>>> and not kato after...
>>
>> This is a good patch, and I can add it to the series.
> 
> Uh... I wouldn't want to do this.? You can easily burn lots of cycles 
> sending a KA.

I don't realy mind to lose some cycles during connection establishment 
and do the right thing.

> 
> I would lean toward making people get used to setting KATO to a large 
> enough value that it can be used equally well with TBKA - thus it should 
> be something like this instead:
>  ?????? schedule_delayed_work(&ctrl->ka_work, (ctrl->kato * HZ) / 2);

People are using the default value. I guess I can live with (ctrl->kato 
* HZ) / 2 as a first delay if we start the timer before IO queues creation.

> 
> This is somewhat independent from the grace value - as grace is a 
> scaling factor to accommodate congestion and delayed processing.
> 
> -- james
> 

I can summerize that there are different perspectives on that issue and 
that is the reason why I think this should be added to the spec to clear 
things up.
The spec should say explicitly that the KA timer should be started ater 
admin_queue connect and before starting to perform io_connect.

Christoph,
what do you think about the assymetry we have here and the patches that 
fix it ?

-Max.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 14:40       ` Max Gurtovoy
  2018-04-11 16:48         ` James Smart
@ 2018-04-12 12:14         ` Sagi Grimberg
  2018-04-12 12:18           ` Max Gurtovoy
  1 sibling, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-12 12:14 UTC (permalink / raw)



>>> TBKA will work as nop-in/out in iSCSI AKAIK. But again, it will no 
>>> solve the issue I described above.
>>
>> Yes it will, fabrics connect is a traffic indication and will reset the
>> keep alive timer.
> 
> Very good. But it will reset a KA timer that is not even started (in the 
> initiator) ?

No, host indication of traffic is completions, so the completion would 
reset it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12 12:14         ` Sagi Grimberg
@ 2018-04-12 12:18           ` Max Gurtovoy
  2018-04-12 12:31             ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-12 12:18 UTC (permalink / raw)




On 4/12/2018 3:14 PM, Sagi Grimberg wrote:
> 
>>>> TBKA will work as nop-in/out in iSCSI AKAIK. But again, it will no 
>>>> solve the issue I described above.
>>>
>>> Yes it will, fabrics connect is a traffic indication and will reset the
>>> keep alive timer.
>>
>> Very good. But it will reset a KA timer that is not even started (in 
>> the initiator) ?
> 
> No, host indication of traffic is completions, so the completion would 
> reset it.

I meant that in the current host implementation (that starts KA timer 
after IO queues connection establishment) there will be nothing to reset 
during io queues connection establishment and completions.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-11 16:48         ` James Smart
  2018-04-12  8:49           ` Max Gurtovoy
@ 2018-04-12 12:29           ` Sagi Grimberg
  2018-04-12 13:32             ` Max Gurtovoy
  1 sibling, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-12 12:29 UTC (permalink / raw)




On 04/11/2018 07:48 PM, James Smart wrote:
> On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>>
>>
>> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>>
>>>>> I don't understand what you mean by a "property of the admin queue",
>>>>> there is no such definition in the spec. In any event, the keep alive
>>>>> mechanism was defined to detect host/controller health on a periodic
>>>>> basis.
>>>>>
>>>>> The spec clearly states that the host needs to send keep alive at
>>>>> a faster rate than the Keep alive timeout accounting things such
>>>>> as transport roundtrip times, transport delays, keep alive timer
>>>>> granularity and so on.
>>>>>
>>>>> To me, this is a clear case where your use-case requires a larger
>>>>> keep alive timeout. I'm not sure why sprinkling the keep alive 
>>>>> execution
>>>>> around solve/help anything. To me this looks completely redundant
>>>>> (sorry).
>>>>
>>>> I think that starting keep-alive timer at the target side after 
>>>> admin connect and starting keep-alive timer at the host side after 
>>>> all io-queues connect is wrong.
>>>> In my solution I start keep-alive mechanism in the host side also 
>>>> after admin connect (exactly as the target side).
> 
> well - true, but I believe it should be started after the controller is 
> enabled - not just that the admin queue has been created.? I don't think 
> even the admin queue can be serviced till the adapter is enabled. Thus, 
> I would agree with moving nvme_start_keep_alive() from nvme_start_ctrl() 
> to nvme_enable_ctrl().

I'm fine with that.

>? We would also want the "enablement" to detect 
> the keepalive style (today's keepalive or TBKA) supported by the time 
> that call is made.?? Given that needs to be known, I believe there will 
> always be a small window where a couple of things have to be accessed 
> before the KA starts.

I honestly don't understand why should this even be an issue. If someone
is running a heavy load where keep alive timeout is not sufficient, then
it should use a larger keep alive.

> I strongly support Sagi's statements on trying to keep the KA start/stop 
> in common code rather than sprinkling it around.?? I agree with the 
> current placement of the stop code in nvme_stop_ctrl().?? The stop ctrl 
> should always be called prior to the admin queue being torn down in the 
> transport (as part of kicking off the reset of the controller).

Yea, we need to keep it where it is. the start can move to controller
enable, and in fact, the target should also do that for the target
and update the timer in nvmet_start_ctrl.
--
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index fe151d672241..a81bf4d5e60c 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -643,6 +643,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
         }

         ctrl->csts = NVME_CSTS_RDY;
+       mod_delayed_work(system_wq, &ctrl->ka_work, ctrl->kato * HZ);
  }

  static void nvmet_clear_ctrl(struct nvmet_ctrl *ctrl)
--

This would get the correct behavior and still be able to
detect host death before the controller was enabled.

This together with moving the keep alive timer start in nvme_enable_ctrl
should be enough and not propagating the handle to every transport
driver, we have enough of that already.
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e1b708ee6783..4b5c3f7addeb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1739,7 +1739,14 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl, u64 cap)
         ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config);
         if (ret)
                 return ret;
-       return nvme_wait_ready(ctrl, cap, true);
+
+       ret = nvme_wait_ready(ctrl, cap, true);
+       if (ret)
+               return ret;
+
+       nvme_start_keep_alive(ctrl);
+
+       return 0;
  }
  EXPORT_SYMBOL_GPL(nvme_enable_ctrl);

@@ -3393,9 +3400,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);

  void nvme_start_ctrl(struct nvme_ctrl *ctrl)
  {
-       if (ctrl->kato)
-               nvme_start_keep_alive(ctrl);
-
         if (ctrl->queue_count > 1) {
                 nvme_queue_scan(ctrl);
                 queue_work(nvme_wq, &ctrl->async_event_work);
--

>>> That does not guarantee anything either. as you were simply able to
>>> minimize the effect.
>>>
>>> An alternative patch would also be able to minimize the effect:
>>> -- 
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index e1b708ee6783..a84198b1e0d8 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct 
>>> nvme_ctrl *ctrl)
>>> ???????? INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>>> ???????? memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>>> ???????? ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>>> -?????? schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
>>> +?????? schedule_delayed_work(&ctrl->ka_work, 0);
>>> ??}
>>>
>>> ??void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>>> -- 
>>>
>>> This way the first keep alive will execute immediately the first time
>>> and not kato after...
>>
>> This is a good patch, and I can add it to the series.
> 
> Uh... I wouldn't want to do this.? You can easily burn lots of cycles 
> sending a KA.

Its only for the first invocation. but I think its useless anyways...

> I would lean toward making people get used to setting KATO to a large 
> enough value that it can be used equally well with TBKA - thus it should 
> be something like this instead:
>  ?????? schedule_delayed_work(&ctrl->ka_work, (ctrl->kato * HZ) / 2);
> 
> This is somewhat independent from the grace value - as grace is a 
> scaling factor to accommodate congestion and delayed processing.

This is fine too.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12 12:18           ` Max Gurtovoy
@ 2018-04-12 12:31             ` Sagi Grimberg
  0 siblings, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-12 12:31 UTC (permalink / raw)



>>>>> TBKA will work as nop-in/out in iSCSI AKAIK. But again, it will no 
>>>>> solve the issue I described above.
>>>>
>>>> Yes it will, fabrics connect is a traffic indication and will reset the
>>>> keep alive timer.
>>>
>>> Very good. But it will reset a KA timer that is not even started (in 
>>> the initiator) ?
>>
>> No, host indication of traffic is completions, so the completion would 
>> reset it.
> 
> I meant that in the current host implementation (that starts KA timer 
> after IO queues connection establishment) there will be nothing to reset 
> during io queues connection establishment and completions.

Its not exactly a reset, its more like a check without a keep alive
timeout interval. You can read the updates when the proposal
is ratified (and the code).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12  8:49           ` Max Gurtovoy
@ 2018-04-12 12:34             ` Sagi Grimberg
  2018-04-12 17:28               ` James Smart
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-12 12:34 UTC (permalink / raw)




On 04/12/2018 11:49 AM, Max Gurtovoy wrote:
> 
> 
> On 4/11/2018 7:48 PM, James Smart wrote:
>> On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>>>
>>>
>>> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>>>
>>>>>> I don't understand what you mean by a "property of the admin queue",
>>>>>> there is no such definition in the spec. In any event, the keep alive
>>>>>> mechanism was defined to detect host/controller health on a periodic
>>>>>> basis.
>>>>>>
>>>>>> The spec clearly states that the host needs to send keep alive at
>>>>>> a faster rate than the Keep alive timeout accounting things such
>>>>>> as transport roundtrip times, transport delays, keep alive timer
>>>>>> granularity and so on.
>>>>>>
>>>>>> To me, this is a clear case where your use-case requires a larger
>>>>>> keep alive timeout. I'm not sure why sprinkling the keep alive 
>>>>>> execution
>>>>>> around solve/help anything. To me this looks completely redundant
>>>>>> (sorry).
>>>>>
>>>>> I think that starting keep-alive timer at the target side after 
>>>>> admin connect and starting keep-alive timer at the host side after 
>>>>> all io-queues connect is wrong.
>>>>> In my solution I start keep-alive mechanism in the host side also 
>>>>> after admin connect (exactly as the target side).
>>
>> well - true, but I believe it should be started after the controller 
>> is enabled - not just that the admin queue has been created.? I don't 
>> think even the admin queue can be serviced till the adapter is 
>> enabled. Thus, I would agree with moving nvme_start_keep_alive() from 
>> nvme_start_ctrl() to nvme_enable_ctrl().?? We would also want the 
>> "enablement" to detect the keepalive style (today's keepalive or TBKA) 
>> supported by the time that call is made.?? Given that needs to be 
>> known, I believe there will always be a small window where a couple of 
>> things have to be accessed before the KA starts.
> 
> At least I see that you agree to start keep-alive timer before creating 
> the IO queues so that's a progress :).
> The fact that I didn't put it in nvme_enable_ctrl is because the FC code 
> never calls nvme_disable_ctrl (and there we probably should stop the 
> timer).
> I'm trying to solve an issue without changing the whole design of the FC 
> implementation. We can do it incrementaly.

Please don't touch any of the transport drivers for any of this, its the
wrong place to handle keep alive.

>>
>> I strongly support Sagi's statements on trying to keep the KA 
>> start/stop in common code rather than sprinkling it around.?? I agree 
>> with the current placement of the stop code in nvme_stop_ctrl().?? The 
>> stop ctrl should always be called prior to the admin queue being torn 
>> down in the transport (as part of kicking off the reset of the 
>> controller).
> 
> If we'll add KA start to nvme_enable_ctrl then the KA stop should be 
> added to nvme_disable_ctrl. We should be symmetric.

Not a must. even after the controller is disabled we want to
have a keep alive to it as we would need to enable it again at
some point.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12 12:29           ` Sagi Grimberg
@ 2018-04-12 13:32             ` Max Gurtovoy
  2018-04-12 15:17               ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-12 13:32 UTC (permalink / raw)




On 4/12/2018 3:29 PM, Sagi Grimberg wrote:
> 
> 
> On 04/11/2018 07:48 PM, James Smart wrote:
>> On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>>>
>>>
>>> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>>>
>>>>>> I don't understand what you mean by a "property of the admin queue",
>>>>>> there is no such definition in the spec. In any event, the keep alive
>>>>>> mechanism was defined to detect host/controller health on a periodic
>>>>>> basis.
>>>>>>
>>>>>> The spec clearly states that the host needs to send keep alive at
>>>>>> a faster rate than the Keep alive timeout accounting things such
>>>>>> as transport roundtrip times, transport delays, keep alive timer
>>>>>> granularity and so on.
>>>>>>
>>>>>> To me, this is a clear case where your use-case requires a larger
>>>>>> keep alive timeout. I'm not sure why sprinkling the keep alive 
>>>>>> execution
>>>>>> around solve/help anything. To me this looks completely redundant
>>>>>> (sorry).
>>>>>
>>>>> I think that starting keep-alive timer at the target side after 
>>>>> admin connect and starting keep-alive timer at the host side after 
>>>>> all io-queues connect is wrong.
>>>>> In my solution I start keep-alive mechanism in the host side also 
>>>>> after admin connect (exactly as the target side).
>>
>> well - true, but I believe it should be started after the controller 
>> is enabled - not just that the admin queue has been created.? I don't 
>> think even the admin queue can be serviced till the adapter is 
>> enabled. Thus, I would agree with moving nvme_start_keep_alive() from 
>> nvme_start_ctrl() to nvme_enable_ctrl().
> 
> I'm fine with that.
> 
>> ? We would also want the "enablement" to detect the keepalive style 
>> (today's keepalive or TBKA) supported by the time that call is made.   
>> Given that needs to be known, I believe there will always be a small 
>> window where a couple of things have to be accessed before the KA starts.
> 
> I honestly don't understand why should this even be an issue. If someone
> is running a heavy load where keep alive timeout is not sufficient, then
> it should use a larger keep alive.
> 
>> I strongly support Sagi's statements on trying to keep the KA 
>> start/stop in common code rather than sprinkling it around.?? I agree 
>> with the current placement of the stop code in nvme_stop_ctrl().?? The 
>> stop ctrl should always be called prior to the admin queue being torn 
>> down in the transport (as part of kicking off the reset of the 
>> controller).
> 
> Yea, we need to keep it where it is. the start can move to controller
> enable, and in fact, the target should also do that for the target
> and update the timer in nvmet_start_ctrl.
> -- 
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index fe151d672241..a81bf4d5e60c 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -643,6 +643,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
>  ??????? }
> 
>  ??????? ctrl->csts = NVME_CSTS_RDY;
> +?????? mod_delayed_work(system_wq, &ctrl->ka_work, ctrl->kato * HZ);
>  ?}
> 
>  ?static void nvmet_clear_ctrl(struct nvmet_ctrl *ctrl)
> -- 
> 
> This would get the correct behavior and still be able to
> detect host death before the controller was enabled.
> 
> This together with moving the keep alive timer start in nvme_enable_ctrl
> should be enough and not propagating the handle to every transport
> driver, we have enough of that already.

Great, this is what I wanted to do in the first place but I was 
concerned about stopping the keep-alive in nvme_disable_ctrl.
Are you sending a new series or I will ?

> -- 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e1b708ee6783..4b5c3f7addeb 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1739,7 +1739,14 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl, u64 
> cap)
>  ??????? ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, 
> ctrl->ctrl_config);
>  ??????? if (ret)
>  ??????????????? return ret;
> -?????? return nvme_wait_ready(ctrl, cap, true);
> +
> +?????? ret = nvme_wait_ready(ctrl, cap, true);
> +?????? if (ret)
> +?????????????? return ret;
> +
> +?????? nvme_start_keep_alive(ctrl);
> +
> +?????? return 0;
>  ?}
>  ?EXPORT_SYMBOL_GPL(nvme_enable_ctrl);
> 
> @@ -3393,9 +3400,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
> 
>  ?void nvme_start_ctrl(struct nvme_ctrl *ctrl)
>  ?{
> -?????? if (ctrl->kato)
> -?????????????? nvme_start_keep_alive(ctrl);
> -
>  ??????? if (ctrl->queue_count > 1) {
>  ??????????????? nvme_queue_scan(ctrl);
>  ??????????????? queue_work(nvme_wq, &ctrl->async_event_work);
> -- 
> 
>>>> That does not guarantee anything either. as you were simply able to
>>>> minimize the effect.
>>>>
>>>> An alternative patch would also be able to minimize the effect:
>>>> -- 
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index e1b708ee6783..a84198b1e0d8 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct 
>>>> nvme_ctrl *ctrl)
>>>> ???????? INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>>>> ???????? memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>>>> ???????? ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>>>> -?????? schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
>>>> +?????? schedule_delayed_work(&ctrl->ka_work, 0);
>>>> ??}
>>>>
>>>> ??void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>>>> -- 
>>>>
>>>> This way the first keep alive will execute immediately the first time
>>>> and not kato after...
>>>
>>> This is a good patch, and I can add it to the series.
>>
>> Uh... I wouldn't want to do this.? You can easily burn lots of cycles 
>> sending a KA.
> 
> Its only for the first invocation. but I think its useless anyways...

So this one is out.

> 
>> I would lean toward making people get used to setting KATO to a large 
>> enough value that it can be used equally well with TBKA - thus it 
>> should be something like this instead:
>> ??????? schedule_delayed_work(&ctrl->ka_work, (ctrl->kato * HZ) / 2);
>>
>> This is somewhat independent from the grace value - as grace is a 
>> scaling factor to accommodate congestion and delayed processing.
> 
> This is fine too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12 13:32             ` Max Gurtovoy
@ 2018-04-12 15:17               ` Sagi Grimberg
  2018-04-12 16:43                 ` Max Gurtovoy
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-04-12 15:17 UTC (permalink / raw)



>> This together with moving the keep alive timer start in nvme_enable_ctrl
>> should be enough and not propagating the handle to every transport
>> driver, we have enough of that already.
> 
> Great, this is what I wanted to do in the first place but I was 
> concerned about stopping the keep-alive in nvme_disable_ctrl.
> Are you sending a new series or I will ?

I can send it. Given that its your report and your intention in the
first place, can I place you as the author of these?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12 15:17               ` Sagi Grimberg
@ 2018-04-12 16:43                 ` Max Gurtovoy
  0 siblings, 0 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-12 16:43 UTC (permalink / raw)




On 4/12/2018 6:17 PM, Sagi Grimberg wrote:
> 
>>> This together with moving the keep alive timer start in nvme_enable_ctrl
>>> should be enough and not propagating the handle to every transport
>>> driver, we have enough of that already.
>>
>> Great, this is what I wanted to do in the first place but I was 
>> concerned about stopping the keep-alive in nvme_disable_ctrl.
>> Are you sending a new series or I will ?
> 
> I can send it. Given that its your report and your intention in the
> first place, can I place you as the author of these?

Yes. I've also run my test on it successfuly.
I'll run some more tests using RDMA transport and update if there are 
some issues.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-12 12:34             ` Sagi Grimberg
@ 2018-04-12 17:28               ` James Smart
  0 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2018-04-12 17:28 UTC (permalink / raw)


On 4/12/2018 5:34 AM, Sagi Grimberg wrote:
>
>> At least I see that you agree to start keep-alive timer before 
>> creating the IO queues so that's a progress :).
>> The fact that I didn't put it in nvme_enable_ctrl is because the FC 
>> code never calls nvme_disable_ctrl (and there we probably should stop 
>> the timer).
>> I'm trying to solve an issue without changing the whole design of the 
>> FC implementation. We can do it incrementaly.
>
> Please don't touch any of the transport drivers for any of this, its the
> wrong place to handle keep alive.

The reason FC doesn't call nvme_disable_ctrl() is because I don't 
believe it has much value. It's at best a "good citizen" thing that can 
encounter more problems than it's worth.? All that disable does is 
perform a pseudo reg write (property_set) to disable the controller. But 
to perform that write requires the link-side nvmeof association to not 
be failed/in error, the admin connection valid, the controller not 
failed in any way (may already be due to the error) and able to send a 
response back. In almost all cases other than an admin-requested reset, 
those things aren't necessarily valid.? So calling it, and having it 
wait for a completion isn't a great idea imho.

Thus the use jumping to nvme_stop_ctrl() instead. It is the real 
function that disables io and stops work against the controller until a 
new association can get in place.? So I agree with where the stop KA is 
now - in nvme_stop_ctrl().


>
>>>
>>> I strongly support Sagi's statements on trying to keep the KA 
>>> start/stop in common code rather than sprinkling it around. I agree 
>>> with the current placement of the stop code in nvme_stop_ctrl().?? 
>>> The stop ctrl should always be called prior to the admin queue being 
>>> torn down in the transport (as part of kicking off the reset of the 
>>> controller).
>>
>> If we'll add KA start to nvme_enable_ctrl then the KA stop should be 
>> added to nvme_disable_ctrl. We should be symmetric.
>
> Not a must. even after the controller is disabled we want to
> have a keep alive to it as we would need to enable it again at
> some point.

As indicate above - no - don't move it.

-- james

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive
  2018-04-10 17:18 ` [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive Max Gurtovoy
@ 2018-04-13 17:01   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-04-13 17:01 UTC (permalink / raw)


On Tue, Apr 10, 2018@08:18:06PM +0300, Max Gurtovoy wrote:
> Keep-alive mechanism is an admin queue property and
> should be activated/deactivated during admin queue
> creation/destruction.

On its own this patch looks bogus.  You later fix this up but the move
really needs to be in one patch instead of splitting it in parts that
make things unusable inbetween.

Also please use your 74 or whatever chars available for the commit log,
please.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
  2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
                   ` (5 preceding siblings ...)
  2018-04-11 13:04 ` [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Sagi Grimberg
@ 2018-04-13 17:06 ` Christoph Hellwig
  6 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-04-13 17:06 UTC (permalink / raw)


On Tue, Apr 10, 2018@08:18:04PM +0300, Max Gurtovoy wrote:
> I've noticed that there is no clear definition in the NVMe spec
> regarding the keep-alive mechanism association. IMO, it should be
> a property of the admin queue and should be triggered as soon as
> the admin queue configured successfuly.

It is a property of the controller, sent over the admin queue.
I'm not sure what else it could be, but if things are confusing
please reach out to the technical working group.

Note that we can only send NVMe command (rather than Fabrics commands)
when the controller is enabled.  Based on that I think that we need
to fix keep alive, but in a different way than in this series.

So instead of starting keep alive in nvme_start_ctrl we should
start in nvme_enable_ctrl, and similar on the stop side with
disable/shutdown.  On the target side we can also only start to
expect a a keep-alive once the controller has been enabled,
and need to stop expecting one once the controller has been disabled.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-04-13 17:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
2018-04-10 17:18 ` [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive" Max Gurtovoy
2018-04-10 17:18 ` [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive Max Gurtovoy
2018-04-13 17:01   ` Christoph Hellwig
2018-04-10 17:18 ` [PATCH v1 3/5] nvme-rdma: add keep-alive mechanism as admin_q property Max Gurtovoy
2018-04-10 17:18 ` [PATCH v1 4/5] nvme-fc: " Max Gurtovoy
2018-04-10 17:18 ` [PATCH 5/5] nvme-loop: " Max Gurtovoy
2018-04-11 13:04 ` [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Sagi Grimberg
2018-04-11 13:38   ` Max Gurtovoy
2018-04-11 14:07     ` Sagi Grimberg
2018-04-11 14:40       ` Max Gurtovoy
2018-04-11 16:48         ` James Smart
2018-04-12  8:49           ` Max Gurtovoy
2018-04-12 12:34             ` Sagi Grimberg
2018-04-12 17:28               ` James Smart
2018-04-12 12:29           ` Sagi Grimberg
2018-04-12 13:32             ` Max Gurtovoy
2018-04-12 15:17               ` Sagi Grimberg
2018-04-12 16:43                 ` Max Gurtovoy
2018-04-12 12:14         ` Sagi Grimberg
2018-04-12 12:18           ` Max Gurtovoy
2018-04-12 12:31             ` Sagi Grimberg
2018-04-13 17:06 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.