All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/5] Fix keep-alive mechanism for fabrics
@ 2018-04-10 17:18 Max Gurtovoy
  2018-04-10 17:18 ` [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive" Max Gurtovoy
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Max Gurtovoy @ 2018-04-10 17:18 UTC (permalink / raw)


Hi all,
I've been debugging the KA mechanism lately and found a lack of
coordination between the target and host implementations.

Johannes,
Sorry for reverting your commit - I'll use nvme_start_keep_alive
for my fix.

I've noticed that there is no clear definition in the NVMe spec
regarding the keep-alive mechanism association. IMO, it should be
a property of the admin queue and should be triggered as soon as
the admin queue configured successfuly.

Idan/Christoph/Sagi,
Any thoughts on that proposal ?
Anyway we should make the spec clear about it, otherwise we'll have
interoperability issue running different implementations/versions.

This patchset was tested using RDMA transport only:
I've created 20 subsystems, 5 namespaces per subsystem and exposed
all through 8 portals (total 160 ctrl's created) on 1 target.
I used 1 initiator (host) and connected successfuly.
Later on I've destroyed the target and caused a reconnection flow
in the initiator side.
Ater ~30-50 seconds, I've configured the target again but the initiator
couldn't reconnect to it (after many retries).
The reason for this was that the keep-alive timer expired at the target
side, caused ctrl fatal error and the io-queue connect failed to find
the ctrl. This loop never converged.

With the patches below, the test passed successfully after 1/2
reconnection attempts.

I was able to test it only with RDMA fabric, so it will be great to have
Tested-by from FC guys as well (also need to test loop).


Max Gurtovoy (5):
  Revert "nvme: unexport nvme_start_keep_alive"
  nvme: remove association between ctrl and keep-alive
  nvme-rdma: add keep-alive mechanism as admin_q property
  nvme-fc: add keep-alive mechanism as admin_q property
  nvme-loop: add keep-alive mechanism as admin_q property

 drivers/nvme/host/core.c   | 7 ++-----
 drivers/nvme/host/fc.c     | 5 +++++
 drivers/nvme/host/nvme.h   | 1 +
 drivers/nvme/host/rdma.c   | 5 +++--
 drivers/nvme/target/loop.c | 3 +++
 5 files changed, 14 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-04-13 17:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10 17:18 [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Max Gurtovoy
2018-04-10 17:18 ` [PATCH v1 1/5] Revert "nvme: unexport nvme_start_keep_alive" Max Gurtovoy
2018-04-10 17:18 ` [PATCH v1 2/5] nvme: remove association between ctrl and keep-alive Max Gurtovoy
2018-04-13 17:01   ` Christoph Hellwig
2018-04-10 17:18 ` [PATCH v1 3/5] nvme-rdma: add keep-alive mechanism as admin_q property Max Gurtovoy
2018-04-10 17:18 ` [PATCH v1 4/5] nvme-fc: " Max Gurtovoy
2018-04-10 17:18 ` [PATCH 5/5] nvme-loop: " Max Gurtovoy
2018-04-11 13:04 ` [PATCH v1 0/5] Fix keep-alive mechanism for fabrics Sagi Grimberg
2018-04-11 13:38   ` Max Gurtovoy
2018-04-11 14:07     ` Sagi Grimberg
2018-04-11 14:40       ` Max Gurtovoy
2018-04-11 16:48         ` James Smart
2018-04-12  8:49           ` Max Gurtovoy
2018-04-12 12:34             ` Sagi Grimberg
2018-04-12 17:28               ` James Smart
2018-04-12 12:29           ` Sagi Grimberg
2018-04-12 13:32             ` Max Gurtovoy
2018-04-12 15:17               ` Sagi Grimberg
2018-04-12 16:43                 ` Max Gurtovoy
2018-04-12 12:14         ` Sagi Grimberg
2018-04-12 12:18           ` Max Gurtovoy
2018-04-12 12:31             ` Sagi Grimberg
2018-04-13 17:06 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.