linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] fix possible controller reset hangs in nvme-tcp/nvme-rdma
@ 2020-08-06 19:11 Sagi Grimberg
  2020-08-06 19:11 ` [PATCH v2 1/8] nvme-fabrics: allow to queue requests for live queues Sagi Grimberg
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Sagi Grimberg @ 2020-08-06 19:11 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch, James Smart

When a controller reset runs during I/O we may hang if the controller
suddenly becomes unresponsive during the reset and/or the reconnection
stages. This is due to how the timeout handler did not fail inflight
commands properly and also not being able to abort the controller reset
sequence when the controller becomes unresponsive (hence can't ever
recover even if the controller ever becomes responsive again).

This set fixes nvme-tcp and nvme-rdma for exactly the same scenarios.

Patch 1 prevents commands being rejected by a live queue, making
commands mistakenly getting requeued forever while we are either
resetting or connecting to a controller.

Patch 2 lets consumers know if the freeze completed or a timeout
elapsed, will be used in patches 5,8.

Patches 3,4,6,7 fix the timeout handler in nvme-tcp and nvme-rdma
respectively to correctly and safely fail requests that are a
part of a serial (blocking) initialization or teardown sequences.

Patches 5,8 address the case when a controller stops responding when
we are in the middle of a connection establishment stage (tcp and rdma).

James, please have a look as well to see what needs to be addressed
for fc.

Changes from v1:
- added patches 3,6 to protect against possible (but rare) double
  completions for timed out requests.

Sagi Grimberg (8):
  nvme-fabrics: allow to queue requests for live queues
  nvme: have nvme_wait_freeze_timeout return if it timed out
  nvme-tcp: serialize controller teardown double completion
  nvme-tcp: fix timeout handler
  nvme-tcp: fix reset hang if controller died in the middle of a reset
  nvme-rdma: serialize controller teardown sequences
  nvme-rdma: fix timeout handler
  nvme-rdma: fix reset hang if controller died in the middle of a reset

 drivers/nvme/host/core.c    |  3 +-
 drivers/nvme/host/fabrics.c | 13 ++++--
 drivers/nvme/host/nvme.h    |  2 +-
 drivers/nvme/host/rdma.c    | 85 +++++++++++++++++++++++++--------
 drivers/nvme/host/tcp.c     | 93 ++++++++++++++++++++++++++++---------
 5 files changed, 147 insertions(+), 49 deletions(-)

-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-08-19  0:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06 19:11 [PATCH v2 0/8] fix possible controller reset hangs in nvme-tcp/nvme-rdma Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 1/8] nvme-fabrics: allow to queue requests for live queues Sagi Grimberg
2020-08-14  6:44   ` Christoph Hellwig
2020-08-14  7:08     ` Sagi Grimberg
2020-08-14  7:22       ` Christoph Hellwig
2020-08-14 15:55         ` James Smart
2020-08-14 17:49         ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 2/8] nvme: have nvme_wait_freeze_timeout return if it timed out Sagi Grimberg
2020-08-14  6:45   ` Christoph Hellwig
2020-08-14  7:09     ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 3/8] nvme-tcp: serialize controller teardown double completion Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 4/8] nvme-tcp: fix timeout handler Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 5/8] nvme-tcp: fix reset hang if controller died in the middle of a reset Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 6/8] nvme-rdma: serialize controller teardown sequences Sagi Grimberg
2020-08-14  6:45   ` Christoph Hellwig
2020-08-14 21:12   ` James Smart
2020-08-19  0:35     ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 7/8] nvme-rdma: fix timeout handler Sagi Grimberg
2020-08-14  6:52   ` Christoph Hellwig
2020-08-14  7:14     ` Sagi Grimberg
2020-08-14 23:19       ` James Smart
2020-08-19  0:26         ` Sagi Grimberg
2020-08-14 23:27   ` James Smart
2020-08-14 23:30     ` James Smart
2020-08-19  0:39       ` Sagi Grimberg
2020-08-19  0:38     ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 8/8] nvme-rdma: fix reset hang if controller died in the middle of a reset Sagi Grimberg
2020-08-14  6:53   ` Christoph Hellwig
2020-08-11 22:16 ` [PATCH v2 0/8] fix possible controller reset hangs in nvme-tcp/nvme-rdma Sagi Grimberg
2020-08-13 15:39   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).