linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 for-5.8-rc 0/6] address deadlocks in high stress ns scanning and ana updates
@ 2020-06-24  0:18 Sagi Grimberg
  2020-06-24  0:18 ` [PATCH v2 for-5.8-rc 1/6] nvme: fix possible deadlock when I/O is blocked Sagi Grimberg
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Sagi Grimberg @ 2020-06-24  0:18 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Anton Eidelman

Changes from v1:
- Fixed compilation error in patch #4
- Added patch #5 to resolve a use-after-free condition

Hey All,

The following patches addresses some deadlocks observed while performing some
stress testing of a connect/disconnect storm in addition to rapid ana path
switches concurrently (paths may transition between live<->inaccessible)
on a large number of namespaces (100+).

The test mainly triggers three main flows:
1. ongoing ns scanning, in the presence of concurrent ANA path state changes
   and controller removals (disconnect).
2. ongoing ns scanning (or ana processing) in the presence of concurrent
   controller removal (disconnect).
3. ongoing ANA processing in the presence of concurrent controller removal
   (disconnect).

What was observed is that basically when we disconnect while scan_work and/or ana_work
are running, we can easily deadlock. The main reason is that scan_work and ana_work
may both register the gendisk, triggering I/O (partition scans). Given that a
controller removal (disconnect) may also be running at the same time, I/O may
block. The issue with blocking the head->disk I/O under the locks taken by
both ana_work and scan_work, it means that no other path may update path states
and by doing so, unblock the blocking I/O.

With this patchset applied, the test is able to pass successfully without any
deadlocks.

The last patch is posted as an RFC, while it solves a real problem, we are
essentially adding state to the controller without it going via the normal
controller state, the reason is that the controller state will also affect
ongoing mpath I/O which is the original cause of the deadlock. We are open
to alternative better suggestions if such exist.

Anton Eidelman (3):
  nvme-multipath: fix deadlock between ana_work and scan_work
  nvme-multipath: fix deadlock due to head->lock
  nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Sagi Grimberg (3):
  nvme: fix possible deadlock when I/O is blocked
  nvme: don't protect ns mutation with ns->head->lock
  nvme-multipath: fix bogus request queue reference put

 drivers/nvme/host/core.c      | 11 +++++++-
 drivers/nvme/host/multipath.c | 48 +++++++++++++++++++++++++----------
 drivers/nvme/host/nvme.h      |  3 +++
 3 files changed, 47 insertions(+), 15 deletions(-)

-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-07-14 11:13 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24  0:18 [PATCH v2 for-5.8-rc 0/6] address deadlocks in high stress ns scanning and ana updates Sagi Grimberg
2020-06-24  0:18 ` [PATCH v2 for-5.8-rc 1/6] nvme: fix possible deadlock when I/O is blocked Sagi Grimberg
2020-06-24  6:29   ` Christoph Hellwig
2020-06-24  6:54     ` Sagi Grimberg
2020-06-24  6:57       ` Christoph Hellwig
2020-06-24  7:09         ` Sagi Grimberg
2020-07-07 10:57       ` Anthony Iliopoulos
2020-07-08 14:42         ` Christoph Hellwig
2020-07-10  4:47           ` Sagi Grimberg
2020-07-14 11:12         ` Christoph Hellwig
2020-06-24  0:18 ` [PATCH v2 for-5.8-rc 2/6] nvme-multipath: fix deadlock between ana_work and scan_work Sagi Grimberg
2020-06-24  6:34   ` Christoph Hellwig
2020-06-24  6:56     ` Sagi Grimberg
2020-06-24  0:18 ` [PATCH v2 for-5.8-rc 3/6] nvme: don't protect ns mutation with ns->head->lock Sagi Grimberg
2020-06-24  6:37   ` Christoph Hellwig
2020-06-24  6:58     ` Sagi Grimberg
2020-06-24  8:24     ` Sagi Grimberg
2020-06-24  0:18 ` [PATCH v2 for-5.8-rc 4/6] nvme-multipath: fix deadlock due to head->lock Sagi Grimberg
2020-06-24  6:39   ` Christoph Hellwig
2020-06-24  7:00     ` Sagi Grimberg
2020-06-24  0:18 ` [PATCH v2 for-5.8-rc 5/6] nvme-multipath: fix bogus request queue reference put Sagi Grimberg
2020-06-24  6:40   ` Christoph Hellwig
2020-06-24  0:18 ` [PATCH v2 RFC 6/6] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work Sagi Grimberg
2020-06-24  6:43   ` Christoph Hellwig
2020-06-24  7:13     ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).