linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions
@ 2019-08-30 19:19 Sagi Grimberg
  2019-08-30 19:19 ` [PATCH v10 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
  To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke

Hey all,

This series handles the reset and scanning race saga.

The approach is to have the relevant admin commands return a proper
status code that reflects that we had a transport error and
not remove the namepsace if that is indeed the case.

This should be a reliable way to know if the revalidate_disk failed
due to a transport error or not.

I am able to reproduce this race with the following command (using
tcp/rdma):
for j in `seq 50`; do nvme connect-all; for i in `seq 50`; do nvme reset /dev/nvme0; done ; nvme disconnect-all; done

With this patch set I was able to pass the test without reproducing the hang
that hannes reported.

This version follows Christoph's suggestion to ignore non dnr failures
in nvme_revalidate_disk

Changes from v9:
- fixed positive nvme status leak to the block layer (and also for
  the allocation path, just to be consistent)
- instead of checking revalidate_disk ret code, ignore transient
  errors (non-dnr) in nvme_revalidate_disk (suggested by Christoph)
- changed the ordering of the patches a bit
- collected some more review tags

Changes from v8:
- fixed nvme_revalidate_disk to never leak nvme status to the block layer
- used __nvme_revalidate_disk in nvme_validate_ns to also check for nvme
  status if return status is positive
- added patch to rename __nvme_revalidate_disk to nvme_revalidate_ns
- added patch that makes nvme_status_error get status instead of request struct
- added review tags

Changes from v7:
- added patch to split out revalidate_disk to ->revalidate_disk()
  and check_disk_size
- split nvme_validate_ns to call nvme_revalidate_disk and the new
  check_disk_size callout (only if nvme_revalidate_disk succeeded)

Changes from v6:
- dropped patch for nvme_submit_sync_cmd returning blk_status_t, it
  is now returning nvme status or negative errno again
- made nvme_identify_ns return status code and get id struct by reference
- made nvme_validate_ns check for -ENOMEM or NVME_SC_HOST_PATH_ERROR
  to decide if it should/should'nt to remove the namespace.
- added review tags

Changes from v5:
- don't return blk_status_t from nvme_submit_user_cmd

Changes from v4:
- return nvme_error_status in __nvme_submit_sync_cmd and cast to
  errno in nvme_identify_ns
- modified status print in nvme_report_ns_ids

Changes from v3:
- return blk_status_to_errno instead of blk_status_t in sync cmds
- check for normal return errno from revalidate_disk, this covers
  transport errors, but also spurious allocation errors and any
  type of transient errors.

Changes from v2:
- added fc patch from James (can you test please?)
- made nvme_identify_ns return id or PTR_ERR (Hannes)

Changes from v1:
- different approach

James Smart (1):
  nvme-fc: Fail transport errors with NVME_SC_HOST_PATH

Sagi Grimberg (6):
  nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
  nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
  nvme: pass status to nvme_error_status
  nvme: make nvme_identify_ns propagate errors back
  nvme: make nvme_report_ns_ids propagate error back
  nvme: fix ns removal hang when failing to revalidate due to a
    transient error

 drivers/nvme/host/core.c | 83 ++++++++++++++++++++++++++--------------
 drivers/nvme/host/fc.c   | 37 ++++++++++++++----
 drivers/nvme/host/tcp.c  |  2 +-
 3 files changed, 85 insertions(+), 37 deletions(-)

-- 
2.17.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-02  8:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 2/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 3/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 4/7] nvme: pass status to nvme_error_status Sagi Grimberg
2019-09-02  8:25   ` Christoph Hellwig
2019-08-30 19:19 ` [PATCH v10 5/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
2019-09-02  8:25   ` Christoph Hellwig
2019-08-30 19:19 ` [PATCH v10 6/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg
2019-09-02  8:26   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).