From: james.smart@broadcom.com (James Smart)
Subject: [PATCH v6 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error
Date: Thu, 15 Aug 2019 14:17:14 -0700 [thread overview]
Message-ID: <a60abc02-d4d3-3803-f198-33c7047454e5@broadcom.com> (raw)
In-Reply-To: <20190813064304.7344-8-sagi@grimberg.me>
On 8/12/2019 11:43 PM, Sagi Grimberg wrote:
> If a controller reset is racing with a namespace revalidation, the
> revalidation (admin) I/O will surely fail, but we should not remove the
> namespace as we will execute the I/O when the controller is back up.
> Same for spurious allocation errors (return -ENOMEM).
>
> Fix this by checking the specific error code that revalidate_disk
> returns, and if it is a transient error (for example ENOLINK correlates
> to BLK_STS_TRANSPORT or ENOMEM correlates to BLK_STS_RESOURCE or an
> allocation failure), do not remove the namespace as it will either
> recover when the controller is back up and schedule a subsequent scan,
> or the controller is going away and the namespaces will be removed anyways.
>
> This fixes a hang namespace scanning racing with a controller reset and
> also sporious I/O errors in path failover coditions where the
> controller reset is racing with the namespace scan work with multipath
> enabled.
>
> Reported-by: Hannes Reinecke <hare at suse.de>
> Reviewed-by: Hannes Reinecke <hare at suse.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>
This looks fine:??? Reviewed-by: James Smart <james.smart at broadcom.com>
Q: do we need to do something about nvme_update_formats() which does a
nvme_set_queue_dying() if nvme_revalidate_disk() fails ?? It's not
removal, but....
-- james
next prev parent reply other threads:[~2019-08-15 21:17 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-13 6:42 [PATCH v6 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-13 6:42 ` [PATCH v6 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
2019-08-15 21:03 ` James Smart
2019-08-13 6:42 ` [PATCH v6 2/7] nvme: return a proper status for sync commands failure Sagi Grimberg
2019-08-13 7:09 ` Hannes Reinecke
2019-08-15 21:05 ` James Smart
2019-08-13 6:43 ` [PATCH v6 3/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
2019-08-15 21:10 ` James Smart
2019-08-13 6:43 ` [PATCH v6 4/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
2019-08-15 21:11 ` James Smart
2019-08-13 6:43 ` [PATCH v6 5/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
2019-08-15 21:11 ` James Smart
2019-08-13 6:43 ` [PATCH v6 6/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
2019-08-15 21:12 ` James Smart
2019-08-13 6:43 ` [PATCH v6 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg
2019-08-15 21:17 ` James Smart [this message]
2019-08-15 17:45 ` [PATCH v6 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-15 20:01 ` James Smart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a60abc02-d4d3-3803-f198-33c7047454e5@broadcom.com \
--to=james.smart@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).