Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
@ 2021-04-16 22:06 Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 1/4] nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set Mike Snitzer
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Mike Snitzer @ 2021-04-16 22:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: dm-devel, linux-block, linux-nvme

Hi,

This patchset reflects changes needed to make NVMe error handling and
ANA state updates work well with dm-multipath (which always sets
REQ_FAILFAST_TRANSPORT).

RHEL8 has been carrying an older ~5.9 based version of this patchset
(since RHEL8.3, August 2020).

RHEL9 is coming, would really prefer that these changes land upstream
rather than carry them within RHEL.

All review/feedback welcome.

Thanks,
Mike

v2 -> v3:
- Added Reviewed-by tags to BLK_STS_DO_NOT_RETRY patch.
- Eliminated __nvme_end_req() and added code comment to
  nvme_failup_req() in FAILUP handling patch.

Chao Leng (1):
  nvme: allow local retry for requests with REQ_FAILFAST_TRANSPORT set

Mike Snitzer (3):
  nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set
  nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT
  nvme: decouple basic ANA log page re-read support from native
    multipathing

 drivers/nvme/host/core.c      | 42 +++++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/multipath.c | 16 +++++++++++-----
 drivers/nvme/host/nvme.h      |  4 ++++
 include/linux/blk_types.h     |  8 ++++++++
 4 files changed, 62 insertions(+), 8 deletions(-)

-- 
2.15.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set
  2021-04-16 22:06 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
@ 2021-04-16 22:06 ` Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 2/4] nvme: allow local retry for requests with REQ_FAILFAST_TRANSPORT set Mike Snitzer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2021-04-16 22:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: dm-devel, linux-block, linux-nvme

If the DNR bit is set we should not retry the command.

We care about the retryable vs not retryable distinction at the block
layer so propagate the equivalent of the DNR bit by introducing
BLK_STS_DO_NOT_RETRY. Update blk_path_error() to _not_ retry if it
is set.

This change runs with the suggestion made here:
https://lore.kernel.org/linux-nvme/20190813170144.GA10269@lst.de/

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c  | 3 +++
 include/linux/blk_types.h | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642be..540d6fd8ffef 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -237,6 +237,9 @@ static void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
 
 static blk_status_t nvme_error_status(u16 status)
 {
+	if (unlikely(status & NVME_SC_DNR))
+		return BLK_STS_DO_NOT_RETRY;
+
 	switch (status & 0x7ff) {
 	case NVME_SC_SUCCESS:
 		return BLK_STS_OK;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..1ca724948c56 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -142,6 +142,13 @@ typedef u8 __bitwise blk_status_t;
  */
 #define BLK_STS_ZONE_ACTIVE_RESOURCE	((__force blk_status_t)16)
 
+/*
+ * BLK_STS_DO_NOT_RETRY is returned from the driver in the completion path
+ * if the device returns a status indicating that if the same command is
+ * re-submitted it is expected to fail.
+ */
+#define BLK_STS_DO_NOT_RETRY	((__force blk_status_t)17)
+
 /**
  * blk_path_error - returns true if error may be path related
  * @error: status the request was completed with
@@ -157,6 +164,7 @@ typedef u8 __bitwise blk_status_t;
 static inline bool blk_path_error(blk_status_t error)
 {
 	switch (error) {
+	case BLK_STS_DO_NOT_RETRY:
 	case BLK_STS_NOTSUPP:
 	case BLK_STS_NOSPC:
 	case BLK_STS_TARGET:
-- 
2.15.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] nvme: allow local retry for requests with REQ_FAILFAST_TRANSPORT set
  2021-04-16 22:06 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 1/4] nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set Mike Snitzer
@ 2021-04-16 22:06 ` Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 3/4] nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 4/4] nvme: decouple basic ANA log page re-read support from native multipathing Mike Snitzer
  3 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2021-04-16 22:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: dm-devel, linux-block, linux-nvme, Chao Leng

From: Chao Leng <lengchao@huawei.com>

REQ_FAILFAST_TRANSPORT was designed for SCSI, because the SCSI protocol
does not define the local retry mechanism. SCSI implements a fuzzy
local retry mechanism, so REQ_FAILFAST_TRANSPORT is needed to allow
higher-level multipathing software to perform failover/retry.

NVMe is different with SCSI about this. It defines a local retry
mechanism and path error codes, so NVMe should retry local for non
path error. If path related error, whether to retry and how to retry
is still determined by higher-level multipathing's failover.

Unlike SCSI, NVMe shouldn't prevent retry if REQ_FAILFAST_TRANSPORT
because NVMe's local retry is needed -- as is NVMe specific logic to
categorize whether an error is path related.

In this way, the mechanism of NVMe multipath or other multipath are
now equivalent. The mechanism is: non path related error will be
retried locally, path related error is handled by multipath.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[snitzer: edited header for grammar and clarity, also added code comment]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/nvme/host/core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 540d6fd8ffef..4134cf3c7e48 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -306,7 +306,14 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
 	if (likely(nvme_req(req)->status == 0))
 		return COMPLETE;
 
-	if (blk_noretry_request(req) ||
+	/*
+	 * REQ_FAILFAST_TRANSPORT is set by upper layer software that
+	 * handles multipathing. Unlike SCSI, NVMe's error handling was
+	 * specifically designed to handle local retry for non-path errors.
+	 * As such, allow NVMe's local retry mechanism to be used for
+	 * requests marked with REQ_FAILFAST_TRANSPORT.
+	 */
+	if ((req->cmd_flags & (REQ_FAILFAST_DEV | REQ_FAILFAST_DRIVER)) ||
 	    (nvme_req(req)->status & NVME_SC_DNR) ||
 	    nvme_req(req)->retries >= nvme_max_retries)
 		return COMPLETE;
-- 
2.15.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT
  2021-04-16 22:06 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 1/4] nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 2/4] nvme: allow local retry for requests with REQ_FAILFAST_TRANSPORT set Mike Snitzer
@ 2021-04-16 22:06 ` Mike Snitzer
  2021-04-16 22:06 ` [PATCH v3 4/4] nvme: decouple basic ANA log page re-read support from native multipathing Mike Snitzer
  3 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2021-04-16 22:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: dm-devel, linux-block, linux-nvme

If REQ_FAILFAST_TRANSPORT is set it means the driver should not retry
IO that completed with transport errors. REQ_FAILFAST_TRANSPORT is
set by multipathing software (e.g. dm-multipath) before it issues IO.

Update NVMe to allow failover of requests marked with either
REQ_NVME_MPATH or REQ_FAILFAST_TRANSPORT. This allows such requests
to be given a disposition of either FAILOVER or FAILUP respectively.
FAILUP handling ensures a retryable error is returned up from NVMe.

Introduce nvme_failup_req() for use in nvme_complete_rq() if
nvme_decide_disposition() returns FAILUP. nvme_failup_req() ensures
the request is completed with a retryable IO error when appropriate.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/nvme/host/core.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4134cf3c7e48..605ffba6835f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -299,6 +299,7 @@ enum nvme_disposition {
 	COMPLETE,
 	RETRY,
 	FAILOVER,
+	FAILUP,
 };
 
 static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
@@ -318,10 +319,11 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
 	    nvme_req(req)->retries >= nvme_max_retries)
 		return COMPLETE;
 
-	if (req->cmd_flags & REQ_NVME_MPATH) {
+	if (req->cmd_flags & (REQ_NVME_MPATH | REQ_FAILFAST_TRANSPORT)) {
 		if (nvme_is_path_error(nvme_req(req)->status) ||
 		    blk_queue_dying(req->q))
-			return FAILOVER;
+			return (req->cmd_flags & REQ_NVME_MPATH) ?
+				FAILOVER : FAILUP;
 	} else {
 		if (blk_queue_dying(req->q))
 			return COMPLETE;
@@ -343,6 +345,25 @@ static inline void nvme_end_req(struct request *req)
 	blk_mq_end_request(req, status);
 }
 
+static void nvme_failup_req(struct request *req)
+{
+	blk_status_t status = nvme_error_status(nvme_req(req)->status);
+
+	/* Ensure a retryable path error is returned */
+	if (WARN_ON_ONCE(!blk_path_error(status))) {
+		/*
+		 * If here, nvme_is_path_error() returned true.
+		 * So nvme_error_status() translation needs updating
+		 * relative to blk_path_error(), or vice versa.
+		 */
+		pr_debug("Request meant for failover but blk_status_t (errno=%d) was not retryable.\n",
+			 blk_status_to_errno(status));
+		nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
+	}
+
+	nvme_end_req(req);
+}
+
 void nvme_complete_rq(struct request *req)
 {
 	trace_nvme_complete_rq(req);
@@ -361,6 +382,9 @@ void nvme_complete_rq(struct request *req)
 	case FAILOVER:
 		nvme_failover_req(req);
 		return;
+	case FAILUP:
+		nvme_failup_req(req);
+		return;
 	}
 }
 EXPORT_SYMBOL_GPL(nvme_complete_rq);
-- 
2.15.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] nvme: decouple basic ANA log page re-read support from native multipathing
  2021-04-16 22:06 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
                   ` (2 preceding siblings ...)
  2021-04-16 22:06 ` [PATCH v3 3/4] nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT Mike Snitzer
@ 2021-04-16 22:06 ` Mike Snitzer
  3 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2021-04-16 22:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: dm-devel, linux-block, linux-nvme

Whether or not ANA is present is a choice of the target implementation;
the host (and whether it supports multipathing) has _zero_ influence on
this. If the target declares a path as 'inaccessible' the path _is_
inaccessible to the host. As such, ANA support should be functional
even if native multipathing is not.

Introduce ability to always re-read ANA log page as required due to ANA
error and make current ANA state available via sysfs -- even if native
multipathing is disabled on the host (e.g. nvme_core.multipath=N).
This is achieved by factoring out nvme_update_ana() and calling it in
nvme_complete_rq() for all FAILOVER requests.

This affords userspace access to the current ANA state independent of
which layer might be doing multipathing. This makes 'nvme list-subsys'
show ANA state for all NVMe subsystems with multiple controllers. It
also allows userspace multipath-tools to rely on the NVMe driver for
ANA support while dm-multipath takes care of multipathing.

And as always, if embedded NVMe users do not want any performance
overhead associated with ANA or native NVMe multipathing they can
disable CONFIG_NVME_MULTIPATH.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/nvme/host/core.c      |  2 ++
 drivers/nvme/host/multipath.c | 16 +++++++++++-----
 drivers/nvme/host/nvme.h      |  4 ++++
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 605ffba6835f..9a878a599897 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -349,6 +349,8 @@ static void nvme_failup_req(struct request *req)
 {
 	blk_status_t status = nvme_error_status(nvme_req(req)->status);
 
+	nvme_update_ana(req);
+
 	/* Ensure a retryable path error is returned */
 	if (WARN_ON_ONCE(!blk_path_error(status))) {
 		/*
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index a1d476e1ac02..7d94250264aa 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -65,23 +65,29 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	}
 }
 
-void nvme_failover_req(struct request *req)
+void nvme_update_ana(struct request *req)
 {
 	struct nvme_ns *ns = req->q->queuedata;
 	u16 status = nvme_req(req)->status & 0x7ff;
-	unsigned long flags;
-
-	nvme_mpath_clear_current_path(ns);
 
 	/*
 	 * If we got back an ANA error, we know the controller is alive but not
-	 * ready to serve this namespace.  Kick of a re-read of the ANA
+	 * ready to serve this namespace.  Kick off a re-read of the ANA
 	 * information page, and just try any other available path for now.
 	 */
 	if (nvme_is_ana_error(status) && ns->ctrl->ana_log_buf) {
 		set_bit(NVME_NS_ANA_PENDING, &ns->flags);
 		queue_work(nvme_wq, &ns->ctrl->ana_work);
 	}
+}
+
+void nvme_failover_req(struct request *req)
+{
+	struct nvme_ns *ns = req->q->queuedata;
+	unsigned long flags;
+
+	nvme_mpath_clear_current_path(ns);
+	nvme_update_ana(req);
 
 	spin_lock_irqsave(&ns->head->requeue_lock, flags);
 	blk_steal_bios(&ns->head->requeue_list, req);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07b34175c6ce..4eed8536625c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -664,6 +664,7 @@ void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 			struct nvme_ctrl *ctrl, int *flags);
 void nvme_failover_req(struct request *req);
+void nvme_update_ana(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
@@ -714,6 +715,9 @@ static inline void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 static inline void nvme_failover_req(struct request *req)
 {
 }
+static inline void nvme_update_ana(struct request *req)
+{
+}
 static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 {
 }
-- 
2.15.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
  2021-05-01 11:58       ` Hannes Reinecke
@ 2021-05-01 15:19         ` Mike Snitzer
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2021-05-01 15:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Laurence Oberman, Christoph Hellwig, Jens Axboe, dm-devel,
	linux-block, linux-nvme

On Sat, May 01 2021 at  7:58am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 4/20/21 5:46 PM, Laurence Oberman wrote:
> [ .. ]
> >
> >Let me add some reasons why as primarily a support person that this is
> >important and try avoid another combative situation.
> >
> >Customers depend on managing device-mapper-multipath the way it is now
> >even with the advent of nvme-over-F/C. Years of administration and
> >management for multiple Enterprise O/S vendor customers (Suse/Red Hat,
> >Oracle) all depend on managing multipath access in a transparent way.
> >
> >I respect everybody's point of view here but native does change log
> >alerting and recovery and that is what will take time for customers to
> >adopt.
> >
> >It is going to take time for Enterprise customers to transition so all
> >we want is an option for them. At some point they will move to native
> >but we always like to keep in step with upstream as much as possible.
> >
> >Of course we could live with RHEL-only for while but that defeats our
> >intention to be as close to upstream as possible.
> >
> >If we could have this accepted upstream for now perhaps when customers
> >are ready to move to native only we could phase this out.
> >
> >Any technical reason why this would not fly is of course important to
> >consider but perhaps for now we have a parallel option until we dont.
> >
> Curiously, we (as in we as SLES developers) have found just the opposite.
> NVMe is a new technology, and out of necessity there will not be any
> existing installations where we have to be compatible with.
> We have switched to native NVMe multipathing with SLE15, and decided
> to educate customers that NVMe is a different concept than SCSI, and
> one shouldn't try treat both the same way.

As you well know: dm-multipath was first engineered to handle SCSI, and
it was too tightly coupled at the start, but the scsi_dh interface
provided sorely missing abstraction. With NVMe, dm-multipath was
further enhanced to not do work only needed for SCSI.

Seems SUSE has forgotten that dm-multipath has taken strides to properly
abstract away SCSI specific details, at least this patchset forgot it
(because proper layering/abstraction is too hard? that mantra is what
gave native NVMe multipath life BTW):
https://patchwork.kernel.org/project/dm-devel/list/?series=475271

Long story short, there is utility in dm-multipath being transport
agnostic with specialized protocol specific bits properly abstracted.

If you or others don't care about any of that anymore, that's fine! But
it doesn't mean others don't. Thankfully both can exist perfectly fine,
sadly that clearly isn't possible without absurd tribal fighting (at
least in the context of NVMe).

And to be clear Hannes: your quick review of this patchset couldn't have
been less helpful or informed. Yet it enabled NVMe maintainers to ignore
technical review (you gave them cover).

The lack of proper technical review of this patchset was astonishing but
hch's dysfunctional attack that took its place really _should_ concern
others. Seems it doesn't, must be nice to not have a dog in the fight
other than philosophical ideals that enable turning a blind eye.

> This was helped by the
> fact the SLE15 is a new release, so customers were accustomed to
> having to change bits and pieces in their infrastructure to support
> new releases.

Sounds like you either have very few customers and/or they don't use
layers that were engineered with dm-multipath being an integral layer
in the IO stack. That's fine, but that doesn't prove anything other
than your limited experience.

> Overall it worked reasonably well; we sure found plenty of bugs, but
> that was kind of expected, and for bad or worse nearly all of them
> turned out to be upstream issues. Which was good for us (nothing
> beats being able to blame things on upstream, if one is careful to
> not linger too much on the fact that one is part of upstream); and
> upstream these things will need to be fixed anyway.
> So we had a bit of a mixed experience, but customers seemed to be
> happy enough with this step.
> 
> Sorry about that :-)

Nothing to be sorry about, good on you and the others at SUSE
engineering for improving native NVMe multipathing. Red Hat supports it
too, so your and others' efforts are appreciated there.

Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
  2021-04-20 15:46     ` Laurence Oberman
@ 2021-05-01 11:58       ` Hannes Reinecke
  2021-05-01 15:19         ` Mike Snitzer
  0 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2021-05-01 11:58 UTC (permalink / raw)
  To: Laurence Oberman, Mike Snitzer, Christoph Hellwig
  Cc: Jens Axboe, dm-devel, linux-block, linux-nvme

On 4/20/21 5:46 PM, Laurence Oberman wrote:
[ .. ]
> 
> Let me add some reasons why as primarily a support person that this is
> important and try avoid another combative situation.
> 
> Customers depend on managing device-mapper-multipath the way it is now
> even with the advent of nvme-over-F/C. Years of administration and
> management for multiple Enterprise O/S vendor customers (Suse/Red Hat,
> Oracle) all depend on managing multipath access in a transparent way.
> 
> I respect everybody's point of view here but native does change log
> alerting and recovery and that is what will take time for customers to
> adopt.
> 
> It is going to take time for Enterprise customers to transition so all
> we want is an option for them. At some point they will move to native
> but we always like to keep in step with upstream as much as possible.
> 
> Of course we could live with RHEL-only for while but that defeats our
> intention to be as close to upstream as possible.
> 
> If we could have this accepted upstream for now perhaps when customers
> are ready to move to native only we could phase this out.
> 
> Any technical reason why this would not fly is of course important to
> consider but perhaps for now we have a parallel option until we dont.
> 
Curiously, we (as in we as SLES developers) have found just the opposite.
NVMe is a new technology, and out of necessity there will not be any 
existing installations where we have to be compatible with.
We have switched to native NVMe multipathing with SLE15, and decided to 
educate customers that NVMe is a different concept than SCSI, and one 
shouldn't try treat both the same way. This was helped by the fact the 
SLE15 is a new release, so customers were accustomed to having to change 
bits and pieces in their infrastructure to support new releases.

Overall it worked reasonably well; we sure found plenty of bugs, but 
that was kind of expected, and for bad or worse nearly all of them 
turned out to be upstream issues. Which was good for us (nothing beats 
being able to blame things on upstream, if one is careful to not linger 
too much on the fact that one is part of upstream); and upstream these 
things will need to be fixed anyway.
So we had a bit of a mixed experience, but customers seemed to be happy 
enough with this step.

Sorry about that :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
  2021-04-20 14:38   ` Mike Snitzer
@ 2021-04-20 15:46     ` Laurence Oberman
  2021-05-01 11:58       ` Hannes Reinecke
  0 siblings, 1 reply; 11+ messages in thread
From: Laurence Oberman @ 2021-04-20 15:46 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig
  Cc: Jens Axboe, dm-devel, linux-block, linux-nvme

On Tue, 2021-04-20 at 10:38 -0400, Mike Snitzer wrote:
> On Tue, Apr 20 2021 at  5:37am -0400,
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > > RHEL9 is coming, would really prefer that these changes land
> > > upstream
> > > rather than carry them within RHEL.
> > 
> > We've told from the very beginning that dm-multipth on nvme is not
> > a support configuration.
> 
> You have some high quality revisionist history there. But other than
> pointing that out I'm not going to dwell on our past discussions on
> how
> NVMe multipathing would be.
> 
> > Red Hat decided to ignore that and live with the pain.
> 
> Red Hat supports both native nvme-multipath _and_ DM-multipath on
> NVMe.
> 
> The only "pain" I've been living with is trying to get you to be
> impartial and allow others to provide Linux multipathing as they see
> fit.
> 
> > Your major version change is a chance to fix this up on the Red Hat
> > side, not to resubmit bogus patches upstream.
> 
> Please spare me the vapid and baseless assertion about patches you
> refuse to review technically without political motivation.
> 
> > In other words: please get your house in order NOW.
> 
> My simple 3 patch submission was an attempt to do so. Reality is the
> Linux NVMe maintainers need to get their collective house in order.
> 
> Until sanity prevails these NVMe changes will be carried in RHEL. And
> if
> you go out of your way to cause trivial, or elaborate, conflicts now
> that you _know_ that changes that are being carried it will be
> handled
> without issue.
> 
> Sad this is where we are but it is what it is.
> 
> Linux is about choice that is founded upon need. Hostile action that
> unilaterally limits choice is antithetical to Linux and Open Source.
> 
> Mike
> 

Hello

Let me add some reasons why as primarily a support person that this is
important and try avoid another combative situation. 

Customers depend on managing device-mapper-multipath the way it is now
even with the advent of nvme-over-F/C. Years of administration and
management for multiple Enterprise O/S vendor customers (Suse/Red Hat,
Oracle) all depend on managing multipath access in a transparent way.

I respect everybody's point of view here but native does change log
alerting and recovery and that is what will take time for customers to
adopt. 

It is going to take time for Enterprise customers to transition so all
we want is an option for them. At some point they will move to native
but we always like to keep in step with upstream as much as possible.

Of course we could live with RHEL-only for while but that defeats our
intention to be as close to upstream as possible.

If we could have this accepted upstream for now perhaps when customers
are ready to move to native only we could phase this out.

Any technical reason why this would not fly is of course important to
consider but perhaps for now we have a parallel option until we dont.

With due respect to all concerned.

Laurence Oberman


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
  2021-04-20  9:37 ` Christoph Hellwig
@ 2021-04-20 14:38   ` Mike Snitzer
  2021-04-20 15:46     ` Laurence Oberman
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2021-04-20 14:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, dm-devel, linux-block, linux-nvme

On Tue, Apr 20 2021 at  5:37am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> > RHEL9 is coming, would really prefer that these changes land upstream
> > rather than carry them within RHEL.
> 
> We've told from the very beginning that dm-multipth on nvme is not
> a support configuration.

You have some high quality revisionist history there. But other than
pointing that out I'm not going to dwell on our past discussions on how
NVMe multipathing would be.

> Red Hat decided to ignore that and live with the pain.

Red Hat supports both native nvme-multipath _and_ DM-multipath on NVMe.

The only "pain" I've been living with is trying to get you to be
impartial and allow others to provide Linux multipathing as they see
fit.

> Your major version change is a chance to fix this up on the Red Hat
> side, not to resubmit bogus patches upstream.

Please spare me the vapid and baseless assertion about patches you
refuse to review technically without political motivation.

> In other words: please get your house in order NOW.

My simple 3 patch submission was an attempt to do so. Reality is the
Linux NVMe maintainers need to get their collective house in order.

Until sanity prevails these NVMe changes will be carried in RHEL. And if
you go out of your way to cause trivial, or elaborate, conflicts now
that you _know_ that changes that are being carried it will be handled
without issue.

Sad this is where we are but it is what it is.

Linux is about choice that is founded upon need. Hostile action that
unilaterally limits choice is antithetical to Linux and Open Source.

Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
  2021-04-16 23:53 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
@ 2021-04-20  9:37 ` Christoph Hellwig
  2021-04-20 14:38   ` Mike Snitzer
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-04-20  9:37 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Jens Axboe, dm-devel, linux-block, linux-nvme

> RHEL9 is coming, would really prefer that these changes land upstream
> rather than carry them within RHEL.

We've told from the very beginning that dm-multipth on nvme is not
a support configuration.  Red Hat decided to ignore that and live with the
pain.  Your major version change is a chance to fix this up on the Red Hat
side, not to resubmit bogus patches upstream.

In other words: please get your house in order NOW.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
@ 2021-04-16 23:53 Mike Snitzer
  2021-04-20  9:37 ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2021-04-16 23:53 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: dm-devel, linux-block, linux-nvme

Hi,

This patchset reflects changes needed to make NVMe error handling and
ANA state updates work well with dm-multipath (which always sets
REQ_FAILFAST_TRANSPORT).

RHEL8 has been carrying an older ~5.9 based version of this patchset
(since RHEL8.3, August 2020).

RHEL9 is coming, would really prefer that these changes land upstream
rather than carry them within RHEL.

All review/feedback welcome.

Thanks,
Mike

v3 -> v4, less is more:
- folded REQ_FAILFAST_TRANSPORT local retry and FAILUP patches
- simplified nvme_failup_req(), removes needless blk_path_error() et al
- removed comment block in nvme_decide_disposition()

v2 -> v3:
- Added Reviewed-by tags to BLK_STS_DO_NOT_RETRY patch.
- Eliminated __nvme_end_req() and added code comment to
  nvme_failup_req() in FAILUP handling patch.

Mike Snitzer (3):
  nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set
  nvme: allow local retry and proper failover for REQ_FAILFAST_TRANSPORT
  nvme: decouple basic ANA log page re-read support from native
    multipathing

 drivers/nvme/host/core.c      | 22 +++++++++++++++++++---
 drivers/nvme/host/multipath.c | 16 +++++++++++-----
 drivers/nvme/host/nvme.h      |  4 ++++
 include/linux/blk_types.h     |  8 ++++++++
 4 files changed, 42 insertions(+), 8 deletions(-)

-- 
2.15.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16 22:06 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
2021-04-16 22:06 ` [PATCH v3 1/4] nvme: return BLK_STS_DO_NOT_RETRY if the DNR bit is set Mike Snitzer
2021-04-16 22:06 ` [PATCH v3 2/4] nvme: allow local retry for requests with REQ_FAILFAST_TRANSPORT set Mike Snitzer
2021-04-16 22:06 ` [PATCH v3 3/4] nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT Mike Snitzer
2021-04-16 22:06 ` [PATCH v3 4/4] nvme: decouple basic ANA log page re-read support from native multipathing Mike Snitzer
2021-04-16 23:53 [PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath Mike Snitzer
2021-04-20  9:37 ` Christoph Hellwig
2021-04-20 14:38   ` Mike Snitzer
2021-04-20 15:46     ` Laurence Oberman
2021-05-01 11:58       ` Hannes Reinecke
2021-05-01 15:19         ` Mike Snitzer

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git