linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-multipath: do not reset controller on unknown status
@ 2020-02-12 13:41 Hannes Reinecke
  2020-02-12 17:53 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2020-02-12 13:41 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, John Managhini, Christoph Hellwig, linux-nvme,
	Hannes Reinecke

We're seeing occasional controller resets during straight I/O,
but only when multipath is active.
The problem here is the nvme-multipath will reset the controller
on every unknown status, which really is an odd behaviour, seeing
that the host already received a perfectly good status; it's just
that it's not smart enough to understand it.
And resetting wouldn't help at all; the error status will continue
to be received.
So we should rather pass up any unknown error to the generic
routines and let them deal with this situation.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: John Managhini <john.meneghini@netapp.com>
---
 drivers/nvme/host/core.c      |  4 ++--
 drivers/nvme/host/multipath.c | 18 ++++++++++--------
 drivers/nvme/host/nvme.h      |  2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5dc32b72e7fa..edb081781ae7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -293,8 +293,8 @@ void nvme_complete_rq(struct request *req)
 	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
 		if ((req->cmd_flags & REQ_NVME_MPATH) &&
 		    blk_path_error(status)) {
-			nvme_failover_req(req);
-			return;
+			if (nvme_failover_req(req))
+				return;
 		}
 
 		if (!blk_queue_dying(req->q)) {
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 797c18337d96..71e8acae78eb 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -64,16 +64,16 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	}
 }
 
-void nvme_failover_req(struct request *req)
+bool nvme_failover_req(struct request *req)
 {
 	struct nvme_ns *ns = req->q->queuedata;
 	u16 status = nvme_req(req)->status;
 	unsigned long flags;
+	bool handled = false;
 
 	spin_lock_irqsave(&ns->head->requeue_lock, flags);
 	blk_steal_bios(&ns->head->requeue_list, req);
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
-	blk_mq_end_request(req, 0);
 
 	switch (status & 0x7ff) {
 	case NVME_SC_ANA_TRANSITION:
@@ -88,11 +88,13 @@ void nvme_failover_req(struct request *req)
 		 * mark the the path as pending and kick of a re-read of the ANA
 		 * log page ASAP.
 		 */
+		blk_mq_end_request(req, 0);
 		nvme_mpath_clear_current_path(ns);
 		if (ns->ctrl->ana_log_buf) {
 			set_bit(NVME_NS_ANA_PENDING, &ns->flags);
 			queue_work(nvme_wq, &ns->ctrl->ana_work);
 		}
+		handled = true;
 		break;
 	case NVME_SC_HOST_PATH_ERROR:
 	case NVME_SC_HOST_ABORTED_CMD:
@@ -100,18 +102,18 @@ void nvme_failover_req(struct request *req)
 		 * Temporary transport disruption in talking to the controller.
 		 * Try to send on a new path.
 		 */
+		blk_mq_end_request(req, 0);
 		nvme_mpath_clear_current_path(ns);
+		handled = true;
 		break;
 	default:
-		/*
-		 * Reset the controller for any non-ANA error as we don't know
-		 * what caused the error.
-		 */
-		nvme_reset_ctrl(ns->ctrl);
+		/* Delegate to common error handling */
 		break;
 	}
 
-	kblockd_schedule_work(&ns->head->requeue_work);
+	if (handled)
+		kblockd_schedule_work(&ns->head->requeue_work);
+	return handled;
 }
 
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec7914c..7e28084f71af 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -550,7 +550,7 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
 void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 			struct nvme_ctrl *ctrl, int *flags);
-void nvme_failover_req(struct request *req);
+bool nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
-- 
2.16.4


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-12 13:41 [PATCH] nvme-multipath: do not reset controller on unknown status Hannes Reinecke
@ 2020-02-12 17:53 ` Christoph Hellwig
  2020-02-12 19:33   ` Sagi Grimberg
  2020-02-13  6:53   ` Hannes Reinecke
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2020-02-12 17:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Keith Busch, John Managhini, Sagi Grimberg, linux-nvme,
	Christoph Hellwig

On Wed, Feb 12, 2020 at 02:41:40PM +0100, Hannes Reinecke wrote:
> We're seeing occasional controller resets during straight I/O,
> but only when multipath is active.
> The problem here is the nvme-multipath will reset the controller
> on every unknown status, which really is an odd behaviour, seeing
> that the host already received a perfectly good status; it's just
> that it's not smart enough to understand it.
> And resetting wouldn't help at all; the error status will continue
> to be received.
> So we should rather pass up any unknown error to the generic
> routines and let them deal with this situation.

What unknown status do you see?

> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Cc: John Managhini <john.meneghini@netapp.com>
> ---
>  drivers/nvme/host/core.c      |  4 ++--
>  drivers/nvme/host/multipath.c | 18 ++++++++++--------
>  drivers/nvme/host/nvme.h      |  2 +-
>  3 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 5dc32b72e7fa..edb081781ae7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -293,8 +293,8 @@ void nvme_complete_rq(struct request *req)
>  	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>  		if ((req->cmd_flags & REQ_NVME_MPATH) &&
>  		    blk_path_error(status)) {
> -			nvme_failover_req(req);
> -			return;
> +			if (nvme_failover_req(req))
> +				return;

This reads weird, why not:

		if ((req->cmd_flags & REQ_NVME_MPATH) &&
		    blk_path_error(status) && nvme_failover_req(req))
			return;

?

But see below, with your updated checks (assuming this makes sense
as we need more explanation) we don't even need the blk_path_error
call.

> -void nvme_failover_req(struct request *req)
> +bool nvme_failover_req(struct request *req)
>  {
>  	struct nvme_ns *ns = req->q->queuedata;
>  	u16 status = nvme_req(req)->status;
>  	unsigned long flags;
> +	bool handled = false;
>  
>  	spin_lock_irqsave(&ns->head->requeue_lock, flags);
>  	blk_steal_bios(&ns->head->requeue_list, req);
>  	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> -	blk_mq_end_request(req, 0);

You can't just steal the bios without completing the request.

>  
>  	switch (status & 0x7ff) {
>  	case NVME_SC_ANA_TRANSITION:
> @@ -88,11 +88,13 @@ void nvme_failover_req(struct request *req)
>  		 * mark the the path as pending and kick of a re-read of the ANA
>  		 * log page ASAP.
>  		 */
> +		blk_mq_end_request(req, 0);
>  		nvme_mpath_clear_current_path(ns);
>  		if (ns->ctrl->ana_log_buf) {
>  			set_bit(NVME_NS_ANA_PENDING, &ns->flags);
>  			queue_work(nvme_wq, &ns->ctrl->ana_work);
>  		}
> +		handled = true;
>  		break;
>  	case NVME_SC_HOST_PATH_ERROR:
>  	case NVME_SC_HOST_ABORTED_CMD:
> @@ -100,18 +102,18 @@ void nvme_failover_req(struct request *req)
>  		 * Temporary transport disruption in talking to the controller.
>  		 * Try to send on a new path.
>  		 */
> +		blk_mq_end_request(req, 0);
>  		nvme_mpath_clear_current_path(ns);
> +		handled = true;
>  		break;
>  	default:
> -		/*
> -		 * Reset the controller for any non-ANA error as we don't know
> -		 * what caused the error.
> -		 */
> -		nvme_reset_ctrl(ns->ctrl);
> +		/* Delegate to common error handling */
>  		break;
>  	}
>  
> -	kblockd_schedule_work(&ns->head->requeue_work);
> +	if (handled)
> +		kblockd_schedule_work(&ns->head->requeue_work);
> +	return handled;

I think you can defer the blk_mq_end_request to here as well.  Then
just return false from the default case and you don't need the handled
variable.

So in the end this should be something like this:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5dc32b72e7fa..68443564b930 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -291,11 +291,8 @@ void nvme_complete_rq(struct request *req)
 		nvme_req(req)->ctrl->comp_seen = true;
 
 	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
-		if ((req->cmd_flags & REQ_NVME_MPATH) &&
-		    blk_path_error(status)) {
-			nvme_failover_req(req);
+		if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
 			return;
-		}
 
 		if (!blk_queue_dying(req->q)) {
 			nvme_retry_req(req);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 797c18337d96..7b62b14985c2 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -64,18 +64,12 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	}
 }
 
-void nvme_failover_req(struct request *req)
+bool nvme_failover_req(struct request *req)
 {
 	struct nvme_ns *ns = req->q->queuedata;
-	u16 status = nvme_req(req)->status;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ns->head->requeue_lock, flags);
-	blk_steal_bios(&ns->head->requeue_list, req);
-	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
-	blk_mq_end_request(req, 0);
-
-	switch (status & 0x7ff) {
+	switch (nvme_req(req)->status & 0x7ff) {
 	case NVME_SC_ANA_TRANSITION:
 	case NVME_SC_ANA_INACCESSIBLE:
 	case NVME_SC_ANA_PERSISTENT_LOSS:
@@ -103,15 +97,16 @@ void nvme_failover_req(struct request *req)
 		nvme_mpath_clear_current_path(ns);
 		break;
 	default:
-		/*
-		 * Reset the controller for any non-ANA error as we don't know
-		 * what caused the error.
-		 */
-		nvme_reset_ctrl(ns->ctrl);
-		break;
+		return false;
 	}
 
+	spin_lock_irqsave(&ns->head->requeue_lock, flags);
+	blk_steal_bios(&ns->head->requeue_list, req);
+	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
+	blk_mq_end_request(req, 0);
+
 	kblockd_schedule_work(&ns->head->requeue_work);
+	return true;
 }
 
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec7914c..d800b9a51c2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -550,7 +550,7 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
 void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 			struct nvme_ctrl *ctrl, int *flags);
-void nvme_failover_req(struct request *req);
+bool nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
@@ -599,8 +599,9 @@ static inline void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->head->instance);
 }
 
-static inline void nvme_failover_req(struct request *req)
+static inline bool nvme_failover_req(struct request *req)
 {
+	return false;
 }
 static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 {
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1b8014ab0b25..85b7e080c20b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1071,7 +1071,7 @@ static void dwc3_prepare_one_trb_sg(struct dwc3_ep *dep,
 		unsigned int rem = length % maxp;
 		unsigned chain = true;
 
-		if (sg_is_last(s))
+		if (req->num_queued_sgs + 1 == req->request.num_mapped_sgs)
 			chain = false;
 
 		if (rem && usb_endpoint_dir_out(dep->endpoint.desc) && !chain) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9cfc668f91f4..047c9864c3c7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -152,6 +152,35 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 	},
 };
 
+static void btrfs_scratch_superblocks(struct block_device *bdev,
+		const char *device_path)
+{
+	struct buffer_head *bh;
+	struct btrfs_super_block *disk_super;
+	int copy_num;
+
+	if (!bdev)
+		return;
+
+	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
+		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
+			continue;
+
+		disk_super = (struct btrfs_super_block *)bh->b_data;
+
+		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
+		set_buffer_dirty(bh);
+		sync_dirty_buffer(bh);
+		brelse(bh);
+	}
+
+	/* Notify udev that device has changed */
+	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
+
+	/* Update ctime/mtime for device path for libblkid */
+	update_dev_time(device_path);
+}
+
 const char *btrfs_bg_type_to_raid_name(u64 flags)
 {
 	const int index = btrfs_bg_flags_to_raid_index(flags);
@@ -7317,36 +7346,6 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
-void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
-{
-	struct buffer_head *bh;
-	struct btrfs_super_block *disk_super;
-	int copy_num;
-
-	if (!bdev)
-		return;
-
-	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
-		copy_num++) {
-
-		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
-			continue;
-
-		disk_super = (struct btrfs_super_block *)bh->b_data;
-
-		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
-		set_buffer_dirty(bh);
-		sync_dirty_buffer(bh);
-		brelse(bh);
-	}
-
-	/* Notify udev that device has changed */
-	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
-
-	/* Update ctime/mtime for device path for libblkid */
-	update_dev_time(device_path);
-}
-
 /*
  * Update the size and bytes used for each device where it changed.  This is
  * delayed since we would otherwise get errors while writing out the
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 409f4816fb89..9394cfde7ace 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -473,7 +473,6 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle *trans);
 void btrfs_rm_dev_replace_remove_srcdev(struct btrfs_device *srcdev);
 void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev);
 void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev);
-void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path);
 int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info,
 			   u64 logical, u64 len);
 unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-12 17:53 ` Christoph Hellwig
@ 2020-02-12 19:33   ` Sagi Grimberg
  2020-02-13  7:02     ` Hannes Reinecke
  2020-02-13  6:53   ` Hannes Reinecke
  1 sibling, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2020-02-12 19:33 UTC (permalink / raw)
  To: Christoph Hellwig, Hannes Reinecke
  Cc: Keith Busch, linux-nvme, John Managhini



On 2/12/20 9:53 AM, Christoph Hellwig wrote:
> On Wed, Feb 12, 2020 at 02:41:40PM +0100, Hannes Reinecke wrote:
>> We're seeing occasional controller resets during straight I/O,
>> but only when multipath is active.
>> The problem here is the nvme-multipath will reset the controller
>> on every unknown status, which really is an odd behaviour, seeing
>> that the host already received a perfectly good status; it's just
>> that it's not smart enough to understand it.
>> And resetting wouldn't help at all; the error status will continue
>> to be received.
>> So we should rather pass up any unknown error to the generic
>> routines and let them deal with this situation.
> 
> What unknown status do you see?

My question exactly, I don't understand what is the good status that
the host is not smart enough to understand?

I have a vague recollection that John sent some questions in that area
in the past...

> 
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> Cc: John Managhini <john.meneghini@netapp.com>
>> ---
>>   drivers/nvme/host/core.c      |  4 ++--
>>   drivers/nvme/host/multipath.c | 18 ++++++++++--------
>>   drivers/nvme/host/nvme.h      |  2 +-
>>   3 files changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 5dc32b72e7fa..edb081781ae7 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -293,8 +293,8 @@ void nvme_complete_rq(struct request *req)
>>   	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>>   		if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>   		    blk_path_error(status)) {
>> -			nvme_failover_req(req);
>> -			return;
>> +			if (nvme_failover_req(req))
>> +				return;
> 
> This reads weird, why not:
> 
> 		if ((req->cmd_flags & REQ_NVME_MPATH) &&
> 		    blk_path_error(status) && nvme_failover_req(req))
> 			return;
> 
> ?
> 
> But see below, with your updated checks (assuming this makes sense
> as we need more explanation) we don't even need the blk_path_error
> call.

I think it reads better that we call nvme_failover_req only for actual
path errors.

> 
>> -void nvme_failover_req(struct request *req)
>> +bool nvme_failover_req(struct request *req)
>>   {
>>   	struct nvme_ns *ns = req->q->queuedata;
>>   	u16 status = nvme_req(req)->status;
>>   	unsigned long flags;
>> +	bool handled = false;
>>   
>>   	spin_lock_irqsave(&ns->head->requeue_lock, flags);
>>   	blk_steal_bios(&ns->head->requeue_list, req);
>>   	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
>> -	blk_mq_end_request(req, 0);
> 
> You can't just steal the bios without completing the request.
> 
>>   
>>   	switch (status & 0x7ff) {
>>   	case NVME_SC_ANA_TRANSITION:
>> @@ -88,11 +88,13 @@ void nvme_failover_req(struct request *req)
>>   		 * mark the the path as pending and kick of a re-read of the ANA
>>   		 * log page ASAP.
>>   		 */
>> +		blk_mq_end_request(req, 0);
>>   		nvme_mpath_clear_current_path(ns);
>>   		if (ns->ctrl->ana_log_buf) {
>>   			set_bit(NVME_NS_ANA_PENDING, &ns->flags);
>>   			queue_work(nvme_wq, &ns->ctrl->ana_work);
>>   		}
>> +		handled = true;
>>   		break;
>>   	case NVME_SC_HOST_PATH_ERROR:
>>   	case NVME_SC_HOST_ABORTED_CMD:
>> @@ -100,18 +102,18 @@ void nvme_failover_req(struct request *req)
>>   		 * Temporary transport disruption in talking to the controller.
>>   		 * Try to send on a new path.
>>   		 */
>> +		blk_mq_end_request(req, 0);
>>   		nvme_mpath_clear_current_path(ns);
>> +		handled = true;
>>   		break;
>>   	default:
>> -		/*
>> -		 * Reset the controller for any non-ANA error as we don't know
>> -		 * what caused the error.
>> -		 */
>> -		nvme_reset_ctrl(ns->ctrl);
>> +		/* Delegate to common error handling */
>>   		break;

What will happen in the common case? right now it will just retry
it on the same path, is that the desired behavior?

I guess we need to understand the phenomenon better.

Right now the code says, we don't know what went wrong here, so we
are going to reset the controller because it acts weird, which can
make some sense, given that the host doesn't understand what is going
on...

>>   	}
>>   
>> -	kblockd_schedule_work(&ns->head->requeue_work);
>> +	if (handled)
>> +		kblockd_schedule_work(&ns->head->requeue_work);
>> +	return handled;
> 
> I think you can defer the blk_mq_end_request to here as well.  Then
> just return false from the default case and you don't need the handled
> variable.
> 
> So in the end this should be something like this:
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 5dc32b72e7fa..68443564b930 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -291,11 +291,8 @@ void nvme_complete_rq(struct request *req)
>   		nvme_req(req)->ctrl->comp_seen = true;
>   
>   	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
> -		if ((req->cmd_flags & REQ_NVME_MPATH) &&
> -		    blk_path_error(status)) {
> -			nvme_failover_req(req);
> +		if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
>   			return;
> -		}
>   
>   		if (!blk_queue_dying(req->q)) {
>   			nvme_retry_req(req);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 797c18337d96..7b62b14985c2 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -64,18 +64,12 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
>   	}
>   }
>   
> -void nvme_failover_req(struct request *req)
> +bool nvme_failover_req(struct request *req)
>   {
>   	struct nvme_ns *ns = req->q->queuedata;
> -	u16 status = nvme_req(req)->status;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> -	blk_steal_bios(&ns->head->requeue_list, req);
> -	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> -	blk_mq_end_request(req, 0);
> -
> -	switch (status & 0x7ff) {
> +	switch (nvme_req(req)->status & 0x7ff) {
>   	case NVME_SC_ANA_TRANSITION:
>   	case NVME_SC_ANA_INACCESSIBLE:
>   	case NVME_SC_ANA_PERSISTENT_LOSS:
> @@ -103,15 +97,16 @@ void nvme_failover_req(struct request *req)
>   		nvme_mpath_clear_current_path(ns);
>   		break;
>   	default:
> -		/*
> -		 * Reset the controller for any non-ANA error as we don't know
> -		 * what caused the error.
> -		 */
> -		nvme_reset_ctrl(ns->ctrl);
> -		break;
> +		return false;
>   	}
>   
> +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> +	blk_steal_bios(&ns->head->requeue_list, req);
> +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> +	blk_mq_end_request(req, 0);
> +
>   	kblockd_schedule_work(&ns->head->requeue_work);
> +	return true;
>   }
>   
>   void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 1024fec7914c..d800b9a51c2c 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -550,7 +550,7 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
>   void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
>   void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
>   			struct nvme_ctrl *ctrl, int *flags);
> -void nvme_failover_req(struct request *req);
> +bool nvme_failover_req(struct request *req);
>   void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
>   int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
>   void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
> @@ -599,8 +599,9 @@ static inline void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
>   	sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->head->instance);
>   }
>   
> -static inline void nvme_failover_req(struct request *req)
> +static inline bool nvme_failover_req(struct request *req)
>   {
> +	return false;
>   }
>   static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
>   {

I'm assuming the below doesn't belong..

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 1b8014ab0b25..85b7e080c20b 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1071,7 +1071,7 @@ static void dwc3_prepare_one_trb_sg(struct dwc3_ep *dep,
>   		unsigned int rem = length % maxp;
>   		unsigned chain = true;
>   
> -		if (sg_is_last(s))
> +		if (req->num_queued_sgs + 1 == req->request.num_mapped_sgs)
>   			chain = false;
>   
>   		if (rem && usb_endpoint_dir_out(dep->endpoint.desc) && !chain) {
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9cfc668f91f4..047c9864c3c7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -152,6 +152,35 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>   	},
>   };
>   
> +static void btrfs_scratch_superblocks(struct block_device *bdev,
> +		const char *device_path)
> +{
> +	struct buffer_head *bh;
> +	struct btrfs_super_block *disk_super;
> +	int copy_num;
> +
> +	if (!bdev)
> +		return;
> +
> +	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
> +		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
> +			continue;
> +
> +		disk_super = (struct btrfs_super_block *)bh->b_data;
> +
> +		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
> +		set_buffer_dirty(bh);
> +		sync_dirty_buffer(bh);
> +		brelse(bh);
> +	}
> +
> +	/* Notify udev that device has changed */
> +	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
> +
> +	/* Update ctime/mtime for device path for libblkid */
> +	update_dev_time(device_path);
> +}
> +
>   const char *btrfs_bg_type_to_raid_name(u64 flags)
>   {
>   	const int index = btrfs_bg_flags_to_raid_index(flags);
> @@ -7317,36 +7346,6 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
>   	return 0;
>   }
>   
> -void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
> -{
> -	struct buffer_head *bh;
> -	struct btrfs_super_block *disk_super;
> -	int copy_num;
> -
> -	if (!bdev)
> -		return;
> -
> -	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
> -		copy_num++) {
> -
> -		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
> -			continue;
> -
> -		disk_super = (struct btrfs_super_block *)bh->b_data;
> -
> -		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
> -		set_buffer_dirty(bh);
> -		sync_dirty_buffer(bh);
> -		brelse(bh);
> -	}
> -
> -	/* Notify udev that device has changed */
> -	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
> -
> -	/* Update ctime/mtime for device path for libblkid */
> -	update_dev_time(device_path);
> -}
> -
>   /*
>    * Update the size and bytes used for each device where it changed.  This is
>    * delayed since we would otherwise get errors while writing out the
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 409f4816fb89..9394cfde7ace 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -473,7 +473,6 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle *trans);
>   void btrfs_rm_dev_replace_remove_srcdev(struct btrfs_device *srcdev);
>   void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev);
>   void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev);
> -void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path);
>   int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info,
>   			   u64 logical, u64 len);
>   unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
> 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-12 17:53 ` Christoph Hellwig
  2020-02-12 19:33   ` Sagi Grimberg
@ 2020-02-13  6:53   ` Hannes Reinecke
  1 sibling, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2020-02-13  6:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, John Managhini

On 2/12/20 6:53 PM, Christoph Hellwig wrote:
> On Wed, Feb 12, 2020 at 02:41:40PM +0100, Hannes Reinecke wrote:
>> We're seeing occasional controller resets during straight I/O,
>> but only when multipath is active.
>> The problem here is the nvme-multipath will reset the controller
>> on every unknown status, which really is an odd behaviour, seeing
>> that the host already received a perfectly good status; it's just
>> that it's not smart enough to understand it.
>> And resetting wouldn't help at all; the error status will continue
>> to be received.
>> So we should rather pass up any unknown error to the generic
>> routines and let them deal with this situation.
> 
> What unknown status do you see?
> 
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> Cc: John Managhini <john.meneghini@netapp.com>
>> ---
>>  drivers/nvme/host/core.c      |  4 ++--
>>  drivers/nvme/host/multipath.c | 18 ++++++++++--------
>>  drivers/nvme/host/nvme.h      |  2 +-
>>  3 files changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 5dc32b72e7fa..edb081781ae7 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -293,8 +293,8 @@ void nvme_complete_rq(struct request *req)
>>  	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>>  		if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>  		    blk_path_error(status)) {
>> -			nvme_failover_req(req);
>> -			return;
>> +			if (nvme_failover_req(req))
>> +				return;
> 
> This reads weird, why not:
> 
> 		if ((req->cmd_flags & REQ_NVME_MPATH) &&
> 		    blk_path_error(status) && nvme_failover_req(req))
> 			return;
> 
> ?
> 
> But see below, with your updated checks (assuming this makes sense
> as we need more explanation) we don't even need the blk_path_error
> call.
> 
>> -void nvme_failover_req(struct request *req)
>> +bool nvme_failover_req(struct request *req)
>>  {
>>  	struct nvme_ns *ns = req->q->queuedata;
>>  	u16 status = nvme_req(req)->status;
>>  	unsigned long flags;
>> +	bool handled = false;
>>  
>>  	spin_lock_irqsave(&ns->head->requeue_lock, flags);
>>  	blk_steal_bios(&ns->head->requeue_list, req);
>>  	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
>> -	blk_mq_end_request(req, 0);
> 
> You can't just steal the bios without completing the request.
> 
Ah. True.

>>  
>>  	switch (status & 0x7ff) {
>>  	case NVME_SC_ANA_TRANSITION:
>> @@ -88,11 +88,13 @@ void nvme_failover_req(struct request *req)
>>  		 * mark the the path as pending and kick of a re-read of the ANA
>>  		 * log page ASAP.
>>  		 */
>> +		blk_mq_end_request(req, 0);
>>  		nvme_mpath_clear_current_path(ns);
>>  		if (ns->ctrl->ana_log_buf) {
>>  			set_bit(NVME_NS_ANA_PENDING, &ns->flags);
>>  			queue_work(nvme_wq, &ns->ctrl->ana_work);
>>  		}
>> +		handled = true;
>>  		break;
>>  	case NVME_SC_HOST_PATH_ERROR:
>>  	case NVME_SC_HOST_ABORTED_CMD:
>> @@ -100,18 +102,18 @@ void nvme_failover_req(struct request *req)
>>  		 * Temporary transport disruption in talking to the controller.
>>  		 * Try to send on a new path.
>>  		 */
>> +		blk_mq_end_request(req, 0);
>>  		nvme_mpath_clear_current_path(ns);
>> +		handled = true;
>>  		break;
>>  	default:
>> -		/*
>> -		 * Reset the controller for any non-ANA error as we don't know
>> -		 * what caused the error.
>> -		 */
>> -		nvme_reset_ctrl(ns->ctrl);
>> +		/* Delegate to common error handling */
>>  		break;
>>  	}
>>  
>> -	kblockd_schedule_work(&ns->head->requeue_work);
>> +	if (handled)
>> +		kblockd_schedule_work(&ns->head->requeue_work);
>> +	return handled;
> 
> I think you can defer the blk_mq_end_request to here as well.  Then
> just return false from the default case and you don't need the handled
> variable.
> 
> So in the end this should be something like this:
> 
[ .. ]
Maybe without the btrfs bits :-)

I'll give it a spin.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-12 19:33   ` Sagi Grimberg
@ 2020-02-13  7:02     ` Hannes Reinecke
  2020-02-13  7:19       ` Sagi Grimberg
  2020-02-13 17:02       ` Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: Hannes Reinecke @ 2020-02-13  7:02 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, linux-nvme, John Managhini

On 2/12/20 8:33 PM, Sagi Grimberg wrote:
> 
> 
> On 2/12/20 9:53 AM, Christoph Hellwig wrote:
>> On Wed, Feb 12, 2020 at 02:41:40PM +0100, Hannes Reinecke wrote:
>>> We're seeing occasional controller resets during straight I/O,
>>> but only when multipath is active.
>>> The problem here is the nvme-multipath will reset the controller
>>> on every unknown status, which really is an odd behaviour, seeing
>>> that the host already received a perfectly good status; it's just
>>> that it's not smart enough to understand it.
>>> And resetting wouldn't help at all; the error status will continue
>>> to be received.
>>> So we should rather pass up any unknown error to the generic
>>> routines and let them deal with this situation.
>>
>> What unknown status do you see?
> 
> My question exactly, I don't understand what is the good status that
> the host is not smart enough to understand?
> 
> I have a vague recollection that John sent some questions in that area
> in the past...
> 
>>
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> Cc: John Managhini <john.meneghini@netapp.com>
>>> ---
>>>   drivers/nvme/host/core.c      |  4 ++--
>>>   drivers/nvme/host/multipath.c | 18 ++++++++++--------
>>>   drivers/nvme/host/nvme.h      |  2 +-
>>>   3 files changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 5dc32b72e7fa..edb081781ae7 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -293,8 +293,8 @@ void nvme_complete_rq(struct request *req)
>>>       if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>>>           if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>>               blk_path_error(status)) {
>>> -            nvme_failover_req(req);
>>> -            return;
>>> +            if (nvme_failover_req(req))
>>> +                return;
>>
>> This reads weird, why not:
>>
>>         if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>             blk_path_error(status) && nvme_failover_req(req))
>>             return;
>>
>> ?
>>
>> But see below, with your updated checks (assuming this makes sense
>> as we need more explanation) we don't even need the blk_path_error
>> call.
> 
> I think it reads better that we call nvme_failover_req only for actual
> path errors.
> 
>>
>>> -void nvme_failover_req(struct request *req)
>>> +bool nvme_failover_req(struct request *req)
>>>   {
>>>       struct nvme_ns *ns = req->q->queuedata;
>>>       u16 status = nvme_req(req)->status;
>>>       unsigned long flags;
>>> +    bool handled = false;
>>>         spin_lock_irqsave(&ns->head->requeue_lock, flags);
>>>       blk_steal_bios(&ns->head->requeue_list, req);
>>>       spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
>>> -    blk_mq_end_request(req, 0);
>>
>> You can't just steal the bios without completing the request.
>>
>>>         switch (status & 0x7ff) {
>>>       case NVME_SC_ANA_TRANSITION:
>>> @@ -88,11 +88,13 @@ void nvme_failover_req(struct request *req)
>>>            * mark the the path as pending and kick of a re-read of
>>> the ANA
>>>            * log page ASAP.
>>>            */
>>> +        blk_mq_end_request(req, 0);
>>>           nvme_mpath_clear_current_path(ns);
>>>           if (ns->ctrl->ana_log_buf) {
>>>               set_bit(NVME_NS_ANA_PENDING, &ns->flags);
>>>               queue_work(nvme_wq, &ns->ctrl->ana_work);
>>>           }
>>> +        handled = true;
>>>           break;
>>>       case NVME_SC_HOST_PATH_ERROR:
>>>       case NVME_SC_HOST_ABORTED_CMD:
>>> @@ -100,18 +102,18 @@ void nvme_failover_req(struct request *req)
>>>            * Temporary transport disruption in talking to the
>>> controller.
>>>            * Try to send on a new path.
>>>            */
>>> +        blk_mq_end_request(req, 0);
>>>           nvme_mpath_clear_current_path(ns);
>>> +        handled = true;
>>>           break;
>>>       default:
>>> -        /*
>>> -         * Reset the controller for any non-ANA error as we don't know
>>> -         * what caused the error.
>>> -         */
>>> -        nvme_reset_ctrl(ns->ctrl);
>>> +        /* Delegate to common error handling */
>>>           break;
> 
> What will happen in the common case? right now it will just retry
> it on the same path, is that the desired behavior?
> 
> I guess we need to understand the phenomenon better.
> 
> Right now the code says, we don't know what went wrong here, so we
> are going to reset the controller because it acts weird, which can
> make some sense, given that the host doesn't understand what is going
> on...
> 
But this is precisely the case I'm arguing against here.
One of the lessons learned from SCSI is that reset only makes sense if
the system misbehaves and resetting it would make this error go away.

Receiving a status code which we don't know about does _not_ fall into
this category; the very fact that we have a status code proves that the
system does _not_ misbehave.

So what exactly will be resolved by resetting?
There actually is a fair chance that we'll be getting the very same
error again...

Consequently I think that resetting the system here is wrong, and we
should just handle it as a normal but unknown error.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-13  7:02     ` Hannes Reinecke
@ 2020-02-13  7:19       ` Sagi Grimberg
  2020-02-13 17:02       ` Keith Busch
  1 sibling, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2020-02-13  7:19 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, John Managhini


>> What will happen in the common case? right now it will just retry
>> it on the same path, is that the desired behavior?
>>
>> I guess we need to understand the phenomenon better.
>>
>> Right now the code says, we don't know what went wrong here, so we
>> are going to reset the controller because it acts weird, which can
>> make some sense, given that the host doesn't understand what is going
>> on...
>>
> But this is precisely the case I'm arguing against here.
> One of the lessons learned from SCSI is that reset only makes sense if
> the system misbehaves and resetting it would make this error go away.
> 
> Receiving a status code which we don't know about does _not_ fall into
> this category; the very fact that we have a status code proves that the
> system does _not_ misbehave.

Fair enough...

> So what exactly will be resolved by resetting?
> There actually is a fair chance that we'll be getting the very same
> error again...

Or not, we don't know.. That is exactly the point.. But I agree with
you that resetting the controller may not the best action to take here.

But, you didn't answer my question, what is the expected behavior here?
right now with your patch is to blindly retry on the same path, is that
desired? is that always desired? Please share more details on the issue.

> Consequently I think that resetting the system here is wrong, and we
> should just handle it as a normal but unknown error.

What is this unknown error? Was it what John complained about a few
months back that the array returned a status code that translated
to BLK_STS_IOERR which is considered a path error and the host reset
the controller?

I think I'm starting to remember that this sort of approach was
suggested... I forget the details though...

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-13  7:02     ` Hannes Reinecke
  2020-02-13  7:19       ` Sagi Grimberg
@ 2020-02-13 17:02       ` Keith Busch
  2020-02-14 14:22         ` Meneghini, John
  1 sibling, 1 reply; 9+ messages in thread
From: Keith Busch @ 2020-02-13 17:02 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Keith Busch, John Managhini, Sagi Grimberg, linux-nvme,
	Christoph Hellwig

On Thu, Feb 13, 2020 at 08:02:20AM +0100, Hannes Reinecke wrote:
> But this is precisely the case I'm arguing against here.
> One of the lessons learned from SCSI is that reset only makes sense if
> the system misbehaves and resetting it would make this error go away.
> 
> Receiving a status code which we don't know about does _not_ fall into
> this category; the very fact that we have a status code proves that the
> system does _not_ misbehave.
>
> So what exactly will be resolved by resetting?
> There actually is a fair chance that we'll be getting the very same
> error again...

I agree, the types of issues a reset may resolve don't seem applicable
if we're actually getting response. Is there even a single defined NVMe
status code where a reset would be an appropriate escalation?
 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-13 17:02       ` Keith Busch
@ 2020-02-14 14:22         ` Meneghini, John
  2020-02-19 15:03           ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Meneghini, John @ 2020-02-14 14:22 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: Keith Busch, Meneghini, John, Sagi Grimberg, linux-nvme,
	Christoph Hellwig

This is exactly the problem I discussed in the last patch I sent out (maybe my only patch) to fix ACRE.

Kieth and I also "discussed" this topic at ALPSS.

At issue here is: in the current code, as soon as (req->cmd_flags & REQ_NVME_MPATH) is set the error handling in
nvme_complete_rq() completely changes. This means that a single port subsystem will receive one kind of error
handling treatment when it presents one subsystem port, and then a completely different treatment when
presenting more than one subsystem port (assuming namespace sharing).

I think the error handling on the host should be exactly identical between (REQ_NVME_MPATH)  and (~REQ_NVME_MPATH)
- modulo the path-related NVMe errors. 

        /*
         * Path-related Errors: 
         */
        NVME_SC_ANA_PERSISTENT_LOSS     = 0x301,
        NVME_SC_ANA_INACCESSIBLE        = 0x302,
        NVME_SC_ANA_TRANSITION          = 0x303,
        NVME_SC_HOST_PATH_ERROR         = 0x370,
        NVME_SC_HOST_ABORTED_CMD        = 0x371,

We can't have one error handling policy for namespace sharing and another for non-shared namespaces.

Resetting the controller like this only leads to a fatal embrace.

> I agree, the types of issues a reset may resolve don't seem applicable
> if we're actually getting response. Is there even a single defined NVMe
> status code where a reset would be an appropriate escalation?

I think, in all cases, errors that are not handled by nvme_retry_req(req) return
BLK_STS_IOERR.  This is what happens when REQ_NVME_MPATH is false, so I think
multipath controllers should get the same treatment.

/John

On 2/13/20, 12:03 PM, "Keith Busch" <kbusch@kernel.org> wrote:    
    
    On Thu, Feb 13, 2020 at 08:02:20AM +0100, Hannes Reinecke wrote:
    > But this is precisely the case I'm arguing against here.
    > One of the lessons learned from SCSI is that reset only makes sense if
    > the system misbehaves and resetting it would make this error go away.
    >
    > Receiving a status code which we don't know about does _not_ fall into
    > this category; the very fact that we have a status code proves that the
    > system does _not_ misbehave.
    >
    > So what exactly will be resolved by resetting?
    > There actually is a fair chance that we'll be getting the very same
    > error again...
    
    I agree, the types of issues a reset may resolve don't seem applicable
    if we're actually getting response. Is there even a single defined NVMe
    status code where a reset would be an appropriate escalation?
    
    

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset controller on unknown status
  2020-02-14 14:22         ` Meneghini, John
@ 2020-02-19 15:03           ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2020-02-19 15:03 UTC (permalink / raw)
  To: Meneghini, John
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Hannes Reinecke,
	Keith Busch, Christoph Hellwig

On Fri, Feb 14, 2020 at 02:22:12PM +0000, Meneghini, John wrote:
> This is exactly the problem I discussed in the last patch I sent out (maybe my only patch) to fix ACRE.

And said problem needs to be documented in the patch commit log.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-02-19 15:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 13:41 [PATCH] nvme-multipath: do not reset controller on unknown status Hannes Reinecke
2020-02-12 17:53 ` Christoph Hellwig
2020-02-12 19:33   ` Sagi Grimberg
2020-02-13  7:02     ` Hannes Reinecke
2020-02-13  7:19       ` Sagi Grimberg
2020-02-13 17:02       ` Keith Busch
2020-02-14 14:22         ` Meneghini, John
2020-02-19 15:03           ` Christoph Hellwig
2020-02-13  6:53   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).