linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Reduce ATA disk resume time
@ 2022-06-28 22:21 Bart Van Assche
  2022-06-28 22:21 ` [PATCH 1/3] scsi: core: Move the definition of SCSI_QUEUE_DELAY Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Bart Van Assche @ 2022-06-28 22:21 UTC (permalink / raw)
  To: Martin K . Petersen; +Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche

Hi Martin,

Recently it was reported that patch "scsi: core: pm: Rely on the device driver
core for async power management" causes resuming to take longer if an ATA disk
is present. This patch series fixes that regression. Please consider this
patch series for kernel v5.20.

Thanks,

Bart.

Bart Van Assche (3):
  scsi: core: Move the definition of SCSI_QUEUE_DELAY
  scsi: core: Retry after a delay if the device is becoming ready
  scsi: sd: Rework asynchronous resume support

 drivers/scsi/scsi_error.c |  4 +-
 drivers/scsi/scsi_lib.c   | 14 +++----
 drivers/scsi/sd.c         | 79 ++++++++++++++++++++++++++++++---------
 drivers/scsi/sd.h         |  5 +++
 4 files changed, 75 insertions(+), 27 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] scsi: core: Move the definition of SCSI_QUEUE_DELAY
  2022-06-28 22:21 [PATCH 0/3] Reduce ATA disk resume time Bart Van Assche
@ 2022-06-28 22:21 ` Bart Van Assche
  2022-06-28 22:21 ` [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready Bart Van Assche
  2022-06-28 22:21 ` [PATCH 3/3] scsi: sd: Rework asynchronous resume support Bart Van Assche
  2 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2022-06-28 22:21 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche, Ming Lei,
	Hannes Reinecke, John Garry

Move the definition of SCSI_QUEUE_DELAY to just above the function that
uses it.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6ffc9e4258a8..2aca0a838ca5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -75,13 +75,6 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
 	return ret;
 }
 
-/*
- * When to reinvoke queueing after a resource shortage. It's 3 msecs to
- * not change behaviour from the previous unplug mechanism, experimentation
- * may prove this needs changing.
- */
-#define SCSI_QUEUE_DELAY	3
-
 static void
 scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 {
@@ -1648,6 +1641,13 @@ static void scsi_mq_put_budget(struct request_queue *q, int budget_token)
 	sbitmap_put(&sdev->budget_map, budget_token);
 }
 
+/*
+ * When to reinvoke queueing after a resource shortage. It's 3 msecs to
+ * not change behaviour from the previous unplug mechanism, experimentation
+ * may prove this needs changing.
+ */
+#define SCSI_QUEUE_DELAY 3
+
 static int scsi_mq_get_budget(struct request_queue *q)
 {
 	struct scsi_device *sdev = q->queuedata;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready
  2022-06-28 22:21 [PATCH 0/3] Reduce ATA disk resume time Bart Van Assche
  2022-06-28 22:21 ` [PATCH 1/3] scsi: core: Move the definition of SCSI_QUEUE_DELAY Bart Van Assche
@ 2022-06-28 22:21 ` Bart Van Assche
  2022-06-29  1:21   ` Ming Lei
  2022-06-28 22:21 ` [PATCH 3/3] scsi: sd: Rework asynchronous resume support Bart Van Assche
  2 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2022-06-28 22:21 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche, Ming Lei,
	Hannes Reinecke, John Garry

If a logical unit reports that it is becoming ready, retry the command
after a delay instead of retrying immediately.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 49ef864df581..fb7e363c4c00 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -625,10 +625,10 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 			return NEEDS_RETRY;
 		/*
 		 * if the device is in the process of becoming ready, we
-		 * should retry.
+		 * should retry after a delay.
 		 */
 		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
-			return NEEDS_RETRY;
+			return ADD_TO_MLQUEUE;
 		/*
 		 * if the device is not started, we need to wake
 		 * the error handler to start the motor

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] scsi: sd: Rework asynchronous resume support
  2022-06-28 22:21 [PATCH 0/3] Reduce ATA disk resume time Bart Van Assche
  2022-06-28 22:21 ` [PATCH 1/3] scsi: core: Move the definition of SCSI_QUEUE_DELAY Bart Van Assche
  2022-06-28 22:21 ` [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready Bart Van Assche
@ 2022-06-28 22:21 ` Bart Van Assche
  2022-06-29  6:02   ` Hannes Reinecke
  2022-06-30 16:23   ` John Garry
  2 siblings, 2 replies; 11+ messages in thread
From: Bart Van Assche @ 2022-06-28 22:21 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche, Ming Lei,
	Hannes Reinecke, John Garry, ericspero, jason600.groome

For some technologies, e.g. an ATA bus, resuming can take multiple
seconds. Waiting for resume to finish can cause a very noticeable delay.
Hence this patch that restores the behavior from before patch "scsi:
core: pm: Rely on the device driver core for async power management" for
most SCSI devices.

This patch introduces a behavior change: if the START command fails, do
not consider this as a SCSI disk resume failure.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: ericspero@icloud.com
Cc: jason600.groome@gmail.com
Tested-by: jason600.groome@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 79 ++++++++++++++++++++++++++++++++++++-----------
 drivers/scsi/sd.h |  5 +++
 2 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 895b56c8f25e..06888b675e71 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -103,6 +103,7 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
 static void sd_config_write_same(struct scsi_disk *);
 static int  sd_revalidate_disk(struct gendisk *);
 static void sd_unlock_native_capacity(struct gendisk *disk);
+static void sd_start_done_work(struct work_struct *work);
 static int  sd_probe(struct device *);
 static int  sd_remove(struct device *);
 static void sd_shutdown(struct device *);
@@ -3463,6 +3464,7 @@ static int sd_probe(struct device *dev)
 	sdkp->max_retries = SD_MAX_RETRIES;
 	atomic_set(&sdkp->openers, 0);
 	atomic_set(&sdkp->device->ioerr_cnt, 0);
+	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
 
 	if (!sdp->request_queue->rq_timeout) {
 		if (sdp->type != TYPE_MOD)
@@ -3585,12 +3587,64 @@ static void scsi_disk_release(struct device *dev)
 	kfree(sdkp);
 }
 
+/* Process sense data after a START command finished. */
+static void sd_start_done_work(struct work_struct *work)
+{
+	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
+					      start_done_work);
+	struct scsi_sense_hdr sshdr;
+	int res = sdkp->start_result;
+
+	if (res == 0)
+		return;
+
+	sd_print_result(sdkp, "Start/Stop Unit failed", res);
+	if (res > 0 && scsi_normalize_sense(sdkp->start_sense_buffer,
+					    sdkp->start_sense_len, &sshdr))
+		sd_print_sense_hdr(sdkp, &sshdr);
+}
+
+/* A START command finished. May be called from interrupt context. */
+static void sd_start_done(struct request *req, blk_status_t status)
+{
+	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
+	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
+
+	sdkp->start_result = scmd->result;
+	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
+	sdkp->start_sense_len = scmd->sense_len;
+	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, scmd->sense_len);
+	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
+}
+
+/* Submit a START command asynchronously. */
+static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
+{
+	struct scsi_device *sdev = sdkp->device;
+	struct request_queue *q = sdev->request_queue;
+	struct request *req;
+	struct scsi_cmnd *scmd;
+
+	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	scmd = blk_mq_rq_to_pdu(req);
+	scmd->cmd_len = cmd_len;
+	memcpy(scmd->cmnd, cmd, cmd_len);
+	scmd->allowed = sdkp->max_retries;
+	req->timeout = SD_TIMEOUT;
+	req->rq_flags |= RQF_PM | RQF_QUIET;
+	req->end_io = sd_start_done;
+	blk_execute_rq_nowait(req, /*at_head=*/true);
+
+	return 0;
+}
+
 static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 {
 	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
-	struct scsi_sense_hdr sshdr;
 	struct scsi_device *sdp = sdkp->device;
-	int res;
 
 	if (start)
 		cmd[4] |= 1;	/* START */
@@ -3601,23 +3655,10 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 	if (!scsi_device_online(sdp))
 		return -ENODEV;
 
-	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
-			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
-	if (res) {
-		sd_print_result(sdkp, "Start/Stop Unit failed", res);
-		if (res > 0 && scsi_sense_valid(&sshdr)) {
-			sd_print_sense_hdr(sdkp, &sshdr);
-			/* 0x3a is medium not present */
-			if (sshdr.asc == 0x3a)
-				res = 0;
-		}
-	}
+	/* Wait until processing of sense data has finished. */
+	flush_work(&sdkp->start_done_work);
 
-	/* SCSI error codes must not go to the generic layer */
-	if (res)
-		return -EIO;
-
-	return 0;
+	return sd_submit_start(sdkp, cmd, sizeof(cmd));
 }
 
 /*
@@ -3644,6 +3685,8 @@ static void sd_shutdown(struct device *dev)
 		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
 		sd_start_stop_device(sdkp, 0);
 	}
+
+	flush_work(&sdkp->start_done_work);
 }
 
 static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5eea762f84d1..b89187761d61 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -150,6 +150,11 @@ struct scsi_disk {
 	unsigned	urswrz : 1;
 	unsigned	security : 1;
 	unsigned	ignore_medium_access_errors : 1;
+
+	int		start_result;
+	u32		start_sense_len;
+	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
+	struct work_struct start_done_work;
 };
 #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready
  2022-06-28 22:21 ` [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready Bart Van Assche
@ 2022-06-29  1:21   ` Ming Lei
  2022-06-29 22:06     ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2022-06-29  1:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, Hannes Reinecke,
	John Garry

On Tue, Jun 28, 2022 at 03:21:30PM -0700, Bart Van Assche wrote:
> If a logical unit reports that it is becoming ready, retry the command
> after a delay instead of retrying immediately.
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: John Garry <john.garry@huawei.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 49ef864df581..fb7e363c4c00 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -625,10 +625,10 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  			return NEEDS_RETRY;
>  		/*
>  		 * if the device is in the process of becoming ready, we
> -		 * should retry.
> +		 * should retry after a delay.
>  		 */
>  		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
> -			return NEEDS_RETRY;
> +			return ADD_TO_MLQUEUE;

The above code & commit log just said changing to retry after a delay, but not
explains why, care to document reason why the delay is useful?

Thanks
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] scsi: sd: Rework asynchronous resume support
  2022-06-28 22:21 ` [PATCH 3/3] scsi: sd: Rework asynchronous resume support Bart Van Assche
@ 2022-06-29  6:02   ` Hannes Reinecke
  2022-06-30 16:09     ` Bart Van Assche
  2022-06-30 16:23   ` John Garry
  1 sibling, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2022-06-29  6:02 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Ming Lei, John Garry, ericspero,
	jason600.groome

On 6/29/22 00:21, Bart Van Assche wrote:
> For some technologies, e.g. an ATA bus, resuming can take multiple
> seconds. Waiting for resume to finish can cause a very noticeable delay.
> Hence this patch that restores the behavior from before patch "scsi:
> core: pm: Rely on the device driver core for async power management" for
> most SCSI devices.
> 
> This patch introduces a behavior change: if the START command fails, do
> not consider this as a SCSI disk resume failure.
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: John Garry <john.garry@huawei.com>
> Cc: ericspero@icloud.com
> Cc: jason600.groome@gmail.com
> Tested-by: jason600.groome@gmail.com
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/scsi/sd.c | 79 ++++++++++++++++++++++++++++++++++++-----------
>   drivers/scsi/sd.h |  5 +++
>   2 files changed, 66 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 895b56c8f25e..06888b675e71 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -103,6 +103,7 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
>   static void sd_config_write_same(struct scsi_disk *);
>   static int  sd_revalidate_disk(struct gendisk *);
>   static void sd_unlock_native_capacity(struct gendisk *disk);
> +static void sd_start_done_work(struct work_struct *work);
>   static int  sd_probe(struct device *);
>   static int  sd_remove(struct device *);
>   static void sd_shutdown(struct device *);
> @@ -3463,6 +3464,7 @@ static int sd_probe(struct device *dev)
>   	sdkp->max_retries = SD_MAX_RETRIES;
>   	atomic_set(&sdkp->openers, 0);
>   	atomic_set(&sdkp->device->ioerr_cnt, 0);
> +	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>   
>   	if (!sdp->request_queue->rq_timeout) {
>   		if (sdp->type != TYPE_MOD)
> @@ -3585,12 +3587,64 @@ static void scsi_disk_release(struct device *dev)
>   	kfree(sdkp);
>   }
>   
> +/* Process sense data after a START command finished. */
> +static void sd_start_done_work(struct work_struct *work)
> +{
> +	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
> +					      start_done_work);
> +	struct scsi_sense_hdr sshdr;
> +	int res = sdkp->start_result;
> +
> +	if (res == 0)
> +		return;
> + > +	sd_print_result(sdkp, "Start/Stop Unit failed", res);

Surely START/STOP unit can succeed, no?

> +	if (res > 0 && scsi_normalize_sense(sdkp->start_sense_buffer,
> +					    sdkp->start_sense_len, &sshdr))
> +		sd_print_sense_hdr(sdkp, &sshdr);
> +}
> +
> +/* A START command finished. May be called from interrupt context. */
> +static void sd_start_done(struct request *req, blk_status_t status)
> +{
> +	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> +	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
> +
> +	sdkp->start_result = scmd->result;
> +	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
> +	sdkp->start_sense_len = scmd->sense_len;
> +	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, scmd->sense_len);
> +	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
> +}
> +
> +/* Submit a START command asynchronously. */
> +static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
> +{
> +	struct scsi_device *sdev = sdkp->device;
> +	struct request_queue *q = sdev->request_queue;
> +	struct request *req;
> +	struct scsi_cmnd *scmd;
> +
> +	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	scmd = blk_mq_rq_to_pdu(req);
> +	scmd->cmd_len = cmd_len;
> +	memcpy(scmd->cmnd, cmd, cmd_len);
> +	scmd->allowed = sdkp->max_retries;
> +	req->timeout = SD_TIMEOUT;
> +	req->rq_flags |= RQF_PM | RQF_QUIET;
> +	req->end_io = sd_start_done;
> +	blk_execute_rq_nowait(req, /*at_head=*/true);
> +
> +	return 0;
> +}
> +
>   static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   {
>   	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
> -	struct scsi_sense_hdr sshdr;
>   	struct scsi_device *sdp = sdkp->device;
> -	int res;
>   
>   	if (start)
>   		cmd[4] |= 1;	/* START */
> @@ -3601,23 +3655,10 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   	if (!scsi_device_online(sdp))
>   		return -ENODEV;
>   
> -	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> -			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
> -	if (res) {
> -		sd_print_result(sdkp, "Start/Stop Unit failed", res);
> -		if (res > 0 && scsi_sense_valid(&sshdr)) {
> -			sd_print_sense_hdr(sdkp, &sshdr);
> -			/* 0x3a is medium not present */
> -			if (sshdr.asc == 0x3a)
> -				res = 0;
> -		}
> -	}
> +	/* Wait until processing of sense data has finished. */
> +	flush_work(&sdkp->start_done_work);
>   
> -	/* SCSI error codes must not go to the generic layer */
> -	if (res)
> -		return -EIO;
> -
> -	return 0;
> +	return sd_submit_start(sdkp, cmd, sizeof(cmd));
>   }
>   
>   /*
> @@ -3644,6 +3685,8 @@ static void sd_shutdown(struct device *dev)
>   		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>   		sd_start_stop_device(sdkp, 0);
>   	}
> +
> +	flush_work(&sdkp->start_done_work);
>   }
>   
>   static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 5eea762f84d1..b89187761d61 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,6 +150,11 @@ struct scsi_disk {
>   	unsigned	urswrz : 1;
>   	unsigned	security : 1;
>   	unsigned	ignore_medium_access_errors : 1;
> +
> +	int		start_result;
> +	u32		start_sense_len;
> +	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
> +	struct work_struct start_done_work;
>   };
>   #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>   

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready
  2022-06-29  1:21   ` Ming Lei
@ 2022-06-29 22:06     ` Bart Van Assche
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2022-06-29 22:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, Hannes Reinecke,
	John Garry

On 6/28/22 18:21, Ming Lei wrote:
> On Tue, Jun 28, 2022 at 03:21:30PM -0700, Bart Van Assche wrote:
>> If a logical unit reports that it is becoming ready, retry the command
>> after a delay instead of retrying immediately.
>>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Hannes Reinecke <hare@suse.de>
>> Cc: John Garry <john.garry@huawei.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   drivers/scsi/scsi_error.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 49ef864df581..fb7e363c4c00 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -625,10 +625,10 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   			return NEEDS_RETRY;
>>   		/*
>>   		 * if the device is in the process of becoming ready, we
>> -		 * should retry.
>> +		 * should retry after a delay.
>>   		 */
>>   		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
>> -			return NEEDS_RETRY;
>> +			return ADD_TO_MLQUEUE;
> 
> The above code & commit log just said changing to retry after a delay, but not
> explains why, care to document reason why the delay is useful?

I came up with this patch because I was concerned about the impact of the LOGICAL
UNIT IS IN PROCESS OF BECOMING READY response on the START command. While reviewing
SBC-4 I noticed that that response can be produced for any command but not for the
START command with IMM=0. So I think this patch can be dropped since
sd_start_stop_device() does not set the IMM flag.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] scsi: sd: Rework asynchronous resume support
  2022-06-29  6:02   ` Hannes Reinecke
@ 2022-06-30 16:09     ` Bart Van Assche
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2022-06-30 16:09 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Ming Lei, John Garry, ericspero,
	jason600.groome

On 6/28/22 23:02, Hannes Reinecke wrote:
> On 6/29/22 00:21, Bart Van Assche wrote:
>> +/* Process sense data after a START command finished. */
>> +static void sd_start_done_work(struct work_struct *work)
>> +{
>> +    struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
>> +                          start_done_work);
>> +    struct scsi_sense_hdr sshdr;
>> +    int res = sdkp->start_result;
>> +
>> +    if (res == 0)
>> +        return;
>> +
>> +    sd_print_result(sdkp, "Start/Stop Unit failed", res);
> 
> Surely START/STOP unit can succeed, no?

Yes, hence the "if (res == 0) return;" code. Did I perhaps misunderstand
your question?

Bart.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] scsi: sd: Rework asynchronous resume support
  2022-06-28 22:21 ` [PATCH 3/3] scsi: sd: Rework asynchronous resume support Bart Van Assche
  2022-06-29  6:02   ` Hannes Reinecke
@ 2022-06-30 16:23   ` John Garry
  2022-06-30 18:57     ` Bart Van Assche
  1 sibling, 1 reply; 11+ messages in thread
From: John Garry @ 2022-06-30 16:23 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Ming Lei, Hannes Reinecke, ericspero,
	jason600.groome

On 28/06/2022 23:21, Bart Van Assche wrote:
> For some technologies, e.g. an ATA bus, resuming can take multiple
> seconds. Waiting for resume to finish can cause a very noticeable delay.
> Hence this patch that restores the behavior from before patch "scsi:
> core: pm: Rely on the device driver core for async power management" for
> most SCSI devices.
> 
> This patch introduces a behavior change: if the START command fails, do
> not consider this as a SCSI disk resume failure.
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: John Garry <john.garry@huawei.com>
> Cc: ericspero@icloud.com
> Cc: jason600.groome@gmail.com
> Tested-by: jason600.groome@gmail.com
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/scsi/sd.c | 79 ++++++++++++++++++++++++++++++++++++-----------
>   drivers/scsi/sd.h |  5 +++
>   2 files changed, 66 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 895b56c8f25e..06888b675e71 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -103,6 +103,7 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
>   static void sd_config_write_same(struct scsi_disk *);
>   static int  sd_revalidate_disk(struct gendisk *);
>   static void sd_unlock_native_capacity(struct gendisk *disk);
> +static void sd_start_done_work(struct work_struct *work);
>   static int  sd_probe(struct device *);
>   static int  sd_remove(struct device *);
>   static void sd_shutdown(struct device *);
> @@ -3463,6 +3464,7 @@ static int sd_probe(struct device *dev)
>   	sdkp->max_retries = SD_MAX_RETRIES;
>   	atomic_set(&sdkp->openers, 0);
>   	atomic_set(&sdkp->device->ioerr_cnt, 0);
> +	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>   
>   	if (!sdp->request_queue->rq_timeout) {
>   		if (sdp->type != TYPE_MOD)
> @@ -3585,12 +3587,64 @@ static void scsi_disk_release(struct device *dev)
>   	kfree(sdkp);
>   }
>   
> +/* Process sense data after a START command finished. */
> +static void sd_start_done_work(struct work_struct *work)
> +{
> +	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
> +					      start_done_work);
> +	struct scsi_sense_hdr sshdr;
> +	int res = sdkp->start_result;
> +
> +	if (res == 0)
> +		return;
> +
> +	sd_print_result(sdkp, "Start/Stop Unit failed", res);
> +	if (res > 0 && scsi_normalize_sense(sdkp->start_sense_buffer,
> +					    sdkp->start_sense_len, &sshdr))
> +		sd_print_sense_hdr(sdkp, &sshdr);

nit: maybe you can reduce indentation, like:

	if (res < 0)
		return;

	if (scsi_normalize_sense(sdkp->start_sense_buffer,
			sdkp->start_sense_len, &sshdr)) {
		sd_print_sense_hdr(sdkp, &sshdr);
	}

> +}
> +
> +/* A START command finished. May be called from interrupt context. */
> +static void sd_start_done(struct request *req, blk_status_t status)
> +{
> +	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> +	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
> +
> +	sdkp->start_result = scmd->result;
> +	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);

If scmd->sense_len > SCSI_SENSE_BUFFERSIZE, do you really want to go on 
to copy at sdkp->start_sense_buffer (which is of size 
SCSI_SENSE_BUFFERSIZE)? Won't that cause a corruption?

> +	sdkp->start_sense_len = scmd->sense_len;
> +	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, scmd->sense_len);
> +	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
> +}
> +
> +/* Submit a START command asynchronously. */
> +static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
> +{
> +	struct scsi_device *sdev = sdkp->device;
> +	struct request_queue *q = sdev->request_queue;
> +	struct request *req;
> +	struct scsi_cmnd *scmd;
> +
> +	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	scmd = blk_mq_rq_to_pdu(req);
> +	scmd->cmd_len = cmd_len;
> +	memcpy(scmd->cmnd, cmd, cmd_len);
> +	scmd->allowed = sdkp->max_retries;
> +	req->timeout = SD_TIMEOUT;
> +	req->rq_flags |= RQF_PM | RQF_QUIET;
> +	req->end_io = sd_start_done;
> +	blk_execute_rq_nowait(req, /*at_head=*/true);
> +
> +	return 0;
> +}
> +
>   static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   {
>   	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
> -	struct scsi_sense_hdr sshdr;
>   	struct scsi_device *sdp = sdkp->device;
> -	int res;
>   
>   	if (start)
>   		cmd[4] |= 1;	/* START */
> @@ -3601,23 +3655,10 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   	if (!scsi_device_online(sdp))
>   		return -ENODEV;
>   
> -	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> -			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
> -	if (res) {
> -		sd_print_result(sdkp, "Start/Stop Unit failed", res);
> -		if (res > 0 && scsi_sense_valid(&sshdr)) {
> -			sd_print_sense_hdr(sdkp, &sshdr);
> -			/* 0x3a is medium not present */
> -			if (sshdr.asc == 0x3a)
> -				res = 0;
> -		}
> -	}
> +	/* Wait until processing of sense data has finished. */
> +	flush_work(&sdkp->start_done_work);
>   
> -	/* SCSI error codes must not go to the generic layer */
> -	if (res)
> -		return -EIO;
> -
> -	return 0;
> +	return sd_submit_start(sdkp, cmd, sizeof(cmd));
>   }
>   
>   /*
> @@ -3644,6 +3685,8 @@ static void sd_shutdown(struct device *dev)
>   		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>   		sd_start_stop_device(sdkp, 0);
>   	}
> +
> +	flush_work(&sdkp->start_done_work);
>   }
>   
>   static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 5eea762f84d1..b89187761d61 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,6 +150,11 @@ struct scsi_disk {
>   	unsigned	urswrz : 1;
>   	unsigned	security : 1;
>   	unsigned	ignore_medium_access_errors : 1;
> +
> +	int		start_result;
> +	u32		start_sense_len;
> +	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
> +	struct work_struct start_done_work;
>   };
>   #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>   
> .


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] scsi: sd: Rework asynchronous resume support
  2022-06-30 16:23   ` John Garry
@ 2022-06-30 18:57     ` Bart Van Assche
  2022-06-30 19:28       ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2022-06-30 18:57 UTC (permalink / raw)
  To: John Garry, Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Ming Lei, Hannes Reinecke, ericspero,
	jason600.groome

On 6/30/22 09:23, John Garry wrote:
> On 28/06/2022 23:21, Bart Van Assche wrote:
>> +/* A START command finished. May be called from interrupt context. */
>> +static void sd_start_done(struct request *req, blk_status_t status)
>> +{
>> +    const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
>> +    struct scsi_disk *sdkp = scsi_disk(req->q->disk);
>> +
>> +    sdkp->start_result = scmd->result;
>> +    WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
> 
> If scmd->sense_len > SCSI_SENSE_BUFFERSIZE, do you really want to go on 
> to copy at sdkp->start_sense_buffer (which is of size 
> SCSI_SENSE_BUFFERSIZE)? Won't that cause a corruption?

scsi_mq_init_request() allocates a buffer with size 
SCSI_SENSE_BUFFERSIZE. SCSI LLDs copy sense data into that buffer. I am 
not aware of any SCSI LLD that modifies the cmd->sense_buffer pointer. 
So if scmd->sense_len would be larger than SCSI_SENSE_BUFFERSIZE that 
either indicates that the LLD reported a sense length that is too large 
or that it wrote outside the bounds of the sense buffer. Do we really 
need to add a protection in the SCSI core against buggy LLDs?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] scsi: sd: Rework asynchronous resume support
  2022-06-30 18:57     ` Bart Van Assche
@ 2022-06-30 19:28       ` Bart Van Assche
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2022-06-30 19:28 UTC (permalink / raw)
  To: John Garry, Martin K . Petersen
  Cc: Jaegeuk Kim, linux-scsi, Ming Lei, Hannes Reinecke, ericspero,
	jason600.groome

On 6/30/22 11:57, Bart Van Assche wrote:
> On 6/30/22 09:23, John Garry wrote:
>> On 28/06/2022 23:21, Bart Van Assche wrote:
>>> +/* A START command finished. May be called from interrupt context. */
>>> +static void sd_start_done(struct request *req, blk_status_t status)
>>> +{
>>> +    const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
>>> +    struct scsi_disk *sdkp = scsi_disk(req->q->disk);
>>> +
>>> +    sdkp->start_result = scmd->result;
>>> +    WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
>>
>> If scmd->sense_len > SCSI_SENSE_BUFFERSIZE, do you really want to go 
>> on to copy at sdkp->start_sense_buffer (which is of size 
>> SCSI_SENSE_BUFFERSIZE)? Won't that cause a corruption?
> 
> scsi_mq_init_request() allocates a buffer with size 
> SCSI_SENSE_BUFFERSIZE. SCSI LLDs copy sense data into that buffer. I am 
> not aware of any SCSI LLD that modifies the cmd->sense_buffer pointer. 
> So if scmd->sense_len would be larger than SCSI_SENSE_BUFFERSIZE that 
> either indicates that the LLD reported a sense length that is too large 
> or that it wrote outside the bounds of the sense buffer. Do we really 
> need to add a protection in the SCSI core against buggy LLDs?

A result of the above is that SCSI_SENSE_BUFFERSIZE bytes can be copied 
instead of scmd->sense_len. I will make that change.

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-30 19:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28 22:21 [PATCH 0/3] Reduce ATA disk resume time Bart Van Assche
2022-06-28 22:21 ` [PATCH 1/3] scsi: core: Move the definition of SCSI_QUEUE_DELAY Bart Van Assche
2022-06-28 22:21 ` [PATCH 2/3] scsi: core: Retry after a delay if the device is becoming ready Bart Van Assche
2022-06-29  1:21   ` Ming Lei
2022-06-29 22:06     ` Bart Van Assche
2022-06-28 22:21 ` [PATCH 3/3] scsi: sd: Rework asynchronous resume support Bart Van Assche
2022-06-29  6:02   ` Hannes Reinecke
2022-06-30 16:09     ` Bart Van Assche
2022-06-30 16:23   ` John Garry
2022-06-30 18:57     ` Bart Van Assche
2022-06-30 19:28       ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).