All of lore.kernel.org
 help / color / mirror / Atom feed
From: Niklas Cassel <niklas.cassel@wdc.com>
To: "James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>,
	linux-scsi@vger.kernel.org, Niklas Cassel <niklas.cassel@wdc.com>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>
Subject: [PATCH 23/25] scsi: sd: handle read/write CDL timeout failures
Date: Thu,  8 Dec 2022 11:59:39 +0100	[thread overview]
Message-ID: <20221208105947.2399894-24-niklas.cassel@wdc.com> (raw)
In-Reply-To: <20221208105947.2399894-1-niklas.cassel@wdc.com>

Commands using a duration limit descriptor that has limit policies set
to a value other than 0x0 may be failed by the device if one of the
limits are exceeded. For such commands, since the failure is the result
of the user duration limit configuration and workload, the commands
should not be retried and terminated immediately. Furthermore, to allow
the user to differentiate these "soft" failures from hard errors due to
hardware problem, a different error code than EIO should be returned.

There are 2 cases to consider:
(1) The failure is due to a limit policy failing the command with a
check condition sense key, that is, any limit policy other than 0xD.
For this case, scsi_check_sense() is modified to detect failures with
the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
failures, a SUCCESS disposition is returned so that
scsi_finish_command() is called to terminate the command.

(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and
DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
the scsi_check_sense() is modified to return a SUCCESS disposition so
that scsi_finish_command() is called to terminate the command.
In addition, scsi_decide_disposition() has to be modified to see if a
command being terminated with GOOD status has sense data.
This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
according to spec, even if GOOD status commands were not checked before.

If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
scsi_noretry_cmd(), so that a command that failed because of a CDL
timeout cannot be retried. The SCSI ML byte is also checked in
scsi_result_to_blk_status() to complete the command request with the
BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
errors for the failed commands.

Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c   |  4 ++++
 drivers/scsi/scsi_priv.h  |  1 +
 3 files changed, 51 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 51aa5c1e31b5..1bdab5385985 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
  */
 enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 {
+	struct request *req = scsi_cmd_to_rq(scmd);
 	struct scsi_device *sdev = scmd->device;
 	struct scsi_sense_hdr sshdr;
 
@@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		if (sshdr.asc == 0x10) /* DIF */
 			return SUCCESS;
 
+		/*
+		 * Check aborts due to command duration limit policy:
+		 * ABORTED COMMAND additional sense code with the
+		 * COMMAND TIMEOUT BEFORE PROCESSING or
+		 * COMMAND TIMEOUT DURING PROCESSING or
+		 * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
+		 * additional sense code qualifiers.
+		 */
+		if (sshdr.asc == 0x2e &&
+		    sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
+			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
+			req->cmd_flags |= REQ_FAILFAST_DEV;
+			req->rq_flags |= RQF_QUIET;
+			return SUCCESS;
+		}
+
 		if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
 			return ADD_TO_MLQUEUE;
 		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
@@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		}
 		return SUCCESS;
 
+	case COMPLETED:
+		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
+			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
+			req->cmd_flags |= REQ_FAILFAST_DEV;
+			req->rq_flags |= RQF_QUIET;
+			return SUCCESS;
+		}
+		return SUCCESS;
+
 	default:
 		return SUCCESS;
 	}
@@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
 	switch (get_status_byte(scmd)) {
 	case SAM_STAT_GOOD:
 		scsi_handle_queue_ramp_up(scmd->device);
+		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
+			/*
+			 * If we have sense data, call scsi_check_sense() in
+			 * order to set the correct SCSI ML byte (if any).
+			 * No point in checking the return value, since the
+			 * command has already completed successfully.
+			 */
+			scsi_check_sense(scmd);
 		fallthrough;
 	case SAM_STAT_COMMAND_TERMINATED:
 		return SUCCESS;
@@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return !!(req->cmd_flags & REQ_FAILFAST_DRIVER);
 	}
 
+	/* Never retry commands aborted due to a duration limit timeout */
+	if (get_scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT)
+		return true;
+
 	if (!scsi_status_is_check_condition(scmd->result))
 		return false;
 
@@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd)
 		if (scmd->cmnd[0] == REPORT_LUNS)
 			scmd->device->sdev_target->expecting_lun_change = 0;
 		scsi_handle_queue_ramp_up(scmd->device);
+		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
+			/*
+			 * If we have sense data, call scsi_check_sense() in
+			 * order to set the correct SCSI ML byte (if any).
+			 * No point in checking the return value, since the
+			 * command has already completed successfully.
+			 */
+			scsi_check_sense(scmd);
 		fallthrough;
 	case SAM_STAT_COMMAND_TERMINATED:
 		return SUCCESS;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e64fd8f495d7..4f317c6593aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -602,6 +602,8 @@ static blk_status_t scsi_result_to_blk_status(int result)
 		return BLK_STS_MEDIUM;
 	case SCSIML_STAT_TGT_FAILURE:
 		return BLK_STS_TARGET;
+	case SCSIML_STAT_DL_TIMEOUT:
+		return BLK_STS_DURATION_LIMIT;
 	}
 
 	switch (host_byte(result)) {
@@ -799,6 +801,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
 				blk_stat = BLK_STS_ZONE_OPEN_RESOURCE;
 			}
 			break;
+		case COMPLETED:
+			fallthrough;
 		default:
 			action = ACTION_FAIL;
 			break;
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 4f97e126c6fb..f8da92428ff6 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -27,6 +27,7 @@ enum scsi_ml_status {
 	SCSIML_STAT_NOSPC		= 0x02,	/* Space allocation on the dev failed */
 	SCSIML_STAT_MED_ERROR		= 0x03,	/* Medium error */
 	SCSIML_STAT_TGT_FAILURE		= 0x04,	/* Permanent target failure */
+	SCSIML_STAT_DL_TIMEOUT		= 0x05, /* Command Duration Limit timeout */
 };
 
 static inline u8 get_scsi_ml_byte(int result)
-- 
2.38.1


  parent reply	other threads:[~2022-12-08 11:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-08 10:59 [PATCH 00/25] Add Command Duration Limits support Niklas Cassel
2022-12-08 10:59 ` [PATCH 01/25] ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH Niklas Cassel
2022-12-21 11:47   ` John Garry
2022-12-08 10:59 ` [PATCH 02/25] ata: libata: move NCQ related ATA_DFLAGs Niklas Cassel
2022-12-08 10:59 ` [PATCH 03/25] ata: libata: simplify qc_fill_rtf port operation interface Niklas Cassel
2022-12-21 11:48   ` John Garry
2022-12-08 10:59 ` [PATCH 04/25] ata: libata: fix broken NCQ command status handling Niklas Cassel
2022-12-08 10:59 ` [PATCH 05/25] ata: libata: respect successfully completed commands during errors Niklas Cassel
2022-12-08 10:59 ` [PATCH 06/25] ata: libata: allow ata_scsi_set_sense() to not set CHECK_CONDITION Niklas Cassel
2022-12-08 10:59 ` [PATCH 07/25] ata: libata: allow ata_eh_request_sense() " Niklas Cassel
2022-12-08 10:59 ` [PATCH 08/25] ata: libata-scsi: do not overwrite SCSI ML and status bytes Niklas Cassel
2022-12-08 10:59 ` [PATCH 09/25] ata: libata-scsi: improve ata_scsiop_maint_in() Niklas Cassel
2022-12-08 10:59 ` [PATCH 10/25] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2022-12-08 10:59 ` [PATCH 11/25] scsi: move get_scsi_ml_byte() to scsi_priv.h Niklas Cassel
2022-12-08 23:58   ` Mike Christie
2022-12-28 20:41     ` Niklas Cassel
2022-12-29 18:55       ` Mike Christie
2022-12-29 20:19         ` Niklas Cassel
2022-12-08 10:59 ` [PATCH 12/25] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2022-12-08 10:59 ` [PATCH 13/25] scsi: support service action in scsi_report_opcode() Niklas Cassel
2022-12-08 10:59 ` [PATCH 14/25] block: introduce duration-limits priority class Niklas Cassel
2022-12-08 10:59 ` [PATCH 15/25] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2022-12-08 10:59 ` [PATCH 16/25] ata: libata: detect support for command duration limits Niklas Cassel
2022-12-08 10:59 ` [PATCH 17/25] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2022-12-08 10:59 ` [PATCH 18/25] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2022-12-08 10:59 ` [PATCH 19/25] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2022-12-08 10:59 ` [PATCH 20/25] ata: libata: set read/write commands CDL index Niklas Cassel
2022-12-08 10:59 ` [PATCH 21/25] scsi: sd: detect support for command duration limits Niklas Cassel
2022-12-08 10:59 ` [PATCH 22/25] scsi: sd: set read/write commands CDL index Niklas Cassel
2022-12-08 10:59 ` Niklas Cassel [this message]
2022-12-09  0:13   ` [PATCH 23/25] scsi: sd: handle read/write CDL timeout failures Mike Christie
2022-12-09  0:26     ` Damien Le Moal
2022-12-08 10:59 ` [PATCH 24/25] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2022-12-08 10:59 ` [PATCH 25/25] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2022-12-09  3:22   ` Bagas Sanjaya
2022-12-09  3:31     ` Damien Le Moal
2022-12-08 18:18 ` [PATCH 00/25] Add Command Duration Limits support Chaitanya Kulkarni
2022-12-09  0:29   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221208105947.2399894-24-niklas.cassel@wdc.com \
    --to=niklas.cassel@wdc.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=hare@suse.de \
    --cc=jejb@linux.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.