Linux-ide Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] libata: don't request sense data on !ZAC ATA devices
@ 2019-06-24 16:32 Tejun Heo
  2019-06-24 20:27 ` Damien Le Moal
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Tejun Heo @ 2019-06-24 16:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-ide, Hannes Reinecke, kernel-team

ZAC support added sense data requesting on error for both ZAC and ATA
devices. This seems to cause erratic error handling behaviors on some
SSDs where the device reports sense data availability and then
delivers the wrong content making EH take the wrong actions.  The
failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
reproduced.

There is no value in requesting sense data from non-ZAC ATA devices
while there's a significant risk of introducing EH misbehaviors which
are difficult to reproduce and fix.  Let's do the sense data dancing
only for ZAC devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hannes Reinecke <hare@kernel.org>
---
 drivers/ata/libata-eh.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 9d687e1d4325..3bfd9da58473 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1469,7 +1469,7 @@ static int ata_eh_read_log_10h(struct ata_device *dev,
 	tf->hob_lbah = buf[10];
 	tf->nsect = buf[12];
 	tf->hob_nsect = buf[13];
-	if (ata_id_has_ncq_autosense(dev->id))
+	if (dev->class == ATA_DEV_ZAC && ata_id_has_ncq_autosense(dev->id))
 		tf->auxiliary = buf[14] << 16 | buf[15] << 8 | buf[16];
 
 	return 0;
@@ -1716,7 +1716,8 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
 	memcpy(&qc->result_tf, &tf, sizeof(tf));
 	qc->result_tf.flags = ATA_TFLAG_ISADDR | ATA_TFLAG_LBA | ATA_TFLAG_LBA48;
 	qc->err_mask |= AC_ERR_DEV | AC_ERR_NCQ;
-	if ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary) {
+	if (dev->class == ATA_DEV_ZAC &&
+	    ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary)) {
 		char sense_key, asc, ascq;
 
 		sense_key = (qc->result_tf.auxiliary >> 16) & 0xff;
@@ -1770,10 +1771,11 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
 	}
 
 	switch (qc->dev->class) {
-	case ATA_DEV_ATA:
 	case ATA_DEV_ZAC:
 		if (stat & ATA_SENSE)
 			ata_eh_request_sense(qc, qc->scsicmd);
+		/* fall through */
+	case ATA_DEV_ATA:
 		if (err & ATA_ICRC)
 			qc->err_mask |= AC_ERR_ATA_BUS;
 		if (err & (ATA_UNC | ATA_AMNF))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libata: don't request sense data on !ZAC ATA devices
  2019-06-24 16:32 [PATCH] libata: don't request sense data on !ZAC ATA devices Tejun Heo
@ 2019-06-24 20:27 ` Damien Le Moal
  2019-06-24 20:57   ` Tejun Heo
  2019-06-25  6:05 ` Hannes Reinecke
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2019-06-24 20:27 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe; +Cc: linux-ide, Hannes Reinecke, kernel-team

On 2019/06/25 1:33, Tejun Heo wrote:
> ZAC support added sense data requesting on error for both ZAC and ATA
> devices. This seems to cause erratic error handling behaviors on some
> SSDs where the device reports sense data availability and then
> delivers the wrong content making EH take the wrong actions.  The
> failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
> reproduced.
> 
> There is no value in requesting sense data from non-ZAC ATA devices
> while there's a significant risk of introducing EH misbehaviors which
> are difficult to reproduce and fix.  Let's do the sense data dancing
> only for ZAC devices.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Hannes Reinecke <hare@kernel.org>
> ---
>  drivers/ata/libata-eh.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 9d687e1d4325..3bfd9da58473 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1469,7 +1469,7 @@ static int ata_eh_read_log_10h(struct ata_device *dev,
>  	tf->hob_lbah = buf[10];
>  	tf->nsect = buf[12];
>  	tf->hob_nsect = buf[13];
> -	if (ata_id_has_ncq_autosense(dev->id))
> +	if (dev->class == ATA_DEV_ZAC && ata_id_has_ncq_autosense(dev->id))
>  		tf->auxiliary = buf[14] << 16 | buf[15] << 8 | buf[16];
>  
>  	return 0;
> @@ -1716,7 +1716,8 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
>  	memcpy(&qc->result_tf, &tf, sizeof(tf));
>  	qc->result_tf.flags = ATA_TFLAG_ISADDR | ATA_TFLAG_LBA | ATA_TFLAG_LBA48;
>  	qc->err_mask |= AC_ERR_DEV | AC_ERR_NCQ;
> -	if ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary) {
> +	if (dev->class == ATA_DEV_ZAC &&
> +	    ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary)) {
>  		char sense_key, asc, ascq;
>  
>  		sense_key = (qc->result_tf.auxiliary >> 16) & 0xff;
> @@ -1770,10 +1771,11 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
>  	}
>  
>  	switch (qc->dev->class) {
> -	case ATA_DEV_ATA:
>  	case ATA_DEV_ZAC:
>  		if (stat & ATA_SENSE)
>  			ata_eh_request_sense(qc, qc->scsicmd);
> +		/* fall through */
> +	case ATA_DEV_ATA:
>  		if (err & ATA_ICRC)
>  			qc->err_mask |= AC_ERR_ATA_BUS;
>  		if (err & (ATA_UNC | ATA_AMNF))
> 

For NCQ commands, I believe it is mandatory to request sense data for the failed
command to get the device out of error mode. So isn't this approach breaking
anything for well behaving drives ? Wouldn't it be better to blacklist the
misbehaving SSD you observed the problem with ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libata: don't request sense data on !ZAC ATA devices
  2019-06-24 20:27 ` Damien Le Moal
@ 2019-06-24 20:57   ` Tejun Heo
  2019-06-24 21:59     ` Damien Le Moal
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2019-06-24 20:57 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Jens Axboe, linux-ide, Hannes Reinecke, kernel-team

Hello, Damien.

On Mon, Jun 24, 2019 at 08:27:02PM +0000, Damien Le Moal wrote:
> For NCQ commands, I believe it is mandatory to request sense data for the failed
> command to get the device out of error mode. So isn't this approach breaking

Hah, that's a news to me.  We never had that code path before ZAC
support was added, so I'm kinda skeptical that'd be the case.

> anything for well behaving drives ? Wouldn't it be better to blacklist the
> misbehaving SSD you observed the problem with ?

Provided I'm not wrong with the assumption, there's virtually no
benefit in doing this and that's gonna be a *really* difficult
blacklist to develop.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libata: don't request sense data on !ZAC ATA devices
  2019-06-24 20:57   ` Tejun Heo
@ 2019-06-24 21:59     ` Damien Le Moal
  0 siblings, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2019-06-24 21:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-ide, Hannes Reinecke, kernel-team

Tejun,

On 2019/06/25 5:57, Tejun Heo wrote:
> Hello, Damien.
> 
> On Mon, Jun 24, 2019 at 08:27:02PM +0000, Damien Le Moal wrote:
>> For NCQ commands, I believe it is mandatory to request sense data for the failed
>> command to get the device out of error mode. So isn't this approach breaking
> 
> Hah, that's a news to me.  We never had that code path before ZAC
> support was added, so I'm kinda skeptical that'd be the case.

I checked again the ACS specs, and your are right, REQUEST SENSE DATA EXT is
optional in general, dependent on support of the Sense Data Reporting feature set.

For NCQ command errors, from ACS:

"If an error occurs while the device is processing an NCQ command, then the
device shall return command aborted for all NCQ commands that are in the queue
and shall return command aborted for any subsequent commands, except a command
from the GPL feature set (see 4.10) that reads the NCQ Command Error log (see
9.13), until the device completes that command without error."

So as long as NCQ command error log page is read, the device queue will get out
of error mode and new commands can be issued. There is no need for REQUEST SENSE
DATA EXT. I got confused with the fact that the Sense data reporting feature is
mandatory with ZAC drives (that is defined in ZAC, not ACS).

>> anything for well behaving drives ? Wouldn't it be better to blacklist the
>> misbehaving SSD you observed the problem with ?
> 
> Provided I'm not wrong with the assumption, there's virtually no
> benefit in doing this and that's gonna be a *really* difficult
> blacklist to develop.

You are not wrong :)
Will test your patch on our test rig which generates (in purpose) a lot of
command failures on ZAC drives. We can also give it a run with generated errors
on regular disks.

Cheers.

> 
> Thanks.
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libata: don't request sense data on !ZAC ATA devices
  2019-06-24 16:32 [PATCH] libata: don't request sense data on !ZAC ATA devices Tejun Heo
  2019-06-24 20:27 ` Damien Le Moal
@ 2019-06-25  6:05 ` Hannes Reinecke
  2019-06-25 12:25 ` Damien Le Moal
  2019-06-25 15:35 ` Jens Axboe
  3 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2019-06-25  6:05 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe; +Cc: linux-ide, Hannes Reinecke, kernel-team

On 6/24/19 6:32 PM, Tejun Heo wrote:
> ZAC support added sense data requesting on error for both ZAC and ATA
> devices. This seems to cause erratic error handling behaviors on some
> SSDs where the device reports sense data availability and then
> delivers the wrong content making EH take the wrong actions.  The
> failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
> reproduced.
> 
> There is no value in requesting sense data from non-ZAC ATA devices
> while there's a significant risk of introducing EH misbehaviors which
> are difficult to reproduce and fix.  Let's do the sense data dancing
> only for ZAC devices.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Hannes Reinecke <hare@kernel.org>
> ---
>  drivers/ata/libata-eh.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
Ah well. I hoped those bothering to implement sense data would do it
properly; seems I've been mistaken.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libata: don't request sense data on !ZAC ATA devices
  2019-06-24 16:32 [PATCH] libata: don't request sense data on !ZAC ATA devices Tejun Heo
  2019-06-24 20:27 ` Damien Le Moal
  2019-06-25  6:05 ` Hannes Reinecke
@ 2019-06-25 12:25 ` Damien Le Moal
  2019-06-25 15:35 ` Jens Axboe
  3 siblings, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2019-06-25 12:25 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe; +Cc: linux-ide, Hannes Reinecke, kernel-team

On 2019/06/25 1:33, Tejun Heo wrote:
> ZAC support added sense data requesting on error for both ZAC and ATA
> devices. This seems to cause erratic error handling behaviors on some
> SSDs where the device reports sense data availability and then
> delivers the wrong content making EH take the wrong actions.  The
> failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
> reproduced.
> 
> There is no value in requesting sense data from non-ZAC ATA devices
> while there's a significant risk of introducing EH misbehaviors which
> are difficult to reproduce and fix.  Let's do the sense data dancing
> only for ZAC devices.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Hannes Reinecke <hare@kernel.org>
> ---
>  drivers/ata/libata-eh.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 9d687e1d4325..3bfd9da58473 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1469,7 +1469,7 @@ static int ata_eh_read_log_10h(struct ata_device *dev,
>  	tf->hob_lbah = buf[10];
>  	tf->nsect = buf[12];
>  	tf->hob_nsect = buf[13];
> -	if (ata_id_has_ncq_autosense(dev->id))
> +	if (dev->class == ATA_DEV_ZAC && ata_id_has_ncq_autosense(dev->id))
>  		tf->auxiliary = buf[14] << 16 | buf[15] << 8 | buf[16];
>  
>  	return 0;
> @@ -1716,7 +1716,8 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
>  	memcpy(&qc->result_tf, &tf, sizeof(tf));
>  	qc->result_tf.flags = ATA_TFLAG_ISADDR | ATA_TFLAG_LBA | ATA_TFLAG_LBA48;
>  	qc->err_mask |= AC_ERR_DEV | AC_ERR_NCQ;
> -	if ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary) {
> +	if (dev->class == ATA_DEV_ZAC &&
> +	    ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary)) {
>  		char sense_key, asc, ascq;
>  
>  		sense_key = (qc->result_tf.auxiliary >> 16) & 0xff;
> @@ -1770,10 +1771,11 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
>  	}
>  
>  	switch (qc->dev->class) {
> -	case ATA_DEV_ATA:
>  	case ATA_DEV_ZAC:
>  		if (stat & ATA_SENSE)
>  			ata_eh_request_sense(qc, qc->scsicmd);
> +		/* fall through */
> +	case ATA_DEV_ATA:
>  		if (err & ATA_ICRC)
>  			qc->err_mask |= AC_ERR_ATA_BUS;
>  		if (err & (ATA_UNC | ATA_AMNF))
> 

No problems with tests.

Tested-by: Masato Suzuki <masato.suzuki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libata: don't request sense data on !ZAC ATA devices
  2019-06-24 16:32 [PATCH] libata: don't request sense data on !ZAC ATA devices Tejun Heo
                   ` (2 preceding siblings ...)
  2019-06-25 12:25 ` Damien Le Moal
@ 2019-06-25 15:35 ` Jens Axboe
  3 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-06-25 15:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, Hannes Reinecke, kernel-team

On 6/24/19 10:32 AM, Tejun Heo wrote:
> ZAC support added sense data requesting on error for both ZAC and ATA
> devices. This seems to cause erratic error handling behaviors on some
> SSDs where the device reports sense data availability and then
> delivers the wrong content making EH take the wrong actions.  The
> failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
> reproduced.
> 
> There is no value in requesting sense data from non-ZAC ATA devices
> while there's a significant risk of introducing EH misbehaviors which
> are difficult to reproduce and fix.  Let's do the sense data dancing
> only for ZAC devices.

Applied, thanks Tejun.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24 16:32 [PATCH] libata: don't request sense data on !ZAC ATA devices Tejun Heo
2019-06-24 20:27 ` Damien Le Moal
2019-06-24 20:57   ` Tejun Heo
2019-06-24 21:59     ` Damien Le Moal
2019-06-25  6:05 ` Hannes Reinecke
2019-06-25 12:25 ` Damien Le Moal
2019-06-25 15:35 ` Jens Axboe

Linux-ide Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ide/0 linux-ide/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ide linux-ide/ https://lore.kernel.org/linux-ide \
		linux-ide@vger.kernel.org linux-ide@archiver.kernel.org
	public-inbox-index linux-ide


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ide


AGPL code for this site: git clone https://public-inbox.org/ public-inbox