Linux-Hwmon Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3] hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives
@ 2020-07-18 12:32 Maciej S. Szmigiero
  2020-07-18 15:13 ` Guenter Roeck
  0 siblings, 1 reply; 2+ messages in thread
From: Maciej S. Szmigiero @ 2020-07-18 12:32 UTC (permalink / raw)
  To: Jean Delvare, Guenter Roeck; +Cc: linux-hwmon, linux-kernel

It has been observed that Toshiba DT01ACA family drives have
WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
power-cycled under heavy write loads when their temperature is getting
polled in SCT mode. The SMART mode seems to be fine, though.

Let's make sure we don't use SCT mode for these drives then.

While only the 3 TB model was actually caught exhibiting the problem let's
play safe here to avoid data corruption and extend the ban to the whole
family.

Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
Cc: stable@vger.kernel.org
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
---

Notes:
    This behavior was observed on two different DT01ACA3 drives.
    
    Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
    but sometimes the whole drive freezes. Merely disconnecting and
    reconnecting SATA interface cable then does not unfreeze the drive.
    
    One has to disconnect and reconnect the drive power connector for the
    drive to be detected again (suggesting the drive firmware itself has
    crashed).
    
    This only happens when the drive temperature is polled very often (like
    every second), so occasional SCT usage via smartmontools is probably
    safe.
    
    Changes from v1:
    'SCT blacklist' -> 'SCT avoid models'
    
    Changes from v2:
    * Switch to prefix matching and use it to match the DT01ACAx family,
    
    * Use "!" instead of "== 0",
    
    * Add a comment about the contents of the "model" field.

 drivers/hwmon/drivetemp.c | 43 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
index 0d4f3d97ffc6..72c760373957 100644
--- a/drivers/hwmon/drivetemp.c
+++ b/drivers/hwmon/drivetemp.c
@@ -285,6 +285,42 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
 	return err;
 }
 
+static const char * const sct_avoid_models[] = {
+/*
+ * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
+ * freeze until power-cycled under heavy write loads when their temperature is
+ * getting polled in SCT mode. The SMART mode seems to be fine, though.
+ *
+ * While only the 3 TB model (DT01ACA3) was actually caught exhibiting the
+ * problem let's play safe here to avoid data corruption and ban the whole
+ * DT01ACAx family.
+
+ * The models from this array are prefix-matched.
+ */
+	"TOSHIBA DT01ACA",
+};
+
+static bool drivetemp_sct_avoid(struct drivetemp_data *st)
+{
+	struct scsi_device *sdev = st->sdev;
+	unsigned int ctr;
+
+	if (!sdev->model)
+		return false;
+
+	/*
+	 * The "model" field contains just the raw SCSI INQUIRY response
+	 * "product identification" field, which has a width of 16 bytes.
+	 * This field is space-filled, but is NOT NULL-terminated.
+	 */
+	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
+		if (!strncmp(sdev->model, sct_avoid_models[ctr],
+			     strlen(sct_avoid_models[ctr])))
+			return true;
+
+	return false;
+}
+
 static int drivetemp_identify_sata(struct drivetemp_data *st)
 {
 	struct scsi_device *sdev = st->sdev;
@@ -326,6 +362,13 @@ static int drivetemp_identify_sata(struct drivetemp_data *st)
 	/* bail out if this is not a SATA device */
 	if (!is_ata || !is_sata)
 		return -ENODEV;
+
+	if (have_sct && drivetemp_sct_avoid(st)) {
+		dev_notice(&sdev->sdev_gendev,
+			   "will avoid using SCT for temperature monitoring\n");
+		have_sct = false;
+	}
+
 	if (!have_sct)
 		goto skip_sct;
 

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH v3] hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives
  2020-07-18 12:32 [PATCH v3] hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives Maciej S. Szmigiero
@ 2020-07-18 15:13 ` Guenter Roeck
  0 siblings, 0 replies; 2+ messages in thread
From: Guenter Roeck @ 2020-07-18 15:13 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: Jean Delvare, linux-hwmon, linux-kernel

On Sat, Jul 18, 2020 at 02:32:10PM +0200, Maciej S. Szmigiero wrote:
> It has been observed that Toshiba DT01ACA family drives have
> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
> power-cycled under heavy write loads when their temperature is getting
> polled in SCT mode. The SMART mode seems to be fine, though.
> 
> Let's make sure we don't use SCT mode for these drives then.
> 
> While only the 3 TB model was actually caught exhibiting the problem let's
> play safe here to avoid data corruption and extend the ban to the whole
> family.
> 
> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
> Cc: stable@vger.kernel.org
> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>

Applied.

Thanks,
Guenter

> ---
> 
> Notes:
>     This behavior was observed on two different DT01ACA3 drives.
>     
>     Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
>     but sometimes the whole drive freezes. Merely disconnecting and
>     reconnecting SATA interface cable then does not unfreeze the drive.
>     
>     One has to disconnect and reconnect the drive power connector for the
>     drive to be detected again (suggesting the drive firmware itself has
>     crashed).
>     
>     This only happens when the drive temperature is polled very often (like
>     every second), so occasional SCT usage via smartmontools is probably
>     safe.
>     
>     Changes from v1:
>     'SCT blacklist' -> 'SCT avoid models'
>     
>     Changes from v2:
>     * Switch to prefix matching and use it to match the DT01ACAx family,
>     
>     * Use "!" instead of "== 0",
>     
>     * Add a comment about the contents of the "model" field.
> 
>  drivers/hwmon/drivetemp.c | 43 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> index 0d4f3d97ffc6..72c760373957 100644
> --- a/drivers/hwmon/drivetemp.c
> +++ b/drivers/hwmon/drivetemp.c
> @@ -285,6 +285,42 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>  	return err;
>  }
>  
> +static const char * const sct_avoid_models[] = {
> +/*
> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
> + * freeze until power-cycled under heavy write loads when their temperature is
> + * getting polled in SCT mode. The SMART mode seems to be fine, though.
> + *
> + * While only the 3 TB model (DT01ACA3) was actually caught exhibiting the
> + * problem let's play safe here to avoid data corruption and ban the whole
> + * DT01ACAx family.
> +
> + * The models from this array are prefix-matched.
> + */
> +	"TOSHIBA DT01ACA",
> +};
> +
> +static bool drivetemp_sct_avoid(struct drivetemp_data *st)
> +{
> +	struct scsi_device *sdev = st->sdev;
> +	unsigned int ctr;
> +
> +	if (!sdev->model)
> +		return false;
> +
> +	/*
> +	 * The "model" field contains just the raw SCSI INQUIRY response
> +	 * "product identification" field, which has a width of 16 bytes.
> +	 * This field is space-filled, but is NOT NULL-terminated.
> +	 */
> +	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
> +		if (!strncmp(sdev->model, sct_avoid_models[ctr],
> +			     strlen(sct_avoid_models[ctr])))
> +			return true;
> +
> +	return false;
> +}
> +
>  static int drivetemp_identify_sata(struct drivetemp_data *st)
>  {
>  	struct scsi_device *sdev = st->sdev;
> @@ -326,6 +362,13 @@ static int drivetemp_identify_sata(struct drivetemp_data *st)
>  	/* bail out if this is not a SATA device */
>  	if (!is_ata || !is_sata)
>  		return -ENODEV;
> +
> +	if (have_sct && drivetemp_sct_avoid(st)) {
> +		dev_notice(&sdev->sdev_gendev,
> +			   "will avoid using SCT for temperature monitoring\n");
> +		have_sct = false;
> +	}
> +
>  	if (!have_sct)
>  		goto skip_sct;
>  

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-18 12:32 [PATCH v3] hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives Maciej S. Szmigiero
2020-07-18 15:13 ` Guenter Roeck

Linux-Hwmon Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hwmon/0 linux-hwmon/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hwmon linux-hwmon/ https://lore.kernel.org/linux-hwmon \
		linux-hwmon@vger.kernel.org
	public-inbox-index linux-hwmon

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hwmon


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git