linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
@ 2019-12-09  5:21 Guenter Roeck
  2019-12-09  5:21 ` [PATCH 1/1] hwmon: Driver " Guenter Roeck
  2019-12-11  4:08 ` [PATCH 0/1] Summary: hwmon driver " Martin K. Petersen
  0 siblings, 2 replies; 20+ messages in thread
From: Guenter Roeck @ 2019-12-09  5:21 UTC (permalink / raw)
  To: linux-hwmon
  Cc: Jean Delvare, linux-doc, linux-kernel, linux-scsi, linux-ide,
	Guenter Roeck

In the past, several attempts have been made to add support for reporting
SCSI/[S]ATA drive temperatures to the Linux kernel. This is desirable to
have a means to report drive temperatures to userspace without root
privileges and in a standard format, but also to be able to tie reported
temperatures with the thermal subsystem.

The most recent attempt was [1] by Linus Walleij. It went through a total
of seven iterations. At the end, it was rejected for a number of reasons;
see the provided link for details. This implementation resides in the
SCSI core. It originally resided in libata but was moved to SCSI per
maintainer request, where it was ultimately rejected.

The feedback on this approach suggests to use the SCSI Temperature log page
[0x0d] as means to access drive temperature information. It is unknown
if this is implemented in any real SCSI drive. The feedback also suggests to
obtain temperature from ATA drives, convert it into the SCSI temperature log
page in libata-scsi, and to use that information in a hardware monitoring
driver. The format and method to do this is documented in [3]. This is not
currently implemented in the Linux kernel.

An earlier submission of a driver to report SCSI/SATA drive temperatures
was made back in 2009 by Constantin Baranov [2]. This submission resides
in the hardware monitoring subsystem. It does not rely on changes in the
SCSI subsystem or in libata-scsi. Instead, it registers itself with the
SCSI subsystem using scsi_register_interface(). It was rejected primarily
because it executes ATA passthrough commands without verification that it
is actually connected to an ATA drive.

Both submissions use SMART attributes to read drive temperature information.
[1] also tries to identify temperature limits from those attributes.
Unfortunately, SMART attributes are not well defined, resulting in relative
complex code trying to identify the exact format of the reported data.

With the available information and feedback, we can make a number of
observations and conclusions.
a) Using available (S)ATA drive temperature information and convert it to
   a SCSI log page is an interesting idea. On the downside, it would add a
   substantial amount of complexity to libata-scsi. The code would either
   have to be optional, or it would have to be built into the kernel even
   if it is never used on a given system. Without access to SCSI drives
   supporting this feature, it would be all but impossible to test the code
   against such a drive. It would neither be possible to test correctness
   of the code in libata-scsi nor in the driver using that information.
   Overall it would be much easier and much less risky to implement such
   code on the receiving side (ie in a driver reporting the temperatures)
   instead of trying to convert the information from one format to another
   first. In summary, it is neither practical nor feasible. On top of that,
   there is no guarantee that code implementing this functionality would
   ever be accepted into the kernel for this very reason.
b) The code needed to read and analyze SCSI temperature log pages is quite
   complex (see smartmontools [5]). There is no existing support code
   in the Linux kernel; such code would have to be written. This makes
   the approach discussed in a) even more risky and less practical.
c) Overall, any attempt to report temperature information for anything
   but SATA drives in the kernel is not practical due to the complexity
   involved, and due to the inability to test the resulting code with
   non-SATA drives.
d) Using SMART data for anything but basic temperature reporting is not
   really feasible due to the lack of standardization. Any attempt to do
   this would add a substantial amount of code, ambiguity, and risk.

This submission implements a driver to report the temperature of SATA
drives through the hardware monitoring subsystem. It is implemented as
stand-alone driver in the hardware monitoring subsystem. The driver uses
the mechanism from submission [1] to register with the SCSI subsystem.
By using this mechanism, changes in the SCSI or ATA subsystems are not
required.  To reduce risk and complexity, it only instantiates after
reliably validating that it is connected to a SATA drive. It does not
attempt to report the temperature of non-SATA drives.

The driver uses the SCT Command Transport feature set as specified in
ATA8-ACS [4] to read and report the temperature as well as temperature
limits and lowest/highest temperature information (if available) for
SATA drives. If a drive does not support SCT Command Transport, the driver
attempts to access a limited set of well known SMART attributes to read
the drive temperature. In that case, only the current drive temperature
is reported.

---
References:
[1] https://patchwork.kernel.org/patch/10688021/
[2] https://lore.kernel.org/lkml/20090913040104.ab1d0b69.const@mimas.ru/
[3] http://www.t10.org/cgi-bin/ac.pl?t=f&f=sat5r02.pdf
    Information technology - SCSI / ATA Translation - 5 (SAT-5),
    section 10.3.8 (Temperature log page).
[4] http://www.t13.org/documents/uploadeddocuments/docs2008/d1699r6a-ata8-acs.pdf
    ANS T13/1699-D "Information technology - AT Attachment 8 - ATA/ATAPI Command
    Set (ATA8-ACS)"
[5] https://github.com/mirror/smartmontools.git

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09  5:21 [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives Guenter Roeck
@ 2019-12-09  5:21 ` Guenter Roeck
  2019-12-09  5:28   ` Randy Dunlap
                     ` (2 more replies)
  2019-12-11  4:08 ` [PATCH 0/1] Summary: hwmon driver " Martin K. Petersen
  1 sibling, 3 replies; 20+ messages in thread
From: Guenter Roeck @ 2019-12-09  5:21 UTC (permalink / raw)
  To: linux-hwmon
  Cc: Jean Delvare, linux-doc, linux-kernel, linux-scsi, linux-ide,
	Guenter Roeck, Chris Healy, Linus Walleij

Reading the hard drive temperature has been supported for years
by userspace tools such as smarttools or hddtemp. The downside of
such tools is that they need to run with super-user privilege, that
the temperatures are not reported by standard tools such as 'sensors'
or 'libsensors', and that drive temperatures are not available for use
in the kernel's thermal subsystem.

This driver solves this problem by adding support for reading the
temperature of SATA drives from the kernel using the hwmon API and
by adding a temperature zone for each drive.

With this driver, the hard disk temperature can be read using the
unprivileged 'sensors' application:

$ sensors satatemp-scsi-1-0
satatemp-scsi-1-0
Adapter: SCSI adapter
temp1:        +23.0°C

or directly from sysfs:

$ grep . /sys/class/hwmon/hwmon9/{name,temp1_input}
/sys/class/hwmon/hwmon9/name:satatemp
/sys/class/hwmon/hwmon9/temp1_input:23000

If the drive supports SCT transport and reports temperature limits,
those are reported as well.

satatemp-scsi-0-0
Adapter: SCSI adapter
temp1:        +27.0°C  (low  =  +0.0°C, high = +60.0°C)
                       (crit low = -41.0°C, crit = +85.0°C)
                       (lowest = +23.0°C, highest = +34.0°C)

Cc: Chris Healy <cphealy@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/index.rst    |   1 +
 Documentation/hwmon/satatemp.rst |  48 +++
 drivers/hwmon/Kconfig            |  10 +
 drivers/hwmon/Makefile           |   1 +
 drivers/hwmon/satatemp.c         | 575 +++++++++++++++++++++++++++++++
 5 files changed, 635 insertions(+)
 create mode 100644 Documentation/hwmon/satatemp.rst
 create mode 100644 drivers/hwmon/satatemp.c

diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 230ad59b462b..ecf1832dd013 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -133,6 +133,7 @@ Hardware Monitoring Kernel Drivers
    pxe1610
    pwm-fan
    raspberrypi-hwmon
+   satatemp
    sch5627
    sch5636
    scpi-hwmon
diff --git a/Documentation/hwmon/satatemp.rst b/Documentation/hwmon/satatemp.rst
new file mode 100644
index 000000000000..59b105f3c79a
--- /dev/null
+++ b/Documentation/hwmon/satatemp.rst
@@ -0,0 +1,48 @@
+Kernel driver satatemp
+======================
+
+
+References
+----------
+
+ANS T13/1699-D
+Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS)
+
+ANS Project T10/BSR INCITS 513
+Information technology - SCSI Primary Commands - 4 (SPC-4)
+
+ANS Project INCITS 557
+Information technology - SCSI / ATA Translation - 5 (SAT-5)
+
+
+Description
+-----------
+
+This driver supports reporting the temperature of SATA drives.
+If supported, it uses the SCT Command Transport feature to read
+the current drive temperature and, if available, temperature limits
+as well as historic minimum and maximum temperatures. If SCT Command
+Transport is not supported, the driver uses SMART attributes to read
+the drive temperature.
+
+
+Sysfs entries
+-------------
+
+Only the temp1_input attribute is always available. Other attributes are
+available only if reported by the drive. All temperatures are reported in
+milli-degrees Celsius.
+
+=======================	=====================================================
+temp1_input		Current drive temperature
+temp1_lcrit		Minimum temperature limit. Operating the device below
+			this temperature may cause physical damage to the
+			device.
+temp1_min		Minimum recommended continuous operating limit
+temp1_max		Maximum recommended continuous operating temperature
+temp1_crit		Maximum temperature limit. Operating the device above
+			this temperature may cause physical damage to the
+			device.
+temp1_lowest		Minimum temperature seen this power cycle
+temp1_highest		Maximum temperature seen this power cycle
+=======================	=====================================================
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 13a6b4afb4b3..4c63eb7ba96a 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1346,6 +1346,16 @@ config SENSORS_RASPBERRYPI_HWMON
 	  This driver can also be built as a module. If so, the module
 	  will be called raspberrypi-hwmon.
 
+config SENSORS_SATATEMP
+	tristate "SATA hard disk drives with temperature sensors"
+	depends on SCSI && ATA
+	help
+	  If you say yes you get support for the temperature sensor on
+	  SATA hard disk drives.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called smarttemp.
+
 config SENSORS_SHT15
 	tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
 	depends on GPIOLIB || COMPILE_TEST
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 40c036ea45e6..fe55b8f76af9 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -148,6 +148,7 @@ obj-$(CONFIG_SENSORS_S3C)	+= s3c-hwmon.o
 obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
 obj-$(CONFIG_SENSORS_SCH5627)	+= sch5627.o
 obj-$(CONFIG_SENSORS_SCH5636)	+= sch5636.o
+obj-$(CONFIG_SENSORS_SATATEMP)	+= satatemp.o
 obj-$(CONFIG_SENSORS_SHT15)	+= sht15.o
 obj-$(CONFIG_SENSORS_SHT21)	+= sht21.o
 obj-$(CONFIG_SENSORS_SHT3x)	+= sht3x.o
diff --git a/drivers/hwmon/satatemp.c b/drivers/hwmon/satatemp.c
new file mode 100644
index 000000000000..4a6bdcc86988
--- /dev/null
+++ b/drivers/hwmon/satatemp.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hwmon client for SATA hard disk drives with temperature sensors
+ * Copyright (C) 2019 Zodiac Inflight Innovations
+ *
+ * With input from:
+ *    Hwmon client for S.M.A.R.T. hard disk drives with temperature sensors.
+ *    (C) 2018 Linus Walleij
+ *
+ *    hwmon: Driver for SCSI/ATA temperature sensors
+ *    by Constantin Baranov <const@mimas.ru>, submitted September 2009
+ *
+ * The primary means to read hard drive temperatures and temperature limits
+ * is the SCT Command Transport feature set as specified in ATA8-ACS.
+ * It can be used to read the current drive temperature, temperature limits,
+ * and historic minimum and maximum temperatures. The SCT Command Transport
+ * feature set is documented in "AT Attachment 8 - ATA/ATAPI Command Set
+ * (ATA8-ACS)".
+ *
+ * If the SCT Command Transport feature set is not available, drive temperatures
+ * may be readable through SMART attributes. Since SMART attributes are not well
+ * defined, this method is only used as fallback mechanism.
+ *
+ * There are three SMART attributes which may report drive temperatures.
+ * Those are defined as follows (from
+ * http://www.cropel.com/library/smart-attribute-list.aspx).
+ *
+ * 190	Temperature	Temperature, monitored by a sensor somewhere inside
+ *			the drive. Raw value typicaly holds the actual
+ *			temperature (hexadecimal) in its rightmost two digits.
+ *
+ * 194	Temperature	Temperature, monitored by a sensor somewhere inside
+ *			the drive. Raw value typicaly holds the actual
+ *			temperature (hexadecimal) in its rightmost two digits.
+ *
+ * 231	Temperature	Temperature, monitored by a sensor somewhere inside
+ *			the drive. Raw value typicaly holds the actual
+ *			temperature (hexadecimal) in its rightmost two digits.
+ *
+ * Wikipedia defines attributes a bit differently.
+ *
+ * 190	Temperature	Value is equal to (100-temp. °C), allowing manufacturer
+ *	Difference or	to set a minimum threshold which corresponds to a
+ *	Airflow		maximum temperature. This also follows the convention of
+ *	Temperature	100 being a best-case value and lower values being
+ *			undesirable. However, some older drives may instead
+ *			report raw Temperature (identical to 0xC2) or
+ *			Temperature minus 50 here.
+ * 194	Temperature or	Indicates the device temperature, if the appropriate
+ *	Temperature	sensor is fitted. Lowest byte of the raw value contains
+ *	Celsius		the exact temperature value (Celsius degrees).
+ * 231	Life Left	Indicates the approximate SSD life left, in terms of
+ *	(SSDs) or	program/erase cycles or available reserved blocks.
+ *	Temperature	A normalized value of 100 represents a new drive, with
+ *			a threshold value at 10 indicating a need for
+ *			replacement. A value of 0 may mean that the drive is
+ *			operating in read-only mode to allow data recovery.
+ *			Previously (pre-2010) occasionally used for Drive
+ *			Temperature (more typically reported at 0xC2).
+ *
+ * Common denominator is that the first raw byte reports the temperature
+ * in degrees C on almost all drives. Some drives may report a fractional
+ * temperature in the second raw byte.
+ *
+ * Known exceptions (from libatasmart):
+ * - SAMSUNG SV0412H and SAMSUNG SV1204H) report the temperature in 10th
+ *   degrees C in the first two raw bytes.
+ * - A few Maxtor drives report an unknown or bad value in attribute 194.
+ * - Certain Apple SSD drives report an unknown value in attribute 190.
+ *   Only certain firmware versions are affected.
+ *
+ * Those exceptions affect older ATA drives and are currently ignored.
+ * Also, the second raw byte (possibly reporting the fractional temperature)
+ * is currently ignored.
+ *
+ * Many drives also report temperature limits in additional SMART data raw
+ * bytes. The format of those is not well defined and varies widely.
+ * The driver does not currently attempt to report those limits.
+ *
+ * According to data in smartmontools, attribute 231 is rarely used to report
+ * drive temperatures. At the same time, several drives report SSD life left
+ * in attribute 231, but do not support temperature sensors. For this reason,
+ * attribute 231 is currently ignored.
+ *
+ * Following above definitions, temperatures are reported as follows.
+ *   If SCT Command Transport is supported, it is used to read the
+ *   temperature and, if available, temperature limits.
+ * - Otherwise, if SMART attribute 194 is supported, it is used to read
+ *   the temperature.
+ * - Otherwise, if SMART attribute 190 is supported, it is used to read
+ *   the temperature.
+ */
+
+#include <linux/ata.h>
+#include <linux/bits.h>
+#include <linux/device.h>
+#include <linux/hwmon.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_proto.h>
+
+struct satatemp_data {
+	struct list_head list;		/* list of instantiated devices */
+	struct mutex lock;		/* protect data buffer accesses */
+	struct scsi_device *sdev;	/* SCSI device */
+	struct device *dev;		/* instantiating device */
+	struct device *hwdev;		/* hardware monitoring device */
+	u8 smartdata[ATA_SECT_SIZE];	/* local buffer */
+	int (*get_temp)(struct satatemp_data *st, u32 attr, long *val);
+	bool have_temp_lowest;		/* lowest temp in SCT status */
+	bool have_temp_highest;		/* highest temp in SCT status */
+	bool have_temp_min;		/* have min temp */
+	bool have_temp_max;		/* have max temp */
+	bool have_temp_lcrit;		/* have lower critical limit */
+	bool have_temp_crit;		/* have critical limit */
+	int temp_min;			/* min temp */
+	int temp_max;			/* max temp */
+	int temp_lcrit;			/* lower critical limit */
+	int temp_crit;			/* critical limit */
+};
+
+static LIST_HEAD(satatemp_devlist);
+
+#define ATA_MAX_SMART_ATTRS	30
+#define SMART_TEMP_PROP_190	190
+#define SMART_TEMP_PROP_194	194
+
+#define SCT_STATUS_REQ_ADDR	0xe0
+#define  SCT_STATUS_VERSION_LOW		0	/* log byte offsets */
+#define  SCT_STATUS_VERSION_HIGH	1
+#define  SCT_STATUS_TEMP		200
+#define  SCT_STATUS_TEMP_LOWEST		201
+#define  SCT_STATUS_TEMP_HIGHEST	202
+#define SCT_READ_LOG_ADDR	0xe1
+#define  SMART_READ_LOG			0xd5
+#define  SMART_WRITE_LOG		0xd6
+
+#define INVALID_TEMP		0x80
+
+#define temp_is_valid(temp)	((temp) != INVALID_TEMP)
+#define temp_from_sct(temp)	(((s8)(temp)) * 1000)
+
+static inline bool ata_id_smart_supported(u16 *id)
+{
+	return id[ATA_ID_COMMAND_SET_1] & BIT(0);
+}
+
+static inline bool ata_id_smart_enabled(u16 *id)
+{
+	return id[ATA_ID_CFS_ENABLE_1] & BIT(0);
+}
+
+static int satatemp_scsi_command(struct satatemp_data *st,
+				 u8 ata_command, u8 feature,
+				 u8 lba_low, u8 lba_mid, u8 lba_high)
+{
+	static u8 scsi_cmd[MAX_COMMAND_SIZE];
+	int data_dir;
+
+	memset(scsi_cmd, 0, sizeof(scsi_cmd));
+	scsi_cmd[0] = ATA_16;
+	if (ata_command == ATA_CMD_SMART && feature == SMART_WRITE_LOG) {
+		scsi_cmd[1] = (5 << 1);	/* PIO Data-out */
+		/*
+		 * No off.line or cc, write to dev, block count in sector count
+		 * field.
+		 */
+		scsi_cmd[2] = 0x06;
+		data_dir = DMA_TO_DEVICE;
+	} else {
+		scsi_cmd[1] = (4 << 1);	/* PIO Data-in */
+		/*
+		 * No off.line or cc, read from dev, block count in sector count
+		 * field.
+		 */
+		scsi_cmd[2] = 0x0e;
+		data_dir = DMA_FROM_DEVICE;
+	}
+	scsi_cmd[4] = feature;
+	scsi_cmd[6] = 1;	/* 1 sector */
+	scsi_cmd[8] = lba_low;
+	scsi_cmd[10] = lba_mid;
+	scsi_cmd[12] = lba_high;
+	scsi_cmd[14] = ata_command;
+
+	return scsi_execute_req(st->sdev, scsi_cmd, data_dir,
+				st->smartdata, ATA_SECT_SIZE, NULL, HZ, 5,
+				NULL);
+}
+
+static int satatemp_ata_command(struct satatemp_data *st, u8 feature, u8 select)
+{
+	return satatemp_scsi_command(st, ATA_CMD_SMART, feature, select,
+				     ATA_SMART_LBAM_PASS, ATA_SMART_LBAH_PASS);
+}
+
+static int satatemp_get_smarttemp(struct satatemp_data *st, u32 attr,
+				  long *temp)
+{
+	u8 *buf = st->smartdata;
+	bool have_temp = false;
+	u8 temp_raw;
+	u8 csum;
+	int err;
+	int i;
+
+	err = satatemp_ata_command(st, ATA_SMART_READ_VALUES, 0);
+	if (err)
+		return err;
+
+	/* Checksum the read value table */
+	csum = 0;
+	for (i = 0; i < ATA_SECT_SIZE; i++)
+		csum += buf[i];
+	if (csum) {
+		dev_dbg(&st->sdev->sdev_gendev,
+			"checksum error reading SMART values\n");
+		return -EIO;
+	}
+
+	for (i = 0; i < ATA_MAX_SMART_ATTRS; i++) {
+		u8 *attr = buf + i * 12;
+		int id = attr[2];
+
+		if (!id)
+			continue;
+
+		if (id == SMART_TEMP_PROP_190) {
+			temp_raw = attr[7];
+			have_temp = true;
+		}
+		if (id == SMART_TEMP_PROP_194) {
+			temp_raw = attr[7];
+			have_temp = true;
+			break;
+		}
+	}
+
+	if (have_temp) {
+		*temp = temp_raw * 1000;
+		return 0;
+	}
+
+	return -ENXIO;
+}
+
+static int satatemp_get_scttemp(struct satatemp_data *st, u32 attr, long *val)
+{
+	u8 *buf = st->smartdata;
+	int err;
+
+	err = satatemp_ata_command(st, SMART_READ_LOG, SCT_STATUS_REQ_ADDR);
+	if (err)
+		return err;
+	switch (attr) {
+	case hwmon_temp_input:
+		*val = temp_from_sct(buf[SCT_STATUS_TEMP]);
+		break;
+	case hwmon_temp_lowest:
+		*val = temp_from_sct(buf[SCT_STATUS_TEMP_LOWEST]);
+		break;
+	case hwmon_temp_highest:
+		*val = temp_from_sct(buf[SCT_STATUS_TEMP_HIGHEST]);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+	return err;
+}
+
+static int satatemp_identify(struct satatemp_data *st)
+{
+	struct scsi_device *sdev = st->sdev;
+	u8 *buf = st->smartdata;
+	bool is_ata, is_sata;
+	bool have_sct_data_table;
+	bool have_sct_temp;
+	bool have_smart;
+	bool have_sct;
+	u16 *ata_id;
+	u16 version;
+	long temp;
+	u8 *vpd;
+	int err;
+
+	/* bail out immediately if there is no inquiry data */
+	if (!sdev->inquiry || sdev->inquiry_len < 16)
+		return -ENODEV;
+
+	/*
+	 * Inquiry data sanity checks (per SAT-5):
+	 * - peripheral qualifier must be 0
+	 * - peripheral device type must be 0x0 (Direct access block device)
+	 * - SCSI Vendor ID is "ATA     "
+	 */
+	if (sdev->inquiry[0] ||
+	    strncmp(&sdev->inquiry[8], "ATA     ", 8))
+		return -ENODEV;
+
+	vpd = kzalloc(1024, GFP_KERNEL);
+	if (!vpd)
+		return -ENOMEM;
+
+	err = scsi_get_vpd_page(sdev, 0x89, vpd, 1024);
+	if (err) {
+		kfree(vpd);
+		return err;
+	}
+
+	/*
+	 * More sanity checks.
+	 * For VPD offsets and values see ANS Project INCITS 557,
+	 * "Information technology - SCSI / ATA Translation - 5 (SAT-5)".
+	 */
+	if (vpd[1] != 0x89 || vpd[2] != 0x02 || vpd[3] != 0x38 ||
+	    vpd[36] != 0x34 || vpd[56] != ATA_CMD_ID_ATA) {
+		kfree(vpd);
+		return -ENODEV;
+	}
+	ata_id = (u16 *)&vpd[60];
+	is_ata = ata_id_is_ata(ata_id);
+	is_sata = ata_id_is_sata(ata_id);
+	have_sct = ata_id_sct_supported(ata_id);
+	have_sct_data_table = ata_id_sct_data_tables(ata_id);
+	have_smart = ata_id_smart_supported(ata_id) &&
+				ata_id_smart_enabled(ata_id);
+
+	kfree(vpd);
+
+	/* bail out if this is not a SATA device */
+	if (!is_ata || !is_sata)
+		return -ENODEV;
+	if (!have_sct)
+		goto skip_sct;
+
+	err = satatemp_ata_command(st, SMART_READ_LOG, SCT_STATUS_REQ_ADDR);
+	if (err)
+		goto skip_sct;
+
+	version = (buf[SCT_STATUS_VERSION_HIGH] << 8) |
+		  buf[SCT_STATUS_VERSION_LOW];
+	if (version != 2 && version != 3)
+		goto skip_sct;
+
+	have_sct_temp = temp_is_valid(buf[SCT_STATUS_TEMP]);
+	if (!have_sct_temp)
+		goto skip_sct;
+
+	st->have_temp_lowest = temp_is_valid(buf[SCT_STATUS_TEMP_LOWEST]);
+	st->have_temp_highest = temp_is_valid(buf[SCT_STATUS_TEMP_HIGHEST]);
+
+	if (!have_sct_data_table)
+		goto skip_sct;
+
+	/* Request and read temperature history table */
+	memset(buf, '\0', sizeof(st->smartdata));
+	buf[0] = 5;	/* data table command */
+	buf[2] = 1;	/* read table */
+	buf[4] = 2;	/* temperature history table */
+
+	err = satatemp_ata_command(st, SMART_WRITE_LOG, SCT_STATUS_REQ_ADDR);
+	if (err)
+		goto skip_sct_data;
+
+	err = satatemp_ata_command(st, SMART_READ_LOG, SCT_READ_LOG_ADDR);
+	if (err)
+		goto skip_sct_data;
+
+	/*
+	 * Temperature limits per AT Attachment 8 -
+	 * ATA/ATAPI Command Set (ATA8-ACS)
+	 */
+	st->have_temp_max = temp_is_valid(buf[6]);
+	st->have_temp_crit = temp_is_valid(buf[7]);
+	st->have_temp_min = temp_is_valid(buf[8]);
+	st->have_temp_lcrit = temp_is_valid(buf[9]);
+
+	st->temp_max = temp_from_sct(buf[6]);
+	st->temp_crit = temp_from_sct(buf[7]);
+	st->temp_min = temp_from_sct(buf[8]);
+	st->temp_lcrit = temp_from_sct(buf[9]);
+
+skip_sct_data:
+	if (have_sct_temp) {
+		st->get_temp = satatemp_get_scttemp;
+		return 0;
+	}
+skip_sct:
+	if (!have_smart)
+		return -ENODEV;
+	st->get_temp = satatemp_get_smarttemp;
+	return satatemp_get_smarttemp(st, hwmon_temp_input, &temp);
+}
+
+static int satatemp_read(struct device *dev, enum hwmon_sensor_types type,
+			 u32 attr, int channel, long *val)
+{
+	struct satatemp_data *st = dev_get_drvdata(dev);
+	int err = 0;
+
+	if (type != hwmon_temp)
+		return -EINVAL;
+
+	switch (attr) {
+	case hwmon_temp_input:
+	case hwmon_temp_lowest:
+	case hwmon_temp_highest:
+		mutex_lock(&st->lock);
+		err = st->get_temp(st, attr, val);
+		mutex_unlock(&st->lock);
+		break;
+	case hwmon_temp_lcrit:
+		*val = st->temp_lcrit;
+		break;
+	case hwmon_temp_min:
+		*val = st->temp_min;
+		break;
+	case hwmon_temp_max:
+		*val = st->temp_max;
+		break;
+	case hwmon_temp_crit:
+		*val = st->temp_crit;
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+	return err;
+}
+
+static umode_t satatemp_is_visible(const void *data,
+				   enum hwmon_sensor_types type,
+				   u32 attr, int channel)
+{
+	const struct satatemp_data *st = data;
+
+	switch (type) {
+	case hwmon_temp:
+		switch (attr) {
+		case hwmon_temp_input:
+			return 0444;
+		case hwmon_temp_lowest:
+			if (st->have_temp_lowest)
+				return 0444;
+			break;
+		case hwmon_temp_highest:
+			if (st->have_temp_highest)
+				return 0444;
+			break;
+		case hwmon_temp_min:
+			if (st->have_temp_min)
+				return 0444;
+			break;
+		case hwmon_temp_max:
+			if (st->have_temp_max)
+				return 0444;
+			break;
+		case hwmon_temp_lcrit:
+			if (st->have_temp_lcrit)
+				return 0444;
+			break;
+		case hwmon_temp_crit:
+			if (st->have_temp_crit)
+				return 0444;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static const struct hwmon_channel_info *satatemp_info[] = {
+	HWMON_CHANNEL_INFO(chip,
+			   HWMON_C_REGISTER_TZ),
+	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT |
+			   HWMON_T_LOWEST | HWMON_T_HIGHEST |
+			   HWMON_T_MIN | HWMON_T_MAX |
+			   HWMON_T_LCRIT | HWMON_T_CRIT),
+	NULL
+};
+
+static const struct hwmon_ops satatemp_ops = {
+	.is_visible = satatemp_is_visible,
+	.read = satatemp_read,
+};
+
+static const struct hwmon_chip_info satatemp_chip_info = {
+	.ops = &satatemp_ops,
+	.info = satatemp_info,
+};
+
+/*
+ * The device argument points to sdev->sdev_dev. Its parent is
+ * sdev->sdev_gendev, which we can use to get the scsi_device pointer.
+ */
+static int satatemp_add(struct device *dev, struct class_interface *intf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev->parent);
+	struct satatemp_data *st;
+	int err;
+
+	st = kzalloc(sizeof(*st), GFP_KERNEL);
+	if (!st)
+		return -ENOMEM;
+
+	st->sdev = sdev;
+	st->dev = dev;
+	mutex_init(&st->lock);
+
+	if (satatemp_identify(st)) {
+		err = -ENODEV;
+		goto abort;
+	}
+
+	st->hwdev = hwmon_device_register_with_info(dev->parent, "satatemp",
+						    st, &satatemp_chip_info,
+						    NULL);
+	if (IS_ERR(st->hwdev)) {
+		err = PTR_ERR(st->hwdev);
+		goto abort;
+	}
+
+	list_add(&st->list, &satatemp_devlist);
+	return 0;
+
+abort:
+	kfree(st);
+	return err;
+}
+
+static void satatemp_remove(struct device *dev, struct class_interface *intf)
+{
+	struct satatemp_data *st, *tmp;
+
+	list_for_each_entry_safe(st, tmp, &satatemp_devlist, list) {
+		if (st->dev == dev) {
+			list_del(&st->list);
+			hwmon_device_unregister(st->hwdev);
+			kfree(st);
+			break;
+		}
+	}
+}
+
+static struct class_interface satatemp_interface = {
+	.add_dev = satatemp_add,
+	.remove_dev = satatemp_remove,
+};
+
+static int __init satatemp_init(void)
+{
+	return scsi_register_interface(&satatemp_interface);
+}
+
+static void __exit satatemp_exit(void)
+{
+	scsi_unregister_interface(&satatemp_interface);
+}
+
+module_init(satatemp_init);
+module_exit(satatemp_exit);
+
+MODULE_AUTHOR("Guenter Roeck <linus@roeck-us.net>");
+MODULE_DESCRIPTION("ATA temperature monitor");
+MODULE_LICENSE("GPL");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09  5:21 ` [PATCH 1/1] hwmon: Driver " Guenter Roeck
@ 2019-12-09  5:28   ` Randy Dunlap
  2019-12-09  6:00     ` Guenter Roeck
  2019-12-09 17:08   ` Bart Van Assche
  2019-12-12 22:33   ` Linus Walleij
  2 siblings, 1 reply; 20+ messages in thread
From: Randy Dunlap @ 2019-12-09  5:28 UTC (permalink / raw)
  To: Guenter Roeck, linux-hwmon
  Cc: Jean Delvare, linux-doc, linux-kernel, linux-scsi, linux-ide,
	Chris Healy, Linus Walleij

Hi,

On 12/8/19 9:21 PM, Guenter Roeck wrote:
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 13a6b4afb4b3..4c63eb7ba96a 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1346,6 +1346,16 @@ config SENSORS_RASPBERRYPI_HWMON
>  	  This driver can also be built as a module. If so, the module
>  	  will be called raspberrypi-hwmon.
>  
> +config SENSORS_SATATEMP
> +	tristate "SATA hard disk drives with temperature sensors"
> +	depends on SCSI && ATA
> +	help
> +	  If you say yes you get support for the temperature sensor on
> +	  SATA hard disk drives.
> +
> +	  This driver can also be built as a module. If so, the module
> +	  will be called smarttemp.

Makefile seems to say satatemp.

> +
>  config SENSORS_SHT15
>  	tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
>  	depends on GPIOLIB || COMPILE_TEST
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index 40c036ea45e6..fe55b8f76af9 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -148,6 +148,7 @@ obj-$(CONFIG_SENSORS_S3C)	+= s3c-hwmon.o
>  obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
>  obj-$(CONFIG_SENSORS_SCH5627)	+= sch5627.o
>  obj-$(CONFIG_SENSORS_SCH5636)	+= sch5636.o
> +obj-$(CONFIG_SENSORS_SATATEMP)	+= satatemp.o
>  obj-$(CONFIG_SENSORS_SHT15)	+= sht15.o
>  obj-$(CONFIG_SENSORS_SHT21)	+= sht21.o
>  obj-$(CONFIG_SENSORS_SHT3x)	+= sht3x.o


-- 
~Randy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09  5:28   ` Randy Dunlap
@ 2019-12-09  6:00     ` Guenter Roeck
  0 siblings, 0 replies; 20+ messages in thread
From: Guenter Roeck @ 2019-12-09  6:00 UTC (permalink / raw)
  To: Randy Dunlap, linux-hwmon
  Cc: Jean Delvare, linux-doc, linux-kernel, linux-scsi, linux-ide,
	Chris Healy, Linus Walleij

On 12/8/19 9:28 PM, Randy Dunlap wrote:
> Hi,
> 
> On 12/8/19 9:21 PM, Guenter Roeck wrote:
>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>> index 13a6b4afb4b3..4c63eb7ba96a 100644
>> --- a/drivers/hwmon/Kconfig
>> +++ b/drivers/hwmon/Kconfig
>> @@ -1346,6 +1346,16 @@ config SENSORS_RASPBERRYPI_HWMON
>>   	  This driver can also be built as a module. If so, the module
>>   	  will be called raspberrypi-hwmon.
>>   
>> +config SENSORS_SATATEMP
>> +	tristate "SATA hard disk drives with temperature sensors"
>> +	depends on SCSI && ATA
>> +	help
>> +	  If you say yes you get support for the temperature sensor on
>> +	  SATA hard disk drives.
>> +
>> +	  This driver can also be built as a module. If so, the module
>> +	  will be called smarttemp.
> 
> Makefile seems to say satatemp.
> 

Oops. Thanks for the note. Will fix.

Guenter

>> +
>>   config SENSORS_SHT15
>>   	tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
>>   	depends on GPIOLIB || COMPILE_TEST
>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>> index 40c036ea45e6..fe55b8f76af9 100644
>> --- a/drivers/hwmon/Makefile
>> +++ b/drivers/hwmon/Makefile
>> @@ -148,6 +148,7 @@ obj-$(CONFIG_SENSORS_S3C)	+= s3c-hwmon.o
>>   obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
>>   obj-$(CONFIG_SENSORS_SCH5627)	+= sch5627.o
>>   obj-$(CONFIG_SENSORS_SCH5636)	+= sch5636.o
>> +obj-$(CONFIG_SENSORS_SATATEMP)	+= satatemp.o
>>   obj-$(CONFIG_SENSORS_SHT15)	+= sht15.o
>>   obj-$(CONFIG_SENSORS_SHT21)	+= sht21.o
>>   obj-$(CONFIG_SENSORS_SHT3x)	+= sht3x.o
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09  5:21 ` [PATCH 1/1] hwmon: Driver " Guenter Roeck
  2019-12-09  5:28   ` Randy Dunlap
@ 2019-12-09 17:08   ` Bart Van Assche
  2019-12-09 19:20     ` Guenter Roeck
  2019-12-12 22:33   ` Linus Walleij
  2 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2019-12-09 17:08 UTC (permalink / raw)
  To: Guenter Roeck, linux-hwmon
  Cc: Jean Delvare, linux-doc, linux-kernel, linux-scsi, linux-ide,
	Chris Healy, Linus Walleij

On 12/8/19 9:21 PM, Guenter Roeck wrote:
> +static int satatemp_scsi_command(struct satatemp_data *st,
> +				 u8 ata_command, u8 feature,
> +				 u8 lba_low, u8 lba_mid, u8 lba_high)
> +{
> +	static u8 scsi_cmd[MAX_COMMAND_SIZE];
> +	int data_dir;

Declaring scsi_cmd[] static makes an otherwise thread-safe function
thread-unsafe. Has it been considered to allocate scsi_cmd[] on the stack?

> +	/*
> +	 * Inquiry data sanity checks (per SAT-5):
> +	 * - peripheral qualifier must be 0
> +	 * - peripheral device type must be 0x0 (Direct access block device)
> +	 * - SCSI Vendor ID is "ATA     "
> +	 */
> +	if (sdev->inquiry[0] ||
> +	    strncmp(&sdev->inquiry[8], "ATA     ", 8))
> +		return -ENODEV;

It's possible that we will need a quirk mechanism to disable temperature
monitoring for certain ATA devices. Has it been considered to make
scsi_add_lun() set a flag that indicates whether or not temperatures
should be monitored and to check that flag from inside this function?
I'm asking this because an identical strncmp() check exists in
scsi_add_lun().

> +static int satatemp_read(struct device *dev, enum hwmon_sensor_types type,
> +			 u32 attr, int channel, long *val)
> +{
> +	struct satatemp_data *st = dev_get_drvdata(dev);

Which device does 'dev' represent? What guarantees that the drvdata
won't be used for another purpose, e.g. by the SCSI core?

> +/*
> + * The device argument points to sdev->sdev_dev. Its parent is
> + * sdev->sdev_gendev, which we can use to get the scsi_device pointer.
> + */
> +static int satatemp_add(struct device *dev, struct class_interface *intf)
> +{
> +	struct scsi_device *sdev = to_scsi_device(dev->parent);
> +	struct satatemp_data *st;
> +	int err;
> +
> +	st = kzalloc(sizeof(*st), GFP_KERNEL);
> +	if (!st)
> +		return -ENOMEM;
> +
> +	st->sdev = sdev;
> +	st->dev = dev;
> +	mutex_init(&st->lock);
> +
> +	if (satatemp_identify(st)) {
> +		err = -ENODEV;
> +		goto abort;
> +	}
> +
> +	st->hwdev = hwmon_device_register_with_info(dev->parent, "satatemp",
> +						    st, &satatemp_chip_info,
> +						    NULL);
> +	if (IS_ERR(st->hwdev)) {
> +		err = PTR_ERR(st->hwdev);
> +		goto abort;
> +	}
> +
> +	list_add(&st->list, &satatemp_devlist);
> +	return 0;
> +
> +abort:
> +	kfree(st);
> +	return err;
> +}

How much does synchronously submitting SCSI commands from inside the
device probing call back slow down SCSI device discovery? What is the
impact of this code on systems with a large number of ATA devices?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09 17:08   ` Bart Van Assche
@ 2019-12-09 19:20     ` Guenter Roeck
  2019-12-10 16:10       ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2019-12-09 19:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-hwmon, Jean Delvare, linux-doc, linux-kernel, linux-scsi,
	linux-ide, Chris Healy, Linus Walleij

On Mon, Dec 09, 2019 at 09:08:13AM -0800, Bart Van Assche wrote:
> On 12/8/19 9:21 PM, Guenter Roeck wrote:
> > +static int satatemp_scsi_command(struct satatemp_data *st,
> > +				 u8 ata_command, u8 feature,
> > +				 u8 lba_low, u8 lba_mid, u8 lba_high)
> > +{
> > +	static u8 scsi_cmd[MAX_COMMAND_SIZE];
> > +	int data_dir;
> 
> Declaring scsi_cmd[] static makes an otherwise thread-safe function
> thread-unsafe. Has it been considered to allocate scsi_cmd[] on the stack?
> 
No idea why I declared that variable 'static'. I removed it.

> > +	/*
> > +	 * Inquiry data sanity checks (per SAT-5):
> > +	 * - peripheral qualifier must be 0
> > +	 * - peripheral device type must be 0x0 (Direct access block device)
> > +	 * - SCSI Vendor ID is "ATA     "
> > +	 */
> > +	if (sdev->inquiry[0] ||
> > +	    strncmp(&sdev->inquiry[8], "ATA     ", 8))
> > +		return -ENODEV;
> 
> It's possible that we will need a quirk mechanism to disable temperature
> monitoring for certain ATA devices. Has it been considered to make
> scsi_add_lun() set a flag that indicates whether or not temperatures
> should be monitored and to check that flag from inside this function?
> I'm asking this because an identical strncmp() check exists in
> scsi_add_lun().
> 
I am aware that we may at some point need quirks for some SATA devices.
From my perspective, the place for such quirks would be this driver,
possibly using the ATA ID string in the inquiry data structure and,
if needed, the firmware revision as identifier.

> > +static int satatemp_read(struct device *dev, enum hwmon_sensor_types type,
> > +			 u32 attr, int channel, long *val)
> > +{
> > +	struct satatemp_data *st = dev_get_drvdata(dev);
> 
> Which device does 'dev' represent? What guarantees that the drvdata
> won't be used for another purpose, e.g. by the SCSI core?
> 
'dev' is the hardware monitoring device. The driver data is set in
hwmon_device_register_with_info(); it is the third argument of that
function. It won't be used outside the context of this driver.

> > +/*
> > + * The device argument points to sdev->sdev_dev. Its parent is
> > + * sdev->sdev_gendev, which we can use to get the scsi_device pointer.
> > + */
> > +static int satatemp_add(struct device *dev, struct class_interface *intf)
> > +{
> > +	struct scsi_device *sdev = to_scsi_device(dev->parent);
> > +	struct satatemp_data *st;
> > +	int err;
> > +
> > +	st = kzalloc(sizeof(*st), GFP_KERNEL);
> > +	if (!st)
> > +		return -ENOMEM;
> > +
> > +	st->sdev = sdev;
> > +	st->dev = dev;
> > +	mutex_init(&st->lock);
> > +
> > +	if (satatemp_identify(st)) {
> > +		err = -ENODEV;
> > +		goto abort;
> > +	}
> > +
> > +	st->hwdev = hwmon_device_register_with_info(dev->parent, "satatemp",
> > +						    st, &satatemp_chip_info,
> > +						    NULL);
> > +	if (IS_ERR(st->hwdev)) {
> > +		err = PTR_ERR(st->hwdev);
> > +		goto abort;
> > +	}
> > +
> > +	list_add(&st->list, &satatemp_devlist);
> > +	return 0;
> > +
> > +abort:
> > +	kfree(st);
> > +	return err;
> > +}
> 
> How much does synchronously submitting SCSI commands from inside the
> device probing call back slow down SCSI device discovery? What is the
> impact of this code on systems with a large number of ATA devices?
> 

Interesting question. In general, any SCSI commands would only be executed
for SATA drives since the very first check in satatemp_identify() uses
sdev->inquiriy and aborts if the drive in question is not an ATA drive.
When connected to SATA drives, I measured the execution time of
satatemp_identify() to be between ~900 uS and 1,700 uS on a system with
Ryzen 3900 CPU.

In more detail:
- Time to read VPD page: ~10-20 uS
- Time to execute SMART_READ_LOG/SCT_STATUS_REQ_ADDR: ~140-150 uS
- Time to execute SMART_WRITE_LOG/SCT_STATUS_REQ_ADDR: ~600-1,500 uS
- Time to execute SMART_READ_LOG/SCT_READ_LOG_ADDR: ~100-130 uS

Does that answer your question ?

Please note that I think that this is irrelevant in this context.
The driver is only instantiated if loaded explicitly, so whoever uses it
will be in a position to decide if the benefit of using it will outweigh
its cost.

If instantiation time ever becomes a real problem, for example if someone
with a large number of SATA drives in a system wants to use the driver
and is concerned about instantiation time, we can make the second part
of its registration (ie everything after identifying SATA drives)
asynchronous. That would, however, add a substantial amount of complexity
to the driver, and we should only do it if it is really warranted.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09 19:20     ` Guenter Roeck
@ 2019-12-10 16:10       ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2019-12-10 16:10 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-hwmon, Jean Delvare, linux-doc, linux-kernel, linux-scsi,
	linux-ide, Chris Healy, Linus Walleij

On 12/9/19 2:20 PM, Guenter Roeck wrote:
> On Mon, Dec 09, 2019 at 09:08:13AM -0800, Bart Van Assche wrote:
>> How much does synchronously submitting SCSI commands from inside the
>> device probing call back slow down SCSI device discovery? What is the
>> impact of this code on systems with a large number of ATA devices?
> 
> Interesting question. In general, any SCSI commands would only be executed
> for SATA drives since the very first check in satatemp_identify() uses
> sdev->inquiriy and aborts if the drive in question is not an ATA drive.
> When connected to SATA drives, I measured the execution time of
> satatemp_identify() to be between ~900 uS and 1,700 uS on a system with
> Ryzen 3900 CPU.
> 
> In more detail:
> - Time to read VPD page: ~10-20 uS
> - Time to execute SMART_READ_LOG/SCT_STATUS_REQ_ADDR: ~140-150 uS
> - Time to execute SMART_WRITE_LOG/SCT_STATUS_REQ_ADDR: ~600-1,500 uS
> - Time to execute SMART_READ_LOG/SCT_READ_LOG_ADDR: ~100-130 uS
> 
> Does that answer your question ?
> 
> Please note that I think that this is irrelevant in this context.
> The driver is only instantiated if loaded explicitly, so whoever uses it
> will be in a position to decide if the benefit of using it will outweigh
> its cost.
> 
> If instantiation time ever becomes a real problem, for example if someone
> with a large number of SATA drives in a system wants to use the driver
> and is concerned about instantiation time, we can make the second part
> of its registration (ie everything after identifying SATA drives)
> asynchronous. That would, however, add a substantial amount of complexity
> to the driver, and we should only do it if it is really warranted.

Hi Guenter,

Thank you for having answered my question in great detail. I think this
overhead is low enough to be acceptable.

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-09  5:21 [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives Guenter Roeck
  2019-12-09  5:21 ` [PATCH 1/1] hwmon: Driver " Guenter Roeck
@ 2019-12-11  4:08 ` Martin K. Petersen
  2019-12-11  5:57   ` Guenter Roeck
  1 sibling, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2019-12-11  4:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-hwmon, Jean Delvare, linux-doc, linux-kernel, linux-scsi,
	linux-ide


Hi Guenter,

> The most recent attempt was [1] by Linus Walleij. It went through a total
> of seven iterations. At the end, it was rejected for a number of reasons;
> see the provided link for details. This implementation resides in the
> SCSI core. It originally resided in libata but was moved to SCSI per
> maintainer request, where it was ultimately rejected.

While I am sure I come across as a curmudgeon, regressions is a major
concern for me. That, and making sure we pick the right architecture. I
thought we were making good progress in that department when Linus
abandoned the effort.

> The feedback on this approach suggests to use the SCSI Temperature log
> page [0x0d] as means to access drive temperature information. It is
> unknown if this is implemented in any real SCSI drive.

Almost every SCSI drive has it.

> The feedback also suggests to obtain temperature from ATA drives,
> convert it into the SCSI temperature log page in libata-scsi, and to
> use that information in a hardware monitoring driver. The format and
> method to do this is documented in [3]. This is not currently
> implemented in the Linux kernel.

Correct, but I have no qualms over exporting the SCSI temperature log
page. The devices that export that page are generally well-behaved.

My concerns are wrt. identifying whether SMART data is available for
USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
controllers that hide the real drives in various ways).

I am not sure why the SCSI temperature log page parsing would be
complex. I will have to go check smartmontools to see what that is all
about. The spec is as simple as can be.

Anyway. I think the overall approach wrt. SCT and falling back to
well-known SMART fields is reasonably sane and fine for libata. But I
don't understand the pushback wrt. using the SCSI temperature log page
as a conduit. I think it would be fine if this worked out of the box for
both SCSI and ATA drives.

The elephant in the room remains USB. And coming up with a way we can
reliably detect whether it is safe to start poking at the device to
discover if SMART is provided. If we eventually want to pursue USB, I
think your heuristic stuff needs to be a library that can be leveraged
by both libata and USB. But that doesn't have to be part of the initial
effort.

And finally, my concerns wrt. reacting to bad sensors remain. Not too
familiar with hwmon, but I would still like any actions based on
reported temperatures to be under user control and not the kernel.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-11  4:08 ` [PATCH 0/1] Summary: hwmon driver " Martin K. Petersen
@ 2019-12-11  5:57   ` Guenter Roeck
  2019-12-17  2:35     ` Martin K. Petersen
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2019-12-11  5:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-hwmon, Jean Delvare, linux-doc, linux-kernel, linux-scsi,
	linux-ide

Hi Martin,

On 12/10/19 8:08 PM, Martin K. Petersen wrote:
> 
> Hi Guenter,
> 
>> The most recent attempt was [1] by Linus Walleij. It went through a total
>> of seven iterations. At the end, it was rejected for a number of reasons;
>> see the provided link for details. This implementation resides in the
>> SCSI core. It originally resided in libata but was moved to SCSI per
>> maintainer request, where it was ultimately rejected.
> 
> While I am sure I come across as a curmudgeon, regressions is a major
> concern for me. That, and making sure we pick the right architecture. I
> thought we were making good progress in that department when Linus
> abandoned the effort.
> 

If anything, I am surprised that he did not give up earlier. Personally
I did not see a path to success after v7 of the patch set was rejected.

I also no longer believe that temperature monitoring of SATA drives
should be implemented within the ATA or SCSI subsystem. I came to the
conclusion that it is much better suited as separate hardware monitoring
driver. As separate driver, its instantiation is in full control of
the user. If it causes trouble (or, as mentioned separately, if it adds
too much instantiation time, or if it is considered to be too large),
it can simply be disabled in a given system by blacklisting it (or,
rather, by not explicitly loading it in the first place). With that,
there is no real compatibility concern. If and when drives are detected
which report bad information, such drives can be added to a blacklist
without impact on the core SCSI or ATA code. Until that happens, not
loading the driver solves the problem on any affected system.

>> The feedback on this approach suggests to use the SCSI Temperature log
>> page [0x0d] as means to access drive temperature information. It is
>> unknown if this is implemented in any real SCSI drive.
> 
> Almost every SCSI drive has it.
> 
Good to hear.

>> The feedback also suggests to obtain temperature from ATA drives,
>> convert it into the SCSI temperature log page in libata-scsi, and to
>> use that information in a hardware monitoring driver. The format and
>> method to do this is documented in [3]. This is not currently
>> implemented in the Linux kernel.
> 
> Correct, but I have no qualms over exporting the SCSI temperature log
> page. The devices that export that page are generally well-behaved.
> 
Also good to hear. However, for my part, I have no means to test such
code since I don't have any SCSI drives.

> My concerns are wrt. identifying whether SMART data is available for
> USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
> controllers that hide the real drives in various ways).
> 

The one USB/UAS connected SATA drive I have (a WD passport) reports
itself as "WD      ", not as "ATA     ". I would expect other drives
to do the same. That drive reports (via smartctl) that it supports
both SCT and SMART data. It doesn't report temperatures through SCT,
but it does report the drive temperature with SMART attribute 194.
I did not attempt to add support for this and similar drives since
I don't know if I can reliably detect it. The potential benefit
compared to the risk seemed too low (we would be getting into
possible regression space) for me to try. Such code (effectively
it boils down to relaxing SATA drive detection) can always be added
at a later time.

> I am not sure why the SCSI temperature log page parsing would be
> complex. I will have to go check smartmontools to see what that is all

Not as much the parsing, but detection if the information is there.

> about. The spec is as simple as can be.
> 

Possibly. I personally also find it quite vague. It is definitely not
something I would want to try to implement without ability to see how
the data actually looks like as reported by a real drive, and without
ability to test the code.

> Anyway. I think the overall approach wrt. SCT and falling back to
> well-known SMART fields is reasonably sane and fine for libata. But I
> don't understand the pushback wrt. using the SCSI temperature log page
> as a conduit. I think it would be fine if this worked out of the box for

This is not a pushback per se. It is simply a matter of ability (or lack
of it) to test any such code.

Regarding "conduit", I assume you mean converting SATA/SCT information
into SCST temperature pages and reporting temperature purely based
on those. I personally think that this would be the wrong approach:
It would effectively require code in the ATA core which is not really
needed there. This would bloat the ATA code with no real advantage.
In my opinion, available temperature information should be interpreted
where it is needed, and only there, not in several places. I see that
as much less risky and error prone than spreading the code to multiple
places.

> both SCSI and ATA drives.
> 
The elegance of my approach is that adding support for reading temperatures
from SCSI drives (or, for that matter, USB/UAS drives) would be
straightforward. All one would need to do is to implement the necessary
detection code as well as a function to actually read the information
from the drive. This can be done at any time, and, again, it should be
done by someone with the ability to test the code.

> The elephant in the room remains USB. And coming up with a way we can
> reliably detect whether it is safe to start poking at the device to
> discover if SMART is provided. If we eventually want to pursue USB,  > think your heuristic stuff needs to be a library that can be leveraged
> by both libata and USB. But that doesn't have to be part of the initial
> effort.
> 
> And finally, my concerns wrt. reacting to bad sensors remain. Not too
> familiar with hwmon, but I would still like any actions based on
> reported temperatures to be under user control and not the kernel.
> 
All sensors can report bad information, and quite often they do.
This is actually quite normal in any given system. That doesn't mean
that the available (connected) sensors should be ignored.

Also, when it comes to actions, the one subsystem performing any actions
in the kernel based on temperature sensor information is the thermal
subsystem, and that is on purpose implemented in the kernel.
The hardware monitoring subsystem, on its own, is purely passive
and only reports sensor information; it does not act on it. Any action
will either be done by userspace (eg with fancontrol) or by the thermal
subsystem.

Overall, I understand the desire to also support temperature reporting
for SCSI and USB/UAS drives. As hardware monitoring maintainer, I'd
be happy to accept patches implementing that support. However, I don't
see this as immediately necessary, and I would want to have some
reassurance that such code is well tested and doesn't cause any
regressions (especially since concerns about possible regressions were
mentioned several times in the context of the previous submissions).

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-09  5:21 ` [PATCH 1/1] hwmon: Driver " Guenter Roeck
  2019-12-09  5:28   ` Randy Dunlap
  2019-12-09 17:08   ` Bart Van Assche
@ 2019-12-12 22:33   ` Linus Walleij
  2019-12-12 23:21     ` Martin K. Petersen
  2 siblings, 1 reply; 20+ messages in thread
From: Linus Walleij @ 2019-12-12 22:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-hwmon, Jean Delvare, Linux Doc Mailing List, linux-kernel,
	linux-scsi, linux-ide, Chris Healy

Hi Guenther,

needless to say I am a big fan of this patch, so:
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

It's a nice addition with the SCT command, I never
figured that part out. Also nice how you register the
scsi class interface I never saw that before, it makes it
a very neat plug-in.

The comments are more discussion points on how to
(maybe) take it further after this.

On Mon, Dec 9, 2019 at 6:21 AM Guenter Roeck <linux@roeck-us.net> wrote:

> If the drive supports SCT transport and reports temperature limits,
> those are reported as well.

If I understand the patch correctly it will prefer to use
SCT transport to read the temperature, and only fall back
to the SMART attributes if this is not working, so I guess the
commit message should state the heuristics used here.

> +++ b/Documentation/hwmon/satatemp.rst

Excellent doc.

> + * If the SCT Command Transport feature set is not available, drive temperatures
> + * may be readable through SMART attributes. Since SMART attributes are not well
> + * defined, this method is only used as fallback mechanism.

So this maybe cut/paste to commit message as well so people understand
the commit fully.

> +       for (i = 0; i < ATA_MAX_SMART_ATTRS; i++) {
> +               u8 *attr = buf + i * 12;
> +               int id = attr[2];
> +
> +               if (!id)
> +                       continue;
> +
> +               if (id == SMART_TEMP_PROP_190) {
> +                       temp_raw = attr[7];
> +                       have_temp = true;
> +               }
> +               if (id == SMART_TEMP_PROP_194) {
> +                       temp_raw = attr[7];
> +                       have_temp = true;
> +                       break;
> +               }
> +       }
> +
> +       if (have_temp) {
> +               *temp = temp_raw * 1000;
> +               return 0;
> +       }

This looks like it will work fine, I had some heuristics to determine
the vendor-specific max/min temperatures in property 194 in my
patch, but I can certainly add that back in later.

> +static const struct hwmon_channel_info *satatemp_info[] = {
> +       HWMON_CHANNEL_INFO(chip,
> +                          HWMON_C_REGISTER_TZ),

I suppose this means I will also have a temperature zone as
I want :D

When I read the comments from the previous thread I got the
impression the SCSI people wanted me to use something like
the SCT transport and the hook in the SMART thing in the
libata back-end specifically for [S]ATA in response to the
SCT read log command.

In  drivers/ata/libata-scsi.c I suppose.

I guess one thing doesn't exclude the other though.

We can attempt to move the code for [S]ATA over to libata
at some point and respond to the SCT read log command
from within the library in that case.

I don't understand if that means the SCT read log also works
on some SCSI drives, or if it is just a slot-in thing for
ATA translation that has no meaning on SCSI drives.
But that can be resolved by people who want to use this
for SCSI drives and not by us.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-12 22:33   ` Linus Walleij
@ 2019-12-12 23:21     ` Martin K. Petersen
  2019-12-13  4:18       ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2019-12-12 23:21 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Guenter Roeck, linux-hwmon, Jean Delvare, Linux Doc Mailing List,
	linux-kernel, linux-scsi, linux-ide, Chris Healy


Linus,

> It's a nice addition with the SCT command, I never figured that part
> out. Also nice how you register the scsi class interface I never saw
> that before, it makes it a very neat plug-in.

Yep, I agree that the patch looks pretty good in general. There are just
a few wrinkles in the detection heuristics I would like to tweak. More
on that later.

Yesterday I added support for the SCSI temperature log page and am
working through some kinks wrt. making this work for USB as well.

> When I read the comments from the previous thread I got the impression
> the SCSI people wanted me to use something like the SCT transport and
> the hook in the SMART thing in the libata back-end specifically for
> [S]ATA in response to the SCT read log command.

Our recommendation was for libata-scsi.c to export the SCSI temperature
log page, just like we do for all the other ATA parameters.

However, in tinkering with this the last couple of days, I find myself
torn on the subject. For two reasons. First of all, there is no 1:1
sensor mapping unless you implement the slightly more complex
environmental log page. Which isn't a big deal, except out of the
hundred or so SCSI devices I have here there isn't a single one that
supports it it. So in practice this interface would probably only exist
for the purpose of the libata SATL.

The other reason the libata approach is slightly less attractive is that
we need all the same SMART parsing for USB as well. So while it is
cleaner to hide everything ATA in libata, the reality of USB-ATA bridges
gets in the way. That is why I previously suggested having a libsmart or
similar with those common bits.

Anyway, based on what I've worked on today, I'm not sure that libata is
necessarily the way to go. Sorry about giving bad advice! We've
successfully implemented translations for everything else in libata over
the years without too much trouble. And it's not really that the
translation is bad. It's more the need to support it for USB as well
that makes things clunky.

> I don't understand if that means the SCT read log also works
> on some SCSI drives, or if it is just a slot-in thing for
> ATA translation that has no meaning on SCSI drives.

It's an ATA command.

One concern I have is wrt. to sensor naming. Maybe my /usr/bin/sensors
command is too old. But it's pretty hopeless to get sensor readings for
100 drives without being able to tell which sensor is for which
device. Haven't looked into that yet. The links exist in
/sys/class/hwmon that would allow vendor/model/serial to be queried.

Oh, and another issue. While technically legal according to the spec, I
am not sure it's a good idea to export a sensor per scsi_device. I have
moved things to scsi_target instead to avoid having bazillions of
sensors show up. Multi-actuator drives are already shipping.

If I recall correctly, though, I seem to recall that you had some sort
of multi-LUN external disk box that warranted you working on this in the
first place. Is that correct? Can you refresh my memory?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-12 23:21     ` Martin K. Petersen
@ 2019-12-13  4:18       ` Guenter Roeck
  2019-12-17  2:47         ` Martin K. Petersen
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2019-12-13  4:18 UTC (permalink / raw)
  To: Martin K. Petersen, Linus Walleij
  Cc: linux-hwmon, Jean Delvare, Linux Doc Mailing List, linux-kernel,
	linux-scsi, linux-ide, Chris Healy

On 12/12/19 3:21 PM, Martin K. Petersen wrote:
> 
> Linus,
> 
>> It's a nice addition with the SCT command, I never figured that part
>> out. Also nice how you register the scsi class interface I never saw
>> that before, it makes it a very neat plug-in.
> 
> Yep, I agree that the patch looks pretty good in general. There are just
> a few wrinkles in the detection heuristics I would like to tweak. More
> on that later.
> 
> Yesterday I added support for the SCSI temperature log page and am
> working through some kinks wrt. making this work for USB as well.
> 
>> When I read the comments from the previous thread I got the impression
>> the SCSI people wanted me to use something like the SCT transport and
>> the hook in the SMART thing in the libata back-end specifically for
>> [S]ATA in response to the SCT read log command.
> 
> Our recommendation was for libata-scsi.c to export the SCSI temperature
> log page, just like we do for all the other ATA parameters.
> 
> However, in tinkering with this the last couple of days, I find myself
> torn on the subject. For two reasons. First of all, there is no 1:1
> sensor mapping unless you implement the slightly more complex
> environmental log page. Which isn't a big deal, except out of the
> hundred or so SCSI devices I have here there isn't a single one that
> supports it it. So in practice this interface would probably only exist
> for the purpose of the libata SATL.
> 
> The other reason the libata approach is slightly less attractive is that
> we need all the same SMART parsing for USB as well. So while it is
> cleaner to hide everything ATA in libata, the reality of USB-ATA bridges
> gets in the way. That is why I previously suggested having a libsmart or
> similar with those common bits.
> 
> Anyway, based on what I've worked on today, I'm not sure that libata is
> necessarily the way to go. Sorry about giving bad advice! We've
> successfully implemented translations for everything else in libata over
> the years without too much trouble. And it's not really that the
> translation is bad. It's more the need to support it for USB as well
> that makes things clunky.
> 
>> I don't understand if that means the SCT read log also works
>> on some SCSI drives, or if it is just a slot-in thing for
>> ATA translation that has no meaning on SCSI drives.
> 
> It's an ATA command.
> 
> One concern I have is wrt. to sensor naming. Maybe my /usr/bin/sensors
> command is too old. But it's pretty hopeless to get sensor readings for

You'll need the command (and libsensors) from the lm-sensors package version
3.5 or later for it to recognize SCSI/ATA drives.

> 100 drives without being able to tell which sensor is for which
> device. Haven't looked into that yet. The links exist in
> /sys/class/hwmon that would allow vendor/model/serial to be queried.
> 

There is a device/ subdirectory which points to that information.
Is that what you are looking for ? "sensors" displays something
like satatemp-scsi-5-0, which matches sd 5:0:0:0:

> Oh, and another issue. While technically legal according to the spec, I
> am not sure it's a good idea to export a sensor per scsi_device. I have
> moved things to scsi_target instead to avoid having bazillions of
> sensors show up. Multi-actuator drives are already shipping.
> 

Not sure I understand what you mean with 'bazillions of sensors' and
'sensor per scsi_device'. Can you elaborate ? I see one sensor per drive,
which is what I would expect.

Thanks,
Guenter

> If I recall correctly, though, I seem to recall that you had some sort
> of multi-LUN external disk box that warranted you working on this in the
> first place. Is that correct? Can you refresh my memory?
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-11  5:57   ` Guenter Roeck
@ 2019-12-17  2:35     ` Martin K. Petersen
  2019-12-17  3:57       ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2019-12-17  2:35 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, linux-doc,
	linux-kernel, linux-scsi, linux-ide


Guenter,

> If and when drives are detected which report bad information, such
> drives can be added to a blacklist without impact on the core SCSI or
> ATA code. Until that happens, not loading the driver solves the
> problem on any affected system.

My only concern with that is that we'll have blacklisting several
places. We already have ATA and SCSI blacklists. If we now add a third
place, that's going to be a maintenance nightmare.

More on that below.

>> My concerns are wrt. identifying whether SMART data is available for
>> USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
>> controllers that hide the real drives in various ways).

OK, so I spent my weekend tinkering with 15+ years of accumulated USB
devices. And my conclusion is that no, we can't in any sensible manner,
support USB storage monitoring in the kernel. There is no heuristic that
I can find that identifies that "this is a hard drive or an SSD and
attempting one of the various SMART methods may be safe". As opposed to
"this is a USB key that's likely to lock up if you try". And that's
ignoring the drives with USB-ATA bridges that I managed to wedge in my
attempt at sending down commands.

Even smartmontools is failing to work on a huge part of my vintage
collection.  Thanks to a wide variety of bridges with random, custom
interfaces.

So my stance on all this is that I'm fine with your general approach for
ATA. I will post a patch adding the required bits for SCSI. And if a
device does not implement either of the two standard methods, people
should use smartmontools.

Wrt. name, since I've added SCSI support, satatemp is a bit of a
misnomer. drivetemp, maybe? No particular preference.

> The one USB/UAS connected SATA drive I have (a WD passport) reports
> itself as "WD      ", not as "ATA     ". I would expect other drives
> to do the same.

Yes. Most vendors are too fond of their brand names to put "ATA" in
there. So my suggestion is to relax the heuristic to trigger on the ATA
Information VPD page only and ignore the name.

Also, there are some devices that will lock up the way you access that
VPD page. So a tweak is also required there.

To avoid the multiple blacklists and heuristic collections my suggestion
is that I introduce a helper function in SCSI (based on what I did in
the disk driver) that can be called to identify whether something is an
ATA device. And then the hwmon driver can call that and we can keep the
heuristics in one place.

If a device turns out to be problematic wrt. getting the ATA VPD for the
purpose of SMART, for instance, it will also need to be blacklisted for
other reasons in SCSI. So I would really like to keep the heuristics in
one place.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-13  4:18       ` Guenter Roeck
@ 2019-12-17  2:47         ` Martin K. Petersen
  2019-12-17  4:20           ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2019-12-17  2:47 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Martin K. Petersen, Linus Walleij, linux-hwmon, Jean Delvare,
	Linux Doc Mailing List, linux-kernel, linux-scsi, linux-ide,
	Chris Healy


Guenter,

> Not sure I understand what you mean with 'bazillions of sensors' and
> 'sensor per scsi_device'. Can you elaborate ? I see one sensor per
> drive, which is what I would expect.

Yes, but for storage arrays, hanging off of struct scsi_device means you
would get a sensor for each volume you create. Even though you
presumably only have one physical "box" to monitor (ignoring for a
moment that the drives inside the box may have their own sensors that
may or may not be visible to the host).

Also, multi-actuator disk drives are shipping. They present themselves
to the host as a target with multiple LUNs. Once again you'll probably
have one temperature sensor for the physical drive but many virtual
disks being presented to the OS. So you'd end up with for instance 4
sensors in hwmon even though there physically only is one.

It's a tough call since there may be hardware configurations where
distinct per-LUN temperature is valid (some quirky JBODs represent disk
drives as different LUNs instead of different targets, for instance).

How expensive will it be to have - say - 100 hwmon sensors instantiated
for a drive tray?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-17  2:35     ` Martin K. Petersen
@ 2019-12-17  3:57       ` Guenter Roeck
  2019-12-17  5:50         ` Damien Le Moal
  2019-12-18  3:42         ` Martin K. Petersen
  0 siblings, 2 replies; 20+ messages in thread
From: Guenter Roeck @ 2019-12-17  3:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-hwmon, Jean Delvare, linux-doc, linux-kernel, linux-scsi,
	linux-ide

On 12/16/19 6:35 PM, Martin K. Petersen wrote:
> 
> Guenter,
> 
>> If and when drives are detected which report bad information, such
>> drives can be added to a blacklist without impact on the core SCSI or
>> ATA code. Until that happens, not loading the driver solves the
>> problem on any affected system.
> 
> My only concern with that is that we'll have blacklisting several
> places. We already have ATA and SCSI blacklists. If we now add a third
> place, that's going to be a maintenance nightmare.
> 
> More on that below.
> 
>>> My concerns are wrt. identifying whether SMART data is available for
>>> USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
>>> controllers that hide the real drives in various ways).
> 
> OK, so I spent my weekend tinkering with 15+ years of accumulated USB
> devices. And my conclusion is that no, we can't in any sensible manner,
> support USB storage monitoring in the kernel. There is no heuristic that
> I can find that identifies that "this is a hard drive or an SSD and
> attempting one of the various SMART methods may be safe". As opposed to
> "this is a USB key that's likely to lock up if you try". And that's
> ignoring the drives with USB-ATA bridges that I managed to wedge in my
> attempt at sending down commands.
> 
> Even smartmontools is failing to work on a huge part of my vintage
> collection.  Thanks to a wide variety of bridges with random, custom
> interfaces.
> 
> So my stance on all this is that I'm fine with your general approach for
> ATA. I will post a patch adding the required bits for SCSI. And if a
> device does not implement either of the two standard methods, people
> should use smartmontools.
> 
> Wrt. name, since I've added SCSI support, satatemp is a bit of a
> misnomer. drivetemp, maybe? No particular preference.
> 
Agreed, if we extend this to SCSI, satatemp is less than perfect.
drivetemp ? disktemp ? I am open to suggestions, with maybe a small
personal preference for disktemp out of those two.

>> The one USB/UAS connected SATA drive I have (a WD passport) reports
>> itself as "WD      ", not as "ATA     ". I would expect other drives
>> to do the same.
> 
> Yes. Most vendors are too fond of their brand names to put "ATA" in
> there. So my suggestion is to relax the heuristic to trigger on the ATA
> Information VPD page only and ignore the name.
> 

Fine with me. I wanted to be as restrictive as possible.

> Also, there are some devices that will lock up the way you access that
> VPD page. So a tweak is also required there.
> 
Do you have details ? Do I need to add a call to scsi_device_supports_vpd(),
maybe ?

> To avoid the multiple blacklists and heuristic collections my suggestion
> is that I introduce a helper function in SCSI (based on what I did in
> the disk driver) that can be called to identify whether something is an
> ATA device. And then the hwmon driver can call that and we can keep the
> heuristics in one place.
> 
> If a device turns out to be problematic wrt. getting the ATA VPD for the
> purpose of SMART, for instance, it will also need to be blacklisted for
> other reasons in SCSI. So I would really like to keep the heuristics in
> one place.
> 
Fine with me. My only concern is that I don't want the driver to disappear
into nowhere-land (again).

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-17  2:47         ` Martin K. Petersen
@ 2019-12-17  4:20           ` Guenter Roeck
  2019-12-18  3:39             ` Martin K. Petersen
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2019-12-17  4:20 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Linus Walleij, linux-hwmon, Jean Delvare, Linux Doc Mailing List,
	linux-kernel, linux-scsi, linux-ide, Chris Healy

On 12/16/19 6:47 PM, Martin K. Petersen wrote:
> 
> Guenter,
> 
>> Not sure I understand what you mean with 'bazillions of sensors' and
>> 'sensor per scsi_device'. Can you elaborate ? I see one sensor per
>> drive, which is what I would expect.
> 
> Yes, but for storage arrays, hanging off of struct scsi_device means you
> would get a sensor for each volume you create. Even though you
> presumably only have one physical "box" to monitor (ignoring for a
> moment that the drives inside the box may have their own sensors that
> may or may not be visible to the host).
> 
> Also, multi-actuator disk drives are shipping. They present themselves
> to the host as a target with multiple LUNs. Once again you'll probably
> have one temperature sensor for the physical drive but many virtual
> disks being presented to the OS. So you'd end up with for instance 4
> sensors in hwmon even though there physically only is one.
> 
> It's a tough call since there may be hardware configurations where
> distinct per-LUN temperature is valid (some quirky JBODs represent disk
> drives as different LUNs instead of different targets, for instance).
> 
> How expensive will it be to have - say - 100 hwmon sensors instantiated
> for a drive tray?
> 

If that drive tray has 100 physical drives, that is what I would expect
to see. The most expensive part is the device entry, and there are already
several of those for each scsi device. I have seen systems with hundreds
of hwmon devices (backbone switches tend to be quite generous with
voltage, current, power, and temperature sensors), so I am not
particularly concerned in that regard. If there are 100 physical drives,
you would actually want to see the temperature of each drive separately,
as one of them might be overheating due to some internal failure.

If the storage array is represented to the system as single huge physical
drive, which is then split into logical entities not related to physical
drives, I guess that would represent a problem for system management overall.
Maybe such boxes have separate thermal monitoring ? Either case, we
have the question if it is possible to distinguish the pseudo-physical
drive from the virtual drives (or volumes).

I would not mind to tie the hardware monitoring device to something else
than the scsi device if the scsi device does not always have a physical
representation. Is there a way to determine if a scsi device is virtual
or real ? Obviously it does not make sense to report the same temperature
multiple times, and we would want only a single temperature reported
for each physical drive. At the same time, I absolutely want to avoid
a situation where a single hardware monitoring device would report
the temperature of multiple drives. The concern here is crossing OIR
boundaries. A single hardware monitoring device should never cross
an OIR boundary.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-17  3:57       ` Guenter Roeck
@ 2019-12-17  5:50         ` Damien Le Moal
  2019-12-17 15:47           ` Guenter Roeck
  2019-12-18  3:42         ` Martin K. Petersen
  1 sibling, 1 reply; 20+ messages in thread
From: Damien Le Moal @ 2019-12-17  5:50 UTC (permalink / raw)
  To: Guenter Roeck, Martin K. Petersen
  Cc: linux-hwmon, Jean Delvare, linux-doc, linux-kernel, linux-scsi,
	linux-ide

On 2019/12/17 12:57, Guenter Roeck wrote:
> On 12/16/19 6:35 PM, Martin K. Petersen wrote:
>>
>> Guenter,
>>
>>> If and when drives are detected which report bad information, such
>>> drives can be added to a blacklist without impact on the core SCSI or
>>> ATA code. Until that happens, not loading the driver solves the
>>> problem on any affected system.
>>
>> My only concern with that is that we'll have blacklisting several
>> places. We already have ATA and SCSI blacklists. If we now add a third
>> place, that's going to be a maintenance nightmare.
>>
>> More on that below.
>>
>>>> My concerns are wrt. identifying whether SMART data is available for
>>>> USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
>>>> controllers that hide the real drives in various ways).
>>
>> OK, so I spent my weekend tinkering with 15+ years of accumulated USB
>> devices. And my conclusion is that no, we can't in any sensible manner,
>> support USB storage monitoring in the kernel. There is no heuristic that
>> I can find that identifies that "this is a hard drive or an SSD and
>> attempting one of the various SMART methods may be safe". As opposed to
>> "this is a USB key that's likely to lock up if you try". And that's
>> ignoring the drives with USB-ATA bridges that I managed to wedge in my
>> attempt at sending down commands.
>>
>> Even smartmontools is failing to work on a huge part of my vintage
>> collection.  Thanks to a wide variety of bridges with random, custom
>> interfaces.
>>
>> So my stance on all this is that I'm fine with your general approach for
>> ATA. I will post a patch adding the required bits for SCSI. And if a
>> device does not implement either of the two standard methods, people
>> should use smartmontools.
>>
>> Wrt. name, since I've added SCSI support, satatemp is a bit of a
>> misnomer. drivetemp, maybe? No particular preference.
>>
> Agreed, if we extend this to SCSI, satatemp is less than perfect.
> drivetemp ? disktemp ? I am open to suggestions, with maybe a small
> personal preference for disktemp out of those two.

"disk" tend to imply HDD, excluding SSDs. So my vote goes to
"drivetemp", or even the more generic, "devtemp".


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-17  5:50         ` Damien Le Moal
@ 2019-12-17 15:47           ` Guenter Roeck
  0 siblings, 0 replies; 20+ messages in thread
From: Guenter Roeck @ 2019-12-17 15:47 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, linux-doc,
	linux-kernel, linux-scsi, linux-ide

On Tue, Dec 17, 2019 at 05:50:17AM +0000, Damien Le Moal wrote:
> On 2019/12/17 12:57, Guenter Roeck wrote:
> > On 12/16/19 6:35 PM, Martin K. Petersen wrote:
> >>
> >> Guenter,
> >>
> >>> If and when drives are detected which report bad information, such
> >>> drives can be added to a blacklist without impact on the core SCSI or
> >>> ATA code. Until that happens, not loading the driver solves the
> >>> problem on any affected system.
> >>
> >> My only concern with that is that we'll have blacklisting several
> >> places. We already have ATA and SCSI blacklists. If we now add a third
> >> place, that's going to be a maintenance nightmare.
> >>
> >> More on that below.
> >>
> >>>> My concerns are wrt. identifying whether SMART data is available for
> >>>> USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
> >>>> controllers that hide the real drives in various ways).
> >>
> >> OK, so I spent my weekend tinkering with 15+ years of accumulated USB
> >> devices. And my conclusion is that no, we can't in any sensible manner,
> >> support USB storage monitoring in the kernel. There is no heuristic that
> >> I can find that identifies that "this is a hard drive or an SSD and
> >> attempting one of the various SMART methods may be safe". As opposed to
> >> "this is a USB key that's likely to lock up if you try". And that's
> >> ignoring the drives with USB-ATA bridges that I managed to wedge in my
> >> attempt at sending down commands.
> >>
> >> Even smartmontools is failing to work on a huge part of my vintage
> >> collection.  Thanks to a wide variety of bridges with random, custom
> >> interfaces.
> >>
> >> So my stance on all this is that I'm fine with your general approach for
> >> ATA. I will post a patch adding the required bits for SCSI. And if a
> >> device does not implement either of the two standard methods, people
> >> should use smartmontools.
> >>
> >> Wrt. name, since I've added SCSI support, satatemp is a bit of a
> >> misnomer. drivetemp, maybe? No particular preference.
> >>
> > Agreed, if we extend this to SCSI, satatemp is less than perfect.
> > drivetemp ? disktemp ? I am open to suggestions, with maybe a small
> > personal preference for disktemp out of those two.
> 
> "disk" tend to imply HDD, excluding SSDs. So my vote goes to
> "drivetemp", or even the more generic, "devtemp".
> 
"devtemp" would apply to all devices with temperature sensors, which
would be a bit too generic. I'll take that as a vote for "drivetemp".

Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] hwmon: Driver for temperature sensors on SATA drives
  2019-12-17  4:20           ` Guenter Roeck
@ 2019-12-18  3:39             ` Martin K. Petersen
  0 siblings, 0 replies; 20+ messages in thread
From: Martin K. Petersen @ 2019-12-18  3:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Martin K. Petersen, Linus Walleij, linux-hwmon, Jean Delvare,
	Linux Doc Mailing List, linux-kernel, linux-scsi, linux-ide,
	Chris Healy


Guenter,

> If there are 100 physical drives, you would actually want to see the
> temperature of each drive separately, as one of them might be
> overheating due to some internal failure.

Yep. However, for "big boxes" you'll typically get that information from
SAF-TE or SES enclosure services and not from the drive itself.

SES allows you to monitor power supplies, drive bays, hot swap events,
thermals, etc. We have a SES driver in SCSI that exposes all these
things in sysfs. It is not currently tied into hwmon.

> If the storage array is represented to the system as single huge
> physical drive, which is then split into logical entities not related
> to physical drives, I guess that would represent a problem for system
> management overall.

Yep. That's why there's dedicated plumbing in smartmontools to handle
various RAID controller interfaces for accessing physical drive
information. It's typically highly vendor-specific.

> I would not mind to tie the hardware monitoring device to something
> else than the scsi device if the scsi device does not always have a
> physical representation. Is there a way to determine if a scsi device
> is virtual or real ?

Not really. Target is usually a pretty good approximation, although some
arrays introduce virtual targets because of limited LUN (scsi_device)
numbering capabilities. However, arrays generally don't support per-LUN
temperature because it makes no sense.

I'm trying to gauge how much a pain potentially redundant sensors would
be for userland monitoring tooling vs. how many oddball devices we'd not
be able to support if we were to use scsi_target as parent (or restrict
the sensor binding to LUN 0).

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives
  2019-12-17  3:57       ` Guenter Roeck
  2019-12-17  5:50         ` Damien Le Moal
@ 2019-12-18  3:42         ` Martin K. Petersen
  1 sibling, 0 replies; 20+ messages in thread
From: Martin K. Petersen @ 2019-12-18  3:42 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, linux-doc,
	linux-kernel, linux-scsi, linux-ide


Guenter,

>> Also, there are some devices that will lock up the way you access that
>> VPD page. So a tweak is also required there.
>>
> Do you have details ? Do I need to add a call to scsi_device_supports_vpd(),
> maybe ?

Some devices lock up if you ask for too much data. I actually discovered
a VPD handling regression in 5.5 while working on a series of prep
patches for you today. Working on a fix. I'll try to get a patch series
out for review tomorrow.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-12-18  3:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09  5:21 [PATCH 0/1] Summary: hwmon driver for temperature sensors on SATA drives Guenter Roeck
2019-12-09  5:21 ` [PATCH 1/1] hwmon: Driver " Guenter Roeck
2019-12-09  5:28   ` Randy Dunlap
2019-12-09  6:00     ` Guenter Roeck
2019-12-09 17:08   ` Bart Van Assche
2019-12-09 19:20     ` Guenter Roeck
2019-12-10 16:10       ` Bart Van Assche
2019-12-12 22:33   ` Linus Walleij
2019-12-12 23:21     ` Martin K. Petersen
2019-12-13  4:18       ` Guenter Roeck
2019-12-17  2:47         ` Martin K. Petersen
2019-12-17  4:20           ` Guenter Roeck
2019-12-18  3:39             ` Martin K. Petersen
2019-12-11  4:08 ` [PATCH 0/1] Summary: hwmon driver " Martin K. Petersen
2019-12-11  5:57   ` Guenter Roeck
2019-12-17  2:35     ` Martin K. Petersen
2019-12-17  3:57       ` Guenter Roeck
2019-12-17  5:50         ` Damien Le Moal
2019-12-17 15:47           ` Guenter Roeck
2019-12-18  3:42         ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).