Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Akinobu Mita <akinobu.mita@gmail.com>,
	linux-nvme@lists.infradead.org, linux-hwmon@vger.kernel.org
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Jean Delvare <jdelvare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH] nvme: hwmon: provide temperature min and max values for each sensor
Date: Sun, 10 Nov 2019 08:30:23 -0800
Message-ID: <d3c0c9a7-00b9-0465-16e1-6fd7ba97dfd0@roeck-us.net> (raw)
In-Reply-To: <1573395466-19526-1-git-send-email-akinobu.mita@gmail.com>

On 11/10/19 6:17 AM, Akinobu Mita wrote:
> According to the NVMe specification, the over temperature threshold and
> under temperature threshold features shall be implemented for Composite
> Temperature if a non-zero WCTEMP field value is reported in the Identify
> Controller data structure.  The features are also implemented for all
> implemented temperature sensors (i.e., all Temperature Sensor fields that
> report a non-zero value).
> 
> This provides the over temperature threshold and under temperature
> threshold for each sensor as temperature min and max values of hwmon
> sysfs attributes.
> 
> The WCTEMP is already provided as a temperature max value for Composite
> Temperature, but this change isn't incompatible.  Because the default
> value of the over temperature threshold for Composite Temperature is
> the WCTEMP.
> 
> This also provides alarm attributes for each temperature sensor.  But all
> alarm conditions are same, because there is only a single bit in
> Critical Warning field that indicates one of the temperature is outside of
> a temperature threshold.
> 

I think it would be more appropriate to report the alarm only for the
composite temperature, reason being that we don't really know which individual
sensor it is associated with.

> Example output from the "sensors" command:
> 
> nvme-pci-0100
> Adapter: PCI adapter
> Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
>                         (crit = +80.0 C)
> Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> 

Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
any minimum limit on the Composite temperature sensor results in a temperature
warning, and that warning is sticky until I reset the controller.
I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.

root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
nvme-pci-0100
Adapter: PCI adapter
Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
                        (crit = +80.0°C)

root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
nvme-pci-0100
Adapter: PCI adapter
Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
                        (crit = +80.0°C)

It doesn't seem to matter which temperature I write; writing -273000 has
the same result.

[This is actually why I didn't use the features commands; not that I had observed
  the problem, but I was concerned that problems like this would show up.]

> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Jean Delvare <jdelvare@suse.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> This patch depends on the patch "nvme: Add hardware monitoring support" [1]
> [1] http://lists.infradead.org/pipermail/linux-nvme/2019-November/027883.html
> 
>   drivers/nvme/host/nvme-hwmon.c | 98 ++++++++++++++++++++++++++++++++++++------
>   include/linux/nvme.h           |  6 +++
>   2 files changed, 90 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
> index 5480cbb..79323b2 100644
> --- a/drivers/nvme/host/nvme-hwmon.c
> +++ b/drivers/nvme/host/nvme-hwmon.c
> @@ -15,6 +15,46 @@ struct nvme_hwmon_data {
>   	struct mutex read_lock;
>   };
>   
> +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> +				long *temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (under)
> +		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> +
> +	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +	if (!ret)
> +		*temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
> +
> +	return ret <= 0 ? ret : -EIO;
> +}
> +
> +static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> +				long temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	temp = temp / 1000 + 273;
> +	if (temp > NVME_TEMP_THRESH_MASK)
> +		return -EINVAL;
> +

Traditionally we use clamp_val() in hwmon drivers to adjust value ranges
for limit attributes, reason being that we can't expect userspace to dig
through per-sensor-type documentation to identify valid limits. Also, note
that the above does not handle negative values well (-274000 -> -274 -> -1).
I would suggest something like

	temp = temp / 1000 + 273;
	temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);

or, if you want to be fancy;

	temp = DIV_ROUND_CLOSEST(temp, 1000) - 273;
	temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);

> +	threshold |= temp;
> +
> +	if (under)
> +		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> +
> +	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);

I am a bit baffled here. The last parameter of nvme_set_features() (and nvme_get_features)
is a pointer to u32, but status is declared as int. I would have assumed this generates
a compiler warning, but it doesn't, at least not with my version of gcc.

Either case, it might be better to declare status as u32 (unless I did not have enough
coffee and I am missing something).

Also, I assume that the returned status value is irrelevant. I don't find useful
information in the specification, but I may be missing it.

> +
> +	return ret <= 0 ? ret : -EIO;
> +}
> +
>   static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
>   {
>   	int ret;
> @@ -39,8 +79,12 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
>   	 */
>   	switch (attr) {
>   	case hwmon_temp_max:
> -		*val = (data->ctrl->wctemp - 273) * 1000;
> +		err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> +		if (err)
> +			*val = (data->ctrl->wctemp - 273) * 1000;

This would report WCTEMP for all sensors on errors, including errors seen while
the controller is resetting. I think it should be something like

		int err = 0;
		...

		if (!channel)
			*val = (data->ctrl->wctemp - 273) * 1000;
		else
			err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
		return err;

assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
temperature sensor limit changes wctemp, and we don't update it, we should not
use it at all after registration and just report the error.

>   		return 0;
> +	case hwmon_temp_min:
> +		return nvme_get_temp_thresh(data->ctrl, channel, true, val);
>   	case hwmon_temp_crit:
>   		*val = (data->ctrl->cctemp - 273) * 1000;
>   		return 0;
> @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
>   	return err;
>   }
>   
> +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> +			    u32 attr, int channel, long val)
> +{
> +	struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> +
> +	switch (attr) {
> +	case hwmon_temp_max:
> +		return nvme_set_temp_thresh(data->ctrl, channel, false, val);

Does this change WCTEMP if written on channel 0 ? If so, we would have to update
the cached value of ctrl->wctemp (or never use it after registration).

> +	case hwmon_temp_min:
> +		return nvme_set_temp_thresh(data->ctrl, channel, true, val);
> +	default:
> +		break;
> +	}
> +
> +	return -EOPNOTSUPP;
> +}
> +
>   static const char * const nvme_hwmon_sensor_names[] = {
>   	"Composite",
>   	"Sensor 1",
> @@ -105,13 +166,13 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
>   			return 0444;
>   		break;
>   	case hwmon_temp_max:
> +	case hwmon_temp_min:
>   		if (!channel && data->ctrl->wctemp)
> -			return 0444;
> +			return 0644;
> +		else if (data->log.temp_sensor[channel - 1])
> +			return 0644;

This ends up with a negative index into data->log.temp_sensor
if data->ctrl->wctemp == 0. It needs to be
		else if (channel && data->log.temp_sensor[channel - 1])
It can also be written as a single conditional since the return value is the same.

>   		break;
>   	case hwmon_temp_alarm:
> -		if (!channel)
> -			return 0444;
> -		break;
>   	case hwmon_temp_input:
>   	case hwmon_temp_label:
>   		if (!channel || data->log.temp_sensor[channel - 1])
> @@ -126,16 +187,24 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
>   static const struct hwmon_channel_info *nvme_hwmon_info[] = {
>   	HWMON_CHANNEL_INFO(chip, HWMON_C_REGISTER_TZ),
>   	HWMON_CHANNEL_INFO(temp,
> -			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_CRIT | HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
>   				HWMON_T_LABEL | HWMON_T_ALARM,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL),
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM),
>   	NULL
>   };
>   
> @@ -143,6 +212,7 @@ static const struct hwmon_ops nvme_hwmon_ops = {
>   	.is_visible	= nvme_hwmon_is_visible,
>   	.read		= nvme_hwmon_read,
>   	.read_string	= nvme_hwmon_read_string,
> +	.write		= nvme_hwmon_write,
>   };
>   
>   static const struct hwmon_chip_info nvme_hwmon_chip_info = {
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index 269b2e8..347a44f 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -803,6 +803,12 @@ struct nvme_write_zeroes_cmd {
>   
>   /* Features */
>   
> +enum {
> +	NVME_TEMP_THRESH_MASK		= 0xffff,
> +	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
> +	NVME_TEMP_THRESH_TYPE_UNDER	= 0x100000,
> +};
> +
>   struct nvme_feat_auto_pst {
>   	__le64 entries[32];
>   };
> 


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-10 14:17 Akinobu Mita
2019-11-10 16:30 ` Guenter Roeck [this message]
2019-11-11 15:56   ` Akinobu Mita
2019-11-11 17:35     ` Guenter Roeck
2019-11-12 14:40       ` Akinobu Mita
2019-11-12 15:04         ` Guenter Roeck
2019-11-12 15:06           ` Christoph Hellwig
2019-11-12 16:35             ` Guenter Roeck
2019-11-11 16:53 ` Christoph Hellwig
2019-11-12 14:19   ` Akinobu Mita
2019-11-12 14:21     ` Christoph Hellwig
2019-11-12 15:00       ` Akinobu Mita
2019-11-12 15:08         ` Christoph Hellwig
2019-11-12 16:38         ` Guenter Roeck
2019-11-13 12:58           ` Akinobu Mita
2019-11-13 14:11             ` Guenter Roeck

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3c0c9a7-00b9-0465-16e1-6fd7ba97dfd0@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=akinobu.mita@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=jdelvare@suse.com \
    --cc=kbusch@kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git