Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Akinobu Mita <akinobu.mita@gmail.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org, Jean Delvare <jdelvare@suse.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, Jens Axboe <axboe@fb.com>,
	Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] nvme: hwmon: provide temperature min and max values for each sensor
Date: Tue, 12 Nov 2019 00:56:21 +0900
Message-ID: <CAC5umyiju2Q2fdfVaFyX+Q=sMKr5Gsc_GDVYmSa0vB+w8acvAw@mail.gmail.com> (raw)
In-Reply-To: <d3c0c9a7-00b9-0465-16e1-6fd7ba97dfd0@roeck-us.net>

2019年11月11日(月) 1:30 Guenter Roeck <linux@roeck-us.net>:
>
> On 11/10/19 6:17 AM, Akinobu Mita wrote:
> > According to the NVMe specification, the over temperature threshold and
> > under temperature threshold features shall be implemented for Composite
> > Temperature if a non-zero WCTEMP field value is reported in the Identify
> > Controller data structure.  The features are also implemented for all
> > implemented temperature sensors (i.e., all Temperature Sensor fields that
> > report a non-zero value).
> >
> > This provides the over temperature threshold and under temperature
> > threshold for each sensor as temperature min and max values of hwmon
> > sysfs attributes.
> >
> > The WCTEMP is already provided as a temperature max value for Composite
> > Temperature, but this change isn't incompatible.  Because the default
> > value of the over temperature threshold for Composite Temperature is
> > the WCTEMP.
> >
> > This also provides alarm attributes for each temperature sensor.  But all
> > alarm conditions are same, because there is only a single bit in
> > Critical Warning field that indicates one of the temperature is outside of
> > a temperature threshold.
> >
>
> I think it would be more appropriate to report the alarm only for the
> composite temperature, reason being that we don't really know which individual
> sensor it is associated with.

OK.

> > Example output from the "sensors" command:
> >
> > nvme-pci-0100
> > Adapter: PCI adapter
> > Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
> >                         (crit = +80.0 C)
> > Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> > Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> > Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> >
>
> Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
> any minimum limit on the Composite temperature sensor results in a temperature
> warning, and that warning is sticky until I reset the controller.
> I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.

I have Crucial CT500P1SSD8 and WDC WDS512G1X0C-00ENX0, and I have no
problem with these devices.

> root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> nvme-pci-0100
> Adapter: PCI adapter
> Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
>                         (crit = +80.0°C)
>
> root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
> root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> nvme-pci-0100
> Adapter: PCI adapter
> Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
>                         (crit = +80.0°C)
>
> It doesn't seem to matter which temperature I write; writing -273000 has
> the same result.
>
> [This is actually why I didn't use the features commands; not that I had observed
>   the problem, but I was concerned that problems like this would show up.]

Maybe we should introduce a new quirk so that we can avoid changing
temperature threshold for such devices.  Could you tell SSDPEKKW512G7's
vendor and device ID?  Quick googling answers it's 8086:f1a5, but I want
to make sure.

> > Cc: Keith Busch <kbusch@kernel.org>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Jean Delvare <jdelvare@suse.com>
> > Cc: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > ---
> > This patch depends on the patch "nvme: Add hardware monitoring support" [1]
> > [1] http://lists.infradead.org/pipermail/linux-nvme/2019-November/027883.html
> >
> >   drivers/nvme/host/nvme-hwmon.c | 98 ++++++++++++++++++++++++++++++++++++------
> >   include/linux/nvme.h           |  6 +++
> >   2 files changed, 90 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
> > index 5480cbb..79323b2 100644
> > --- a/drivers/nvme/host/nvme-hwmon.c
> > +++ b/drivers/nvme/host/nvme-hwmon.c
> > @@ -15,6 +15,46 @@ struct nvme_hwmon_data {
> >       struct mutex read_lock;
> >   };
> >
> > +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> > +                             long *temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (under)
> > +             threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> > +
> > +     ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +     if (!ret)
> > +             *temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
> > +
> > +     return ret <= 0 ? ret : -EIO;
> > +}
> > +
> > +static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> > +                             long temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     temp = temp / 1000 + 273;
> > +     if (temp > NVME_TEMP_THRESH_MASK)
> > +             return -EINVAL;
> > +
>
> Traditionally we use clamp_val() in hwmon drivers to adjust value ranges
> for limit attributes, reason being that we can't expect userspace to dig
> through per-sensor-type documentation to identify valid limits. Also, note
> that the above does not handle negative values well (-274000 -> -274 -> -1).
> I would suggest something like
>
>         temp = temp / 1000 + 273;
>         temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);
>
> or, if you want to be fancy;
>
>         temp = DIV_ROUND_CLOSEST(temp, 1000) - 273;
>         temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);

Either way looks good.

> > +     threshold |= temp;
> > +
> > +     if (under)
> > +             threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> > +
> > +     ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
>
> I am a bit baffled here. The last parameter of nvme_set_features() (and nvme_get_features)
> is a pointer to u32, but status is declared as int. I would have assumed this generates
> a compiler warning, but it doesn't, at least not with my version of gcc.
>
> Either case, it might be better to declare status as u32 (unless I did not have enough
> coffee and I am missing something).
>
> Also, I assume that the returned status value is irrelevant. I don't find useful
> information in the specification, but I may be missing it.

You are right.  I'll change the last parameter of nvme_set_features()
with NULL.

> > +
> > +     return ret <= 0 ? ret : -EIO;
> > +}
> > +
> >   static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
> >   {
> >       int ret;
> > @@ -39,8 +79,12 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> >        */
> >       switch (attr) {
> >       case hwmon_temp_max:
> > -             *val = (data->ctrl->wctemp - 273) * 1000;
> > +             err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > +             if (err)
> > +                     *val = (data->ctrl->wctemp - 273) * 1000;
>
> This would report WCTEMP for all sensors on errors, including errors seen while
> the controller is resetting. I think it should be something like
>
>                 int err = 0;
>                 ...
>
>                 if (!channel)
>                         *val = (data->ctrl->wctemp - 273) * 1000;
>                 else
>                         err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
>                 return err;
>
> assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
> temperature sensor limit changes wctemp, and we don't update it, we should not
> use it at all after registration and just report the error.
>
> >               return 0;
> > +     case hwmon_temp_min:
> > +             return nvme_get_temp_thresh(data->ctrl, channel, true, val);
> >       case hwmon_temp_crit:
> >               *val = (data->ctrl->cctemp - 273) * 1000;
> >               return 0;
> > @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> >       return err;
> >   }
> >
> > +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> > +                         u32 attr, int channel, long val)
> > +{
> > +     struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> > +
> > +     switch (attr) {
> > +     case hwmon_temp_max:
> > +             return nvme_set_temp_thresh(data->ctrl, channel, false, val);
>
> Does this change WCTEMP if written on channel 0 ? If so, we would have to update
> the cached value of ctrl->wctemp (or never use it after registration).

At least for the devices I have, setting the over temperature threshold
doesn't change the WCTEMP.
I have checked with  'nvme id-ctrl /dev/nvme0 | grep ctemp'.

> > +     case hwmon_temp_min:
> > +             return nvme_set_temp_thresh(data->ctrl, channel, true, val);
> > +     default:
> > +             break;
> > +     }
> > +
> > +     return -EOPNOTSUPP;
> > +}
> > +
> >   static const char * const nvme_hwmon_sensor_names[] = {
> >       "Composite",
> >       "Sensor 1",
> > @@ -105,13 +166,13 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
> >                       return 0444;
> >               break;
> >       case hwmon_temp_max:
> > +     case hwmon_temp_min:
> >               if (!channel && data->ctrl->wctemp)
> > -                     return 0444;
> > +                     return 0644;
> > +             else if (data->log.temp_sensor[channel - 1])
> > +                     return 0644;
>
> This ends up with a negative index into data->log.temp_sensor
> if data->ctrl->wctemp == 0. It needs to be

Oops.

>                 else if (channel && data->log.temp_sensor[channel - 1])
> It can also be written as a single conditional since the return value is the same.

Sounds good.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-10 14:17 Akinobu Mita
2019-11-10 16:30 ` Guenter Roeck
2019-11-11 15:56   ` Akinobu Mita [this message]
2019-11-11 17:35     ` Guenter Roeck
2019-11-12 14:40       ` Akinobu Mita
2019-11-12 15:04         ` Guenter Roeck
2019-11-12 15:06           ` Christoph Hellwig
2019-11-12 16:35             ` Guenter Roeck
2019-11-11 16:53 ` Christoph Hellwig
2019-11-12 14:19   ` Akinobu Mita
2019-11-12 14:21     ` Christoph Hellwig
2019-11-12 15:00       ` Akinobu Mita
2019-11-12 15:08         ` Christoph Hellwig
2019-11-12 16:38         ` Guenter Roeck
2019-11-13 12:58           ` Akinobu Mita
2019-11-13 14:11             ` Guenter Roeck

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC5umyiju2Q2fdfVaFyX+Q=sMKr5Gsc_GDVYmSa0vB+w8acvAw@mail.gmail.com' \
    --to=akinobu.mita@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=jdelvare@suse.com \
    --cc=kbusch@kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux@roeck-us.net \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git