Linux-ide Archive on lore.kernel.org
 help / color / Atom feed
From: Gabriel C <nix.or.die@gmail.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Walleij <linus.walleij@linaro.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-hwmon@vger.kernel.org, Jean Delvare <jdelvare@suse.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	"open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers)" 
	<linux-ide@vger.kernel.org>, Chris Healy <cphealy@gmail.com>
Subject: Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives
Date: Sun, 12 Jan 2020 23:26:37 +0100
Message-ID: <CAEJqkgi1yDUmMcvfaQfuyukCBKjgnpY0n5BxvTTw0U_4+PoAHQ@mail.gmail.com> (raw)
In-Reply-To: <de8861e8-f21b-6a66-4f5b-25acc8ff40e2@roeck-us.net>

Am So., 12. Jan. 2020 um 21:08 Uhr schrieb Guenter Roeck <linux@roeck-us.net>:
>
> On 1/12/20 10:37 AM, Gabriel C wrote:
> > Am So., 12. Jan. 2020 um 16:26 Uhr schrieb Guenter Roeck <linux@roeck-us.net>:
> >>
> >> On 1/12/20 5:45 AM, Gabriel C wrote:
> >>> Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@roeck-us.net>:
> >>>>
> >>>> On 1/12/20 4:07 AM, Linus Walleij wrote:
> >>>>> On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote:
> >>>>>> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij
> >>>>>> <linus.walleij@linaro.org>:
> >>>>>>>
> >>>>>>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> What I've noticed however is the nvme temperature low/high values on
> >>>>>>>> the Sensors X are strange here.
> >>>>>>> (...)
> >>>>>>>> Sensor 1:     +27.9°C  (low  = -273.1°C, high = +65261.8°C)
> >>>>>>>> Sensor 2:     +29.9°C  (low  = -273.1°C, high = +65261.8°C)
> >>>>>>> (...)
> >>>>>>>> Sensor 1:     +23.9°C  (low  = -273.1°C, high = +65261.8°C)
> >>>>>>>> Sensor 2:     +25.9°C  (low  = -273.1°C, high = +65261.8°C)
> >>>>>>>
> >>>>>>> That doesn't look strange to me. It seems like reasonable defaults
> >>>>>>> from the firmware if either it doesn't really log the min/max temperatures
> >>>>>>> or hasn't been through a cycle of updating these yet. Just set both
> >>>>>>> to absolute min/max temperatures possible.
> >>>>>>
> >>>>>> Ok I'll check that.
> >>>>>>
> >>>>>> Do you mean by setting the temperatures to use a lmsensors config?
> >>>>>> Or is there a way to set these with a nvme command?
> >>>>>
> >>>>> Not that I know of.
> >>>>>
> >>>>> The min/max are the minumum and maximum temperatures the
> >>>>> device has experienced during this power-on cycle.
> >>>>>
> >>>>
> >>>> No, that would be lowest/highest. The above are (or should be) per-sensor
> >>>> setpoints. The default for those is typically the absolute minimum /
> >>>> maximum of the supported range.
> >>>>
> >>>> Some SATA drives report the lowest/highest temperatures experienced
> >>>> since power cycle, like here.
> >>>>
> >>>> drivetemp-scsi-5-0
> >>>> Adapter: SCSI adapter
> >>>> temp1:        +23.0°C  (low  =  +0.0°C, high = +60.0°C)
> >>>>                           (crit low = -41.0°C, crit = +85.0°C)
> >>>>                           (lowest = +20.0°C, highest = +31.0°C)
> >>>>
> >>>
> >>> The SATA temperatures are fine and reported like this here too, just
> >>> the nvme ones are strange.
> >>>
> >>> drivetemp-scsi-4-0
> >>> Adapter: SCSI adapter
> >>> temp1:        +28.0°C  (low  =  +1.0°C, high = +61.0°C)
> >>>                         (crit low =  +2.0°C, crit = +60.0°C)
> >>>                         (lowest = +16.0°C, highest = +31.0°C)
> >>>
> >>> drivetemp-scsi-12-0
> >>> Adapter: SCSI adapter
> >>> temp1:        +29.0°C  (low  =  +1.0°C, high = +61.0°C)
> >>>                         (crit low =  +2.0°C, crit = +60.0°C)
> >>>                         (lowest = +18.0°C, highest = +32.0°C)
> >>>
> >>> and so on.
> >>>
> >>> Btw, where I can find the code does these calculations?
> >>>
> >>
> >> Not sure if that is what you are looking for, but the nvme hardware
> >> monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware
> >> monitoring driver is at drivers/hwmon/drivetemp.c.
> >>
> >
> > I have a look thanks.
> >
> > I'm using your v2 patch for the nvme part since you posted it on 5.4 kernels.
> > This is probably why I find the way the temperatures are now reported
> > very strange.
> >
> > The ADATA XPG SX8200 Pro in my laptop seems to work better:
> >
> > nvme-pci-0200
> > Adapter: PCI adapter
> > Composite:    +37.9°C  (low  =  -0.1°C, high = +74.8°C)
> >                        (crit = +79.8°C)
> >
> > Low is 0° which is what the spec suggests.
> >
> >> The limits on nvme drives are configurable.
> >
> > Yes, I found this out already.
> >
> >> root@server:/sys/class/hwmon# sensors nvme-pci-0100
> >> nvme-pci-0100
> >> Adapter: PCI adapter
> >> Composite:    +40.9°C  (low  = -273.1°C, high = +84.8°C)
> >>                          (crit = +84.8°C)
> >> Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)
> >> Sensor 2:     +43.9°C  (low  = -273.1°C, high = +65261.8°C)
> >>
> >> root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min
> >> root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max
> >
> > An lm-sensors configuration will work too.
> >
> Sure, the above was just an example.
>
> >> root@server:/sys/class/hwmon# sensors nvme-pci-0100
> >> nvme-pci-0100
> >> Adapter: PCI adapter
> >> Composite:    +38.9°C  (low  = -273.1°C, high = +84.8°C)
> >>                          (crit = +84.8°C)
> >> Sensor 1:     +38.9°C  (low  =  -0.1°C, high = +99.8°C)
> >> Sensor 2:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)
> >>
> >> If you dislike the defaults, just configure whatever you think is
> >> appropriate for your system.
> >
> > It's not about disliking the values. I want to find out if these Samsung models
> > don't support that, or it is a bug somewhere in writing/calculating the values.
> >
> No, this is not a bug. It is perfectly valid for individual sensors to have
> uninitialized limits. If I recall correctly, the NVME specification
> specifically states that the default settings for individual sensors
> shall be those values (0 and 65535 Kelvin, specifically).
>
> And, yes, I would agree that is a bit odd that NVME drives report temperatures
> in Kelvin, but such is the world.
>
> > In the case, Samsung and others don't support such a thing wouldn't be
> > better to just ignore
> > the bogus reading altogether?
>
> Again, you can set whatever limits you like. The default limits on many
> hardware sensor chips have odd values. Just looking at my system:
>
> nct6797-isa-0a20
> Adapter: ISA adapter
> in0:                    +0.48 V  (min =  +0.00 V, max =  +1.74 V)
> in1:                    +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in2:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in3:                    +3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in4:                    +1.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in5:                    +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in6:                    +0.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in7:                    +3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in8:                    +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in9:                    +1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in10:                   +0.00 V  (min =  +0.00 V, max =  +0.00 V)
> in11:                   +0.74 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in12:                   +1.20 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in13:                   +0.68 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in14:                   +1.50 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>
>
> Are you suggesting that we should not support setting min/max values for
> all drivers just because they are often not initialized to reasonable values
> by default ?

No, I'm not suggesting that. I'm aware of strange I/O monitoring chips values
and the lack of documentation, so in this case, something is better
than nothing.

In the nvme case, these are only 2 values who either are working/supported
by firmware or not, so I thought it would be reasonable to have
known-good values
instead of 65261.8°C, which will probably cause users to report that
as a bug a lot.

Can we at least have that documented and explain how the values can be
set/changed?

  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-15 17:45 [PATCH v2 0/1] Summary: hwmon driver " Guenter Roeck
2019-12-15 17:45 ` [PATCH v2] hwmon: Driver " Guenter Roeck
2019-12-19  0:15   ` Martin K. Petersen
2019-12-19  0:32     ` Guenter Roeck
2020-01-07  4:10       ` Martin K. Petersen
2020-01-07 13:00         ` Guenter Roeck
2020-01-08  1:29           ` Martin K. Petersen
2020-01-08 15:32             ` Guenter Roeck
2019-12-19  7:37     ` Guenter Roeck
2020-01-01 17:46     ` Guenter Roeck
2020-01-03  3:06       ` Martin K. Petersen
2020-01-08  1:12       ` Martin K. Petersen
2020-01-08 15:33         ` Guenter Roeck
2020-01-11 20:22           ` Guenter Roeck
2020-01-12 11:17             ` Gabriel C
2020-01-12 11:21               ` Linus Walleij
2020-01-12 12:02                 ` Gabriel C
2020-01-12 12:07                   ` Linus Walleij
2020-01-12 13:07                     ` Guenter Roeck
2020-01-12 13:45                       ` Gabriel C
2020-01-12 15:26                         ` Guenter Roeck
2020-01-12 18:37                           ` Gabriel C
2020-01-12 20:08                             ` Guenter Roeck
2020-01-12 22:26                               ` Gabriel C [this message]
2020-01-14  3:03             ` Martin K. Petersen
2020-01-14  5:20               ` Guenter Roeck
2020-01-16  4:12                 ` Martin K. Petersen
2020-01-16  5:09                   ` Guenter Roeck
2020-01-16 17:47                   ` Guenter Roeck
2020-01-17  1:43                     ` Martin K. Petersen
2020-01-17  3:53                       ` Guenter Roeck

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEJqkgi1yDUmMcvfaQfuyukCBKjgnpY0n5BxvTTw0U_4+PoAHQ@mail.gmail.com \
    --to=nix.or.die@gmail.com \
    --cc=bvanassche@acm.org \
    --cc=cphealy@gmail.com \
    --cc=jdelvare@suse.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ide Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ide/0 linux-ide/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ide linux-ide/ https://lore.kernel.org/linux-ide \
		linux-ide@vger.kernel.org
	public-inbox-index linux-ide

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ide


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git