linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* hwmon: drivetemp: bogus values after wake up from suspend
@ 2020-04-06 16:23 Holger Hoffstätte
  2020-04-06 19:17 ` Guenter Roeck
  2020-04-07  1:41 ` Guenter Roeck
  0 siblings, 2 replies; 4+ messages in thread
From: Holger Hoffstätte @ 2020-04-06 16:23 UTC (permalink / raw)
  To: LKML, Guenter Roeck


I've been giving the drivetemp hwmon driver a try and am very happy
with it; works right away and - much to my surprise - doesn't wake up
HDDs that have gone to sleep. Nice!

I did notice one tiny thing though: after waking up from suspend, my SSD
(Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
which is definitely not the temperature in my office. While this is more
a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
Can't have that!

So I looked into the source and found that the values are (understandably)
passed on unfiltered/uncapped. Since it's unlikely any active device has
operating temperature below-zero, I figured the laziest way is to cap the
value to positive:

diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
--- a/drivers/hwmon/drivetemp.c	2020-04-02 08:02:32.000000000 +0200
+++ b/drivers/hwmon/drivetemp.c	2020-04-06 18:13:04.892554087 +0200
@@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
  #define INVALID_TEMP		0x80
  
  #define temp_is_valid(temp)	((temp) != INVALID_TEMP)
-#define temp_from_sct(temp)	(((s8)(temp)) * 1000)
+#define temp_from_sct(temp)	(max(0, ((s8)(temp)) * 1000))
  
  static inline bool ata_id_smart_supported(u16 *id)
  {

The assumption is of course *theoretically* wrong since some
equipment might indeed operate in negative C°. One way might be
to use the device's "low" operating point first, but then that
might not be available and we'd be back to capping to 0.
I'm open to other suggestions. :)

thanks,
Holger

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: hwmon: drivetemp: bogus values after wake up from suspend
  2020-04-06 16:23 hwmon: drivetemp: bogus values after wake up from suspend Holger Hoffstätte
@ 2020-04-06 19:17 ` Guenter Roeck
  2020-04-07  1:41 ` Guenter Roeck
  1 sibling, 0 replies; 4+ messages in thread
From: Guenter Roeck @ 2020-04-06 19:17 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: LKML

On Mon, Apr 06, 2020 at 06:23:01PM +0200, Holger Hoffstätte wrote:
> 
> I've been giving the drivetemp hwmon driver a try and am very happy
> with it; works right away and - much to my surprise - doesn't wake up
> HDDs that have gone to sleep. Nice!
> 
> I did notice one tiny thing though: after waking up from suspend, my SSD
> (Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
> which is definitely not the temperature in my office. While this is more
> a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
> Can't have that!
> 
> So I looked into the source and found that the values are (understandably)
> passed on unfiltered/uncapped. Since it's unlikely any active device has
> operating temperature below-zero, I figured the laziest way is to cap the
> value to positive:
> 
> diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> --- a/drivers/hwmon/drivetemp.c	2020-04-02 08:02:32.000000000 +0200
> +++ b/drivers/hwmon/drivetemp.c	2020-04-06 18:13:04.892554087 +0200
> @@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
>  #define INVALID_TEMP		0x80
>  #define temp_is_valid(temp)	((temp) != INVALID_TEMP)
> -#define temp_from_sct(temp)	(((s8)(temp)) * 1000)
> +#define temp_from_sct(temp)	(max(0, ((s8)(temp)) * 1000))
>  static inline bool ata_id_smart_supported(u16 *id)
>  {
> 
> The assumption is of course *theoretically* wrong since some
> equipment might indeed operate in negative C°. One way might be
> to use the device's "low" operating point first, but then that
> might not be available and we'd be back to capping to 0.
> I'm open to other suggestions. :)
> 

I think 0 is't much better than -128, unless your office is somewhere
in the Arctic. I'll have to loook up the spec, but I think -128 may mean
"no data". Maybe we can return something like -ENODATA in that case.

Guenter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: hwmon: drivetemp: bogus values after wake up from suspend
  2020-04-06 16:23 hwmon: drivetemp: bogus values after wake up from suspend Holger Hoffstätte
  2020-04-06 19:17 ` Guenter Roeck
@ 2020-04-07  1:41 ` Guenter Roeck
  2020-04-08  3:59   ` Holger Hoffstätte
  1 sibling, 1 reply; 4+ messages in thread
From: Guenter Roeck @ 2020-04-07  1:41 UTC (permalink / raw)
  To: Holger Hoffstätte, LKML

On 4/6/20 9:23 AM, Holger Hoffstätte wrote:
> 
> I've been giving the drivetemp hwmon driver a try and am very happy
> with it; works right away and - much to my surprise - doesn't wake up
> HDDs that have gone to sleep. Nice!
> 
> I did notice one tiny thing though: after waking up from suspend, my SSD
> (Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
> which is definitely not the temperature in my office. While this is more
> a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
> Can't have that!
> 
> So I looked into the source and found that the values are (understandably)
> passed on unfiltered/uncapped. Since it's unlikely any active device has
> operating temperature below-zero, I figured the laziest way is to cap the
> value to positive:
> 
> diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> --- a/drivers/hwmon/drivetemp.c    2020-04-02 08:02:32.000000000 +0200
> +++ b/drivers/hwmon/drivetemp.c    2020-04-06 18:13:04.892554087 +0200
> @@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
>  #define INVALID_TEMP        0x80
>  
>  #define temp_is_valid(temp)    ((temp) != INVALID_TEMP)
> -#define temp_from_sct(temp)    (((s8)(temp)) * 1000)
> +#define temp_from_sct(temp)    (max(0, ((s8)(temp)) * 1000))
>  
>  static inline bool ata_id_smart_supported(u16 *id)
>  {
> 
> The assumption is of course *theoretically* wrong since some
> equipment might indeed operate in negative C°. One way might be
> to use the device's "low" operating point first, but then that
> might not be available and we'd be back to capping to 0.
> I'm open to other suggestions. :)
> 

Looking into the code, 0x80 or -128 indeed reflects an invalid temperature.
Any chance you can apply the following to see if it helps ?

diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
index 370d0c74eb01..c27239eb28cf 100644
--- a/drivers/hwmon/drivetemp.c
+++ b/drivers/hwmon/drivetemp.c
@@ -264,6 +264,8 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
                return err;
        switch (attr) {
        case hwmon_temp_input:
+               if (!temp_is_valid(buf[SCT_STATUS_TEMP]))
+                       return -ENODATA;
                *val = temp_from_sct(buf[SCT_STATUS_TEMP]);
                break;
        case hwmon_temp_lowest:

I am not sure what the best error code would be - suggestions welcome.

Thanks,
Guenter

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: hwmon: drivetemp: bogus values after wake up from suspend
  2020-04-07  1:41 ` Guenter Roeck
@ 2020-04-08  3:59   ` Holger Hoffstätte
  0 siblings, 0 replies; 4+ messages in thread
From: Holger Hoffstätte @ 2020-04-08  3:59 UTC (permalink / raw)
  To: Guenter Roeck, LKML

On 4/7/20 3:41 AM, Guenter Roeck wrote:
> On 4/6/20 9:23 AM, Holger Hoffstätte wrote:
>>
>> I've been giving the drivetemp hwmon driver a try and am very happy
>> with it; works right away and - much to my surprise - doesn't wake up
>> HDDs that have gone to sleep. Nice!
>>
>> I did notice one tiny thing though: after waking up from suspend, my SSD
>> (Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
>> which is definitely not the temperature in my office. While this is more
>> a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
>> Can't have that!
>>
>> So I looked into the source and found that the values are (understandably)
>> passed on unfiltered/uncapped. Since it's unlikely any active device has
>> operating temperature below-zero, I figured the laziest way is to cap the
>> value to positive:
>>
>> diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
>> --- a/drivers/hwmon/drivetemp.c    2020-04-02 08:02:32.000000000 +0200
>> +++ b/drivers/hwmon/drivetemp.c    2020-04-06 18:13:04.892554087 +0200
>> @@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
>>   #define INVALID_TEMP        0x80
>>   
>>   #define temp_is_valid(temp)    ((temp) != INVALID_TEMP)
>> -#define temp_from_sct(temp)    (((s8)(temp)) * 1000)
>> +#define temp_from_sct(temp)    (max(0, ((s8)(temp)) * 1000))
>>   
>>   static inline bool ata_id_smart_supported(u16 *id)
>>   {
>>
>> The assumption is of course *theoretically* wrong since some
>> equipment might indeed operate in negative C°. One way might be
>> to use the device's "low" operating point first, but then that
>> might not be available and we'd be back to capping to 0.
>> I'm open to other suggestions. :)
>>
> 
> Looking into the code, 0x80 or -128 indeed reflects an invalid temperature.

Excellent, that's of course much better than just capping to 0.

> Any chance you can apply the following to see if it helps ?
> 
> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> index 370d0c74eb01..c27239eb28cf 100644
> --- a/drivers/hwmon/drivetemp.c
> +++ b/drivers/hwmon/drivetemp.c
> @@ -264,6 +264,8 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>                  return err;
>          switch (attr) {
>          case hwmon_temp_input:
> +               if (!temp_is_valid(buf[SCT_STATUS_TEMP]))
> +                       return -ENODATA;
>                  *val = temp_from_sct(buf[SCT_STATUS_TEMP]);
>                  break;
>          case hwmon_temp_lowest:
> 
> I am not sure what the best error code would be - suggestions welcome.

Gave it a try and had to wait overnight for things to cool down
(just suspending for an hour wouldn't do it). Right after wakeup sensors
now shows "N/A" as expected, and no illegal values in drivetemp or my
monitoring; missing values are perfectly fine.
After a few minutes correct values show up and all is good.

In case you submit this as official patch feel free to add my
Reported-by/Tested-by. Thanks for looking into it!

cheers
Holger

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-08  3:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06 16:23 hwmon: drivetemp: bogus values after wake up from suspend Holger Hoffstätte
2020-04-06 19:17 ` Guenter Roeck
2020-04-07  1:41 ` Guenter Roeck
2020-04-08  3:59   ` Holger Hoffstätte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).