linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] i8k: Ignore temperature sensors which report invalid values
@ 2014-10-19 14:46 Pali Rohár
  2014-10-19 15:13 ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-10-19 14:46 UTC (permalink / raw)
  To: Guenter Roeck, Arnd Bergmann, Greg Kroah-Hartman
  Cc: linux-kernel, Steven Honeyman, Pali Rohár

On some machines some temperature sensors report too high values which are
invalid. When value is invalid function i8k_get_temp() returns previous
value and at next call it returns I8K_MAX_TEMP.

With this patch also firt i8k_get_temp() call returns I8K_MAX_TEMP and
i8k_init_hwmon() will ignore all sensor ids which report incorrect values.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
---
 drivers/char/i8k.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
index 7272b08..bc327fa 100644
--- a/drivers/char/i8k.c
+++ b/drivers/char/i8k.c
@@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
 	int temp;
 
 #ifdef I8K_TEMPERATURE_BUG
-	static int prev[4];
+	static int prev[4] = { I8K_MAX_TEMP, I8K_MAX_TEMP, I8K_MAX_TEMP, I8K_MAX_TEMP };
 #endif
 	regs.ebx = sensor & 0xff;
 	rc = i8k_smm(&regs);
@@ -610,17 +610,17 @@ static int __init i8k_init_hwmon(void)
 
 	/* CPU temperature attributes, if temperature reading is OK */
 	err = i8k_get_temp(0);
-	if (err >= 0)
+	if (err >= 0 && err < I8K_MAX_TEMP)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP1;
 	/* check for additional temperature sensors */
 	err = i8k_get_temp(1);
-	if (err >= 0)
+	if (err >= 0 && err < I8K_MAX_TEMP)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP2;
 	err = i8k_get_temp(2);
-	if (err >= 0)
+	if (err >= 0 && err < I8K_MAX_TEMP)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP3;
 	err = i8k_get_temp(3);
-	if (err >= 0)
+	if (err >= 0 && err < I8K_MAX_TEMP)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
 
 	/* Left fan attributes, if left fan is present */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-19 14:46 [PATCH] i8k: Ignore temperature sensors which report invalid values Pali Rohár
@ 2014-10-19 15:13 ` Guenter Roeck
  2014-10-20 16:46   ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-10-19 15:13 UTC (permalink / raw)
  To: Pali Rohár, Arnd Bergmann, Greg Kroah-Hartman
  Cc: linux-kernel, Steven Honeyman

On 10/19/2014 07:46 AM, Pali Rohár wrote:
> On some machines some temperature sensors report too high values which are

What is "too high", and what is "some machines" ?
Would be great if you can be more specific.

> invalid. When value is invalid function i8k_get_temp() returns previous
> value and at next call it returns I8K_MAX_TEMP.
>
> With this patch also firt i8k_get_temp() call returns I8K_MAX_TEMP and

fix ? Also, I am not entirely sure I understand what exactly you are fixing here.

> i8k_init_hwmon() will ignore all sensor ids which report incorrect values.
>
> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> ---
>   drivers/char/i8k.c |   10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> index 7272b08..bc327fa 100644
> --- a/drivers/char/i8k.c
> +++ b/drivers/char/i8k.c
> @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
>   	int temp;
>
>   #ifdef I8K_TEMPERATURE_BUG
> -	static int prev[4];
> +	static int prev[4] = { I8K_MAX_TEMP, I8K_MAX_TEMP, I8K_MAX_TEMP, I8K_MAX_TEMP };

I am not sure I understand what this change is expected to accomplish.
Please explain.

>   #endif
>   	regs.ebx = sensor & 0xff;
>   	rc = i8k_smm(&regs);
> @@ -610,17 +610,17 @@ static int __init i8k_init_hwmon(void)
>
>   	/* CPU temperature attributes, if temperature reading is OK */
>   	err = i8k_get_temp(0);
> -	if (err >= 0)
> +	if (err >= 0 && err < I8K_MAX_TEMP)

I8K_MAX_TEMP is, at least in theory, a valid temperature, so this
should be "<=".

It would be important to understand what the "too high" temperature is
to possibly be able to distinguish it from the buggy temperature
that the code is trying to fix.

Thanks,
Guenter

>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP1;
>   	/* check for additional temperature sensors */
>   	err = i8k_get_temp(1);
> -	if (err >= 0)
> +	if (err >= 0 && err < I8K_MAX_TEMP)
>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP2;
>   	err = i8k_get_temp(2);
> -	if (err >= 0)
> +	if (err >= 0 && err < I8K_MAX_TEMP)
>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP3;
>   	err = i8k_get_temp(3);
> -	if (err >= 0)
> +	if (err >= 0 && err < I8K_MAX_TEMP)
>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
>
>   	/* Left fan attributes, if left fan is present */
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-19 15:13 ` Guenter Roeck
@ 2014-10-20 16:46   ` Pali Rohár
  2014-10-21  4:27     ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-10-20 16:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

[-- Attachment #1: Type: Text/Plain, Size: 3785 bytes --]

Ok, I will describe my problem. Guenter, maybe you can find 
another solution/fix for it.

Calling i8k_get_temp(3) on my laptop without I8K_TEMPERATURE_BUG 
always returns value 193 (which is above I8K_MAX_TEMP).

When I8K_TEMPERATURE_BUG is enabled (by default) then 
i8k_get_temp(3) returns value from prev[3] and store new value 
I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3] is initialized 
to 0.

What I want to achieve is: when i8k_get_temp() for particular 
sensor id always returns invalid value (> I8K_MAX_TEMP) then we 
should totally ignore sensor with that id and do not export it 
via hwmon.

My solution is: initialize prev[id] to I8K_MAX_TEMP, so on 
invalid data first call to i8k_get_temp(id) returns I8K_MAX_TEMP. 
Then in i8k_init_hwmon check if value is < I8K_MAX_TEMP and if 
not ignore sensor id.

Guenter, it is clear now? Are you ok that we should ignore sensor 
if always report value above I8K_MAX_TEMP? If you do not like my 
solution/patch for it, can you specify how other can it be fixed?

On Sunday 19 October 2014 17:13:29 Guenter Roeck wrote:
> On 10/19/2014 07:46 AM, Pali Rohár wrote:
> > On some machines some temperature sensors report too high
> > values which are
> 
> What is "too high", and what is "some machines" ?
> Would be great if you can be more specific.
> 
> > invalid. When value is invalid function i8k_get_temp()
> > returns previous value and at next call it returns
> > I8K_MAX_TEMP.
> > 
> > With this patch also firt i8k_get_temp() call returns
> > I8K_MAX_TEMP and
> 
> fix ? Also, I am not entirely sure I understand what exactly
> you are fixing here.
> 
> > i8k_init_hwmon() will ignore all sensor ids which report
> > incorrect values.
> > 
> > Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> > ---
> > 
> >   drivers/char/i8k.c |   10 +++++-----
> >   1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> > index 7272b08..bc327fa 100644
> > --- a/drivers/char/i8k.c
> > +++ b/drivers/char/i8k.c
> > @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
> > 
> >   	int temp;
> >   
> >   #ifdef I8K_TEMPERATURE_BUG
> > 
> > -	static int prev[4];
> > +	static int prev[4] = { I8K_MAX_TEMP, I8K_MAX_TEMP,
> > I8K_MAX_TEMP, I8K_MAX_TEMP };
> 
> I am not sure I understand what this change is expected to
> accomplish. Please explain.
> 
> >   #endif
> >   
> >   	regs.ebx = sensor & 0xff;
> >   	rc = i8k_smm(&regs);
> > 
> > @@ -610,17 +610,17 @@ static int __init i8k_init_hwmon(void)
> > 
> >   	/* CPU temperature attributes, if temperature reading is
> >   	OK */ err = i8k_get_temp(0);
> > 
> > -	if (err >= 0)
> > +	if (err >= 0 && err < I8K_MAX_TEMP)
> 
> I8K_MAX_TEMP is, at least in theory, a valid temperature, so
> this should be "<=".
> 
> It would be important to understand what the "too high"
> temperature is to possibly be able to distinguish it from the
> buggy temperature that the code is trying to fix.
> 
> Thanks,
> Guenter
> 
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP1;
> >   	
> >   	/* check for additional temperature sensors */
> >   	err = i8k_get_temp(1);
> > 
> > -	if (err >= 0)
> > +	if (err >= 0 && err < I8K_MAX_TEMP)
> > 
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP2;
> >   	
> >   	err = i8k_get_temp(2);
> > 
> > -	if (err >= 0)
> > +	if (err >= 0 && err < I8K_MAX_TEMP)
> > 
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP3;
> >   	
> >   	err = i8k_get_temp(3);
> > 
> > -	if (err >= 0)
> > +	if (err >= 0 && err < I8K_MAX_TEMP)
> > 
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
> >   	
> >   	/* Left fan attributes, if left fan is present */

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-20 16:46   ` Pali Rohár
@ 2014-10-21  4:27     ` Guenter Roeck
  2014-10-22 12:29       ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-10-21  4:27 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

On 10/20/2014 09:46 AM, Pali Rohár wrote:
> Ok, I will describe my problem. Guenter, maybe you can find
> another solution/fix for it.
>
> Calling i8k_get_temp(3) on my laptop without I8K_TEMPERATURE_BUG
> always returns value 193 (which is above I8K_MAX_TEMP).
>
> When I8K_TEMPERATURE_BUG is enabled (by default) then
> i8k_get_temp(3) returns value from prev[3] and store new value
> I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3] is initialized
> to 0.
>
> What I want to achieve is: when i8k_get_temp() for particular
> sensor id always returns invalid value (> I8K_MAX_TEMP) then we
> should totally ignore sensor with that id and do not export it
> via hwmon.
>
> My solution is: initialize prev[id] to I8K_MAX_TEMP, so on
> invalid data first call to i8k_get_temp(id) returns I8K_MAX_TEMP.
> Then in i8k_init_hwmon check if value is < I8K_MAX_TEMP and if
> not ignore sensor id.
>
> Guenter, it is clear now? Are you ok that we should ignore sensor
> if always report value above I8K_MAX_TEMP? If you do not like my
> solution/patch for it, can you specify how other can it be fixed?
>

I still don't see the point in initializing prev[].

Yes, I am ok with ignoring sensor values if the reported temperature
is above I8K_MAX_TEMP. I am just not sure if we should check against
I8K_MAX_TEMP or against, say, 192. Reason is that we do know that
the sensor can erroneously return 0x99 on some systems once in a
while. We would not want to ignore those sensors just because they
happen to report 0x99 during initialization.

So maybe make it
	if (err >= 0 && err < 192)
and add a note before the first if(), explaining that higher values
suggest that there is no sensor attached.

Thanks,
Guenter

> On Sunday 19 October 2014 17:13:29 Guenter Roeck wrote:
>> On 10/19/2014 07:46 AM, Pali Rohár wrote:
>>> On some machines some temperature sensors report too high
>>> values which are
>>
>> What is "too high", and what is "some machines" ?
>> Would be great if you can be more specific.
>>
>>> invalid. When value is invalid function i8k_get_temp()
>>> returns previous value and at next call it returns
>>> I8K_MAX_TEMP.
>>>
>>> With this patch also firt i8k_get_temp() call returns
>>> I8K_MAX_TEMP and
>>
>> fix ? Also, I am not entirely sure I understand what exactly
>> you are fixing here.
>>
>>> i8k_init_hwmon() will ignore all sensor ids which report
>>> incorrect values.
>>>
>>> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
>>> ---
>>>
>>>    drivers/char/i8k.c |   10 +++++-----
>>>    1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
>>> index 7272b08..bc327fa 100644
>>> --- a/drivers/char/i8k.c
>>> +++ b/drivers/char/i8k.c
>>> @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
>>>
>>>    	int temp;
>>>
>>>    #ifdef I8K_TEMPERATURE_BUG
>>>
>>> -	static int prev[4];
>>> +	static int prev[4] = { I8K_MAX_TEMP, I8K_MAX_TEMP,
>>> I8K_MAX_TEMP, I8K_MAX_TEMP };
>>
>> I am not sure I understand what this change is expected to
>> accomplish. Please explain.
>>
>>>    #endif
>>>
>>>    	regs.ebx = sensor & 0xff;
>>>    	rc = i8k_smm(&regs);
>>>
>>> @@ -610,17 +610,17 @@ static int __init i8k_init_hwmon(void)
>>>
>>>    	/* CPU temperature attributes, if temperature reading is
>>>    	OK */ err = i8k_get_temp(0);
>>>
>>> -	if (err >= 0)
>>> +	if (err >= 0 && err < I8K_MAX_TEMP)
>>
>> I8K_MAX_TEMP is, at least in theory, a valid temperature, so
>> this should be "<=".
>>
>> It would be important to understand what the "too high"
>> temperature is to possibly be able to distinguish it from the
>> buggy temperature that the code is trying to fix.
>>
>> Thanks,
>> Guenter
>>
>>>    		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP1;
>>>    	
>>>    	/* check for additional temperature sensors */
>>>    	err = i8k_get_temp(1);
>>>
>>> -	if (err >= 0)
>>> +	if (err >= 0 && err < I8K_MAX_TEMP)
>>>
>>>    		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP2;
>>>    	
>>>    	err = i8k_get_temp(2);
>>>
>>> -	if (err >= 0)
>>> +	if (err >= 0 && err < I8K_MAX_TEMP)
>>>
>>>    		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP3;
>>>    	
>>>    	err = i8k_get_temp(3);
>>>
>>> -	if (err >= 0)
>>> +	if (err >= 0 && err < I8K_MAX_TEMP)
>>>
>>>    		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
>>>    	
>>>    	/* Left fan attributes, if left fan is present */
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-21  4:27     ` Guenter Roeck
@ 2014-10-22 12:29       ` Pali Rohár
  2014-10-22 16:19         ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-10-22 12:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

[-- Attachment #1: Type: Text/Plain, Size: 2969 bytes --]

On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > Ok, I will describe my problem. Guenter, maybe you can find
> > another solution/fix for it.
> > 
> > Calling i8k_get_temp(3) on my laptop without
> > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > above I8K_MAX_TEMP).
> > 
> > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > i8k_get_temp(3) returns value from prev[3] and store new
> > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3] is
> > initialized to 0.
> > 
> > What I want to achieve is: when i8k_get_temp() for
> > particular sensor id always returns invalid value (>
> > I8K_MAX_TEMP) then we should totally ignore sensor with
> > that id and do not export it via hwmon.
> > 
> > My solution is: initialize prev[id] to I8K_MAX_TEMP, so on
> > invalid data first call to i8k_get_temp(id) returns
> > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is <
> > I8K_MAX_TEMP and if not ignore sensor id.
> > 
> > Guenter, it is clear now? Are you ok that we should ignore
> > sensor if always report value above I8K_MAX_TEMP? If you do
> > not like my solution/patch for it, can you specify how
> > other can it be fixed?
> 
> I still don't see the point in initializing prev[].
> 

Now prev[] is initialized to 0. It means that first call 
i8k_get_temp() (with sensor id which return value > I8K_MAX_TEMP) 
returns 0. Second and other calls returns I8K_MAX_TEMP.

So point is to return same value for first and other calls.

> Yes, I am ok with ignoring sensor values if the reported
> temperature is above I8K_MAX_TEMP. I am just not sure if we
> should check against I8K_MAX_TEMP or against, say, 192.
> Reason is that we do know that the sensor can erroneously
> return 0x99 on some systems once in a while. We would not
> want to ignore those sensors just because they happen to
> report 0x99 during initialization.
> 
> So maybe make it
> 	if (err >= 0 && err < 192)
> and add a note before the first if(), explaining that higher
> values suggest that there is no sensor attached.
> 
> Thanks,
> Guenter
> 

Right, now we need to decide which magic constant to use...

And now I found another problem :-)

On my laptop i8k_get_temp(3) not always return value 193. It is 
only when AMD graphics card is turned off. When card is on 
i8k_get_temp(3) returns same value as temperature hwmon part from 
radeon DRM driver.

So it looks like that on my laptop i8k sensor with id 3 reports 
GPU temperature.

When card is turned off radeon driver reports -EINVAL for 
temperature hwmon sysnode.

So now I think i8k could not ignore sensor totally as it can be 
mapped to some HW which can be dynamically turned on/off (like my 
graphics card).

So what do you think about reporting -EINVAL instead I8K_MAX_TEMP 
when dell SMM returns value above I8K_MAX_TEMP?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-22 12:29       ` Pali Rohár
@ 2014-10-22 16:19         ` Guenter Roeck
  2014-10-22 16:35           ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-10-22 16:19 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > Ok, I will describe my problem. Guenter, maybe you can find
> > > another solution/fix for it.
> > > 
> > > Calling i8k_get_temp(3) on my laptop without
> > > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > > above I8K_MAX_TEMP).
> > > 
> > > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > > i8k_get_temp(3) returns value from prev[3] and store new
> > > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3] is
> > > initialized to 0.
> > > 
> > > What I want to achieve is: when i8k_get_temp() for
> > > particular sensor id always returns invalid value (>
> > > I8K_MAX_TEMP) then we should totally ignore sensor with
> > > that id and do not export it via hwmon.
> > > 
> > > My solution is: initialize prev[id] to I8K_MAX_TEMP, so on
> > > invalid data first call to i8k_get_temp(id) returns
> > > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is <
> > > I8K_MAX_TEMP and if not ignore sensor id.
> > > 
> > > Guenter, it is clear now? Are you ok that we should ignore
> > > sensor if always report value above I8K_MAX_TEMP? If you do
> > > not like my solution/patch for it, can you specify how
> > > other can it be fixed?
> > 
> > I still don't see the point in initializing prev[].
> > 
> 
> Now prev[] is initialized to 0. It means that first call 
> i8k_get_temp() (with sensor id which return value > I8K_MAX_TEMP) 
> returns 0. Second and other calls returns I8K_MAX_TEMP.
> 
> So point is to return same value for first and other calls.
> 
Yes, I realized that after I sent my previous mail.

> > Yes, I am ok with ignoring sensor values if the reported
> > temperature is above I8K_MAX_TEMP. I am just not sure if we
> > should check against I8K_MAX_TEMP or against, say, 192.
> > Reason is that we do know that the sensor can erroneously
> > return 0x99 on some systems once in a while. We would not
> > want to ignore those sensors just because they happen to
> > report 0x99 during initialization.
> > 
> > So maybe make it
> > 	if (err >= 0 && err < 192)
> > and add a note before the first if(), explaining that higher
> > values suggest that there is no sensor attached.
> > 
> > Thanks,
> > Guenter
> > 
> 
> Right, now we need to decide which magic constant to use...
> 
> And now I found another problem :-)
> 
> On my laptop i8k_get_temp(3) not always return value 193. It is 
> only when AMD graphics card is turned off. When card is on 
> i8k_get_temp(3) returns same value as temperature hwmon part from 
> radeon DRM driver.
> 
Can you turn the GPU on or off during runtime ?
That would make it really tricky to handle the situation.

> So it looks like that on my laptop i8k sensor with id 3 reports 
> GPU temperature.
> 
> When card is turned off radeon driver reports -EINVAL for 
> temperature hwmon sysnode.
> 
> So now I think i8k could not ignore sensor totally as it can be 
> mapped to some HW which can be dynamically turned on/off (like my 
> graphics card).
> 
> So what do you think about reporting -EINVAL instead I8K_MAX_TEMP 
> when dell SMM returns value above I8K_MAX_TEMP?
> 

-EINVAL is supposed to mean "Invalid Argument", so it really has 
ia different semantics. We could use -ENXIO, "No such device or address",
which seems more appropriate.

Overall, I think the entire error handling is broken and should be
replaced. One option would be to explicitly check for 0x99 and, if
detected, go to sleep for, say, 100ms and try again. If it still fails,
and for all other bad values, return -ENXIO. Then the calling code can
either return the error to user space in the show function, or not
install the sensor in the probe function.

Does that make sense ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-22 16:19         ` Guenter Roeck
@ 2014-10-22 16:35           ` Pali Rohár
  2014-10-22 17:10             ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-10-22 16:35 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

[-- Attachment #1: Type: Text/Plain, Size: 5236 bytes --]

On Wednesday 22 October 2014 18:19:47 Guenter Roeck wrote:
> On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> > On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > > Ok, I will describe my problem. Guenter, maybe you can
> > > > find another solution/fix for it.
> > > > 
> > > > Calling i8k_get_temp(3) on my laptop without
> > > > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > > > above I8K_MAX_TEMP).
> > > > 
> > > > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > > > i8k_get_temp(3) returns value from prev[3] and store new
> > > > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3]
> > > > is initialized to 0.
> > > > 
> > > > What I want to achieve is: when i8k_get_temp() for
> > > > particular sensor id always returns invalid value (>
> > > > I8K_MAX_TEMP) then we should totally ignore sensor with
> > > > that id and do not export it via hwmon.
> > > > 
> > > > My solution is: initialize prev[id] to I8K_MAX_TEMP, so
> > > > on invalid data first call to i8k_get_temp(id) returns
> > > > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is
> > > > < I8K_MAX_TEMP and if not ignore sensor id.
> > > > 
> > > > Guenter, it is clear now? Are you ok that we should
> > > > ignore sensor if always report value above
> > > > I8K_MAX_TEMP? If you do not like my solution/patch for
> > > > it, can you specify how other can it be fixed?
> > > 
> > > I still don't see the point in initializing prev[].
> > 
> > Now prev[] is initialized to 0. It means that first call
> > i8k_get_temp() (with sensor id which return value >
> > I8K_MAX_TEMP) returns 0. Second and other calls returns
> > I8K_MAX_TEMP.
> > 
> > So point is to return same value for first and other calls.
> 
> Yes, I realized that after I sent my previous mail.
> 
> > > Yes, I am ok with ignoring sensor values if the reported
> > > temperature is above I8K_MAX_TEMP. I am just not sure if
> > > we should check against I8K_MAX_TEMP or against, say,
> > > 192. Reason is that we do know that the sensor can
> > > erroneously return 0x99 on some systems once in a while.
> > > We would not want to ignore those sensors just because
> > > they happen to report 0x99 during initialization.
> > > 
> > > So maybe make it
> > > 
> > > 	if (err >= 0 && err < 192)
> > > 
> > > and add a note before the first if(), explaining that
> > > higher values suggest that there is no sensor attached.
> > > 
> > > Thanks,
> > > Guenter
> > 
> > Right, now we need to decide which magic constant to use...
> > 
> > And now I found another problem :-)
> > 
> > On my laptop i8k_get_temp(3) not always return value 193. It
> > is only when AMD graphics card is turned off. When card is
> > on i8k_get_temp(3) returns same value as temperature hwmon
> > part from radeon DRM driver.
> 
> Can you turn the GPU on or off during runtime ?
> That would make it really tricky to handle the situation.
> 

Yes. New laptops with Nvidia Optimus or AMD PowerXpress or Enduro 
technology are designed to automatically turn off secondary GPU 
when is not in use. And nouveau/radeon drivers together with 
vga_switcheroo implements this dynamic power on/off.

> > So it looks like that on my laptop i8k sensor with id 3
> > reports GPU temperature.
> > 
> > When card is turned off radeon driver reports -EINVAL for
> > temperature hwmon sysnode.
> > 
> > So now I think i8k could not ignore sensor totally as it can
> > be mapped to some HW which can be dynamically turned on/off
> > (like my graphics card).
> > 
> > So what do you think about reporting -EINVAL instead
> > I8K_MAX_TEMP when dell SMM returns value above
> > I8K_MAX_TEMP?
> 
> -EINVAL is supposed to mean "Invalid Argument", so it really
> has ia different semantics. We could use -ENXIO, "No such
> device or address", which seems more appropriate.
> 

I prefer to use -EINVAL because other driver (radeon) is using it 
and userspace "sensors" programs handle EINVAL and show "N/A" in 
output instead reporting some error about reading value. If 
nothing else consistency (with other drivers) is my argument.

> Overall, I think the entire error handling is broken and
> should be replaced. One option would be to explicitly check
> for 0x99 and, if detected, go to sleep for, say, 100ms and
> try again. If it still fails, and for all other bad values,
> return -ENXIO. Then the calling code can either return the
> error to user space in the show function, or not install the
> sensor in the probe function.
> 
> Does that make sense ?
> 

Yes, replacing error handling with retry call (after some sleep) 
is better then current code (which returns previous value or 
returns I8K_MAX_TEMP).

But your solution not install the sensor if probe fails on bad 
value does not solve problem that i8k.ko is loading at time when 
GPU card is turned off.

I think current check for installing sensor (err < 0) is enough 
and when invalid value (> I8K_MAX_TEMP) is returned just do not 
show this value for userspace in hwmon sysnode.

> Thanks,
> Guenter

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-22 16:35           ` Pali Rohár
@ 2014-10-22 17:10             ` Guenter Roeck
  2014-10-23 10:37               ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-10-22 17:10 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár wrote:
> On Wednesday 22 October 2014 18:19:47 Guenter Roeck wrote:
> > On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> > > On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > > > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > > > Ok, I will describe my problem. Guenter, maybe you can
> > > > > find another solution/fix for it.
> > > > > 
> > > > > Calling i8k_get_temp(3) on my laptop without
> > > > > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > > > > above I8K_MAX_TEMP).
> > > > > 
> > > > > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > > > > i8k_get_temp(3) returns value from prev[3] and store new
> > > > > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3]
> > > > > is initialized to 0.
> > > > > 
> > > > > What I want to achieve is: when i8k_get_temp() for
> > > > > particular sensor id always returns invalid value (>
> > > > > I8K_MAX_TEMP) then we should totally ignore sensor with
> > > > > that id and do not export it via hwmon.
> > > > > 
> > > > > My solution is: initialize prev[id] to I8K_MAX_TEMP, so
> > > > > on invalid data first call to i8k_get_temp(id) returns
> > > > > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is
> > > > > < I8K_MAX_TEMP and if not ignore sensor id.
> > > > > 
> > > > > Guenter, it is clear now? Are you ok that we should
> > > > > ignore sensor if always report value above
> > > > > I8K_MAX_TEMP? If you do not like my solution/patch for
> > > > > it, can you specify how other can it be fixed?
> > > > 
> > > > I still don't see the point in initializing prev[].
> > > 
> > > Now prev[] is initialized to 0. It means that first call
> > > i8k_get_temp() (with sensor id which return value >
> > > I8K_MAX_TEMP) returns 0. Second and other calls returns
> > > I8K_MAX_TEMP.
> > > 
> > > So point is to return same value for first and other calls.
> > 
> > Yes, I realized that after I sent my previous mail.
> > 
> > > > Yes, I am ok with ignoring sensor values if the reported
> > > > temperature is above I8K_MAX_TEMP. I am just not sure if
> > > > we should check against I8K_MAX_TEMP or against, say,
> > > > 192. Reason is that we do know that the sensor can
> > > > erroneously return 0x99 on some systems once in a while.
> > > > We would not want to ignore those sensors just because
> > > > they happen to report 0x99 during initialization.
> > > > 
> > > > So maybe make it
> > > > 
> > > > 	if (err >= 0 && err < 192)
> > > > 
> > > > and add a note before the first if(), explaining that
> > > > higher values suggest that there is no sensor attached.
> > > > 
> > > > Thanks,
> > > > Guenter
> > > 
> > > Right, now we need to decide which magic constant to use...
> > > 
> > > And now I found another problem :-)
> > > 
> > > On my laptop i8k_get_temp(3) not always return value 193. It
> > > is only when AMD graphics card is turned off. When card is
> > > on i8k_get_temp(3) returns same value as temperature hwmon
> > > part from radeon DRM driver.
> > 
> > Can you turn the GPU on or off during runtime ?
> > That would make it really tricky to handle the situation.
> > 
> 
> Yes. New laptops with Nvidia Optimus or AMD PowerXpress or Enduro 
> technology are designed to automatically turn off secondary GPU 
> when is not in use. And nouveau/radeon drivers together with 
> vga_switcheroo implements this dynamic power on/off.
> 
> > > So it looks like that on my laptop i8k sensor with id 3
> > > reports GPU temperature.
> > > 
> > > When card is turned off radeon driver reports -EINVAL for
> > > temperature hwmon sysnode.
> > > 
> > > So now I think i8k could not ignore sensor totally as it can
> > > be mapped to some HW which can be dynamically turned on/off
> > > (like my graphics card).
> > > 
> > > So what do you think about reporting -EINVAL instead
> > > I8K_MAX_TEMP when dell SMM returns value above
> > > I8K_MAX_TEMP?
> > 
> > -EINVAL is supposed to mean "Invalid Argument", so it really
> > has ia different semantics. We could use -ENXIO, "No such
> > device or address", which seems more appropriate.
> > 
> 
> I prefer to use -EINVAL because other driver (radeon) is using it 
> and userspace "sensors" programs handle EINVAL and show "N/A" in 
> output instead reporting some error about reading value. If 
> nothing else consistency (with other drivers) is my argument.
> 
Ok, if sensors implements it that way then let's do it.

> > Overall, I think the entire error handling is broken and
> > should be replaced. One option would be to explicitly check
> > for 0x99 and, if detected, go to sleep for, say, 100ms and
> > try again. If it still fails, and for all other bad values,
> > return -ENXIO. Then the calling code can either return the
> > error to user space in the show function, or not install the
> > sensor in the probe function.
> > 
> > Does that make sense ?
> > 
> 
> Yes, replacing error handling with retry call (after some sleep) 
> is better then current code (which returns previous value or 
> returns I8K_MAX_TEMP).
> 
> But your solution not install the sensor if probe fails on bad 
> value does not solve problem that i8k.ko is loading at time when 
> GPU card is turned off.
> 
Yes, the dynamics in that situation makes it a bit difficult to
handle the situation.

> I think current check for installing sensor (err < 0) is enough 
> and when invalid value (> I8K_MAX_TEMP) is returned just do not 
> show this value for userspace in hwmon sysnode.
> 
Ok with me, and agreed.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-22 17:10             ` Guenter Roeck
@ 2014-10-23 10:37               ` Pali Rohár
  2014-10-23 16:45                 ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-10-23 10:37 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

[-- Attachment #1: Type: Text/Plain, Size: 6154 bytes --]

On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote:
> On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár wrote:
> > On Wednesday 22 October 2014 18:19:47 Guenter Roeck wrote:
> > > On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> > > > On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > > > > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > > > > Ok, I will describe my problem. Guenter, maybe you
> > > > > > can find another solution/fix for it.
> > > > > > 
> > > > > > Calling i8k_get_temp(3) on my laptop without
> > > > > > I8K_TEMPERATURE_BUG always returns value 193 (which
> > > > > > is above I8K_MAX_TEMP).
> > > > > > 
> > > > > > When I8K_TEMPERATURE_BUG is enabled (by default)
> > > > > > then i8k_get_temp(3) returns value from prev[3] and
> > > > > > store new value I8K_TEMPERATURE_BUG to prev[3].
> > > > > > Value in prev[3] is initialized to 0.
> > > > > > 
> > > > > > What I want to achieve is: when i8k_get_temp() for
> > > > > > particular sensor id always returns invalid value (>
> > > > > > I8K_MAX_TEMP) then we should totally ignore sensor
> > > > > > with that id and do not export it via hwmon.
> > > > > > 
> > > > > > My solution is: initialize prev[id] to I8K_MAX_TEMP,
> > > > > > so on invalid data first call to i8k_get_temp(id)
> > > > > > returns I8K_MAX_TEMP. Then in i8k_init_hwmon check
> > > > > > if value is < I8K_MAX_TEMP and if not ignore sensor
> > > > > > id.
> > > > > > 
> > > > > > Guenter, it is clear now? Are you ok that we should
> > > > > > ignore sensor if always report value above
> > > > > > I8K_MAX_TEMP? If you do not like my solution/patch
> > > > > > for it, can you specify how other can it be fixed?
> > > > > 
> > > > > I still don't see the point in initializing prev[].
> > > > 
> > > > Now prev[] is initialized to 0. It means that first call
> > > > i8k_get_temp() (with sensor id which return value >
> > > > I8K_MAX_TEMP) returns 0. Second and other calls returns
> > > > I8K_MAX_TEMP.
> > > > 
> > > > So point is to return same value for first and other
> > > > calls.
> > > 
> > > Yes, I realized that after I sent my previous mail.
> > > 
> > > > > Yes, I am ok with ignoring sensor values if the
> > > > > reported temperature is above I8K_MAX_TEMP. I am just
> > > > > not sure if we should check against I8K_MAX_TEMP or
> > > > > against, say, 192. Reason is that we do know that the
> > > > > sensor can erroneously return 0x99 on some systems
> > > > > once in a while. We would not want to ignore those
> > > > > sensors just because they happen to report 0x99
> > > > > during initialization.
> > > > > 
> > > > > So maybe make it
> > > > > 
> > > > > 	if (err >= 0 && err < 192)
> > > > > 
> > > > > and add a note before the first if(), explaining that
> > > > > higher values suggest that there is no sensor
> > > > > attached.
> > > > > 
> > > > > Thanks,
> > > > > Guenter
> > > > 
> > > > Right, now we need to decide which magic constant to
> > > > use...
> > > > 
> > > > And now I found another problem :-)
> > > > 
> > > > On my laptop i8k_get_temp(3) not always return value
> > > > 193. It is only when AMD graphics card is turned off.
> > > > When card is on i8k_get_temp(3) returns same value as
> > > > temperature hwmon part from radeon DRM driver.
> > > 
> > > Can you turn the GPU on or off during runtime ?
> > > That would make it really tricky to handle the situation.
> > 
> > Yes. New laptops with Nvidia Optimus or AMD PowerXpress or
> > Enduro technology are designed to automatically turn off
> > secondary GPU when is not in use. And nouveau/radeon
> > drivers together with vga_switcheroo implements this
> > dynamic power on/off.
> > 
> > > > So it looks like that on my laptop i8k sensor with id 3
> > > > reports GPU temperature.
> > > > 
> > > > When card is turned off radeon driver reports -EINVAL
> > > > for temperature hwmon sysnode.
> > > > 
> > > > So now I think i8k could not ignore sensor totally as it
> > > > can be mapped to some HW which can be dynamically
> > > > turned on/off (like my graphics card).
> > > > 
> > > > So what do you think about reporting -EINVAL instead
> > > > I8K_MAX_TEMP when dell SMM returns value above
> > > > I8K_MAX_TEMP?
> > > 
> > > -EINVAL is supposed to mean "Invalid Argument", so it
> > > really has ia different semantics. We could use -ENXIO,
> > > "No such device or address", which seems more
> > > appropriate.
> > 
> > I prefer to use -EINVAL because other driver (radeon) is
> > using it and userspace "sensors" programs handle EINVAL and
> > show "N/A" in output instead reporting some error about
> > reading value. If nothing else consistency (with other
> > drivers) is my argument.
> 
> Ok, if sensors implements it that way then let's do it.
> 
> > > Overall, I think the entire error handling is broken and
> > > should be replaced. One option would be to explicitly
> > > check for 0x99 and, if detected, go to sleep for, say,
> > > 100ms and try again. If it still fails, and for all other
> > > bad values, return -ENXIO. Then the calling code can
> > > either return the error to user space in the show
> > > function, or not install the sensor in the probe
> > > function.
> > > 
> > > Does that make sense ?
> > 
> > Yes, replacing error handling with retry call (after some
> > sleep) is better then current code (which returns previous
> > value or returns I8K_MAX_TEMP).
> > 
> > But your solution not install the sensor if probe fails on
> > bad value does not solve problem that i8k.ko is loading at
> > time when GPU card is turned off.
> 
> Yes, the dynamics in that situation makes it a bit difficult
> to handle the situation.
> 
> > I think current check for installing sensor (err < 0) is
> > enough and when invalid value (> I8K_MAX_TEMP) is returned
> > just do not show this value for userspace in hwmon sysnode.
> 
> Ok with me, and agreed.
> 
> Thanks,
> Guenter

Ok, are you going to fix i8k_get_temp() function (with sleeping)?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-23 10:37               ` Pali Rohár
@ 2014-10-23 16:45                 ` Guenter Roeck
  2014-11-17  8:35                   ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-10-23 16:45 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

On Thu, Oct 23, 2014 at 12:37:34PM +0200, Pali Rohár wrote:
> On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote:
> > On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár wrote:
> > > On Wednesday 22 October 2014 18:19:47 Guenter Roeck wrote:
> > > > On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> > > > > On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > > > > > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > > > > > Ok, I will describe my problem. Guenter, maybe you
> > > > > > > can find another solution/fix for it.
> > > > > > > 
> > > > > > > Calling i8k_get_temp(3) on my laptop without
> > > > > > > I8K_TEMPERATURE_BUG always returns value 193 (which
> > > > > > > is above I8K_MAX_TEMP).
> > > > > > > 
> > > > > > > When I8K_TEMPERATURE_BUG is enabled (by default)
> > > > > > > then i8k_get_temp(3) returns value from prev[3] and
> > > > > > > store new value I8K_TEMPERATURE_BUG to prev[3].
> > > > > > > Value in prev[3] is initialized to 0.
> > > > > > > 
> > > > > > > What I want to achieve is: when i8k_get_temp() for
> > > > > > > particular sensor id always returns invalid value (>
> > > > > > > I8K_MAX_TEMP) then we should totally ignore sensor
> > > > > > > with that id and do not export it via hwmon.
> > > > > > > 
> > > > > > > My solution is: initialize prev[id] to I8K_MAX_TEMP,
> > > > > > > so on invalid data first call to i8k_get_temp(id)
> > > > > > > returns I8K_MAX_TEMP. Then in i8k_init_hwmon check
> > > > > > > if value is < I8K_MAX_TEMP and if not ignore sensor
> > > > > > > id.
> > > > > > > 
> > > > > > > Guenter, it is clear now? Are you ok that we should
> > > > > > > ignore sensor if always report value above
> > > > > > > I8K_MAX_TEMP? If you do not like my solution/patch
> > > > > > > for it, can you specify how other can it be fixed?
> > > > > > 
> > > > > > I still don't see the point in initializing prev[].
> > > > > 
> > > > > Now prev[] is initialized to 0. It means that first call
> > > > > i8k_get_temp() (with sensor id which return value >
> > > > > I8K_MAX_TEMP) returns 0. Second and other calls returns
> > > > > I8K_MAX_TEMP.
> > > > > 
> > > > > So point is to return same value for first and other
> > > > > calls.
> > > > 
> > > > Yes, I realized that after I sent my previous mail.
> > > > 
> > > > > > Yes, I am ok with ignoring sensor values if the
> > > > > > reported temperature is above I8K_MAX_TEMP. I am just
> > > > > > not sure if we should check against I8K_MAX_TEMP or
> > > > > > against, say, 192. Reason is that we do know that the
> > > > > > sensor can erroneously return 0x99 on some systems
> > > > > > once in a while. We would not want to ignore those
> > > > > > sensors just because they happen to report 0x99
> > > > > > during initialization.
> > > > > > 
> > > > > > So maybe make it
> > > > > > 
> > > > > > 	if (err >= 0 && err < 192)
> > > > > > 
> > > > > > and add a note before the first if(), explaining that
> > > > > > higher values suggest that there is no sensor
> > > > > > attached.
> > > > > > 
> > > > > > Thanks,
> > > > > > Guenter
> > > > > 
> > > > > Right, now we need to decide which magic constant to
> > > > > use...
> > > > > 
> > > > > And now I found another problem :-)
> > > > > 
> > > > > On my laptop i8k_get_temp(3) not always return value
> > > > > 193. It is only when AMD graphics card is turned off.
> > > > > When card is on i8k_get_temp(3) returns same value as
> > > > > temperature hwmon part from radeon DRM driver.
> > > > 
> > > > Can you turn the GPU on or off during runtime ?
> > > > That would make it really tricky to handle the situation.
> > > 
> > > Yes. New laptops with Nvidia Optimus or AMD PowerXpress or
> > > Enduro technology are designed to automatically turn off
> > > secondary GPU when is not in use. And nouveau/radeon
> > > drivers together with vga_switcheroo implements this
> > > dynamic power on/off.
> > > 
> > > > > So it looks like that on my laptop i8k sensor with id 3
> > > > > reports GPU temperature.
> > > > > 
> > > > > When card is turned off radeon driver reports -EINVAL
> > > > > for temperature hwmon sysnode.
> > > > > 
> > > > > So now I think i8k could not ignore sensor totally as it
> > > > > can be mapped to some HW which can be dynamically
> > > > > turned on/off (like my graphics card).
> > > > > 
> > > > > So what do you think about reporting -EINVAL instead
> > > > > I8K_MAX_TEMP when dell SMM returns value above
> > > > > I8K_MAX_TEMP?
> > > > 
> > > > -EINVAL is supposed to mean "Invalid Argument", so it
> > > > really has ia different semantics. We could use -ENXIO,
> > > > "No such device or address", which seems more
> > > > appropriate.
> > > 
> > > I prefer to use -EINVAL because other driver (radeon) is
> > > using it and userspace "sensors" programs handle EINVAL and
> > > show "N/A" in output instead reporting some error about
> > > reading value. If nothing else consistency (with other
> > > drivers) is my argument.
> > 
> > Ok, if sensors implements it that way then let's do it.
> > 
> > > > Overall, I think the entire error handling is broken and
> > > > should be replaced. One option would be to explicitly
> > > > check for 0x99 and, if detected, go to sleep for, say,
> > > > 100ms and try again. If it still fails, and for all other
> > > > bad values, return -ENXIO. Then the calling code can
> > > > either return the error to user space in the show
> > > > function, or not install the sensor in the probe
> > > > function.
> > > > 
> > > > Does that make sense ?
> > > 
> > > Yes, replacing error handling with retry call (after some
> > > sleep) is better then current code (which returns previous
> > > value or returns I8K_MAX_TEMP).
> > > 
> > > But your solution not install the sensor if probe fails on
> > > bad value does not solve problem that i8k.ko is loading at
> > > time when GPU card is turned off.
> > 
> > Yes, the dynamics in that situation makes it a bit difficult
> > to handle the situation.
> > 
> > > I think current check for installing sensor (err < 0) is
> > > enough and when invalid value (> I8K_MAX_TEMP) is returned
> > > just do not show this value for userspace in hwmon sysnode.
> > 
> > Ok with me, and agreed.
> > 
> > Thanks,
> > Guenter
> 
> Ok, are you going to fix i8k_get_temp() function (with sleeping)?
> 
I had hoped you would find the time to do it ;-).

Sure, I can do it, but I am kind of busy right now, so it will take
a while.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-10-23 16:45                 ` Guenter Roeck
@ 2014-11-17  8:35                   ` Pali Rohár
  2014-11-18  5:56                     ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-11-17  8:35 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

[-- Attachment #1: Type: Text/Plain, Size: 7305 bytes --]

On Thursday 23 October 2014 18:45:07 Guenter Roeck wrote:
> On Thu, Oct 23, 2014 at 12:37:34PM +0200, Pali Rohár wrote:
> > On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote:
> > > On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár wrote:
> > > > On Wednesday 22 October 2014 18:19:47 Guenter Roeck 
wrote:
> > > > > On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár 
wrote:
> > > > > > On Tuesday 21 October 2014 06:27:23 Guenter Roeck 
wrote:
> > > > > > > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > > > > > > Ok, I will describe my problem. Guenter, maybe
> > > > > > > > you can find another solution/fix for it.
> > > > > > > > 
> > > > > > > > Calling i8k_get_temp(3) on my laptop without
> > > > > > > > I8K_TEMPERATURE_BUG always returns value 193
> > > > > > > > (which is above I8K_MAX_TEMP).
> > > > > > > > 
> > > > > > > > When I8K_TEMPERATURE_BUG is enabled (by default)
> > > > > > > > then i8k_get_temp(3) returns value from prev[3]
> > > > > > > > and store new value I8K_TEMPERATURE_BUG to
> > > > > > > > prev[3]. Value in prev[3] is initialized to 0.
> > > > > > > > 
> > > > > > > > What I want to achieve is: when i8k_get_temp()
> > > > > > > > for particular sensor id always returns invalid
> > > > > > > > value (> I8K_MAX_TEMP) then we should totally
> > > > > > > > ignore sensor with that id and do not export it
> > > > > > > > via hwmon.
> > > > > > > > 
> > > > > > > > My solution is: initialize prev[id] to
> > > > > > > > I8K_MAX_TEMP, so on invalid data first call to
> > > > > > > > i8k_get_temp(id) returns I8K_MAX_TEMP. Then in
> > > > > > > > i8k_init_hwmon check if value is < I8K_MAX_TEMP
> > > > > > > > and if not ignore sensor id.
> > > > > > > > 
> > > > > > > > Guenter, it is clear now? Are you ok that we
> > > > > > > > should ignore sensor if always report value
> > > > > > > > above I8K_MAX_TEMP? If you do not like my
> > > > > > > > solution/patch for it, can you specify how
> > > > > > > > other can it be fixed?
> > > > > > > 
> > > > > > > I still don't see the point in initializing
> > > > > > > prev[].
> > > > > > 
> > > > > > Now prev[] is initialized to 0. It means that first
> > > > > > call i8k_get_temp() (with sensor id which return
> > > > > > value > I8K_MAX_TEMP) returns 0. Second and other
> > > > > > calls returns I8K_MAX_TEMP.
> > > > > > 
> > > > > > So point is to return same value for first and other
> > > > > > calls.
> > > > > 
> > > > > Yes, I realized that after I sent my previous mail.
> > > > > 
> > > > > > > Yes, I am ok with ignoring sensor values if the
> > > > > > > reported temperature is above I8K_MAX_TEMP. I am
> > > > > > > just not sure if we should check against
> > > > > > > I8K_MAX_TEMP or against, say, 192. Reason is that
> > > > > > > we do know that the sensor can erroneously return
> > > > > > > 0x99 on some systems once in a while. We would
> > > > > > > not want to ignore those sensors just because
> > > > > > > they happen to report 0x99 during initialization.
> > > > > > > 
> > > > > > > So maybe make it
> > > > > > > 
> > > > > > > 	if (err >= 0 && err < 192)
> > > > > > > 
> > > > > > > and add a note before the first if(), explaining
> > > > > > > that higher values suggest that there is no
> > > > > > > sensor attached.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Guenter
> > > > > > 
> > > > > > Right, now we need to decide which magic constant to
> > > > > > use...
> > > > > > 
> > > > > > And now I found another problem :-)
> > > > > > 
> > > > > > On my laptop i8k_get_temp(3) not always return value
> > > > > > 193. It is only when AMD graphics card is turned
> > > > > > off. When card is on i8k_get_temp(3) returns same
> > > > > > value as temperature hwmon part from radeon DRM
> > > > > > driver.
> > > > > 
> > > > > Can you turn the GPU on or off during runtime ?
> > > > > That would make it really tricky to handle the
> > > > > situation.
> > > > 
> > > > Yes. New laptops with Nvidia Optimus or AMD PowerXpress
> > > > or Enduro technology are designed to automatically turn
> > > > off secondary GPU when is not in use. And
> > > > nouveau/radeon drivers together with vga_switcheroo
> > > > implements this dynamic power on/off.
> > > > 
> > > > > > So it looks like that on my laptop i8k sensor with
> > > > > > id 3 reports GPU temperature.
> > > > > > 
> > > > > > When card is turned off radeon driver reports
> > > > > > -EINVAL for temperature hwmon sysnode.
> > > > > > 
> > > > > > So now I think i8k could not ignore sensor totally
> > > > > > as it can be mapped to some HW which can be
> > > > > > dynamically turned on/off (like my graphics card).
> > > > > > 
> > > > > > So what do you think about reporting -EINVAL instead
> > > > > > I8K_MAX_TEMP when dell SMM returns value above
> > > > > > I8K_MAX_TEMP?
> > > > > 
> > > > > -EINVAL is supposed to mean "Invalid Argument", so it
> > > > > really has ia different semantics. We could use
> > > > > -ENXIO, "No such device or address", which seems more
> > > > > appropriate.
> > > > 
> > > > I prefer to use -EINVAL because other driver (radeon) is
> > > > using it and userspace "sensors" programs handle EINVAL
> > > > and show "N/A" in output instead reporting some error
> > > > about reading value. If nothing else consistency (with
> > > > other drivers) is my argument.
> > > 
> > > Ok, if sensors implements it that way then let's do it.
> > > 
> > > > > Overall, I think the entire error handling is broken
> > > > > and should be replaced. One option would be to
> > > > > explicitly check for 0x99 and, if detected, go to
> > > > > sleep for, say, 100ms and try again. If it still
> > > > > fails, and for all other bad values, return -ENXIO.
> > > > > Then the calling code can either return the error to
> > > > > user space in the show function, or not install the
> > > > > sensor in the probe function.
> > > > > 
> > > > > Does that make sense ?
> > > > 
> > > > Yes, replacing error handling with retry call (after
> > > > some sleep) is better then current code (which returns
> > > > previous value or returns I8K_MAX_TEMP).
> > > > 
> > > > But your solution not install the sensor if probe fails
> > > > on bad value does not solve problem that i8k.ko is
> > > > loading at time when GPU card is turned off.
> > > 
> > > Yes, the dynamics in that situation makes it a bit
> > > difficult to handle the situation.
> > > 
> > > > I think current check for installing sensor (err < 0) is
> > > > enough and when invalid value (> I8K_MAX_TEMP) is
> > > > returned just do not show this value for userspace in
> > > > hwmon sysnode.
> > > 
> > > Ok with me, and agreed.
> > > 
> > > Thanks,
> > > Guenter
> > 
> > Ok, are you going to fix i8k_get_temp() function (with
> > sleeping)?
> 
> I had hoped you would find the time to do it ;-).
> 
> Sure, I can do it, but I am kind of busy right now, so it will
> take a while.
> 
> Thanks,
> Guenter

Ok. Will you do that for 3.19 kernel? Meanwhile I can prepare 
patch for temperature labels. I looked into NBSVC.MDM and there 
is something related for type of temperature sensors...

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-11-17  8:35                   ` Pali Rohár
@ 2014-11-18  5:56                     ` Guenter Roeck
  2014-11-18 14:46                       ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-11-18  5:56 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

On 11/17/2014 12:35 AM, Pali Rohár wrote:
> On Thursday 23 October 2014 18:45:07 Guenter Roeck wrote:
>> On Thu, Oct 23, 2014 at 12:37:34PM +0200, Pali Rohár wrote:
>>> On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote:
>>>> On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár wrote:
>>>>> On Wednesday 22 October 2014 18:19:47 Guenter Roeck
> wrote:
>>>>>> On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár
> wrote:
>>>>>>> On Tuesday 21 October 2014 06:27:23 Guenter Roeck
> wrote:
>>>>>>>> On 10/20/2014 09:46 AM, Pali Rohár wrote:
>>>>>>>>> Ok, I will describe my problem. Guenter, maybe
>>>>>>>>> you can find another solution/fix for it.
>>>>>>>>>
>>>>>>>>> Calling i8k_get_temp(3) on my laptop without
>>>>>>>>> I8K_TEMPERATURE_BUG always returns value 193
>>>>>>>>> (which is above I8K_MAX_TEMP).
>>>>>>>>>
>>>>>>>>> When I8K_TEMPERATURE_BUG is enabled (by default)
>>>>>>>>> then i8k_get_temp(3) returns value from prev[3]
>>>>>>>>> and store new value I8K_TEMPERATURE_BUG to
>>>>>>>>> prev[3]. Value in prev[3] is initialized to 0.
>>>>>>>>>
>>>>>>>>> What I want to achieve is: when i8k_get_temp()
>>>>>>>>> for particular sensor id always returns invalid
>>>>>>>>> value (> I8K_MAX_TEMP) then we should totally
>>>>>>>>> ignore sensor with that id and do not export it
>>>>>>>>> via hwmon.
>>>>>>>>>
>>>>>>>>> My solution is: initialize prev[id] to
>>>>>>>>> I8K_MAX_TEMP, so on invalid data first call to
>>>>>>>>> i8k_get_temp(id) returns I8K_MAX_TEMP. Then in
>>>>>>>>> i8k_init_hwmon check if value is < I8K_MAX_TEMP
>>>>>>>>> and if not ignore sensor id.
>>>>>>>>>
>>>>>>>>> Guenter, it is clear now? Are you ok that we
>>>>>>>>> should ignore sensor if always report value
>>>>>>>>> above I8K_MAX_TEMP? If you do not like my
>>>>>>>>> solution/patch for it, can you specify how
>>>>>>>>> other can it be fixed?
>>>>>>>>
>>>>>>>> I still don't see the point in initializing
>>>>>>>> prev[].
>>>>>>>
>>>>>>> Now prev[] is initialized to 0. It means that first
>>>>>>> call i8k_get_temp() (with sensor id which return
>>>>>>> value > I8K_MAX_TEMP) returns 0. Second and other
>>>>>>> calls returns I8K_MAX_TEMP.
>>>>>>>
>>>>>>> So point is to return same value for first and other
>>>>>>> calls.
>>>>>>
>>>>>> Yes, I realized that after I sent my previous mail.
>>>>>>
>>>>>>>> Yes, I am ok with ignoring sensor values if the
>>>>>>>> reported temperature is above I8K_MAX_TEMP. I am
>>>>>>>> just not sure if we should check against
>>>>>>>> I8K_MAX_TEMP or against, say, 192. Reason is that
>>>>>>>> we do know that the sensor can erroneously return
>>>>>>>> 0x99 on some systems once in a while. We would
>>>>>>>> not want to ignore those sensors just because
>>>>>>>> they happen to report 0x99 during initialization.
>>>>>>>>
>>>>>>>> So maybe make it
>>>>>>>>
>>>>>>>> 	if (err >= 0 && err < 192)
>>>>>>>>
>>>>>>>> and add a note before the first if(), explaining
>>>>>>>> that higher values suggest that there is no
>>>>>>>> sensor attached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Guenter
>>>>>>>
>>>>>>> Right, now we need to decide which magic constant to
>>>>>>> use...
>>>>>>>
>>>>>>> And now I found another problem :-)
>>>>>>>
>>>>>>> On my laptop i8k_get_temp(3) not always return value
>>>>>>> 193. It is only when AMD graphics card is turned
>>>>>>> off. When card is on i8k_get_temp(3) returns same
>>>>>>> value as temperature hwmon part from radeon DRM
>>>>>>> driver.
>>>>>>
>>>>>> Can you turn the GPU on or off during runtime ?
>>>>>> That would make it really tricky to handle the
>>>>>> situation.
>>>>>
>>>>> Yes. New laptops with Nvidia Optimus or AMD PowerXpress
>>>>> or Enduro technology are designed to automatically turn
>>>>> off secondary GPU when is not in use. And
>>>>> nouveau/radeon drivers together with vga_switcheroo
>>>>> implements this dynamic power on/off.
>>>>>
>>>>>>> So it looks like that on my laptop i8k sensor with
>>>>>>> id 3 reports GPU temperature.
>>>>>>>
>>>>>>> When card is turned off radeon driver reports
>>>>>>> -EINVAL for temperature hwmon sysnode.
>>>>>>>
>>>>>>> So now I think i8k could not ignore sensor totally
>>>>>>> as it can be mapped to some HW which can be
>>>>>>> dynamically turned on/off (like my graphics card).
>>>>>>>
>>>>>>> So what do you think about reporting -EINVAL instead
>>>>>>> I8K_MAX_TEMP when dell SMM returns value above
>>>>>>> I8K_MAX_TEMP?
>>>>>>
>>>>>> -EINVAL is supposed to mean "Invalid Argument", so it
>>>>>> really has ia different semantics. We could use
>>>>>> -ENXIO, "No such device or address", which seems more
>>>>>> appropriate.
>>>>>
>>>>> I prefer to use -EINVAL because other driver (radeon) is
>>>>> using it and userspace "sensors" programs handle EINVAL
>>>>> and show "N/A" in output instead reporting some error
>>>>> about reading value. If nothing else consistency (with
>>>>> other drivers) is my argument.
>>>>
>>>> Ok, if sensors implements it that way then let's do it.
>>>>
>>>>>> Overall, I think the entire error handling is broken
>>>>>> and should be replaced. One option would be to
>>>>>> explicitly check for 0x99 and, if detected, go to
>>>>>> sleep for, say, 100ms and try again. If it still
>>>>>> fails, and for all other bad values, return -ENXIO.
>>>>>> Then the calling code can either return the error to
>>>>>> user space in the show function, or not install the
>>>>>> sensor in the probe function.
>>>>>>
>>>>>> Does that make sense ?
>>>>>
>>>>> Yes, replacing error handling with retry call (after
>>>>> some sleep) is better then current code (which returns
>>>>> previous value or returns I8K_MAX_TEMP).
>>>>>
>>>>> But your solution not install the sensor if probe fails
>>>>> on bad value does not solve problem that i8k.ko is
>>>>> loading at time when GPU card is turned off.
>>>>
>>>> Yes, the dynamics in that situation makes it a bit
>>>> difficult to handle the situation.
>>>>
>>>>> I think current check for installing sensor (err < 0) is
>>>>> enough and when invalid value (> I8K_MAX_TEMP) is
>>>>> returned just do not show this value for userspace in
>>>>> hwmon sysnode.
>>>>
>>>> Ok with me, and agreed.
>>>>
>>>> Thanks,
>>>> Guenter
>>>
>>> Ok, are you going to fix i8k_get_temp() function (with
>>> sleeping)?
>>
>> I had hoped you would find the time to do it ;-).
>>
>> Sure, I can do it, but I am kind of busy right now, so it will
>> take a while.
>>
>> Thanks,
>> Guenter
>
> Ok. Will you do that for 3.19 kernel? Meanwhile I can prepare
> patch for temperature labels. I looked into NBSVC.MDM and there
> is something related for type of temperature sensors...
>
Highly unlikely. I don't have the time right now.

Guenter



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
  2014-11-18  5:56                     ` Guenter Roeck
@ 2014-11-18 14:46                       ` Pali Rohár
  2014-11-18 14:56                         ` [PATCH] i8k: Fix temperature bug handling in i8k_get_temp() Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-11-18 14:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, Steven Honeyman

[-- Attachment #1: Type: Text/Plain, Size: 7473 bytes --]

On Tuesday 18 November 2014 06:56:11 Guenter Roeck wrote:
> On 11/17/2014 12:35 AM, Pali Rohár wrote:
> > On Thursday 23 October 2014 18:45:07 Guenter Roeck wrote:
> >> On Thu, Oct 23, 2014 at 12:37:34PM +0200, Pali Rohár wrote:
> >>> On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote:
> >>>> On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár 
wrote:
> >>>>> On Wednesday 22 October 2014 18:19:47 Guenter Roeck
> > 
> > wrote:
> >>>>>> On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár
> > 
> > wrote:
> >>>>>>> On Tuesday 21 October 2014 06:27:23 Guenter Roeck
> > 
> > wrote:
> >>>>>>>> On 10/20/2014 09:46 AM, Pali Rohár wrote:
> >>>>>>>>> Ok, I will describe my problem. Guenter, maybe
> >>>>>>>>> you can find another solution/fix for it.
> >>>>>>>>> 
> >>>>>>>>> Calling i8k_get_temp(3) on my laptop without
> >>>>>>>>> I8K_TEMPERATURE_BUG always returns value 193
> >>>>>>>>> (which is above I8K_MAX_TEMP).
> >>>>>>>>> 
> >>>>>>>>> When I8K_TEMPERATURE_BUG is enabled (by default)
> >>>>>>>>> then i8k_get_temp(3) returns value from prev[3]
> >>>>>>>>> and store new value I8K_TEMPERATURE_BUG to
> >>>>>>>>> prev[3]. Value in prev[3] is initialized to 0.
> >>>>>>>>> 
> >>>>>>>>> What I want to achieve is: when i8k_get_temp()
> >>>>>>>>> for particular sensor id always returns invalid
> >>>>>>>>> value (> I8K_MAX_TEMP) then we should totally
> >>>>>>>>> ignore sensor with that id and do not export it
> >>>>>>>>> via hwmon.
> >>>>>>>>> 
> >>>>>>>>> My solution is: initialize prev[id] to
> >>>>>>>>> I8K_MAX_TEMP, so on invalid data first call to
> >>>>>>>>> i8k_get_temp(id) returns I8K_MAX_TEMP. Then in
> >>>>>>>>> i8k_init_hwmon check if value is < I8K_MAX_TEMP
> >>>>>>>>> and if not ignore sensor id.
> >>>>>>>>> 
> >>>>>>>>> Guenter, it is clear now? Are you ok that we
> >>>>>>>>> should ignore sensor if always report value
> >>>>>>>>> above I8K_MAX_TEMP? If you do not like my
> >>>>>>>>> solution/patch for it, can you specify how
> >>>>>>>>> other can it be fixed?
> >>>>>>>> 
> >>>>>>>> I still don't see the point in initializing
> >>>>>>>> prev[].
> >>>>>>> 
> >>>>>>> Now prev[] is initialized to 0. It means that first
> >>>>>>> call i8k_get_temp() (with sensor id which return
> >>>>>>> value > I8K_MAX_TEMP) returns 0. Second and other
> >>>>>>> calls returns I8K_MAX_TEMP.
> >>>>>>> 
> >>>>>>> So point is to return same value for first and other
> >>>>>>> calls.
> >>>>>> 
> >>>>>> Yes, I realized that after I sent my previous mail.
> >>>>>> 
> >>>>>>>> Yes, I am ok with ignoring sensor values if the
> >>>>>>>> reported temperature is above I8K_MAX_TEMP. I am
> >>>>>>>> just not sure if we should check against
> >>>>>>>> I8K_MAX_TEMP or against, say, 192. Reason is that
> >>>>>>>> we do know that the sensor can erroneously return
> >>>>>>>> 0x99 on some systems once in a while. We would
> >>>>>>>> not want to ignore those sensors just because
> >>>>>>>> they happen to report 0x99 during initialization.
> >>>>>>>> 
> >>>>>>>> So maybe make it
> >>>>>>>> 
> >>>>>>>> 	if (err >= 0 && err < 192)
> >>>>>>>> 
> >>>>>>>> and add a note before the first if(), explaining
> >>>>>>>> that higher values suggest that there is no
> >>>>>>>> sensor attached.
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Guenter
> >>>>>>> 
> >>>>>>> Right, now we need to decide which magic constant to
> >>>>>>> use...
> >>>>>>> 
> >>>>>>> And now I found another problem :-)
> >>>>>>> 
> >>>>>>> On my laptop i8k_get_temp(3) not always return value
> >>>>>>> 193. It is only when AMD graphics card is turned
> >>>>>>> off. When card is on i8k_get_temp(3) returns same
> >>>>>>> value as temperature hwmon part from radeon DRM
> >>>>>>> driver.
> >>>>>> 
> >>>>>> Can you turn the GPU on or off during runtime ?
> >>>>>> That would make it really tricky to handle the
> >>>>>> situation.
> >>>>> 
> >>>>> Yes. New laptops with Nvidia Optimus or AMD PowerXpress
> >>>>> or Enduro technology are designed to automatically turn
> >>>>> off secondary GPU when is not in use. And
> >>>>> nouveau/radeon drivers together with vga_switcheroo
> >>>>> implements this dynamic power on/off.
> >>>>> 
> >>>>>>> So it looks like that on my laptop i8k sensor with
> >>>>>>> id 3 reports GPU temperature.
> >>>>>>> 
> >>>>>>> When card is turned off radeon driver reports
> >>>>>>> -EINVAL for temperature hwmon sysnode.
> >>>>>>> 
> >>>>>>> So now I think i8k could not ignore sensor totally
> >>>>>>> as it can be mapped to some HW which can be
> >>>>>>> dynamically turned on/off (like my graphics card).
> >>>>>>> 
> >>>>>>> So what do you think about reporting -EINVAL instead
> >>>>>>> I8K_MAX_TEMP when dell SMM returns value above
> >>>>>>> I8K_MAX_TEMP?
> >>>>>> 
> >>>>>> -EINVAL is supposed to mean "Invalid Argument", so it
> >>>>>> really has ia different semantics. We could use
> >>>>>> -ENXIO, "No such device or address", which seems more
> >>>>>> appropriate.
> >>>>> 
> >>>>> I prefer to use -EINVAL because other driver (radeon) is
> >>>>> using it and userspace "sensors" programs handle EINVAL
> >>>>> and show "N/A" in output instead reporting some error
> >>>>> about reading value. If nothing else consistency (with
> >>>>> other drivers) is my argument.
> >>>> 
> >>>> Ok, if sensors implements it that way then let's do it.
> >>>> 
> >>>>>> Overall, I think the entire error handling is broken
> >>>>>> and should be replaced. One option would be to
> >>>>>> explicitly check for 0x99 and, if detected, go to
> >>>>>> sleep for, say, 100ms and try again. If it still
> >>>>>> fails, and for all other bad values, return -ENXIO.
> >>>>>> Then the calling code can either return the error to
> >>>>>> user space in the show function, or not install the
> >>>>>> sensor in the probe function.
> >>>>>> 
> >>>>>> Does that make sense ?
> >>>>> 
> >>>>> Yes, replacing error handling with retry call (after
> >>>>> some sleep) is better then current code (which returns
> >>>>> previous value or returns I8K_MAX_TEMP).
> >>>>> 
> >>>>> But your solution not install the sensor if probe fails
> >>>>> on bad value does not solve problem that i8k.ko is
> >>>>> loading at time when GPU card is turned off.
> >>>> 
> >>>> Yes, the dynamics in that situation makes it a bit
> >>>> difficult to handle the situation.
> >>>> 
> >>>>> I think current check for installing sensor (err < 0) is
> >>>>> enough and when invalid value (> I8K_MAX_TEMP) is
> >>>>> returned just do not show this value for userspace in
> >>>>> hwmon sysnode.
> >>>> 
> >>>> Ok with me, and agreed.
> >>>> 
> >>>> Thanks,
> >>>> Guenter
> >>> 
> >>> Ok, are you going to fix i8k_get_temp() function (with
> >>> sleeping)?
> >> 
> >> I had hoped you would find the time to do it ;-).
> >> 
> >> Sure, I can do it, but I am kind of busy right now, so it
> >> will take a while.
> >> 
> >> Thanks,
> >> Guenter
> > 
> > Ok. Will you do that for 3.19 kernel? Meanwhile I can
> > prepare patch for temperature labels. I looked into
> > NBSVC.MDM and there is something related for type of
> > temperature sensors...
> 
> Highly unlikely. I don't have the time right now.
> 
> Guenter

Ok. I will try to at least fix patch from first post in this 
thread, so i8k_get_temp() does not return totally invalid value 
to userspace (or at first call).

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-18 14:46                       ` Pali Rohár
@ 2014-11-18 14:56                         ` Pali Rohár
  2014-11-30  0:12                           ` Guenter Roeck
  2014-11-30  9:00                           ` Guenter Roeck
  0 siblings, 2 replies; 20+ messages in thread
From: Pali Rohár @ 2014-11-18 14:56 UTC (permalink / raw)
  To: Guenter Roeck, Arnd Bergmann, Greg Kroah-Hartman
  Cc: Steven Honeyman, linux-kernel, Pali Rohár

Static array prev[] was incorrectly initialized. It should be initialized to
some "invalid" temperature value (above I8K_MAX_TEMP).

Next, function should store "invalid" value to prev[] (above I8K_MAX_TEMP),
not valid (= I8K_MAX_TEMP) because whole temperature bug handling will not
work.

And last part, to not break existing detection of temperature sensors, register
them also if i8k report too high temperature (above I8K_MAX_TEMP). This is
needed because some sensors are sometimes turned off (e.g sensor on GPU which
can be turned off/on) and in this case SMM report too high value.

To prevent reporting "invalid" values to userspace, return -EINVAL. In this case
sensors which are currently turned off (e.g optimus/powerexpress/enduro gpu)
are reported as "N/A" by lm-sensors package.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
---
 drivers/char/i8k.c |   16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
index 7272b08..e34a019 100644
--- a/drivers/char/i8k.c
+++ b/drivers/char/i8k.c
@@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
 	int temp;
 
 #ifdef I8K_TEMPERATURE_BUG
-	static int prev[4];
+	static int prev[4] = { I8K_MAX_TEMP+1, I8K_MAX_TEMP+1, I8K_MAX_TEMP+1, I8K_MAX_TEMP+1 };
 #endif
 	regs.ebx = sensor & 0xff;
 	rc = i8k_smm(&regs);
@@ -317,10 +317,12 @@ static int i8k_get_temp(int sensor)
 	 */
 	if (temp > I8K_MAX_TEMP) {
 		temp = prev[sensor];
-		prev[sensor] = I8K_MAX_TEMP;
+		prev[sensor] = I8K_MAX_TEMP+1;
 	} else {
 		prev[sensor] = temp;
 	}
+	if (temp > I8K_MAX_TEMP)
+		return -ERANGE;
 #endif
 
 	return temp;
@@ -499,6 +501,8 @@ static ssize_t i8k_hwmon_show_temp(struct device *dev,
 	int temp;
 
 	temp = i8k_get_temp(index);
+	if (temp == -ERANGE)
+		return -EINVAL;
 	if (temp < 0)
 		return temp;
 	return sprintf(buf, "%d\n", temp * 1000);
@@ -610,17 +614,17 @@ static int __init i8k_init_hwmon(void)
 
 	/* CPU temperature attributes, if temperature reading is OK */
 	err = i8k_get_temp(0);
-	if (err >= 0)
+	if (err >= 0 || err == -ERANGE)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP1;
 	/* check for additional temperature sensors */
 	err = i8k_get_temp(1);
-	if (err >= 0)
+	if (err >= 0 || err == -ERANGE)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP2;
 	err = i8k_get_temp(2);
-	if (err >= 0)
+	if (err >= 0 || err == -ERANGE)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP3;
 	err = i8k_get_temp(3);
-	if (err >= 0)
+	if (err >= 0 || err == -ERANGE)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
 
 	/* Left fan attributes, if left fan is present */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-18 14:56                         ` [PATCH] i8k: Fix temperature bug handling in i8k_get_temp() Pali Rohár
@ 2014-11-30  0:12                           ` Guenter Roeck
  2014-11-30 14:44                             ` Pali Rohár
  2014-11-30  9:00                           ` Guenter Roeck
  1 sibling, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-11-30  0:12 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Steven Honeyman, linux-kernel

On Tue, Nov 18, 2014 at 03:56:54PM +0100, Pali Rohár wrote:
> Static array prev[] was incorrectly initialized. It should be initialized to
> some "invalid" temperature value (above I8K_MAX_TEMP).
> 
> Next, function should store "invalid" value to prev[] (above I8K_MAX_TEMP),
> not valid (= I8K_MAX_TEMP) because whole temperature bug handling will not
> work.
> 
> And last part, to not break existing detection of temperature sensors, register
> them also if i8k report too high temperature (above I8K_MAX_TEMP). This is
> needed because some sensors are sometimes turned off (e.g sensor on GPU which
> can be turned off/on) and in this case SMM report too high value.
> 
> To prevent reporting "invalid" values to userspace, return -EINVAL. In this case
> sensors which are currently turned off (e.g optimus/powerexpress/enduro gpu)
> are reported as "N/A" by lm-sensors package.
> 
> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> ---
>  drivers/char/i8k.c |   16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> index 7272b08..e34a019 100644
> --- a/drivers/char/i8k.c
> +++ b/drivers/char/i8k.c
> @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
>  	int temp;
>  
>  #ifdef I8K_TEMPERATURE_BUG
> -	static int prev[4];
> +	static int prev[4] = { I8K_MAX_TEMP+1, I8K_MAX_TEMP+1, I8K_MAX_TEMP+1, I8K_MAX_TEMP+1 };

This results in a checkpatch warning. Please split, and add a space before
and after '+'. Actually, I would suggest to use something something like

#define I8K_INVALID_TEMP	(I8K_MAX_TEMP + 1)

and to use it instead of hard-coding "I8K_MAX_TEMP + 1" in the code.

>  #endif
>  	regs.ebx = sensor & 0xff;
>  	rc = i8k_smm(&regs);
> @@ -317,10 +317,12 @@ static int i8k_get_temp(int sensor)
>  	 */
>  	if (temp > I8K_MAX_TEMP) {
>  		temp = prev[sensor];
> -		prev[sensor] = I8K_MAX_TEMP;
> +		prev[sensor] = I8K_MAX_TEMP+1;
>  	} else {
>  		prev[sensor] = temp;
>  	}
> +	if (temp > I8K_MAX_TEMP)
> +		return -ERANGE;
>  #endif
>  
>  	return temp;
> @@ -499,6 +501,8 @@ static ssize_t i8k_hwmon_show_temp(struct device *dev,
>  	int temp;
>  
>  	temp = i8k_get_temp(index);
> +	if (temp == -ERANGE)
> +		return -EINVAL;
>  	if (temp < 0)
>  		return temp;
>  	return sprintf(buf, "%d\n", temp * 1000);
> @@ -610,17 +614,17 @@ static int __init i8k_init_hwmon(void)
>  
>  	/* CPU temperature attributes, if temperature reading is OK */
>  	err = i8k_get_temp(0);
> -	if (err >= 0)
> +	if (err >= 0 || err == -ERANGE)
>  		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP1;
>  	/* check for additional temperature sensors */
>  	err = i8k_get_temp(1);
> -	if (err >= 0)
> +	if (err >= 0 || err == -ERANGE)
>  		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP2;
>  	err = i8k_get_temp(2);
> -	if (err >= 0)
> +	if (err >= 0 || err == -ERANGE)
>  		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP3;
>  	err = i8k_get_temp(3);
> -	if (err >= 0)
> +	if (err >= 0 || err == -ERANGE)
>  		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
>  
>  	/* Left fan attributes, if left fan is present */
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-18 14:56                         ` [PATCH] i8k: Fix temperature bug handling in i8k_get_temp() Pali Rohár
  2014-11-30  0:12                           ` Guenter Roeck
@ 2014-11-30  9:00                           ` Guenter Roeck
  2014-11-30 14:48                             ` Pali Rohár
  1 sibling, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-11-30  9:00 UTC (permalink / raw)
  To: Pali Rohár, Arnd Bergmann, Greg Kroah-Hartman
  Cc: Steven Honeyman, linux-kernel

On 11/18/2014 06:56 AM, Pali Rohár wrote:
> Static array prev[] was incorrectly initialized. It should be initialized to
> some "invalid" temperature value (above I8K_MAX_TEMP).
>
> Next, function should store "invalid" value to prev[] (above I8K_MAX_TEMP),
> not valid (= I8K_MAX_TEMP) because whole temperature bug handling will not
> work.
>
> And last part, to not break existing detection of temperature sensors, register
> them also if i8k report too high temperature (above I8K_MAX_TEMP). This is
> needed because some sensors are sometimes turned off (e.g sensor on GPU which
> can be turned off/on) and in this case SMM report too high value.
>
> To prevent reporting "invalid" values to userspace, return -EINVAL. In this case
> sensors which are currently turned off (e.g optimus/powerexpress/enduro gpu)
> are reported as "N/A" by lm-sensors package.
>
> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> ---
>   drivers/char/i8k.c |   16 ++++++++++------
>   1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> index 7272b08..e34a019 100644
> --- a/drivers/char/i8k.c
> +++ b/drivers/char/i8k.c
> @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
>   	int temp;
>
>   #ifdef I8K_TEMPERATURE_BUG
> -	static int prev[4];
> +	static int prev[4] = { I8K_MAX_TEMP+1, I8K_MAX_TEMP+1, I8K_MAX_TEMP+1, I8K_MAX_TEMP+1 };
>   #endif
>   	regs.ebx = sensor & 0xff;
>   	rc = i8k_smm(&regs);
> @@ -317,10 +317,12 @@ static int i8k_get_temp(int sensor)
>   	 */
>   	if (temp > I8K_MAX_TEMP) {
>   		temp = prev[sensor];
> -		prev[sensor] = I8K_MAX_TEMP;
> +		prev[sensor] = I8K_MAX_TEMP+1;
>   	} else {
>   		prev[sensor] = temp;
>   	}
> +	if (temp > I8K_MAX_TEMP)
> +		return -ERANGE;

Can we return -ENODATA in this case ? I think that would be more appropriate.

>   #endif
>
>   	return temp;
> @@ -499,6 +501,8 @@ static ssize_t i8k_hwmon_show_temp(struct device *dev,
>   	int temp;
>
>   	temp = i8k_get_temp(index);
> +	if (temp == -ERANGE)
> +		return -EINVAL;

and can we also return -ENODATA to user space ?
This would make the code a bit cleaner.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-30  0:12                           ` Guenter Roeck
@ 2014-11-30 14:44                             ` Pali Rohár
  0 siblings, 0 replies; 20+ messages in thread
From: Pali Rohár @ 2014-11-30 14:44 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Steven Honeyman, linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 886 bytes --]

On Sunday 30 November 2014 01:12:07 Guenter Roeck wrote:
> > diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> > index 7272b08..e34a019 100644
> > --- a/drivers/char/i8k.c
> > +++ b/drivers/char/i8k.c
> > @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
> >
> >       int temp;
> >  
> >  #ifdef I8K_TEMPERATURE_BUG
> >
> > -     static int prev[4];
> > +     static int prev[4] = { I8K_MAX_TEMP+1, I8K_MAX_TEMP+1,
> > I8K_MAX_TEMP+1, I8K_MAX_TEMP+1 };
> 
> This results in a checkpatch warning. Please split, and add a
> space before and after '+'. Actually, I would suggest to use
> something something like
> 
> #define I8K_INVALID_TEMP        (I8K_MAX_TEMP + 1)
> 
> and to use it instead of hard-coding "I8K_MAX_TEMP + 1" in the
> code.

I will define I8K_INVALID_TEMP and use it instead I8K_MAX_TEMP.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-30  9:00                           ` Guenter Roeck
@ 2014-11-30 14:48                             ` Pali Rohár
  2014-11-30 15:56                               ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2014-11-30 14:48 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Steven Honeyman, linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 2975 bytes --]

On Sunday 30 November 2014 10:00:00 Guenter Roeck wrote:
> On 11/18/2014 06:56 AM, Pali Rohár wrote:
> > Static array prev[] was incorrectly initialized. It should
> > be initialized to some "invalid" temperature value (above
> > I8K_MAX_TEMP).
> > 
> > Next, function should store "invalid" value to prev[] (above
> > I8K_MAX_TEMP), not valid (= I8K_MAX_TEMP) because whole
> > temperature bug handling will not work.
> > 
> > And last part, to not break existing detection of
> > temperature sensors, register them also if i8k report too
> > high temperature (above I8K_MAX_TEMP). This is needed
> > because some sensors are sometimes turned off (e.g sensor
> > on GPU which can be turned off/on) and in this case SMM
> > report too high value.
> > 
> > To prevent reporting "invalid" values to userspace, return
> > -EINVAL. In this case sensors which are currently turned
> > off (e.g optimus/powerexpress/enduro gpu) are reported as
> > "N/A" by lm-sensors package.
> > 
> > Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> > ---
> > 
> >   drivers/char/i8k.c |   16 ++++++++++------
> >   1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> > index 7272b08..e34a019 100644
> > --- a/drivers/char/i8k.c
> > +++ b/drivers/char/i8k.c
> > @@ -298,7 +298,7 @@ static int i8k_get_temp(int sensor)
> > 
> >   	int temp;
> >   
> >   #ifdef I8K_TEMPERATURE_BUG
> > 
> > -	static int prev[4];
> > +	static int prev[4] = { I8K_MAX_TEMP+1, I8K_MAX_TEMP+1,
> > I8K_MAX_TEMP+1, I8K_MAX_TEMP+1 };
> > 
> >   #endif
> >   
> >   	regs.ebx = sensor & 0xff;
> >   	rc = i8k_smm(&regs);
> > 
> > @@ -317,10 +317,12 @@ static int i8k_get_temp(int sensor)
> > 
> >   	 */
> >   	
> >   	if (temp > I8K_MAX_TEMP) {
> >   	
> >   		temp = prev[sensor];
> > 
> > -		prev[sensor] = I8K_MAX_TEMP;
> > +		prev[sensor] = I8K_MAX_TEMP+1;
> > 
> >   	} else {
> >   	
> >   		prev[sensor] = temp;
> >   	
> >   	}
> > 
> > +	if (temp > I8K_MAX_TEMP)
> > +		return -ERANGE;
> 
> Can we return -ENODATA in this case ? I think that would be
> more appropriate.
> 

This is internal kernel function, no problem. If you prefer 
NODATA instead RANGE I will change it.

> >   #endif
> >   
> >   	return temp;
> > 
> > @@ -499,6 +501,8 @@ static ssize_t
> > i8k_hwmon_show_temp(struct device *dev,
> > 
> >   	int temp;
> >   	
> >   	temp = i8k_get_temp(index);
> > 
> > +	if (temp == -ERANGE)
> > +		return -EINVAL;
> 
> and can we also return -ENODATA to user space ?
> This would make the code a bit cleaner.
> 
> Thanks,
> Guenter

There was some problems when I tested similar patch for radeon.ko 
(do not report temperature to userspace when card is turned off).

I can test lm-sensors which is in Ubuntu 12.04 LTS (there is 
probably some older version) what happens with -ENODATA from i8k.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-30 14:48                             ` Pali Rohár
@ 2014-11-30 15:56                               ` Guenter Roeck
  2014-11-30 16:00                                 ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2014-11-30 15:56 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Steven Honeyman, linux-kernel

On 11/30/2014 06:48 AM, Pali Rohár wrote:
[ ... ]
>>>
>>> +	if (temp > I8K_MAX_TEMP)
>>> +		return -ERANGE;
>>
>> Can we return -ENODATA in this case ? I think that would be
>> more appropriate.
>>
>
> This is internal kernel function, no problem. If you prefer
> NODATA instead RANGE I will change it.
>
The idea was to return ENODATA to user space (see below).

>>>    #endif
>>>
>>>    	return temp;
>>>
>>> @@ -499,6 +501,8 @@ static ssize_t
>>> i8k_hwmon_show_temp(struct device *dev,
>>>
>>>    	int temp;
>>>    	
>>>    	temp = i8k_get_temp(index);
>>>
>>> +	if (temp == -ERANGE)
>>> +		return -EINVAL;
>>
>> and can we also return -ENODATA to user space ?
>> This would make the code a bit cleaner.
>>
>> Thanks,
>> Guenter
>
> There was some problems when I tested similar patch for radeon.ko
> (do not report temperature to userspace when card is turned off).
>

You mean when returning -ENODATA to user space ?
I tested that; it worked for me and does what we want it to do
(the sensors command to displays N/A). This would avoid having
to change -ERANGE to a different error code above.

Guenter



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i8k: Fix temperature bug handling in i8k_get_temp()
  2014-11-30 15:56                               ` Guenter Roeck
@ 2014-11-30 16:00                                 ` Pali Rohár
  0 siblings, 0 replies; 20+ messages in thread
From: Pali Rohár @ 2014-11-30 16:00 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Steven Honeyman, linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 1464 bytes --]

On Sunday 30 November 2014 16:56:32 Guenter Roeck wrote:
> On 11/30/2014 06:48 AM, Pali Rohár wrote:
> [ ... ]
> 
> >>> +	if (temp > I8K_MAX_TEMP)
> >>> +		return -ERANGE;
> >> 
> >> Can we return -ENODATA in this case ? I think that would be
> >> more appropriate.
> > 
> > This is internal kernel function, no problem. If you prefer
> > NODATA instead RANGE I will change it.
> 
> The idea was to return ENODATA to user space (see below).
> 
> >>>    #endif
> >>>    
> >>>    	return temp;
> >>> 
> >>> @@ -499,6 +501,8 @@ static ssize_t
> >>> i8k_hwmon_show_temp(struct device *dev,
> >>> 
> >>>    	int temp;
> >>>    	
> >>>    	temp = i8k_get_temp(index);
> >>> 
> >>> +	if (temp == -ERANGE)
> >>> +		return -EINVAL;
> >> 
> >> and can we also return -ENODATA to user space ?
> >> This would make the code a bit cleaner.
> >> 
> >> Thanks,
> >> Guenter
> > 
> > There was some problems when I tested similar patch for
> > radeon.ko (do not report temperature to userspace when card
> > is turned off).
> 
> You mean when returning -ENODATA to user space ?
> I tested that; it worked for me and does what we want it to do
> (the sensors command to displays N/A). This would avoid having
> to change -ERANGE to a different error code above.
> 
> Guenter

Now I tested it too on Ubuntu and it working. sensors display N/A 
without error message. So I will change code.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-11-30 16:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-19 14:46 [PATCH] i8k: Ignore temperature sensors which report invalid values Pali Rohár
2014-10-19 15:13 ` Guenter Roeck
2014-10-20 16:46   ` Pali Rohár
2014-10-21  4:27     ` Guenter Roeck
2014-10-22 12:29       ` Pali Rohár
2014-10-22 16:19         ` Guenter Roeck
2014-10-22 16:35           ` Pali Rohár
2014-10-22 17:10             ` Guenter Roeck
2014-10-23 10:37               ` Pali Rohár
2014-10-23 16:45                 ` Guenter Roeck
2014-11-17  8:35                   ` Pali Rohár
2014-11-18  5:56                     ` Guenter Roeck
2014-11-18 14:46                       ` Pali Rohár
2014-11-18 14:56                         ` [PATCH] i8k: Fix temperature bug handling in i8k_get_temp() Pali Rohár
2014-11-30  0:12                           ` Guenter Roeck
2014-11-30 14:44                             ` Pali Rohár
2014-11-30  9:00                           ` Guenter Roeck
2014-11-30 14:48                             ` Pali Rohár
2014-11-30 15:56                               ` Guenter Roeck
2014-11-30 16:00                                 ` Pali Rohár

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).