linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data
@ 2015-12-01  4:25 Nishanth Menon
  2015-12-01  5:50 ` Guenter Roeck
  2015-12-01 16:10 ` [PATCH V2] " Nishanth Menon
  0 siblings, 2 replies; 8+ messages in thread
From: Nishanth Menon @ 2015-12-01  4:25 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15,
	Nishanth Menon, Eduardo Valentin

TMP102 works based on conversions done periodically. However, as per
the TMP102 data sheet[1] the first conversion is triggered immediately
after we program the configuration register. The temperature data
registers do not reflect proper data until the first conversion is
complete (in our case HZ/4).

The driver currently sets the last_update to be jiffies - HZ, just
after the configuration is complete. When tmp102 driver registers
with the thermal framework, it immediately tries to read the sensor
temperature data. This takes place even before the conversion on the
TMP102 is complete and results in an invalid temperature read.

Depending on the value read, this may cause thermal framework to
assume that a critical temperature event has occurred and attempts to
shutdown the system.

Instead of causing an invalid mid-conversion value to be read
erroneously, we mark the last_update to be in-line with the current
jiffies. This allows the tmp102_update_device function to skip update
until the required conversion time is complete. Further, we ensure to
return -EAGAIN result instead of returning spurious temperature (such
as 0C) values to the caller to prevent any wrong decisions made with
such values.

A simpler alternative approach could be to sleep in the probe for the
duration required, but that will result in latency that is undesirable
that can delay boot sequence un-necessarily.

[1] http://www.ti.com/lit/ds/symlink/tmp102.pdf

Cc: Eduardo Valentin <edubezval@gmail.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Reported-by: Elvita Lobo <elvita@ti.com>
Reported-by: Yan Liu <yan-liu@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
---

Example case (from Beagleboard-x15 using an older kernel revision):
	http://pastebin.ubuntu.com/13591711/
Notice the thermal shutdown trigger:
	thermal thermal_zone3: critical temperature reached(108 C),shutting down

 drivers/hwmon/tmp102.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
index 65482624ea2c..145f69108f23 100644
--- a/drivers/hwmon/tmp102.c
+++ b/drivers/hwmon/tmp102.c
@@ -50,6 +50,9 @@
 #define	TMP102_TLOW_REG			0x02
 #define	TMP102_THIGH_REG		0x03
 
+/* TMP102 range is -55 to 150C -> we use -128 as a default invalid value */
+#define TMP102_NOTREADY			-128
+
 struct tmp102 {
 	struct i2c_client *client;
 	struct device *hwmon_dev;
@@ -102,6 +105,12 @@ static int tmp102_read_temp(void *dev, int *temp)
 {
 	struct tmp102 *tmp102 = tmp102_update_device(dev);
 
+	/* Is it too early even to return a conversion? */
+	if (tmp102->temp[0] == TMP102_NOTREADY) {
+		dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
+		return -EAGAIN;
+	}
+
 	*temp = tmp102->temp[0];
 
 	return 0;
@@ -114,6 +123,10 @@ static ssize_t tmp102_show_temp(struct device *dev,
 	struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
 	struct tmp102 *tmp102 = tmp102_update_device(dev);
 
+	/* Is it too early even to return a read? */
+	if (tmp102->temp[sda->index] == TMP102_NOTREADY)
+		return -EAGAIN;
+
 	return sprintf(buf, "%d\n", tmp102->temp[sda->index]);
 }
 
@@ -207,7 +220,11 @@ static int tmp102_probe(struct i2c_client *client,
 		status = -ENODEV;
 		goto fail_restore_config;
 	}
-	tmp102->last_update = jiffies - HZ;
+	tmp102->last_update = jiffies;
+	/* Mark that we are not ready with data until conversion is complete */
+	tmp102->temp[0] = TMP102_NOTREADY;
+	tmp102->temp[1] = TMP102_NOTREADY;
+	tmp102->temp[2] = TMP102_NOTREADY;
 	mutex_init(&tmp102->lock);
 
 	hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
-- 
2.6.2.402.g2635c2b


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01  4:25 [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data Nishanth Menon
@ 2015-12-01  5:50 ` Guenter Roeck
  2015-12-01 13:47   ` Nishanth Menon
  2015-12-01 16:10 ` [PATCH V2] " Nishanth Menon
  1 sibling, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2015-12-01  5:50 UTC (permalink / raw)
  To: Nishanth Menon, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15, Eduardo Valentin

On 11/30/2015 08:25 PM, Nishanth Menon wrote:
> TMP102 works based on conversions done periodically. However, as per
> the TMP102 data sheet[1] the first conversion is triggered immediately
> after we program the configuration register. The temperature data
> registers do not reflect proper data until the first conversion is
> complete (in our case HZ/4).
>
> The driver currently sets the last_update to be jiffies - HZ, just
> after the configuration is complete. When tmp102 driver registers
> with the thermal framework, it immediately tries to read the sensor
> temperature data. This takes place even before the conversion on the
> TMP102 is complete and results in an invalid temperature read.
>
> Depending on the value read, this may cause thermal framework to
> assume that a critical temperature event has occurred and attempts to
> shutdown the system.
>
> Instead of causing an invalid mid-conversion value to be read
> erroneously, we mark the last_update to be in-line with the current
> jiffies. This allows the tmp102_update_device function to skip update
> until the required conversion time is complete. Further, we ensure to
> return -EAGAIN result instead of returning spurious temperature (such
> as 0C) values to the caller to prevent any wrong decisions made with
> such values.
>
> A simpler alternative approach could be to sleep in the probe for the
> duration required, but that will result in latency that is undesirable
> that can delay boot sequence un-necessarily.
>
A really simpler solution would be to mark when the device is ready
to be accessed in the probe function, and go to sleep for the remaining time
in the update function if necessary. This would not affect the probe function,
avoid the somewhat awkward -EAGAIN, avoid overloading the value cache, and only
sleep if necessary and as long as needed.

> [1] http://www.ti.com/lit/ds/symlink/tmp102.pdf
>
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Reported-by: Aparna Balasubramanian <aparnab@ti.com>
> Reported-by: Elvita Lobo <elvita@ti.com>
> Reported-by: Yan Liu <yan-liu@ti.com>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> ---
>
> Example case (from Beagleboard-x15 using an older kernel revision):
> 	http://pastebin.ubuntu.com/13591711/
> Notice the thermal shutdown trigger:
> 	thermal thermal_zone3: critical temperature reached(108 C),shutting down
>
>   drivers/hwmon/tmp102.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
> index 65482624ea2c..145f69108f23 100644
> --- a/drivers/hwmon/tmp102.c
> +++ b/drivers/hwmon/tmp102.c
> @@ -50,6 +50,9 @@
>   #define	TMP102_TLOW_REG			0x02
>   #define	TMP102_THIGH_REG		0x03
>
> +/* TMP102 range is -55 to 150C -> we use -128 as a default invalid value */
> +#define TMP102_NOTREADY			-128
> +

This is a bit misleading, and also not correct, since the temperature is stored in
milli-degrees C, so a value of -128 reflects -0.128 degreees C. While that value
will not be seen in practice, it is still not a good idea to use it for this purpose.

Even though the chip temperature range is -55 .. 150 C, that doesn't mean
it never returns a value outside that range, for example if nothing is connected
to an external sensor or if something is broken.

You should use a value outside the value range, ie outside
[-128,000 .. 127,999 ] to detect the "not ready" condition.

>   struct tmp102 {
>   	struct i2c_client *client;
>   	struct device *hwmon_dev;
> @@ -102,6 +105,12 @@ static int tmp102_read_temp(void *dev, int *temp)
>   {
>   	struct tmp102 *tmp102 = tmp102_update_device(dev);
>
> +	/* Is it too early even to return a conversion? */
> +	if (tmp102->temp[0] == TMP102_NOTREADY) {
> +		dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
> +		return -EAGAIN;

Does this cause a hard loop in the calling code, or will the thermal code
delay before it reads again ?

If it causes a hard loop, it may be better to go to sleep if needed
when reading the data, as suggested above.

> +	}
> +
>   	*temp = tmp102->temp[0];
>
>   	return 0;
> @@ -114,6 +123,10 @@ static ssize_t tmp102_show_temp(struct device *dev,
>   	struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
>   	struct tmp102 *tmp102 = tmp102_update_device(dev);
>
> +	/* Is it too early even to return a read? */
> +	if (tmp102->temp[sda->index] == TMP102_NOTREADY)
> +		return -EAGAIN;
> +
>   	return sprintf(buf, "%d\n", tmp102->temp[sda->index]);
>   }
>
> @@ -207,7 +220,11 @@ static int tmp102_probe(struct i2c_client *client,
>   		status = -ENODEV;
>   		goto fail_restore_config;
>   	}
> -	tmp102->last_update = jiffies - HZ;
> +	tmp102->last_update = jiffies;
> +	/* Mark that we are not ready with data until conversion is complete */
> +	tmp102->temp[0] = TMP102_NOTREADY;
> +	tmp102->temp[1] = TMP102_NOTREADY;
> +	tmp102->temp[2] = TMP102_NOTREADY;
>   	mutex_init(&tmp102->lock);
>
>   	hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01  5:50 ` Guenter Roeck
@ 2015-12-01 13:47   ` Nishanth Menon
  2015-12-01 14:21     ` Nishanth Menon
  0 siblings, 1 reply; 8+ messages in thread
From: Nishanth Menon @ 2015-12-01 13:47 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15, Eduardo Valentin

Hi Guenter,

Thanks for the detailed review..

On 11/30/2015 11:50 PM, Guenter Roeck wrote:
> On 11/30/2015 08:25 PM, Nishanth Menon wrote:
[...]

>>
>> A simpler alternative approach could be to sleep in the probe for the
>> duration required, but that will result in latency that is undesirable
>> that can delay boot sequence un-necessarily.
>>
> A really simpler solution would be to mark when the device is ready
> to be accessed in the probe function, and go to sleep for the remaining
> time
> in the update function if necessary. This would not affect the probe
> function,
> avoid the somewhat awkward -EAGAIN, avoid overloading the value cache,
> and only
> sleep if necessary and as long as needed.

We already have that logic in a different form:
We use last_update to know when to go read the temperature value. Until
the conversion time has elapsed, we keep providing previously cached
value. Trouble is the first time read before conversion time is complete:

On sleep during update:
unfortunately, forcing the delay in update for the first time:
a) Will also cause the latency in the thermal_zone_device_check which
triggers right after tmp102_probe->thermal_zone_of_sensor_register
b) -EAGAIN is used by other hwmon drivers such as
drivers/hwmon/adt7470.c, drivers/hwmon/ltc4245.c, drivers/hwmon/sht15.c,
drivers/hwmon/tc74.c, drivers/hwmon/via-cputemp.c in similar ways when
data cannot be provided back.

Overriding the temp value to indicate first time read:
I can setup a bool in struct tmp102 instead -> but that serves the same
purpose as what we did with override, except increase 1 char footprint -
though I agree, it might be a little more readable.

> 
>> [1] http://www.ti.com/lit/ds/symlink/tmp102.pdf
>>
>> Cc: Eduardo Valentin <edubezval@gmail.com>
>> Reported-by: Aparna Balasubramanian <aparnab@ti.com>
>> Reported-by: Elvita Lobo <elvita@ti.com>
>> Reported-by: Yan Liu <yan-liu@ti.com>
>> Signed-off-by: Nishanth Menon <nm@ti.com>
>> ---
>>
>> Example case (from Beagleboard-x15 using an older kernel revision):
>>     http://pastebin.ubuntu.com/13591711/
>> Notice the thermal shutdown trigger:
>>     thermal thermal_zone3: critical temperature reached(108
>> C),shutting down
>>
>>   drivers/hwmon/tmp102.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
>> index 65482624ea2c..145f69108f23 100644
>> --- a/drivers/hwmon/tmp102.c
>> +++ b/drivers/hwmon/tmp102.c
>> @@ -50,6 +50,9 @@
>>   #define    TMP102_TLOW_REG            0x02
>>   #define    TMP102_THIGH_REG        0x03
>>
>> +/* TMP102 range is -55 to 150C -> we use -128 as a default invalid
>> value */
>> +#define TMP102_NOTREADY            -128
>> +
> 
> This is a bit misleading, and also not correct, since the temperature is
> stored in
> milli-degrees C, so a value of -128 reflects -0.128 degreees C. While
> that value
> will not be seen in practice, it is still not a good idea to use it for
> this purpose.
> 
> Even though the chip temperature range is -55 .. 150 C, that doesn't mean
> it never returns a value outside that range, for example if nothing is
> connected
> to an external sensor or if something is broken.
> 
> You should use a value outside the value range, ie outside
> [-128,000 .. 127,999 ] to detect the "not ready" condition.


That is true.. I will just drop this and introduce a bool in tmp102 instead.

>>   struct tmp102 {
>>       struct i2c_client *client;
>>       struct device *hwmon_dev;
>> @@ -102,6 +105,12 @@ static int tmp102_read_temp(void *dev, int *temp)
>>   {
>>       struct tmp102 *tmp102 = tmp102_update_device(dev);
>>
>> +    /* Is it too early even to return a conversion? */
>> +    if (tmp102->temp[0] == TMP102_NOTREADY) {
>> +        dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
>> +        return -EAGAIN;
> 
> Does this cause a hard loop in the calling code, or will the thermal code
> delay before it reads again ?
> 
> If it causes a hard loop, it may be better to go to sleep if needed
> when reading the data, as suggested above.

Thermal framework is capable of handling -EAGAIN without a hard loop
around this (it just seems to reschedule around the polling interval and
comes back to check if data is ready).

If you are ok with the above, then I will send a v2 introducing a bool
to setup a flag for first_time read, but will leave the -EAGAIN alone.

-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01 13:47   ` Nishanth Menon
@ 2015-12-01 14:21     ` Nishanth Menon
  2015-12-01 15:09       ` Guenter Roeck
  0 siblings, 1 reply; 8+ messages in thread
From: Nishanth Menon @ 2015-12-01 14:21 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15, Eduardo Valentin

On 07:47-20151201, Nishanth Menon wrote:
> Hi Guenter,
> 
> Thanks for the detailed review..
> 
> On 11/30/2015 11:50 PM, Guenter Roeck wrote:
> > On 11/30/2015 08:25 PM, Nishanth Menon wrote:
> [...]
> 
> >>
> >> A simpler alternative approach could be to sleep in the probe for the
> >> duration required, but that will result in latency that is undesirable
> >> that can delay boot sequence un-necessarily.
> >>
> > A really simpler solution would be to mark when the device is ready
> > to be accessed in the probe function, and go to sleep for the remaining
> > time
> > in the update function if necessary. This would not affect the probe
> > function,
> > avoid the somewhat awkward -EAGAIN, avoid overloading the value cache,
> > and only
> > sleep if necessary and as long as needed.
> 
> We already have that logic in a different form:
> We use last_update to know when to go read the temperature value. Until
> the conversion time has elapsed, we keep providing previously cached
> value. Trouble is the first time read before conversion time is complete:
> 
> On sleep during update:
> unfortunately, forcing the delay in update for the first time:
> a) Will also cause the latency in the thermal_zone_device_check which
> triggers right after tmp102_probe->thermal_zone_of_sensor_register
> b) -EAGAIN is used by other hwmon drivers such as
> drivers/hwmon/adt7470.c, drivers/hwmon/ltc4245.c, drivers/hwmon/sht15.c,
> drivers/hwmon/tc74.c, drivers/hwmon/via-cputemp.c in similar ways when
> data cannot be provided back.
> 
> Overriding the temp value to indicate first time read:
> I can setup a bool in struct tmp102 instead -> but that serves the same
> purpose as what we did with override, except increase 1 char footprint -
> though I agree, it might be a little more readable.
> 
> > 
> >> [1] http://www.ti.com/lit/ds/symlink/tmp102.pdf
> >>
> >> Cc: Eduardo Valentin <edubezval@gmail.com>
> >> Reported-by: Aparna Balasubramanian <aparnab@ti.com>
> >> Reported-by: Elvita Lobo <elvita@ti.com>
> >> Reported-by: Yan Liu <yan-liu@ti.com>
> >> Signed-off-by: Nishanth Menon <nm@ti.com>
> >> ---
> >>
> >> Example case (from Beagleboard-x15 using an older kernel revision):
> >>     http://pastebin.ubuntu.com/13591711/
> >> Notice the thermal shutdown trigger:
> >>     thermal thermal_zone3: critical temperature reached(108
> >> C),shutting down
> >>
> >>   drivers/hwmon/tmp102.c | 19 ++++++++++++++++++-
> >>   1 file changed, 18 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
> >> index 65482624ea2c..145f69108f23 100644
> >> --- a/drivers/hwmon/tmp102.c
> >> +++ b/drivers/hwmon/tmp102.c
> >> @@ -50,6 +50,9 @@
> >>   #define    TMP102_TLOW_REG            0x02
> >>   #define    TMP102_THIGH_REG        0x03
> >>
> >> +/* TMP102 range is -55 to 150C -> we use -128 as a default invalid
> >> value */
> >> +#define TMP102_NOTREADY            -128
> >> +
> > 
> > This is a bit misleading, and also not correct, since the temperature is
> > stored in
> > milli-degrees C, so a value of -128 reflects -0.128 degreees C. While
> > that value
> > will not be seen in practice, it is still not a good idea to use it for
> > this purpose.
> > 
> > Even though the chip temperature range is -55 .. 150 C, that doesn't mean
> > it never returns a value outside that range, for example if nothing is
> > connected
> > to an external sensor or if something is broken.
> > 
> > You should use a value outside the value range, ie outside
> > [-128,000 .. 127,999 ] to detect the "not ready" condition.
> 
> 
> That is true.. I will just drop this and introduce a bool in tmp102 instead.
> 
> >>   struct tmp102 {
> >>       struct i2c_client *client;
> >>       struct device *hwmon_dev;
> >> @@ -102,6 +105,12 @@ static int tmp102_read_temp(void *dev, int *temp)
> >>   {
> >>       struct tmp102 *tmp102 = tmp102_update_device(dev);
> >>
> >> +    /* Is it too early even to return a conversion? */
> >> +    if (tmp102->temp[0] == TMP102_NOTREADY) {
> >> +        dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
> >> +        return -EAGAIN;
> > 
> > Does this cause a hard loop in the calling code, or will the thermal code
> > delay before it reads again ?
> > 
> > If it causes a hard loop, it may be better to go to sleep if needed
> > when reading the data, as suggested above.
> 
> Thermal framework is capable of handling -EAGAIN without a hard loop
> around this (it just seems to reschedule around the polling interval and
> comes back to check if data is ready).
> 
> If you are ok with the above, then I will send a v2 introducing a bool
> to setup a flag for first_time read, but will leave the -EAGAIN alone.

Hint about how the patch will look like:
diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
index 65482624ea2c..5289aa0980a8 100644
--- a/drivers/hwmon/tmp102.c
+++ b/drivers/hwmon/tmp102.c
@@ -58,6 +58,7 @@ struct tmp102 {
 	u16 config_orig;
 	unsigned long last_update;
 	int temp[3];
+	bool first_time;
 };
 
 /* convert left adjusted 13-bit TMP102 register value to milliCelsius */
@@ -93,6 +94,7 @@ static struct tmp102 *tmp102_update_device(struct device *dev)
 				tmp102->temp[i] = tmp102_reg_to_mC(status);
 		}
 		tmp102->last_update = jiffies;
+		tmp102->first_time = false;
 	}
 	mutex_unlock(&tmp102->lock);
 	return tmp102;
@@ -102,6 +104,12 @@ static int tmp102_read_temp(void *dev, int *temp)
 {
 	struct tmp102 *tmp102 = tmp102_update_device(dev);
 
+	/* Is it too early even to return a conversion? */
+	if (tmp102->first_time) {
+		dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
+		return -EAGAIN;
+	}
+
 	*temp = tmp102->temp[0];
 
 	return 0;
@@ -114,6 +122,10 @@ static ssize_t tmp102_show_temp(struct device *dev,
 	struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
 	struct tmp102 *tmp102 = tmp102_update_device(dev);
 
+	/* Is it too early even to return a read? */
+	if (tmp102->first_time)
+		return -EAGAIN;
+
 	return sprintf(buf, "%d\n", tmp102->temp[sda->index]);
 }
 
@@ -207,7 +219,9 @@ static int tmp102_probe(struct i2c_client *client,
 		status = -ENODEV;
 		goto fail_restore_config;
 	}
-	tmp102->last_update = jiffies - HZ;
+	tmp102->last_update = jiffies;
+	/* Mark that we are not ready with data until conversion is complete */
+	tmp102->first_time = true;
 	mutex_init(&tmp102->lock);
 
 	hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
-- 
Regards,
Nishanth Menon

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01 14:21     ` Nishanth Menon
@ 2015-12-01 15:09       ` Guenter Roeck
  2015-12-01 15:14         ` Nishanth Menon
  0 siblings, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2015-12-01 15:09 UTC (permalink / raw)
  To: Nishanth Menon, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15, Eduardo Valentin

On 12/01/2015 06:21 AM, Nishanth Menon wrote:
[ ... ]

>
> Hint about how the patch will look like:

Looks ok (and better).

Guenter

> diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
> index 65482624ea2c..5289aa0980a8 100644
> --- a/drivers/hwmon/tmp102.c
> +++ b/drivers/hwmon/tmp102.c
> @@ -58,6 +58,7 @@ struct tmp102 {
>   	u16 config_orig;
>   	unsigned long last_update;
>   	int temp[3];
> +	bool first_time;
>   };
>
>   /* convert left adjusted 13-bit TMP102 register value to milliCelsius */
> @@ -93,6 +94,7 @@ static struct tmp102 *tmp102_update_device(struct device *dev)
>   				tmp102->temp[i] = tmp102_reg_to_mC(status);
>   		}
>   		tmp102->last_update = jiffies;
> +		tmp102->first_time = false;
>   	}
>   	mutex_unlock(&tmp102->lock);
>   	return tmp102;
> @@ -102,6 +104,12 @@ static int tmp102_read_temp(void *dev, int *temp)
>   {
>   	struct tmp102 *tmp102 = tmp102_update_device(dev);
>
> +	/* Is it too early even to return a conversion? */
> +	if (tmp102->first_time) {
> +		dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
> +		return -EAGAIN;
> +	}
> +
>   	*temp = tmp102->temp[0];
>
>   	return 0;
> @@ -114,6 +122,10 @@ static ssize_t tmp102_show_temp(struct device *dev,
>   	struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
>   	struct tmp102 *tmp102 = tmp102_update_device(dev);
>
> +	/* Is it too early even to return a read? */
> +	if (tmp102->first_time)
> +		return -EAGAIN;
> +
>   	return sprintf(buf, "%d\n", tmp102->temp[sda->index]);
>   }
>
> @@ -207,7 +219,9 @@ static int tmp102_probe(struct i2c_client *client,
>   		status = -ENODEV;
>   		goto fail_restore_config;
>   	}
> -	tmp102->last_update = jiffies - HZ;
> +	tmp102->last_update = jiffies;
> +	/* Mark that we are not ready with data until conversion is complete */
> +	tmp102->first_time = true;
>   	mutex_init(&tmp102->lock);
>
>   	hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01 15:09       ` Guenter Roeck
@ 2015-12-01 15:14         ` Nishanth Menon
  0 siblings, 0 replies; 8+ messages in thread
From: Nishanth Menon @ 2015-12-01 15:14 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15, Eduardo Valentin

On 12/01/2015 09:09 AM, Guenter Roeck wrote:
> On 12/01/2015 06:21 AM, Nishanth Menon wrote:
> [ ... ]
> 
>>
>> Hint about how the patch will look like:
> 
> Looks ok (and better).
Thanks for the feedback. Will post the same.


-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01  4:25 [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data Nishanth Menon
  2015-12-01  5:50 ` Guenter Roeck
@ 2015-12-01 16:10 ` Nishanth Menon
  2015-12-01 21:06   ` Guenter Roeck
  1 sibling, 1 reply; 8+ messages in thread
From: Nishanth Menon @ 2015-12-01 16:10 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-kernel, lm-sensors, linux-omap, beagleboard-x15,
	Nishanth Menon, Eduardo Valentin

TMP102 works based on conversions done periodically. However, as per
the TMP102 data sheet[1] the first conversion is triggered immediately
after we program the configuration register. The temperature data
registers do not reflect proper data until the first conversion is
complete (in our case HZ/4).

The driver currently sets the last_update to be jiffies - HZ, just
after the configuration is complete. When TMP102 driver registers
with the thermal framework, it immediately tries to read the sensor
temperature data. This takes place even before the conversion on the
TMP102 is complete and results in an invalid temperature read.

Depending on the value read, this may cause thermal framework to
assume that a critical temperature event has occurred and attempts to
shutdown the system.

Instead of causing an invalid mid-conversion value to be read
erroneously, we mark the last_update to be in-line with the current
jiffies. This allows the tmp102_update_device function to skip update
until the required conversion time is complete. Further, we ensure to
return -EAGAIN result instead of returning spurious temperature (such
as 0C) values to the caller to prevent any wrong decisions made with
such values. NOTE: this allows the read functions not to be blocking
and allows the callers to make the decision if they would like to
block or try again later. At least the current user(thermal) seems to
handle this by retrying later.

A simpler alternative approach could be to sleep in the probe for the
duration required, but that will result in latency that is undesirable
and delay boot sequence un-necessarily.

[1] http://www.ti.com/lit/ds/symlink/tmp102.pdf

Cc: Eduardo Valentin <edubezval@gmail.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Reported-by: Elvita Lobo <elvita@ti.com>
Reported-by: Yan Liu <yan-liu@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
---
Changes in V2 since V1:
	- Dropped out-of-range temperature used as a marker for first time
	  we are using a bool now
	- minor update in comments to explain -EAGAIN return

V1: https://patchwork.kernel.org/patch/7732781/ https://patchwork.kernel.org/patch/7737771/

Example case (from Beagleboard-x15 using an older kernel revision):
http://pastebin.ubuntu.com/13591711/
Notice the thermal shutdown trigger:
thermal thermal_zone3: critical temperature reached(108 C),shutting down

 drivers/hwmon/tmp102.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
index 65482624ea2c..5289aa0980a8 100644
--- a/drivers/hwmon/tmp102.c
+++ b/drivers/hwmon/tmp102.c
@@ -58,6 +58,7 @@ struct tmp102 {
 	u16 config_orig;
 	unsigned long last_update;
 	int temp[3];
+	bool first_time;
 };
 
 /* convert left adjusted 13-bit TMP102 register value to milliCelsius */
@@ -93,6 +94,7 @@ static struct tmp102 *tmp102_update_device(struct device *dev)
 				tmp102->temp[i] = tmp102_reg_to_mC(status);
 		}
 		tmp102->last_update = jiffies;
+		tmp102->first_time = false;
 	}
 	mutex_unlock(&tmp102->lock);
 	return tmp102;
@@ -102,6 +104,12 @@ static int tmp102_read_temp(void *dev, int *temp)
 {
 	struct tmp102 *tmp102 = tmp102_update_device(dev);
 
+	/* Is it too early even to return a conversion? */
+	if (tmp102->first_time) {
+		dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
+		return -EAGAIN;
+	}
+
 	*temp = tmp102->temp[0];
 
 	return 0;
@@ -114,6 +122,10 @@ static ssize_t tmp102_show_temp(struct device *dev,
 	struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
 	struct tmp102 *tmp102 = tmp102_update_device(dev);
 
+	/* Is it too early even to return a read? */
+	if (tmp102->first_time)
+		return -EAGAIN;
+
 	return sprintf(buf, "%d\n", tmp102->temp[sda->index]);
 }
 
@@ -207,7 +219,9 @@ static int tmp102_probe(struct i2c_client *client,
 		status = -ENODEV;
 		goto fail_restore_config;
 	}
-	tmp102->last_update = jiffies - HZ;
+	tmp102->last_update = jiffies;
+	/* Mark that we are not ready with data until conversion is complete */
+	tmp102->first_time = true;
 	mutex_init(&tmp102->lock);
 
 	hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
-- 
2.6.2.402.g2635c2b


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] hwmon: (tmp102) Force wait for conversion time for the first valid data
  2015-12-01 16:10 ` [PATCH V2] " Nishanth Menon
@ 2015-12-01 21:06   ` Guenter Roeck
  0 siblings, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2015-12-01 21:06 UTC (permalink / raw)
  To: Nishanth Menon
  Cc: Jean Delvare, linux-kernel, lm-sensors, linux-omap,
	beagleboard-x15, Eduardo Valentin

On Tue, Dec 01, 2015 at 10:10:21AM -0600, Nishanth Menon wrote:
> TMP102 works based on conversions done periodically. However, as per
> the TMP102 data sheet[1] the first conversion is triggered immediately
> after we program the configuration register. The temperature data
> registers do not reflect proper data until the first conversion is
> complete (in our case HZ/4).
> 
> The driver currently sets the last_update to be jiffies - HZ, just
> after the configuration is complete. When TMP102 driver registers
> with the thermal framework, it immediately tries to read the sensor
> temperature data. This takes place even before the conversion on the
> TMP102 is complete and results in an invalid temperature read.
> 
> Depending on the value read, this may cause thermal framework to
> assume that a critical temperature event has occurred and attempts to
> shutdown the system.
> 
> Instead of causing an invalid mid-conversion value to be read
> erroneously, we mark the last_update to be in-line with the current
> jiffies. This allows the tmp102_update_device function to skip update
> until the required conversion time is complete. Further, we ensure to
> return -EAGAIN result instead of returning spurious temperature (such
> as 0C) values to the caller to prevent any wrong decisions made with
> such values. NOTE: this allows the read functions not to be blocking
> and allows the callers to make the decision if they would like to
> block or try again later. At least the current user(thermal) seems to
> handle this by retrying later.
> 
> A simpler alternative approach could be to sleep in the probe for the
> duration required, but that will result in latency that is undesirable
> and delay boot sequence un-necessarily.
> 
> [1] http://www.ti.com/lit/ds/symlink/tmp102.pdf
> 
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Reported-by: Aparna Balasubramanian <aparnab@ti.com>
> Reported-by: Elvita Lobo <elvita@ti.com>
> Reported-by: Yan Liu <yan-liu@ti.com>
> Signed-off-by: Nishanth Menon <nm@ti.com>

Applied.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-01 21:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-01  4:25 [PATCH] hwmon: (tmp102) Force wait for conversion time for the first valid data Nishanth Menon
2015-12-01  5:50 ` Guenter Roeck
2015-12-01 13:47   ` Nishanth Menon
2015-12-01 14:21     ` Nishanth Menon
2015-12-01 15:09       ` Guenter Roeck
2015-12-01 15:14         ` Nishanth Menon
2015-12-01 16:10 ` [PATCH V2] " Nishanth Menon
2015-12-01 21:06   ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).