linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
@ 2020-12-30  8:43 Tony Lindgren
  2020-12-30  8:43 ` [PATCH 2/3] thermal: ti-soc-thermal: Simplify polling with iopoll Tony Lindgren
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Tony Lindgren @ 2020-12-30  8:43 UTC (permalink / raw)
  To: Amit Kucheria, Daniel Lezcano, Zhang Rui
  Cc: Eduardo Valentin, Keerthy, linux-pm, linux-kernel, linux-omap,
	Adam Ford, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel

At least for 4430, trying to use the single conversion mode eventually
hangs the thermal sensor. This can be quite easily seen with errors:

thermal thermal_zone0: failed to read out thermal zone (-5)

Also, trying to read the temperature shows a stuck value with:

$ while true; do cat /sys/class/thermal/thermal_zone0/temp; done

Where the temperature is not rising at all with the busy loop.

Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
single conversion mode while it works fine in continuous conversion mode.
It is also possible that the hung temperature sensor can affect the
thermal shutdown alert too.

Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
use it for 4430.

Note that we also need to add udelay to for the EOCZ (end of conversion)
bit polling as otherwise we have it time out too early on 4430. We'll be
changing the loop to use iopoll in the following clean-up patch.

Cc: Adam Ford <aford173@gmail.com>
Cc: Carl Philipp Klemm <philipp@uvos.xyz>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
 drivers/thermal/ti-soc-thermal/ti-bandgap.c         | 9 +++++++--
 drivers/thermal/ti-soc-thermal/ti-bandgap.h         | 2 ++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
--- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
+++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
@@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - OMAP4430_ADC_START_VALUE + 1] = {
 const struct ti_bandgap_data omap4430_data = {
 	.features = TI_BANDGAP_FEATURE_MODE_CONFIG |
 			TI_BANDGAP_FEATURE_CLK_CTRL |
-			TI_BANDGAP_FEATURE_POWER_SWITCH,
+			TI_BANDGAP_FEATURE_POWER_SWITCH |
+			TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
 	.fclock_name = "bandgap_fclk",
 	.div_ck_name = "bandgap_fclk",
 	.conv_table = omap4430_adc_to_temp,
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -15,6 +15,7 @@
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
 #include <linux/clk.h>
+#include <linux/delay.h>
 #include <linux/gpio/consumer.h>
 #include <linux/platform_device.h>
 #include <linux/err.h>
@@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
 	u32 counter = 1000;
 	struct temp_sensor_registers *tsr;
 
-	/* Select single conversion mode */
-	if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
+	/* Select continuous or single conversion mode */
+	if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
+		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
+	else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
 		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
 
 	/* Start of Conversion = 1 */
@@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
 		if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
 		    tsr->bgap_eocz_mask)
 			break;
+		udelay(1);
 	}
 
 	/* Start of Conversion = 0 */
@@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
 		if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
 		      tsr->bgap_eocz_mask))
 			break;
+		udelay(1);
 	}
 
 	return 0;
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
@@ -280,6 +280,7 @@ struct ti_temp_sensor {
  *	has Errata 814
  * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
  *	inaccurate.
+ * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
  * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
  *      specific feature (above) or not. Return non-zero, if yes.
  */
@@ -295,6 +296,7 @@ struct ti_temp_sensor {
 #define TI_BANDGAP_FEATURE_HISTORY_BUFFER	BIT(9)
 #define TI_BANDGAP_FEATURE_ERRATA_814		BIT(10)
 #define TI_BANDGAP_FEATURE_UNRELIABLE		BIT(11)
+#define TI_BANDGAP_FEATURE_CONT_MODE_ONLY	BIT(12)
 #define TI_BANDGAP_HAS(b, f)			\
 			((b)->conf->features & TI_BANDGAP_FEATURE_ ## f)
 
-- 
2.29.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/3] thermal: ti-soc-thermal: Simplify polling with iopoll
  2020-12-30  8:43 [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Tony Lindgren
@ 2020-12-30  8:43 ` Tony Lindgren
  2020-12-30  8:43 ` [PATCH 3/3] thermal: ti-soc-thermal: Use non-inverted define for omap4 Tony Lindgren
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Tony Lindgren @ 2020-12-30  8:43 UTC (permalink / raw)
  To: Amit Kucheria, Daniel Lezcano, Zhang Rui
  Cc: Eduardo Valentin, Keerthy, linux-pm, linux-kernel, linux-omap,
	Adam Ford, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel

We can use iopoll for checking the EOCZ (end of conversion) bit.

Cc: Adam Ford <aford173@gmail.com>
Cc: Carl Philipp Klemm <philipp@uvos.xyz>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/thermal/ti-soc-thermal/ti-bandgap.c | 30 ++++++++++-----------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -15,7 +15,6 @@
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
 #include <linux/clk.h>
-#include <linux/delay.h>
 #include <linux/gpio/consumer.h>
 #include <linux/platform_device.h>
 #include <linux/err.h>
@@ -27,6 +26,7 @@
 #include <linux/of_platform.h>
 #include <linux/of_irq.h>
 #include <linux/io.h>
+#include <linux/iopoll.h>
 #include <linux/cpu_pm.h>
 #include <linux/device.h>
 #include <linux/pm_runtime.h>
@@ -603,8 +603,9 @@ void *ti_bandgap_get_sensor_data(struct ti_bandgap *bgp, int id)
 static int
 ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
 {
-	u32 counter = 1000;
 	struct temp_sensor_registers *tsr;
+	int error;
+	u32 val;
 
 	/* Select continuous or single conversion mode */
 	if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
@@ -615,27 +616,24 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
 	/* Start of Conversion = 1 */
 	RMW_BITS(bgp, id, temp_sensor_ctrl, bgap_soc_mask, 1);
 
-	/* Wait for EOCZ going up */
 	tsr = bgp->conf->sensors[id].registers;
 
-	while (--counter) {
-		if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
-		    tsr->bgap_eocz_mask)
-			break;
-		udelay(1);
-	}
+	/* Wait for EOCZ going up */
+	error = readl_poll_timeout_atomic(bgp->base + tsr->temp_sensor_ctrl,
+					  val, val & tsr->bgap_eocz_mask,
+					  1, 1000);
+	if (error)
+		dev_warn(bgp->dev, "eocz timed out waiting high\n");
 
 	/* Start of Conversion = 0 */
 	RMW_BITS(bgp, id, temp_sensor_ctrl, bgap_soc_mask, 0);
 
 	/* Wait for EOCZ going down */
-	counter = 1000;
-	while (--counter) {
-		if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
-		      tsr->bgap_eocz_mask))
-			break;
-		udelay(1);
-	}
+	error = readl_poll_timeout_atomic(bgp->base + tsr->temp_sensor_ctrl,
+					  val, !(val & tsr->bgap_eocz_mask),
+					  1, 1000);
+	if (error)
+		dev_warn(bgp->dev, "eocz timed out waiting low\n");
 
 	return 0;
 }
-- 
2.29.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/3] thermal: ti-soc-thermal: Use non-inverted define for omap4
  2020-12-30  8:43 [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Tony Lindgren
  2020-12-30  8:43 ` [PATCH 2/3] thermal: ti-soc-thermal: Simplify polling with iopoll Tony Lindgren
@ 2020-12-30  8:43 ` Tony Lindgren
  2020-12-30 12:55 ` [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Adam Ford
  2020-12-31 12:55 ` Péter Ujfalusi
  3 siblings, 0 replies; 11+ messages in thread
From: Tony Lindgren @ 2020-12-30  8:43 UTC (permalink / raw)
  To: Amit Kucheria, Daniel Lezcano, Zhang Rui
  Cc: Eduardo Valentin, Keerthy, linux-pm, linux-kernel, linux-omap,
	Adam Ford, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel

When we set bit 10 high we use continuous mode and not single
mode. Let's correct this to avoid confusion. No functional
changes here, the code does the right thing with bit 10.

Cc: Adam Ford <aford173@gmail.com>
Cc: Carl Philipp Klemm <philipp@uvos.xyz>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 4 ++--
 drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
--- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
+++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
@@ -24,7 +24,7 @@ omap4430_mpu_temp_sensor_registers = {
 	.bgap_dtemp_mask = OMAP4430_BGAP_TEMP_SENSOR_DTEMP_MASK,
 
 	.bgap_mode_ctrl = OMAP4430_TEMP_SENSOR_CTRL_OFFSET,
-	.mode_ctrl_mask = OMAP4430_SINGLE_MODE_MASK,
+	.mode_ctrl_mask = OMAP4430_CONTINUOUS_MODE_MASK,
 
 	.bgap_efuse = OMAP4430_FUSE_OPP_BGAP,
 };
@@ -97,7 +97,7 @@ omap4460_mpu_temp_sensor_registers = {
 	.mask_cold_mask = OMAP4460_MASK_COLD_MASK,
 
 	.bgap_mode_ctrl = OMAP4460_BGAP_CTRL_OFFSET,
-	.mode_ctrl_mask = OMAP4460_SINGLE_MODE_MASK,
+	.mode_ctrl_mask = OMAP4460_CONTINUOUS_MODE_MASK,
 
 	.bgap_counter = OMAP4460_BGAP_COUNTER_OFFSET,
 	.counter_mask = OMAP4460_COUNTER_MASK,
diff --git a/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h b/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h
--- a/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h
+++ b/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h
@@ -40,7 +40,7 @@
 /* OMAP4430.TEMP_SENSOR bits */
 #define OMAP4430_BGAP_TEMPSOFF_MASK			BIT(12)
 #define OMAP4430_BGAP_TSHUT_MASK			BIT(11)
-#define OMAP4430_SINGLE_MODE_MASK			BIT(10)
+#define OMAP4430_CONTINUOUS_MODE_MASK			BIT(10)
 #define OMAP4430_BGAP_TEMP_SENSOR_SOC_MASK		BIT(9)
 #define OMAP4430_BGAP_TEMP_SENSOR_EOCZ_MASK		BIT(8)
 #define OMAP4430_BGAP_TEMP_SENSOR_DTEMP_MASK		(0xff << 0)
@@ -113,7 +113,7 @@
 #define OMAP4460_BGAP_TEMP_SENSOR_DTEMP_MASK		(0x3ff << 0)
 
 /* OMAP4460.BANDGAP_CTRL bits */
-#define OMAP4460_SINGLE_MODE_MASK			BIT(31)
+#define OMAP4460_CONTINUOUS_MODE_MASK			BIT(31)
 #define OMAP4460_MASK_HOT_MASK				BIT(1)
 #define OMAP4460_MASK_COLD_MASK				BIT(0)
 
-- 
2.29.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2020-12-30  8:43 [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Tony Lindgren
  2020-12-30  8:43 ` [PATCH 2/3] thermal: ti-soc-thermal: Simplify polling with iopoll Tony Lindgren
  2020-12-30  8:43 ` [PATCH 3/3] thermal: ti-soc-thermal: Use non-inverted define for omap4 Tony Lindgren
@ 2020-12-30 12:55 ` Adam Ford
  2020-12-30 13:26   ` H. Nikolaus Schaller
  2020-12-31 12:55 ` Péter Ujfalusi
  3 siblings, 1 reply; 11+ messages in thread
From: Adam Ford @ 2020-12-30 12:55 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Amit Kucheria, Daniel Lezcano, Zhang Rui, Eduardo Valentin,
	Keerthy, linux-pm, Linux Kernel Mailing List, Linux-OMAP,
	Carl Philipp Klemm, Merlijn Wajer, Pavel Machek, Peter Ujfalusi,
	Sebastian Reichel

On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@atomide.com> wrote:
>
> At least for 4430, trying to use the single conversion mode eventually
> hangs the thermal sensor. This can be quite easily seen with errors:
>
> thermal thermal_zone0: failed to read out thermal zone (-5)
>
> Also, trying to read the temperature shows a stuck value with:
>
> $ while true; do cat /sys/class/thermal/thermal_zone0/temp; done
>
> Where the temperature is not rising at all with the busy loop.
>
> Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
> single conversion mode while it works fine in continuous conversion mode.
> It is also possible that the hung temperature sensor can affect the
> thermal shutdown alert too.
>
> Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
> use it for 4430.
>
> Note that we also need to add udelay to for the EOCZ (end of conversion)
> bit polling as otherwise we have it time out too early on 4430. We'll be
> changing the loop to use iopoll in the following clean-up patch.
>
> Cc: Adam Ford <aford173@gmail.com>

I don't have an OMAP4, but if you want, I can test a DM3730.

adam

> Cc: Carl Philipp Klemm <philipp@uvos.xyz>
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Cc: Merlijn Wajer <merlijn@wizzup.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
> Cc: Sebastian Reichel <sre@kernel.org>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
>  drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c         | 9 +++++++--
>  drivers/thermal/ti-soc-thermal/ti-bandgap.h         | 2 ++
>  3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> --- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> +++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - OMAP4430_ADC_START_VALUE + 1] = {
>  const struct ti_bandgap_data omap4430_data = {
>         .features = TI_BANDGAP_FEATURE_MODE_CONFIG |
>                         TI_BANDGAP_FEATURE_CLK_CTRL |
> -                       TI_BANDGAP_FEATURE_POWER_SWITCH,
> +                       TI_BANDGAP_FEATURE_POWER_SWITCH |
> +                       TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
>         .fclock_name = "bandgap_fclk",
>         .div_ck_name = "bandgap_fclk",
>         .conv_table = omap4430_adc_to_temp,
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> @@ -15,6 +15,7 @@
>  #include <linux/kernel.h>
>  #include <linux/interrupt.h>
>  #include <linux/clk.h>
> +#include <linux/delay.h>
>  #include <linux/gpio/consumer.h>
>  #include <linux/platform_device.h>
>  #include <linux/err.h>
> @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>         u32 counter = 1000;
>         struct temp_sensor_registers *tsr;
>
> -       /* Select single conversion mode */
> -       if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> +       /* Select continuous or single conversion mode */
> +       if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> +               RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> +       else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>                 RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
>
>         /* Start of Conversion = 1 */
> @@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>                 if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>                     tsr->bgap_eocz_mask)
>                         break;
> +               udelay(1);
>         }
>
>         /* Start of Conversion = 0 */
> @@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>                 if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>                       tsr->bgap_eocz_mask))
>                         break;
> +               udelay(1);
>         }
>
>         return 0;
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> @@ -280,6 +280,7 @@ struct ti_temp_sensor {
>   *     has Errata 814
>   * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
>   *     inaccurate.
> + * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
>   * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
>   *      specific feature (above) or not. Return non-zero, if yes.
>   */
> @@ -295,6 +296,7 @@ struct ti_temp_sensor {
>  #define TI_BANDGAP_FEATURE_HISTORY_BUFFER      BIT(9)
>  #define TI_BANDGAP_FEATURE_ERRATA_814          BIT(10)
>  #define TI_BANDGAP_FEATURE_UNRELIABLE          BIT(11)
> +#define TI_BANDGAP_FEATURE_CONT_MODE_ONLY      BIT(12)
>  #define TI_BANDGAP_HAS(b, f)                   \
>                         ((b)->conf->features & TI_BANDGAP_FEATURE_ ## f)
>
> --
> 2.29.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2020-12-30 12:55 ` [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Adam Ford
@ 2020-12-30 13:26   ` H. Nikolaus Schaller
  2021-01-08  7:22     ` Tony Lindgren
  0 siblings, 1 reply; 11+ messages in thread
From: H. Nikolaus Schaller @ 2020-12-30 13:26 UTC (permalink / raw)
  To: Adam Ford, Tony Lindgren
  Cc: Amit Kucheria, Daniel Lezcano, Zhang Rui, Eduardo Valentin,
	Keerthy, linux-pm, Linux Kernel Mailing List, Linux-OMAP,
	Carl Philipp Klemm, Merlijn Wajer, Pavel Machek, Peter Ujfalusi,
	Sebastian Reichel, Andreas Kemnade

Hi Adam and Tony,

> Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@gmail.com>:
> 
> On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@atomide.com> wrote:
>> 
>> At least for 4430, trying to use the single conversion mode eventually
>> hangs the thermal sensor. This can be quite easily seen with errors:
>> 
>> thermal thermal_zone0: failed to read out thermal zone (-5)
>> 
>> Also, trying to read the temperature shows a stuck value with:
>> 
>> $ while true; do cat /sys/class/thermal/thermal_zone0/temp; done
>> 
>> Where the temperature is not rising at all with the busy loop.
>> 
>> Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
>> single conversion mode while it works fine in continuous conversion mode.
>> It is also possible that the hung temperature sensor can affect the
>> thermal shutdown alert too.
>> 
>> Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
>> use it for 4430.
>> 
>> Note that we also need to add udelay to for the EOCZ (end of conversion)
>> bit polling as otherwise we have it time out too early on 4430. We'll be
>> changing the loop to use iopoll in the following clean-up patch.
>> 
>> Cc: Adam Ford <aford173@gmail.com>
> 
> I don't have an OMAP4, but if you want, I can test a DM3730.

Indeed I remember a similar discussion from the DM3730 [1]. temp values were
always those from the last measurement. E.g. the first one was done
during (cold) boot and the first request after 10 minutes did show a
quite cold system... The next one did show a hot system independent
of what had been between (suspend or high activity).

It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
So it is quite fundamental.

We tried to fix it but did not come to a solution [2]. So we opened an issue
in our tracker [3] and decided to stay with continuous conversion although this
raises idle mode processor load.

BR,
Nikolaus

[1]: https://lists.goldelico.com/pipermail/letux-kernel/2019-September/003958.html
[2]: https://lists.goldelico.com/pipermail/letux-kernel/2019-September/003975.html
[3]: https://projects.goldelico.com/p/gta04-kernel/issues/928/

> adam
> 
>> Cc: Carl Philipp Klemm <philipp@uvos.xyz>
>> Cc: Eduardo Valentin <edubezval@gmail.com>
>> Cc: Merlijn Wajer <merlijn@wizzup.org>
>> Cc: Pavel Machek <pavel@ucw.cz>
>> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
>> Cc: Sebastian Reichel <sre@kernel.org>
>> Signed-off-by: Tony Lindgren <tony@atomide.com>
>> ---
>> drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
>> drivers/thermal/ti-soc-thermal/ti-bandgap.c         | 9 +++++++--
>> drivers/thermal/ti-soc-thermal/ti-bandgap.h         | 2 ++
>> 3 files changed, 11 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
>> --- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
>> +++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
>> @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - OMAP4430_ADC_START_VALUE + 1] = {
>> const struct ti_bandgap_data omap4430_data = {
>>        .features = TI_BANDGAP_FEATURE_MODE_CONFIG |
>>                        TI_BANDGAP_FEATURE_CLK_CTRL |
>> -                       TI_BANDGAP_FEATURE_POWER_SWITCH,
>> +                       TI_BANDGAP_FEATURE_POWER_SWITCH |
>> +                       TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
>>        .fclock_name = "bandgap_fclk",
>>        .div_ck_name = "bandgap_fclk",
>>        .conv_table = omap4430_adc_to_temp,
>> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
>> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
>> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
>> @@ -15,6 +15,7 @@
>> #include <linux/kernel.h>
>> #include <linux/interrupt.h>
>> #include <linux/clk.h>
>> +#include <linux/delay.h>
>> #include <linux/gpio/consumer.h>
>> #include <linux/platform_device.h>
>> #include <linux/err.h>
>> @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>>        u32 counter = 1000;
>>        struct temp_sensor_registers *tsr;
>> 
>> -       /* Select single conversion mode */
>> -       if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>> +       /* Select continuous or single conversion mode */
>> +       if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
>> +               RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
>> +       else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>>                RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
>> 
>>        /* Start of Conversion = 1 */
>> @@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>>                if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>>                    tsr->bgap_eocz_mask)
>>                        break;
>> +               udelay(1);
>>        }
>> 
>>        /* Start of Conversion = 0 */
>> @@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>>                if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>>                      tsr->bgap_eocz_mask))
>>                        break;
>> +               udelay(1);
>>        }
>> 
>>        return 0;
>> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
>> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
>> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
>> @@ -280,6 +280,7 @@ struct ti_temp_sensor {
>>  *     has Errata 814
>>  * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
>>  *     inaccurate.
>> + * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
>>  * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
>>  *      specific feature (above) or not. Return non-zero, if yes.
>>  */
>> @@ -295,6 +296,7 @@ struct ti_temp_sensor {
>> #define TI_BANDGAP_FEATURE_HISTORY_BUFFER      BIT(9)
>> #define TI_BANDGAP_FEATURE_ERRATA_814          BIT(10)
>> #define TI_BANDGAP_FEATURE_UNRELIABLE          BIT(11)
>> +#define TI_BANDGAP_FEATURE_CONT_MODE_ONLY      BIT(12)
>> #define TI_BANDGAP_HAS(b, f)                   \
>>                        ((b)->conf->features & TI_BANDGAP_FEATURE_ ## f)
>> 
>> --
>> 2.29.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2020-12-30  8:43 [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Tony Lindgren
                   ` (2 preceding siblings ...)
  2020-12-30 12:55 ` [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Adam Ford
@ 2020-12-31 12:55 ` Péter Ujfalusi
  2021-01-08  7:19   ` Tony Lindgren
  3 siblings, 1 reply; 11+ messages in thread
From: Péter Ujfalusi @ 2020-12-31 12:55 UTC (permalink / raw)
  To: Tony Lindgren, Amit Kucheria, Daniel Lezcano, Zhang Rui
  Cc: Eduardo Valentin, Keerthy, linux-pm, linux-kernel, linux-omap,
	Adam Ford, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Sebastian Reichel

Hi Tony,

On 12/30/20 10:43 AM, Tony Lindgren wrote:
> At least for 4430, trying to use the single conversion mode eventually
> hangs the thermal sensor. This can be quite easily seen with errors:
> 
> thermal thermal_zone0: failed to read out thermal zone (-5)
> 
> Also, trying to read the temperature shows a stuck value with:
> 
> $ while true; do cat /sys/class/thermal/thermal_zone0/temp; done
> 
> Where the temperature is not rising at all with the busy loop.
> 
> Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
> single conversion mode while it works fine in continuous conversion mode.
> It is also possible that the hung temperature sensor can affect the
> thermal shutdown alert too.
> 
> Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
> use it for 4430.
> 
> Note that we also need to add udelay to for the EOCZ (end of conversion)
> bit polling as otherwise we have it time out too early on 4430. We'll be
> changing the loop to use iopoll in the following clean-up patch.

I don't yet have my setup in working condition, so I can not test these.

> Cc: Adam Ford <aford173@gmail.com>
> Cc: Carl Philipp Klemm <philipp@uvos.xyz>
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Cc: Merlijn Wajer <merlijn@wizzup.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
> Cc: Sebastian Reichel <sre@kernel.org>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
>  drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c         | 9 +++++++--
>  drivers/thermal/ti-soc-thermal/ti-bandgap.h         | 2 ++
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> --- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> +++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - OMAP4430_ADC_START_VALUE + 1] = {
>  const struct ti_bandgap_data omap4430_data = {
>  	.features = TI_BANDGAP_FEATURE_MODE_CONFIG |
>  			TI_BANDGAP_FEATURE_CLK_CTRL |
> -			TI_BANDGAP_FEATURE_POWER_SWITCH,
> +			TI_BANDGAP_FEATURE_POWER_SWITCH |
> +			TI_BANDGAP_FEATURE_CONT_MODE_ONLY,

Can we add a comment with the observations?

>  	.fclock_name = "bandgap_fclk",
>  	.div_ck_name = "bandgap_fclk",
>  	.conv_table = omap4430_adc_to_temp,
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> @@ -15,6 +15,7 @@
>  #include <linux/kernel.h>
>  #include <linux/interrupt.h>
>  #include <linux/clk.h>
> +#include <linux/delay.h>
>  #include <linux/gpio/consumer.h>
>  #include <linux/platform_device.h>
>  #include <linux/err.h>
> @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>  	u32 counter = 1000;
>  	struct temp_sensor_registers *tsr;
>  
> -	/* Select single conversion mode */
> -	if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> +	/* Select continuous or single conversion mode */
> +	if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> +		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> +	else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>  		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);

Would not be better to:
if (TI_BANDGAP_HAS(bgp, MODE_CONFIG)) {
	if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
	else
		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
}

One can only switch to cont/single mode if the mode config is possible.

>  
>  	/* Start of Conversion = 1 */
> @@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>  		if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>  		    tsr->bgap_eocz_mask)
>  			break;
> +		udelay(1);
>  	}
>  
>  	/* Start of Conversion = 0 */
> @@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
>  		if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>  		      tsr->bgap_eocz_mask))
>  			break;
> +		udelay(1);
>  	}
>  
>  	return 0;
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> @@ -280,6 +280,7 @@ struct ti_temp_sensor {
>   *	has Errata 814
>   * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
>   *	inaccurate.
> + * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
>   * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
>   *      specific feature (above) or not. Return non-zero, if yes.
>   */
> @@ -295,6 +296,7 @@ struct ti_temp_sensor {
>  #define TI_BANDGAP_FEATURE_HISTORY_BUFFER	BIT(9)
>  #define TI_BANDGAP_FEATURE_ERRATA_814		BIT(10)
>  #define TI_BANDGAP_FEATURE_UNRELIABLE		BIT(11)
> +#define TI_BANDGAP_FEATURE_CONT_MODE_ONLY	BIT(12)
>  #define TI_BANDGAP_HAS(b, f)			\
>  			((b)->conf->features & TI_BANDGAP_FEATURE_ ## f)
>  
> 

-- 
Péter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2020-12-31 12:55 ` Péter Ujfalusi
@ 2021-01-08  7:19   ` Tony Lindgren
  0 siblings, 0 replies; 11+ messages in thread
From: Tony Lindgren @ 2021-01-08  7:19 UTC (permalink / raw)
  To: Péter Ujfalusi
  Cc: Amit Kucheria, Daniel Lezcano, Zhang Rui, Eduardo Valentin,
	Keerthy, linux-pm, linux-kernel, linux-omap, Adam Ford,
	Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Sebastian Reichel

* Péter Ujfalusi <peter.ujfalusi@gmail.com> [201231 12:55]:
> On 12/30/20 10:43 AM, Tony Lindgren wrote:
> > @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - OMAP4430_ADC_START_VALUE + 1] = {
> >  const struct ti_bandgap_data omap4430_data = {
> >  	.features = TI_BANDGAP_FEATURE_MODE_CONFIG |
> >  			TI_BANDGAP_FEATURE_CLK_CTRL |
> > -			TI_BANDGAP_FEATURE_POWER_SWITCH,
> > +			TI_BANDGAP_FEATURE_POWER_SWITCH |
> > +			TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
> 
> Can we add a comment with the observations?

Sure, and I also noticed that the timeout triggers also on dra7
too. I need to recheck what all are affected.. At least we now
see warnings on the SoCs affected.

> > @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
> >  	u32 counter = 1000;
> >  	struct temp_sensor_registers *tsr;
> >  
> > -	/* Select single conversion mode */
> > -	if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> > +	/* Select continuous or single conversion mode */
> > +	if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> > +		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> > +	else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> >  		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
> 
> Would not be better to:
> if (TI_BANDGAP_HAS(bgp, MODE_CONFIG)) {
> 	if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> 		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> 	else
> 		RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
> }
> 
> One can only switch to cont/single mode if the mode config is possible.

Yup makes sense thanks for spotting that.

Regards,

Tony

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2020-12-30 13:26   ` H. Nikolaus Schaller
@ 2021-01-08  7:22     ` Tony Lindgren
  2021-01-08 13:45       ` Adam Ford
  0 siblings, 1 reply; 11+ messages in thread
From: Tony Lindgren @ 2021-01-08  7:22 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Adam Ford, Amit Kucheria, Daniel Lezcano, Zhang Rui,
	Eduardo Valentin, Keerthy, linux-pm, Linux Kernel Mailing List,
	Linux-OMAP, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel, Andreas Kemnade

* H. Nikolaus Schaller <hns@goldelico.com> [201230 13:29]:
> > Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@gmail.com>:
> > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@atomide.com> wrote:
> >> 
> >> At least for 4430, trying to use the single conversion mode eventually
> >> hangs the thermal sensor. This can be quite easily seen with errors:
> >> 
> >> thermal thermal_zone0: failed to read out thermal zone (-5)
...

> > I don't have an OMAP4, but if you want, I can test a DM3730.
> 
> Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> always those from the last measurement. E.g. the first one was done
> during (cold) boot and the first request after 10 minutes did show a
> quite cold system... The next one did show a hot system independent
> of what had been between (suspend or high activity).
> 
> It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
> So it is quite fundamental.
> 
> We tried to fix it but did not come to a solution [2]. So we opened an issue
> in our tracker [3] and decided to stay with continuous conversion although this
> raises idle mode processor load.

Hmm so maybe eocz high always times out in single mode since it also
triggers at least on dra7?

Yes it would be great if you guys can the $subject patch a try at
least on your omap36xx and omap5 boards and see if you see eocz
time out warnings in dmesg.

Regards,

Tony

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2021-01-08  7:22     ` Tony Lindgren
@ 2021-01-08 13:45       ` Adam Ford
  2021-01-08 18:31         ` Adam Ford
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Ford @ 2021-01-08 13:45 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: H. Nikolaus Schaller, Amit Kucheria, Daniel Lezcano, Zhang Rui,
	Eduardo Valentin, Keerthy, linux-pm, Linux Kernel Mailing List,
	Linux-OMAP, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel, Andreas Kemnade

On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * H. Nikolaus Schaller <hns@goldelico.com> [201230 13:29]:
> > > Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@gmail.com>:
> > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@atomide.com> wrote:
> > >>
> > >> At least for 4430, trying to use the single conversion mode eventually
> > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > >>
> > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> ...
>
> > > I don't have an OMAP4, but if you want, I can test a DM3730.
> >
> > Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> > always those from the last measurement. E.g. the first one was done
> > during (cold) boot and the first request after 10 minutes did show a
> > quite cold system... The next one did show a hot system independent
> > of what had been between (suspend or high activity).
> >
> > It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
> > So it is quite fundamental.
> >
> > We tried to fix it but did not come to a solution [2]. So we opened an issue
> > in our tracker [3] and decided to stay with continuous conversion although this
> > raises idle mode processor load.
>
> Hmm so maybe eocz high always times out in single mode since it also
> triggers at least on dra7?
>
> Yes it would be great if you guys can the $subject patch a try at
> least on your omap36xx and omap5 boards and see if you see eocz
> time out warnings in dmesg.

I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.

adam
>
> Regards,
>
> Tony

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2021-01-08 13:45       ` Adam Ford
@ 2021-01-08 18:31         ` Adam Ford
  2021-01-08 19:41           ` Adam Ford
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Ford @ 2021-01-08 18:31 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: H. Nikolaus Schaller, Amit Kucheria, Daniel Lezcano, Zhang Rui,
	Eduardo Valentin, Keerthy, linux-pm, Linux Kernel Mailing List,
	Linux-OMAP, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel, Andreas Kemnade

On Fri, Jan 8, 2021 at 7:45 AM Adam Ford <aford173@gmail.com> wrote:
>
> On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * H. Nikolaus Schaller <hns@goldelico.com> [201230 13:29]:
> > > > Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@gmail.com>:
> > > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >>
> > > >> At least for 4430, trying to use the single conversion mode eventually
> > > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > > >>
> > > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> > ...
> >
> > > > I don't have an OMAP4, but if you want, I can test a DM3730.
> > >
> > > Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> > > always those from the last measurement. E.g. the first one was done
> > > during (cold) boot and the first request after 10 minutes did show a
> > > quite cold system... The next one did show a hot system independent
> > > of what had been between (suspend or high activity).
> > >
> > > It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
> > > So it is quite fundamental.
> > >
> > > We tried to fix it but did not come to a solution [2]. So we opened an issue
> > > in our tracker [3] and decided to stay with continuous conversion although this
> > > raises idle mode processor load.
> >
> > Hmm so maybe eocz high always times out in single mode since it also
> > triggers at least on dra7?
> >
> > Yes it would be great if you guys can the $subject patch a try at
> > least on your omap36xx and omap5 boards and see if you see eocz
> > time out warnings in dmesg.
>
> I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.

I am going to be a bit delayed testing this.  I cannot boot omap2plus
using Linux version 5.11.0-rc2.

[    2.666748] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc
[    2.673309] nand: Micron MT29F4G16ABBDA3W
[    2.677368] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 64
[    2.685119] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW
[    2.693237] Invalid ECC layout
[    2.696350] omap2-nand 30000000.nand: unable to use BCH library
[    2.702575] omap2-nand: probe of 30000000.nand failed with error -22
[    2.716094] 8<--- cut here ---
[    2.719207] Unable to handle kernel NULL pointer dereference at
virtual address 00000018
[    2.727600] pgd = (ptrval)
...
[    3.050933] ---[ end trace 59640c7399a80a07 ]---
[    3.055603] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[    3.063323] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---

Once I get past this, I'll try to test the thermal stuff.

adam

>
> adam
> >
> > Regards,
> >
> > Tony

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
  2021-01-08 18:31         ` Adam Ford
@ 2021-01-08 19:41           ` Adam Ford
  0 siblings, 0 replies; 11+ messages in thread
From: Adam Ford @ 2021-01-08 19:41 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: H. Nikolaus Schaller, Amit Kucheria, Daniel Lezcano, Zhang Rui,
	Eduardo Valentin, Keerthy, linux-pm, Linux Kernel Mailing List,
	Linux-OMAP, Carl Philipp Klemm, Merlijn Wajer, Pavel Machek,
	Peter Ujfalusi, Sebastian Reichel, Andreas Kemnade

On Fri, Jan 8, 2021 at 12:31 PM Adam Ford <aford173@gmail.com> wrote:
>
> On Fri, Jan 8, 2021 at 7:45 AM Adam Ford <aford173@gmail.com> wrote:
> >
> > On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * H. Nikolaus Schaller <hns@goldelico.com> [201230 13:29]:
> > > > > Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@gmail.com>:
> > > > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >>
> > > > >> At least for 4430, trying to use the single conversion mode eventually
> > > > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > > > >>
> > > > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> > > ...
> > >
> > > > > I don't have an OMAP4, but if you want, I can test a DM3730.
> > > >
> > > > Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> > > > always those from the last measurement. E.g. the first one was done
> > > > during (cold) boot and the first request after 10 minutes did show a
> > > > quite cold system... The next one did show a hot system independent
> > > > of what had been between (suspend or high activity).
> > > >
> > > > It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
> > > > So it is quite fundamental.
> > > >
> > > > We tried to fix it but did not come to a solution [2]. So we opened an issue
> > > > in our tracker [3] and decided to stay with continuous conversion although this
> > > > raises idle mode processor load.
> > >
> > > Hmm so maybe eocz high always times out in single mode since it also
> > > triggers at least on dra7?
> > >
> > > Yes it would be great if you guys can the $subject patch a try at
> > > least on your omap36xx and omap5 boards and see if you see eocz
> > > time out warnings in dmesg.


I do see chatter.

[   15.531005] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low
[   16.571075] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low
[   17.610961] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low

and it repeats quite often.

I would say this patch series would cause a regression on the DM3730.

adam


> >
> > I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.
>
> I am going to be a bit delayed testing this.  I cannot boot omap2plus
> using Linux version 5.11.0-rc2.
>
> [    2.666748] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc
> [    2.673309] nand: Micron MT29F4G16ABBDA3W
> [    2.677368] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 64
> [    2.685119] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW
> [    2.693237] Invalid ECC layout
> [    2.696350] omap2-nand 30000000.nand: unable to use BCH library
> [    2.702575] omap2-nand: probe of 30000000.nand failed with error -22
> [    2.716094] 8<--- cut here ---
> [    2.719207] Unable to handle kernel NULL pointer dereference at
> virtual address 00000018
> [    2.727600] pgd = (ptrval)
> ...
> [    3.050933] ---[ end trace 59640c7399a80a07 ]---
> [    3.055603] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [    3.063323] ---[ end Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x0000000b ]---
>
> Once I get past this, I'll try to test the thermal stuff.
>
> adam
>
> >
> > adam
> > >
> > > Regards,
> > >
> > > Tony

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-08 19:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-30  8:43 [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Tony Lindgren
2020-12-30  8:43 ` [PATCH 2/3] thermal: ti-soc-thermal: Simplify polling with iopoll Tony Lindgren
2020-12-30  8:43 ` [PATCH 3/3] thermal: ti-soc-thermal: Use non-inverted define for omap4 Tony Lindgren
2020-12-30 12:55 ` [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430 Adam Ford
2020-12-30 13:26   ` H. Nikolaus Schaller
2021-01-08  7:22     ` Tony Lindgren
2021-01-08 13:45       ` Adam Ford
2021-01-08 18:31         ` Adam Ford
2021-01-08 19:41           ` Adam Ford
2020-12-31 12:55 ` Péter Ujfalusi
2021-01-08  7:19   ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).