linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
@ 2022-01-14  3:17 Benjamin Li
  2022-01-20  0:35 ` Bjorn Andersson
  2022-01-20 11:40 ` Amit Kucheria
  0 siblings, 2 replies; 4+ messages in thread
From: Benjamin Li @ 2022-01-14  3:17 UTC (permalink / raw)
  To: Amit Kucheria, Thara Gopinath
  Cc: Benjamin Li, Andy Gross, Bjorn Andersson, Rafael J. Wysocki,
	Daniel Lezcano, Zhang Rui, linux-pm, linux-arm-msm, linux-kernel

'echo disabled > .../thermal_zoneX/mode' will disable the thermal core's
polling mechanism to check for threshold trips. This is used sometimes to
run performance test cases.

However, tsens supports an interrupt mechanism to receive notification of
trips, implemented in commit 634e11d5b450 ("drivers: thermal: tsens: Add
interrupt support").

Currently the thermal zone mode that's set by userspace is not checked
before propagating threshold trip events from IRQs. Let's fix this to
restore the abilty to disable thermal throttling at runtime.

====================

Tested on MSM8939 running 5.16.0. This platform has 8 cores; the first
four thermal zones control cpu0-3 and the last zone is for the other four
CPUs together.

  for f in /sys/class/thermal/thermal_zone*; do
    echo "disabled" > $f/mode
    echo $f | paste - $f/type $f/mode
  done

/sys/class/thermal/thermal_zone0	cpu0-thermal	disabled
/sys/class/thermal/thermal_zone1	cpu1-thermal	disabled
/sys/class/thermal/thermal_zone2	cpu2-thermal	disabled
/sys/class/thermal/thermal_zone3	cpu3-thermal	disabled
/sys/class/thermal/thermal_zone4	cpu4567-thermal	disabled

With mitigation thresholds at 75 degC and load running, we can now cruise
past temp=75000 without CPU throttling kicking in.

  watch -n 1 "grep '' /sys/class/thermal/*/temp
      /sys/class/thermal/*/cur_state
      /sys/bus/cpu/devices/cpu*/cpufreq/cpuinfo_cur_freq"

/sys/class/thermal/thermal_zone0/temp:82000
/sys/class/thermal/thermal_zone1/temp:84000
/sys/class/thermal/thermal_zone2/temp:87000
/sys/class/thermal/thermal_zone3/temp:84000
/sys/class/thermal/thermal_zone4/temp:84000
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device1/cur_state:0
/sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu1/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu2/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu3/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu4/cpufreq/cpuinfo_cur_freq:800000
/sys/bus/cpu/devices/cpu5/cpufreq/cpuinfo_cur_freq:800000
/sys/bus/cpu/devices/cpu6/cpufreq/cpuinfo_cur_freq:800000
/sys/bus/cpu/devices/cpu7/cpufreq/cpuinfo_cur_freq:800000

Reported-by: Zac Crosby <zac@squareup.com>
Signed-off-by: Benjamin Li <benl@squareup.com>
---
Changes in v2:
- Reordered sentences in first part of commit message to make sense.

 drivers/thermal/qcom/tsens.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
index 99a8d9f3e03c..0b6299512e7c 100644
--- a/drivers/thermal/qcom/tsens.c
+++ b/drivers/thermal/qcom/tsens.c
@@ -509,13 +509,16 @@ static irqreturn_t tsens_irq_thread(int irq, void *data)
 		spin_unlock_irqrestore(&priv->ul_lock, flags);
 
 		if (trigger) {
-			dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
-				hw_id, __func__, temp);
-			thermal_zone_device_update(s->tzd,
-						   THERMAL_EVENT_UNSPECIFIED);
+			if (s->tzd->mode == THERMAL_DEVICE_ENABLED) {
+				dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
+					hw_id, __func__, temp);
+				thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED);
+			} else {
+				dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped as zone disabled\n",
+					hw_id, __func__, temp);
+			}
 		} else {
-			dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n",
-				hw_id, __func__, temp);
+			dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n", hw_id, __func__, temp);
 		}
 
 		if (tsens_version(priv) < VER_0_1) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
  2022-01-14  3:17 [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting Benjamin Li
@ 2022-01-20  0:35 ` Bjorn Andersson
  2022-01-20 11:40 ` Amit Kucheria
  1 sibling, 0 replies; 4+ messages in thread
From: Bjorn Andersson @ 2022-01-20  0:35 UTC (permalink / raw)
  To: Benjamin Li
  Cc: Amit Kucheria, Thara Gopinath, Andy Gross, Rafael J. Wysocki,
	Daniel Lezcano, Zhang Rui, linux-pm, linux-arm-msm, linux-kernel

On Thu 13 Jan 19:17 PST 2022, Benjamin Li wrote:

> 'echo disabled > .../thermal_zoneX/mode' will disable the thermal core's
> polling mechanism to check for threshold trips. This is used sometimes to
> run performance test cases.
> 
> However, tsens supports an interrupt mechanism to receive notification of
> trips, implemented in commit 634e11d5b450 ("drivers: thermal: tsens: Add
> interrupt support").
> 
> Currently the thermal zone mode that's set by userspace is not checked
> before propagating threshold trip events from IRQs. Let's fix this to
> restore the abilty to disable thermal throttling at runtime.
> 
> ====================
> 
> Tested on MSM8939 running 5.16.0. This platform has 8 cores; the first
> four thermal zones control cpu0-3 and the last zone is for the other four
> CPUs together.
> 
>   for f in /sys/class/thermal/thermal_zone*; do
>     echo "disabled" > $f/mode
>     echo $f | paste - $f/type $f/mode
>   done
> 
> /sys/class/thermal/thermal_zone0	cpu0-thermal	disabled
> /sys/class/thermal/thermal_zone1	cpu1-thermal	disabled
> /sys/class/thermal/thermal_zone2	cpu2-thermal	disabled
> /sys/class/thermal/thermal_zone3	cpu3-thermal	disabled
> /sys/class/thermal/thermal_zone4	cpu4567-thermal	disabled
> 
> With mitigation thresholds at 75 degC and load running, we can now cruise
> past temp=75000 without CPU throttling kicking in.
> 
>   watch -n 1 "grep '' /sys/class/thermal/*/temp
>       /sys/class/thermal/*/cur_state
>       /sys/bus/cpu/devices/cpu*/cpufreq/cpuinfo_cur_freq"
> 
> /sys/class/thermal/thermal_zone0/temp:82000
> /sys/class/thermal/thermal_zone1/temp:84000
> /sys/class/thermal/thermal_zone2/temp:87000
> /sys/class/thermal/thermal_zone3/temp:84000
> /sys/class/thermal/thermal_zone4/temp:84000
> /sys/class/thermal/cooling_device0/cur_state:0
> /sys/class/thermal/cooling_device1/cur_state:0
> /sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu1/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu2/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu3/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu4/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu5/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu6/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu7/cpufreq/cpuinfo_cur_freq:800000
> 
> Reported-by: Zac Crosby <zac@squareup.com>
> Signed-off-by: Benjamin Li <benl@squareup.com>
> ---
> Changes in v2:
> - Reordered sentences in first part of commit message to make sense.

Didn't spot the v2 before replying to v1, still looks good :)

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> 
>  drivers/thermal/qcom/tsens.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
> index 99a8d9f3e03c..0b6299512e7c 100644
> --- a/drivers/thermal/qcom/tsens.c
> +++ b/drivers/thermal/qcom/tsens.c
> @@ -509,13 +509,16 @@ static irqreturn_t tsens_irq_thread(int irq, void *data)
>  		spin_unlock_irqrestore(&priv->ul_lock, flags);
>  
>  		if (trigger) {
> -			dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
> -				hw_id, __func__, temp);
> -			thermal_zone_device_update(s->tzd,
> -						   THERMAL_EVENT_UNSPECIFIED);
> +			if (s->tzd->mode == THERMAL_DEVICE_ENABLED) {
> +				dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
> +					hw_id, __func__, temp);
> +				thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED);
> +			} else {
> +				dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped as zone disabled\n",
> +					hw_id, __func__, temp);
> +			}
>  		} else {
> -			dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n",
> -				hw_id, __func__, temp);
> +			dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n", hw_id, __func__, temp);
>  		}
>  
>  		if (tsens_version(priv) < VER_0_1) {
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
  2022-01-14  3:17 [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting Benjamin Li
  2022-01-20  0:35 ` Bjorn Andersson
@ 2022-01-20 11:40 ` Amit Kucheria
  2022-01-20 19:32   ` Benjamin Li
  1 sibling, 1 reply; 4+ messages in thread
From: Amit Kucheria @ 2022-01-20 11:40 UTC (permalink / raw)
  To: Benjamin Li
  Cc: Andy Gross, Bjorn Andersson, Rafael J. Wysocki, Daniel Lezcano,
	Zhang Rui, Linux PM list, linux-arm-msm, LKML

On Fri, Jan 14, 2022 at 8:47 AM Benjamin Li <benl@squareup.com> wrote:
>
> 'echo disabled > .../thermal_zoneX/mode' will disable the thermal core's
> polling mechanism to check for threshold trips. This is used sometimes to
> run performance test cases.
>
> However, tsens supports an interrupt mechanism to receive notification of
> trips, implemented in commit 634e11d5b450 ("drivers: thermal: tsens: Add
> interrupt support").
>
> Currently the thermal zone mode that's set by userspace is not checked
> before propagating threshold trip events from IRQs. Let's fix this to
> restore the abilty to disable thermal throttling at runtime.
>
> ====================
>
> Tested on MSM8939 running 5.16.0. This platform has 8 cores; the first
> four thermal zones control cpu0-3 and the last zone is for the other four
> CPUs together.
>
>   for f in /sys/class/thermal/thermal_zone*; do
>     echo "disabled" > $f/mode
>     echo $f | paste - $f/type $f/mode
>   done
>
> /sys/class/thermal/thermal_zone0        cpu0-thermal    disabled
> /sys/class/thermal/thermal_zone1        cpu1-thermal    disabled
> /sys/class/thermal/thermal_zone2        cpu2-thermal    disabled
> /sys/class/thermal/thermal_zone3        cpu3-thermal    disabled
> /sys/class/thermal/thermal_zone4        cpu4567-thermal disabled
>
> With mitigation thresholds at 75 degC and load running, we can now cruise
> past temp=75000 without CPU throttling kicking in.
>
>   watch -n 1 "grep '' /sys/class/thermal/*/temp
>       /sys/class/thermal/*/cur_state
>       /sys/bus/cpu/devices/cpu*/cpufreq/cpuinfo_cur_freq"
>
> /sys/class/thermal/thermal_zone0/temp:82000
> /sys/class/thermal/thermal_zone1/temp:84000
> /sys/class/thermal/thermal_zone2/temp:87000
> /sys/class/thermal/thermal_zone3/temp:84000
> /sys/class/thermal/thermal_zone4/temp:84000
> /sys/class/thermal/cooling_device0/cur_state:0
> /sys/class/thermal/cooling_device1/cur_state:0
> /sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu1/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu2/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu3/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu4/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu5/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu6/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu7/cpufreq/cpuinfo_cur_freq:800000
>
> Reported-by: Zac Crosby <zac@squareup.com>
> Signed-off-by: Benjamin Li <benl@squareup.com>
> ---
> Changes in v2:
> - Reordered sentences in first part of commit message to make sense.
>
>  drivers/thermal/qcom/tsens.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
> index 99a8d9f3e03c..0b6299512e7c 100644
> --- a/drivers/thermal/qcom/tsens.c
> +++ b/drivers/thermal/qcom/tsens.c
> @@ -509,13 +509,16 @@ static irqreturn_t tsens_irq_thread(int irq, void *data)
>                 spin_unlock_irqrestore(&priv->ul_lock, flags);
>
>                 if (trigger) {
> -                       dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
> -                               hw_id, __func__, temp);
> -                       thermal_zone_device_update(s->tzd,
> -                                                  THERMAL_EVENT_UNSPECIFIED);
> +                       if (s->tzd->mode == THERMAL_DEVICE_ENABLED) {
> +                               dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
> +                                       hw_id, __func__, temp);
> +                               thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED);
> +                       } else {
> +                               dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped as zone disabled\n",

Hmm. I don't like the fact that these messages won't be visible to
users in dmesg unless they're debugging. This change puts the SoC in a
potentially unsafe state. Perhaps we should print a ratelimited
message in the logs that we're operating outside safety limits?

> +                                       hw_id, __func__, temp);
> +                       }
>                 } else {
> -                       dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n",
> -                               hw_id, __func__, temp);
> +                       dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n", hw_id, __func__, temp);

Get rid of this hunk, it is unrelated to the above change.

>                 }
>
>                 if (tsens_version(priv) < VER_0_1) {
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
  2022-01-20 11:40 ` Amit Kucheria
@ 2022-01-20 19:32   ` Benjamin Li
  0 siblings, 0 replies; 4+ messages in thread
From: Benjamin Li @ 2022-01-20 19:32 UTC (permalink / raw)
  To: Amit Kucheria
  Cc: Andy Gross, Bjorn Andersson, Rafael J. Wysocki, Daniel Lezcano,
	Zhang Rui, Linux PM list, linux-arm-msm, LKML

On 1/19/22 4:33 PM, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Thanks!

On 1/20/22 3:40 AM, Amit Kucheria wrote:
>> +                               dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
>> +                                       hw_id, __func__, temp);
>> +                               thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED);
>> +                       } else {
>> +                               dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped as zone disabled\n",
> 
> Hmm. I don't like the fact that these messages won't be visible to
> users in dmesg unless they're debugging. This change puts the SoC in a
> potentially unsafe state. Perhaps we should print a ratelimited
> message in the logs that we're operating outside safety limits?

That seems fine, I'll change to dev_info_ratelimited and make the message
a bit scarier.

> 
>> +                                       hw_id, __func__, temp);
>> +                       }
>>                 } else {
>> -                       dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n",
>> -                               hw_id, __func__, temp);
>> +                       dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n", hw_id, __func__, temp);
> 
> Get rid of this hunk, it is unrelated to the above change.

Will do.


> 
>>                 }
>>
>>                 if (tsens_version(priv) < VER_0_1) {
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-20 19:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-14  3:17 [PATCH v2] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting Benjamin Li
2022-01-20  0:35 ` Bjorn Andersson
2022-01-20 11:40 ` Amit Kucheria
2022-01-20 19:32   ` Benjamin Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).