linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
@ 2022-01-20 20:01 Benjamin Li
  2022-02-25 14:02 ` Daniel Lezcano
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Li @ 2022-01-20 20:01 UTC (permalink / raw)
  To: Amit Kucheria, Thara Gopinath
  Cc: Bjorn Andersson, Zac Crosby, Benjamin Li, Andy Gross,
	Rafael J. Wysocki, Daniel Lezcano, Zhang Rui, linux-arm-msm,
	linux-pm, linux-kernel

'echo disabled > .../thermal_zoneX/mode' will disable the thermal core's
polling mechanism to check for threshold trips. This is used sometimes to
run performance test cases.

However, tsens supports an interrupt mechanism to receive notification of
trips, implemented in commit 634e11d5b450 ("drivers: thermal: tsens: Add
interrupt support").

Currently the thermal zone mode that's set by userspace is not checked
before propagating threshold trip events from IRQs. Let's fix this to
restore the abilty to disable thermal throttling at runtime.

====================

Tested on MSM8939 running 5.16.0. This platform has 8 cores; the first
four thermal zones control cpu0-3 and the last zone is for the other four
CPUs together.

  for f in /sys/class/thermal/thermal_zone*; do
    echo "disabled" > $f/mode
    echo $f | paste - $f/type $f/mode
  done

/sys/class/thermal/thermal_zone0	cpu0-thermal	disabled
/sys/class/thermal/thermal_zone1	cpu1-thermal	disabled
/sys/class/thermal/thermal_zone2	cpu2-thermal	disabled
/sys/class/thermal/thermal_zone3	cpu3-thermal	disabled
/sys/class/thermal/thermal_zone4	cpu4567-thermal	disabled

With mitigation thresholds at 75 degC and load running, we can now cruise
past temp=75000 without CPU throttling kicking in.

  watch -n 1 "grep '' /sys/class/thermal/*/temp
      /sys/class/thermal/*/cur_state
      /sys/bus/cpu/devices/cpu*/cpufreq/cpuinfo_cur_freq"

/sys/class/thermal/thermal_zone0/temp:82000
/sys/class/thermal/thermal_zone1/temp:84000
/sys/class/thermal/thermal_zone2/temp:87000
/sys/class/thermal/thermal_zone3/temp:84000
/sys/class/thermal/thermal_zone4/temp:84000
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device1/cur_state:0
/sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu1/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu2/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu3/cpufreq/cpuinfo_cur_freq:1113600
/sys/bus/cpu/devices/cpu4/cpufreq/cpuinfo_cur_freq:800000
/sys/bus/cpu/devices/cpu5/cpufreq/cpuinfo_cur_freq:800000
/sys/bus/cpu/devices/cpu6/cpufreq/cpuinfo_cur_freq:800000
/sys/bus/cpu/devices/cpu7/cpufreq/cpuinfo_cur_freq:800000

Reported-by: Zac Crosby <zac@squareup.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Benjamin Li <benl@squareup.com>
---
Changes in v3:
- Upgraded logging to dev_info_ratelimited and revised log message.
- Remove unrelated hunk.

Some drivers that support thermal zone disabling implement a set_mode
operation and simply disable the sensor or the relevant IRQ(s), so they
actually don't log anything when zones are disabled. These drivers are
imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c.

For tsens.c, implementing a change_mode would require migrating the driver
from devm_thermal_zone_of_sensor_register to thermal_zone_device_register
(or updating thermal_of.c to add a change_mode operation in thermal_zone_
of_device_ops).

stm_thermal.c seems to use this patch's model of not disabling IRQs when
the zone is disabled (they still perform the thermal_zone_device_update
upon IRQ, but return -EAGAIN from their get_temp).

Changes in v2:
- Reordered sentences in first part of commit message to make sense.

 drivers/thermal/qcom/tsens.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
index 99a8d9f3e03c..dd0002829536 100644
--- a/drivers/thermal/qcom/tsens.c
+++ b/drivers/thermal/qcom/tsens.c
@@ -509,10 +509,14 @@ static irqreturn_t tsens_irq_thread(int irq, void *data)
 		spin_unlock_irqrestore(&priv->ul_lock, flags);
 
 		if (trigger) {
-			dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
-				hw_id, __func__, temp);
-			thermal_zone_device_update(s->tzd,
-						   THERMAL_EVENT_UNSPECIFIED);
+			if (s->tzd->mode == THERMAL_DEVICE_ENABLED) {
+				dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
+					hw_id, __func__, temp);
+				thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED);
+			} else {
+				dev_info_ratelimited(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped - zone disabled, operating outside of safety limits!\n",
+					hw_id, __func__, temp);
+			}
 		} else {
 			dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n",
 				hw_id, __func__, temp);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
  2022-01-20 20:01 [PATCH v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting Benjamin Li
@ 2022-02-25 14:02 ` Daniel Lezcano
  2022-02-25 16:46   ` Benjamin Li
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Lezcano @ 2022-02-25 14:02 UTC (permalink / raw)
  To: Benjamin Li, Amit Kucheria, Thara Gopinath
  Cc: Bjorn Andersson, Zac Crosby, Andy Gross, Rafael J. Wysocki,
	Zhang Rui, linux-arm-msm, linux-pm, linux-kernel

On 20/01/2022 21:01, Benjamin Li wrote:
> 'echo disabled > .../thermal_zoneX/mode' will disable the thermal core's
> polling mechanism to check for threshold trips. This is used sometimes to
> run performance test cases.
> 
> However, tsens supports an interrupt mechanism to receive notification of
> trips, implemented in commit 634e11d5b450 ("drivers: thermal: tsens: Add
> interrupt support").
> 
> Currently the thermal zone mode that's set by userspace is not checked
> before propagating threshold trip events from IRQs. Let's fix this to
> restore the abilty to disable thermal throttling at runtime.
> 
> ====================
> 
> Tested on MSM8939 running 5.16.0. This platform has 8 cores; the first
> four thermal zones control cpu0-3 and the last zone is for the other four
> CPUs together.
> 
>    for f in /sys/class/thermal/thermal_zone*; do
>      echo "disabled" > $f/mode
>      echo $f | paste - $f/type $f/mode
>    done
> 
> /sys/class/thermal/thermal_zone0	cpu0-thermal	disabled
> /sys/class/thermal/thermal_zone1	cpu1-thermal	disabled
> /sys/class/thermal/thermal_zone2	cpu2-thermal	disabled
> /sys/class/thermal/thermal_zone3	cpu3-thermal	disabled
> /sys/class/thermal/thermal_zone4	cpu4567-thermal	disabled
> 
> With mitigation thresholds at 75 degC and load running, we can now cruise
> past temp=75000 without CPU throttling kicking in.
> 
>    watch -n 1 "grep '' /sys/class/thermal/*/temp
>        /sys/class/thermal/*/cur_state
>        /sys/bus/cpu/devices/cpu*/cpufreq/cpuinfo_cur_freq"
> 
> /sys/class/thermal/thermal_zone0/temp:82000
> /sys/class/thermal/thermal_zone1/temp:84000
> /sys/class/thermal/thermal_zone2/temp:87000
> /sys/class/thermal/thermal_zone3/temp:84000
> /sys/class/thermal/thermal_zone4/temp:84000
> /sys/class/thermal/cooling_device0/cur_state:0
> /sys/class/thermal/cooling_device1/cur_state:0
> /sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu1/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu2/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu3/cpufreq/cpuinfo_cur_freq:1113600
> /sys/bus/cpu/devices/cpu4/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu5/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu6/cpufreq/cpuinfo_cur_freq:800000
> /sys/bus/cpu/devices/cpu7/cpufreq/cpuinfo_cur_freq:800000
> 
> Reported-by: Zac Crosby <zac@squareup.com>
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> Signed-off-by: Benjamin Li <benl@squareup.com>
> ---
> Changes in v3:
> - Upgraded logging to dev_info_ratelimited and revised log message.
> - Remove unrelated hunk.
> 
> Some drivers that support thermal zone disabling implement a set_mode
> operation and simply disable the sensor or the relevant IRQ(s), so they
> actually don't log anything when zones are disabled. These drivers are
> imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c.
> 
> For tsens.c, implementing a change_mode would require migrating the driver
> from devm_thermal_zone_of_sensor_register to thermal_zone_device_register
> (or updating thermal_of.c to add a change_mode operation in thermal_zone_
> of_device_ops).
> 
> stm_thermal.c seems to use this patch's model of not disabling IRQs when
> the zone is disabled (they still perform the thermal_zone_device_update
> upon IRQ, but return -EAGAIN from their get_temp).

What is the concern by changing the core code to have a correct handling 
of the disabled / enabled state in this driver ? (and by this way give 
the opportunity to other drivers to fix their code)


> Changes in v2:
> - Reordered sentences in first part of commit message to make sense.
> 
>   drivers/thermal/qcom/tsens.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
> index 99a8d9f3e03c..dd0002829536 100644
> --- a/drivers/thermal/qcom/tsens.c
> +++ b/drivers/thermal/qcom/tsens.c
> @@ -509,10 +509,14 @@ static irqreturn_t tsens_irq_thread(int irq, void *data)
>   		spin_unlock_irqrestore(&priv->ul_lock, flags);
>   
>   		if (trigger) {
> -			dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
> -				hw_id, __func__, temp);
> -			thermal_zone_device_update(s->tzd,
> -						   THERMAL_EVENT_UNSPECIFIED);
> +			if (s->tzd->mode == THERMAL_DEVICE_ENABLED) {
> +				dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n",
> +					hw_id, __func__, temp);
> +				thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED);
> +			} else {
> +				dev_info_ratelimited(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped - zone disabled, operating outside of safety limits!\n",
> +					hw_id, __func__, temp);
> +			}
>   		} else {
>   			dev_dbg(priv->dev, "[%u] %s: no violation:  %d\n",
>   				hw_id, __func__, temp);


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
  2022-02-25 14:02 ` Daniel Lezcano
@ 2022-02-25 16:46   ` Benjamin Li
  2022-02-25 17:41     ` Daniel Lezcano
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Li @ 2022-02-25 16:46 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Amit Kucheria, Thara Gopinath, Bjorn Andersson, Zac Crosby,
	Andy Gross, Rafael J. Wysocki, Zhang Rui, linux-arm-msm,
	linux-pm, linux-kernel

On Fri, Feb 25, 2022 at 6:02 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> > Some drivers that support thermal zone disabling implement a set_mode
> > operation and simply disable the sensor or the relevant IRQ(s), so they
> > actually don't log anything when zones are disabled. These drivers are
> > imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c.
> >
> > For tsens.c, implementing a change_mode would require migrating the driver
> > from devm_thermal_zone_of_sensor_register to thermal_zone_device_register
> > (or updating thermal_of.c to add a change_mode operation in thermal_zone_
> > of_device_ops).
> >
> > stm_thermal.c seems to use this patch's model of not disabling IRQs when
> > the zone is disabled (they still perform the thermal_zone_device_update
> > upon IRQ, but return -EAGAIN from their get_temp).
>
> What is the concern by changing the core code to have a correct handling
> of the disabled / enabled state in this driver ? (and by this way give
> the opportunity to other drivers to fix their code)'

It seems fine, is that the preference? Updating thermal_of.c to add a
change_mode
operation in thermal_zone_of_device_ops?

Ben

>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting
  2022-02-25 16:46   ` Benjamin Li
@ 2022-02-25 17:41     ` Daniel Lezcano
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Lezcano @ 2022-02-25 17:41 UTC (permalink / raw)
  To: Benjamin Li
  Cc: Amit Kucheria, Thara Gopinath, Bjorn Andersson, Zac Crosby,
	Andy Gross, Rafael J. Wysocki, Zhang Rui, linux-arm-msm,
	linux-pm, linux-kernel

On 25/02/2022 17:46, Benjamin Li wrote:
> On Fri, Feb 25, 2022 at 6:02 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>> Some drivers that support thermal zone disabling implement a set_mode
>>> operation and simply disable the sensor or the relevant IRQ(s), so they
>>> actually don't log anything when zones are disabled. These drivers are
>>> imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c.
>>>
>>> For tsens.c, implementing a change_mode would require migrating the driver
>>> from devm_thermal_zone_of_sensor_register to thermal_zone_device_register
>>> (or updating thermal_of.c to add a change_mode operation in thermal_zone_
>>> of_device_ops).
>>>
>>> stm_thermal.c seems to use this patch's model of not disabling IRQs when
>>> the zone is disabled (they still perform the thermal_zone_device_update
>>> upon IRQ, but return -EAGAIN from their get_temp).
>>
>> What is the concern by changing the core code to have a correct handling
>> of the disabled / enabled state in this driver ? (and by this way give
>> the opportunity to other drivers to fix their code)'
> 
> It seems fine, is that the preference? Updating thermal_of.c to add a
> change_mode
> operation in thermal_zone_of_device_ops?

I'm not a big fan of this duplicated ops structure but preferably it 
would be better to put it there (except if you see a better way to do it)



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-02-25 17:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20 20:01 [PATCH v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting Benjamin Li
2022-02-25 14:02 ` Daniel Lezcano
2022-02-25 16:46   ` Benjamin Li
2022-02-25 17:41     ` Daniel Lezcano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).