linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
@ 2021-02-17  5:48 Viresh Kumar
  2021-02-17 10:29 ` Lukasz Luba
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Viresh Kumar @ 2021-02-17  5:48 UTC (permalink / raw)
  To: Amit Daniel Kachhap, Daniel Lezcano, Viresh Kumar, Javi Merino,
	Zhang Rui, Amit Kucheria, Peter Zijlstra (Intel),
	Ingo Molnar, Thara Gopinath
  Cc: Vincent Guittot, v5 . 7+, linux-pm, linux-kernel

freq_qos_update_request() returns 1 if the effective constraint value
has changed, 0 if the effective constraint value has not changed, or a
negative error code on failures.

The frequency constraints for CPUs can be set by different parts of the
kernel. If the maximum frequency constraint set by other parts of the
kernel are set at a lower value than the one corresponding to cooling
state 0, then we will never be able to cool down the system as
freq_qos_update_request() will keep on returning 0 and we will skip
updating cpufreq_state and thermal pressure.

Fix that by doing the updates even in the case where
freq_qos_update_request() returns 0, as we have effectively set the
constraint to a new value even if the consolidated value of the
actual constraint is unchanged because of external factors.

Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
Hi Guys,

This needs to go in 5.12-rc.

Thara, please give this a try and give your tested-by :).

 drivers/thermal/cpufreq_cooling.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index f5af2571f9b7..10af3341e5ea 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -485,7 +485,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 	frequency = get_state_freq(cpufreq_cdev, state);
 
 	ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
-	if (ret > 0) {
+	if (ret >= 0) {
 		cpufreq_cdev->cpufreq_state = state;
 		cpus = cpufreq_cdev->policy->cpus;
 		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
  2021-02-17  5:48 [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error Viresh Kumar
@ 2021-02-17 10:29 ` Lukasz Luba
  2021-02-17 10:39   ` Viresh Kumar
  2021-02-17 14:21 ` Rafael J. Wysocki
  2021-02-17 15:38 ` Thara Gopinath
  2 siblings, 1 reply; 6+ messages in thread
From: Lukasz Luba @ 2021-02-17 10:29 UTC (permalink / raw)
  To: Viresh Kumar, Amit Kucheria
  Cc: Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Peter Zijlstra (Intel),
	Ingo Molnar, Thara Gopinath, Vincent Guittot, v5 . 7+,
	linux-pm, linux-kernel

Hi Viresh,

On 2/17/21 5:48 AM, Viresh Kumar wrote:
> freq_qos_update_request() returns 1 if the effective constraint value
> has changed, 0 if the effective constraint value has not changed, or a
> negative error code on failures.
> 
> The frequency constraints for CPUs can be set by different parts of the
> kernel. If the maximum frequency constraint set by other parts of the
> kernel are set at a lower value than the one corresponding to cooling
> state 0, then we will never be able to cool down the system as
> freq_qos_update_request() will keep on returning 0 and we will skip
> updating cpufreq_state and thermal pressure.

To be precised, thermal pressure signal is not so important in this
mechanism and the 'cpufreq_state' has changed recently:

236761f19a4f373354  thermal/drivers/cpufreq_cooling: Update 
cpufreq_state only if state has changed

> 
> Fix that by doing the updates even in the case where
> freq_qos_update_request() returns 0, as we have effectively set the
> constraint to a new value even if the consolidated value of the
> actual constraint is unchanged because of external factors.
> 
> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")

I'm not sure if that f12e4f is the root cause.

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Hi Guys,
> 
> This needs to go in 5.12-rc.
> 
> Thara, please give this a try and give your tested-by :).
> 
>   drivers/thermal/cpufreq_cooling.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)


Anyway, the fix LGTM. I will have to make sure that I'm CC'ed for these
topic, so I can have a look (I missed somehow 236761f19)

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
  2021-02-17 10:29 ` Lukasz Luba
@ 2021-02-17 10:39   ` Viresh Kumar
  2021-02-17 10:45     ` Lukasz Luba
  0 siblings, 1 reply; 6+ messages in thread
From: Viresh Kumar @ 2021-02-17 10:39 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Amit Kucheria, Amit Daniel Kachhap, Daniel Lezcano, Javi Merino,
	Zhang Rui, Peter Zijlstra (Intel),
	Ingo Molnar, Thara Gopinath, Vincent Guittot, v5 . 7+,
	linux-pm, linux-kernel

On 17-02-21, 10:29, Lukasz Luba wrote:
> On 2/17/21 5:48 AM, Viresh Kumar wrote:
> > freq_qos_update_request() returns 1 if the effective constraint value
> > has changed, 0 if the effective constraint value has not changed, or a
> > negative error code on failures.
> > 
> > The frequency constraints for CPUs can be set by different parts of the
> > kernel. If the maximum frequency constraint set by other parts of the
> > kernel are set at a lower value than the one corresponding to cooling
> > state 0, then we will never be able to cool down the system as
> > freq_qos_update_request() will keep on returning 0 and we will skip
> > updating cpufreq_state and thermal pressure.
> 
> To be precised, thermal pressure signal is not so important in this
> mechanism and the 'cpufreq_state' has changed recently:

Right, I wasn't concerned only about no thermal cooling, but both
thermal cooling and pressure.

> 236761f19a4f373354  thermal/drivers/cpufreq_cooling: Update cpufreq_state
> only if state has changed

This moved the assignment to a more logical place for me, i.e. not to
do that on errors, just that the block in which it landed may not get
called at all :(

> > Fix that by doing the updates even in the case where
> > freq_qos_update_request() returns 0, as we have effectively set the
> > constraint to a new value even if the consolidated value of the
> > actual constraint is unchanged because of external factors.
> > 
> > Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> > Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> > Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
> 
> I'm not sure if that f12e4f is the root cause.

Hmm, depends on how we define the problem :)

If this was just about thermal-cooling not happening, then may be yes,
but to me it is rather about mishandled return value of
freq_qos_update_request() which has more than one side effects and so
I went for the main commit.

This is also important as f12e4f66ab6a got merged in 5.7 and 236761f19
merged in 5.11 and this patch needs to get applied in stable kernels
since 5.7 to fix it all.

> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> > Hi Guys,
> > 
> > This needs to go in 5.12-rc.
> > 
> > Thara, please give this a try and give your tested-by :).
> > 
> >   drivers/thermal/cpufreq_cooling.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> Anyway, the fix LGTM. I will have to make sure that I'm CC'ed for these
> topic, so I can have a look (I missed somehow 236761f19)
> 
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
> Tested-by: Lukasz Luba <lukasz.luba@arm.com>

Thanks.

-- 
viresh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
  2021-02-17 10:39   ` Viresh Kumar
@ 2021-02-17 10:45     ` Lukasz Luba
  0 siblings, 0 replies; 6+ messages in thread
From: Lukasz Luba @ 2021-02-17 10:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Amit Kucheria, Amit Daniel Kachhap, Daniel Lezcano, Javi Merino,
	Zhang Rui, Peter Zijlstra (Intel),
	Ingo Molnar, Thara Gopinath, Vincent Guittot, v5 . 7+,
	linux-pm, linux-kernel



On 2/17/21 10:39 AM, Viresh Kumar wrote:
> On 17-02-21, 10:29, Lukasz Luba wrote:
>> On 2/17/21 5:48 AM, Viresh Kumar wrote:
>>> freq_qos_update_request() returns 1 if the effective constraint value
>>> has changed, 0 if the effective constraint value has not changed, or a
>>> negative error code on failures.
>>>
>>> The frequency constraints for CPUs can be set by different parts of the
>>> kernel. If the maximum frequency constraint set by other parts of the
>>> kernel are set at a lower value than the one corresponding to cooling
>>> state 0, then we will never be able to cool down the system as
>>> freq_qos_update_request() will keep on returning 0 and we will skip
>>> updating cpufreq_state and thermal pressure.
>>
>> To be precised, thermal pressure signal is not so important in this
>> mechanism and the 'cpufreq_state' has changed recently:
> 
> Right, I wasn't concerned only about no thermal cooling, but both
> thermal cooling and pressure.
> 
>> 236761f19a4f373354  thermal/drivers/cpufreq_cooling: Update cpufreq_state
>> only if state has changed
> 
> This moved the assignment to a more logical place for me, i.e. not to
> do that on errors, just that the block in which it landed may not get
> called at all :(
> 
>>> Fix that by doing the updates even in the case where
>>> freq_qos_update_request() returns 0, as we have effectively set the
>>> constraint to a new value even if the consolidated value of the
>>> actual constraint is unchanged because of external factors.
>>>
>>> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
>>> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
>>> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
>>
>> I'm not sure if that f12e4f is the root cause.
> 
> Hmm, depends on how we define the problem :)
> 
> If this was just about thermal-cooling not happening, then may be yes,
> but to me it is rather about mishandled return value of
> freq_qos_update_request() which has more than one side effects and so
> I went for the main commit.
> 
> This is also important as f12e4f66ab6a got merged in 5.7 and 236761f19
> merged in 5.11 and this patch needs to get applied in stable kernels
> since 5.7 to fix it all.
> 

'to fix it all' - I agree

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
  2021-02-17  5:48 [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error Viresh Kumar
  2021-02-17 10:29 ` Lukasz Luba
@ 2021-02-17 14:21 ` Rafael J. Wysocki
  2021-02-17 15:38 ` Thara Gopinath
  2 siblings, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2021-02-17 14:21 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, Peter Zijlstra (Intel),
	Ingo Molnar, Thara Gopinath, Vincent Guittot, v5 . 7+,
	Linux PM, Linux Kernel Mailing List

On Wed, Feb 17, 2021 at 6:50 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> freq_qos_update_request() returns 1 if the effective constraint value
> has changed, 0 if the effective constraint value has not changed, or a
> negative error code on failures.
>
> The frequency constraints for CPUs can be set by different parts of the
> kernel. If the maximum frequency constraint set by other parts of the
> kernel are set at a lower value than the one corresponding to cooling
> state 0, then we will never be able to cool down the system as
> freq_qos_update_request() will keep on returning 0 and we will skip
> updating cpufreq_state and thermal pressure.
>
> Fix that by doing the updates even in the case where
> freq_qos_update_request() returns 0, as we have effectively set the
> constraint to a new value even if the consolidated value of the
> actual constraint is unchanged because of external factors.
>
> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> Hi Guys,
>
> This needs to go in 5.12-rc.
>
> Thara, please give this a try and give your tested-by :).
>
>  drivers/thermal/cpufreq_cooling.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index f5af2571f9b7..10af3341e5ea 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -485,7 +485,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>         frequency = get_state_freq(cpufreq_cdev, state);
>
>         ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
> -       if (ret > 0) {
> +       if (ret >= 0) {
>                 cpufreq_cdev->cpufreq_state = state;
>                 cpus = cpufreq_cdev->policy->cpus;
>                 max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
> --
> 2.25.0.rc1.19.g042ed3e048af
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
  2021-02-17  5:48 [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error Viresh Kumar
  2021-02-17 10:29 ` Lukasz Luba
  2021-02-17 14:21 ` Rafael J. Wysocki
@ 2021-02-17 15:38 ` Thara Gopinath
  2 siblings, 0 replies; 6+ messages in thread
From: Thara Gopinath @ 2021-02-17 15:38 UTC (permalink / raw)
  To: Viresh Kumar, Amit Daniel Kachhap, Daniel Lezcano, Javi Merino,
	Zhang Rui, Amit Kucheria, Peter Zijlstra (Intel),
	Ingo Molnar
  Cc: Vincent Guittot, v5 . 7+, linux-pm, linux-kernel



On 2/17/21 12:48 AM, Viresh Kumar wrote:
> freq_qos_update_request() returns 1 if the effective constraint value
> has changed, 0 if the effective constraint value has not changed, or a
> negative error code on failures.
> 
> The frequency constraints for CPUs can be set by different parts of the
> kernel. If the maximum frequency constraint set by other parts of the
> kernel are set at a lower value than the one corresponding to cooling
> state 0, then we will never be able to cool down the system as
> freq_qos_update_request() will keep on returning 0 and we will skip
> updating cpufreq_state and thermal pressure.
> 
> Fix that by doing the updates even in the case where
> freq_qos_update_request() returns 0, as we have effectively set the
> constraint to a new value even if the consolidated value of the
> actual constraint is unchanged because of external factors.
> 
> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Hi Guys,
> 
> This needs to go in 5.12-rc.
> 
> Thara, please give this a try and give your tested-by :).

It fixes the thermal runaway issue on sdm845 that I had reported. So,

Tested-by: Thara Gopinath<thara.gopinath@linaro.org>

> 
>   drivers/thermal/cpufreq_cooling.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index f5af2571f9b7..10af3341e5ea 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -485,7 +485,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>   	frequency = get_state_freq(cpufreq_cdev, state);
>   
>   	ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
> -	if (ret > 0) {
> +	if (ret >= 0) {
>   		cpufreq_cdev->cpufreq_state = state;
>   		cpus = cpufreq_cdev->policy->cpus;
>   		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
> 

-- 
Warm Regards
Thara

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-17 15:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-17  5:48 [PATCH] thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error Viresh Kumar
2021-02-17 10:29 ` Lukasz Luba
2021-02-17 10:39   ` Viresh Kumar
2021-02-17 10:45     ` Lukasz Luba
2021-02-17 14:21 ` Rafael J. Wysocki
2021-02-17 15:38 ` Thara Gopinath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).