linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukasz Luba <lukasz.luba@arm.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	amitk@kernel.org, rui.zhang@intel.com
Subject: Re: [PATCH 1/2] thermal: power_allocator: maintain the device statistics from going stale
Date: Tue, 6 Apr 2021 11:39:08 +0100	[thread overview]
Message-ID: <d74b8e8e-64b0-d724-d572-f98eb597a60e@arm.com> (raw)
In-Reply-To: <1a0b6e4a-1717-91d6-a664-d50e6aa8a809@linaro.org>



On 4/6/21 11:16 AM, Daniel Lezcano wrote:
> On 06/04/2021 10:44, Lukasz Luba wrote:
>>
>>
>> On 4/2/21 4:54 PM, Daniel Lezcano wrote:
>>> On 31/03/2021 18:33, Lukasz Luba wrote:
>>>> When the temperature is below the first activation trip point the
>>>> cooling
>>>> devices are not checked, so they cannot maintain fresh statistics. It
>>>> leads into the situation, when temperature crosses first trip point, the
>>>> statistics are stale and show state for very long period.
>>>
>>> Can you elaborate the statistics you are referring to ?
>>>
>>> I can understand the pid controller needs temperature but I don't
>>> understand the statistics with the cooling device.
>>>
>>>
>>
>> The allocate_power() calls cooling_ops->get_requested_power(),
>> which is for CPUs cpufreq_get_requested_power() function.
>> In that cpufreq implementation for !SMP we still has the
>> issue of stale statistics. Viresh, when he introduced the usage
>> of sched_cpu_util(), he fixed that 'long non-meaningful period'
>> of the broken statistics and it can be found since v5.12-rc1.
>>
>> The bug is still there for the !SMP. Look at the way how idle time
>> is calculated in get_load() [1]. It relies on 'idle_time->timestamp'
>> for calculating the period. But when this function is not called,
>> the value can be very far away in time, e.g. a few seconds back,
>> when the last allocate_power() was called.
>>
>> The bug is there for both SMP and !SMP [2] for older kernels, which can
>> be used in Android or ChromeOS. I've been considering to put this simple
>> IPA fix also to some other kernels, because Viresh's change is more
>> a 'feature' and does not cover both platforms.
> 
> Ok, so IIUC, the temperature is needed as well as the power to do the
> connection for the pid loop (temp vs power).
> 
> I'm trying to figure out how to delegate the mitigation switch on/off to
> the core code instead of the governor (and kill tz->passive) but how
> things are implemented for the IPA makes this very difficult.
> 
> AFAICT, this fix is not correct.
> 
> If the temperature is below the 'switch_on_temp' the passive is set to
> zero and the throttle function is not called anymore (assuming it is
> interrupt mode not polling mode), so the power number is not updated also.

IPA doesn't work well in asynchronous mode, because it needs this fixed
length for the period. I have been experimenting with tsens IRQs and
also both fixed polling + sporadic asynchronous IRQs, trying to fix it
and have 'predictable' IPA, but without a luck.
IPA needs synchronous polling events like we have for high temp e.g.
100ms and low temp e.g. 1000ms. The asynchronous events are root of
undesirable states (too low or to high) calculated and set for cooling
devices. It's also harder to escape these states with asynchronous
events. I don't recommend using IPA with asynchronous events from IRQs,
for now. It might change in future, though.

The patch 2/2 should calm down the unnecessary updates/notifications so
your request.
The longer polling, which we have for temperature below 'switch_on_temp'
(e.g. every 1sec) shouldn't harm the performance of the system, but
definitely makes IPA more predictable and stable.


  reply	other threads:[~2021-04-06 10:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 16:33 [PATCH 0/2] Improve IPA mechanisms in low temperature state Lukasz Luba
2021-03-31 16:33 ` [PATCH 1/2] thermal: power_allocator: maintain the device statistics from going stale Lukasz Luba
2021-04-02 15:54   ` Daniel Lezcano
2021-04-06  8:44     ` Lukasz Luba
2021-04-06 10:16       ` Daniel Lezcano
2021-04-06 10:39         ` Lukasz Luba [this message]
2021-04-06 11:24           ` Daniel Lezcano
2021-04-06 12:25             ` Lukasz Luba
2021-04-06 14:32               ` Daniel Lezcano
2021-04-06 18:38                 ` Lukasz Luba
2021-04-06 19:45                   ` Daniel Lezcano
2021-03-31 16:33 ` [PATCH 2/2] thermal: power_allocator: update once cooling devices when temp is low Lukasz Luba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d74b8e8e-64b0-d724-d572-f98eb597a60e@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=amitk@kernel.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rui.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).