All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ionela Voinescu <ionela.voinescu@arm.com>
To: Quentin Perret <quentin.perret@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Thara Gopinath <thara.gopinath@linaro.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Zhang Rui <rui.zhang@intel.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Amit Kachhap <amit.kachhap@gmail.com>,
	viresh kumar <viresh.kumar@linaro.org>,
	Javi Merino <javi.merino@kernel.org>,
	Eduardo Valentin <edubezval@gmail.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	"open list:THERMAL" <linux-pm@vger.kernel.org>
Subject: Re: [RFC PATCH 0/7] Introduce thermal pressure
Date: Wed, 10 Oct 2018 17:15:01 +0100	[thread overview]
Message-ID: <ad3492f2-f5a9-97ea-8068-d4df2d9288ce@arm.com> (raw)
In-Reply-To: <20181010134755.jrigtogbxwaz2izb@queper01-lin>

Hi guys,

On 10/10/18 14:47, Quentin Perret wrote:
> On Wednesday 10 Oct 2018 at 15:27:57 (+0200), Vincent Guittot wrote:
>> On Wed, 10 Oct 2018 at 15:05, Quentin Perret <quentin.perret@arm.com> wrote:
>>>
>>> On Wednesday 10 Oct 2018 at 14:04:40 (+0200), Vincent Guittot wrote:
>>>> This patchset doesn't touch cpu_capacity_orig and doesn't need to  as
>>>> it assume that the max capacity is unchanged but some capacity is
>>>> momentary stolen by thermal.
>>>>  If you want to reflect immediately all thermal capping change, you
>>>> have to update this field and all related fields and struct around
>>>
>>> I don't follow you here. I never said I wanted to change
>>> cpu_capacity_orig. I don't think we should do that actually. Changing
>>> capacity_of (which is updated during LB IIRC) is just fine. The question
>>> is about what you want to do there: reflect an averaged value or the
>>> instantaneous one.
>>
>> Sorry I though your were speaking about updating this cpu_capacity_orig.
> 
> N/p, communication via email can easily become confusing :-)
> 
>> With using instantaneous max  value in capacity_of(), we are back to
>> the problem raised by Thara that  the value will most probably not
>> reflect the current capping value when it is used in LB, because LB
>> period can quite long on busy CPU (default max value is 32*sd_weight
>> ms)
> 
> But averaging the capping value over time doesn't make LB happen more
> often ... That will help you account for capping that happened in the
> past, but it's not obvious this is actually a good thing. Probably not
> all the time anyway.
> 
> Say a CPU was capped at 50% of it's capacity, then the cap is removed.
> At that point it'll take 100ms+ for the thermal signal to decay and let
> the scheduler know about the newly available capacity. That can probably
> be a performance hit in some use cases ... And the other way around, it
> can also take forever for the scheduler to notice that a CPU has a
> reduced capacity before reacting to it.
> 
> If you want to filter out very short transient capping events to avoid
> over-reacting in the scheduler (is this actually happening ?), then
> maybe the average should be done on the cooling device side or something
> like that ?
> 

I think there isn't just the issue of the *occasional* overreaction of a
thermal governor due to noise in the temperature sensors or some spike
in environmental temperature that determines a delayed reaction in the
scheduler due to when capacity is updated.

I'm seeing a bigger issue for *sustained* high temperatures that are not
treated effectively by governors. Depending on the platform, heat can
be dissipated over longer or shorter periods of time. This can determine
a seesaw effect on the maximum frequency: it determines the temperature
is over a threshold and it starts capping, but heat is not dissipated
quickly enough for that to reflect in the value of the temperature sensor,
so it continues to cap; when the temperature gets to normal, capping
is lifted, which in turn results access to higher OPPs and a return to
high temperatures, etc.

What will happen is that, *depending on platform* and the moment when
capacity is updated, you can see either a CPU with a capacity of 1024, or
let's say 800, or (on hikey960 :)) around 500, and back and forth
between them.

Because of these I tend to think that a regulated (averaged) value of
thermal pressure is better than an instantaneous one. Thermal mitigation
measures are there for the well-being and safety of a device, not for
optimizations so it can and should be allowed to overreact, or have a
delayed reaction. But ping-pong-ing tasks back and forth between CPUs
due to changes in CPU capacity is harmful for performance. What would be
awesome to achieve with this is (close to) optimal use of restricted
capacities of CPUs, and I tend to believe instantaneous and most probably
out of date capacity values would not lead to this.

But this is almost a gut feeling and of course it should be validated on
devices with different thermal characteristics. Given the high variation
between devices with regards to this I'd be reluctant to tie it to the
PELT half life.

Regards,
Ionela.

> Thanks,
> Quentin
> 

  parent reply	other threads:[~2018-10-10 16:15 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20181009162509epcas1p4fdd2e23039caa24586a4a52c6d2e7336@epcas1p4.samsung.com>
2018-10-09 16:24 ` [RFC PATCH 0/7] Introduce thermal pressure Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 1/7] sched/pelt.c: Add option to make load and util calculations frequency invariant Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 2/7] sched/pelt.c: Add support to track thermal pressure Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 3/7] sched: Add infrastructure to store and update instantaneous " Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 4/7] sched: Initialize per cpu thermal pressure structure Thara Gopinath
2018-10-09 16:25   ` [RFC PATCH 5/7] sched/fair: Enable CFS periodic tick to update thermal pressure Thara Gopinath
2018-12-04 15:43     ` Vincent Guittot
2018-10-09 16:25   ` [RFC PATCH 6/7] sched/fair: update cpu_capcity to reflect " Thara Gopinath
2018-10-10  5:57     ` Javi Merino
2018-10-10 14:22       ` Thara Gopinath
2018-10-09 16:25   ` [RFC PATCH 7/7] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
2018-10-10  5:44   ` [RFC PATCH 0/7] Introduce thermal pressure Javi Merino
2018-10-10 14:15     ` Thara Gopinath
2018-10-10  6:17   ` Ingo Molnar
2018-10-10  8:29     ` Quentin Perret
2018-10-10  8:50       ` Vincent Guittot
2018-10-10  9:55         ` Quentin Perret
2018-10-10 10:14           ` Vincent Guittot
2018-10-10 10:36             ` Quentin Perret
2018-10-10 12:04               ` Vincent Guittot
2018-10-10 12:23                 ` Juri Lelli
2018-10-10 12:34                   ` Vincent Guittot
2018-10-10 12:50                     ` Juri Lelli
2018-10-10 13:08                       ` Vincent Guittot
2018-10-10 13:34                         ` Juri Lelli
2018-10-10 13:38                           ` Vincent Guittot
2018-10-10 17:08                           ` Thara Gopinath
2018-10-10 13:11                       ` Quentin Perret
2018-10-10 13:05                 ` Quentin Perret
2018-10-10 13:27                   ` Vincent Guittot
2018-10-10 13:47                     ` Quentin Perret
2018-10-10 15:19                       ` Vincent Guittot
2018-10-10 16:15                       ` Ionela Voinescu [this message]
2018-10-10 17:03           ` Thara Gopinath
2018-10-10 15:43     ` Thara Gopinath
2018-10-16  7:33       ` Ingo Molnar
2018-10-16  9:28         ` Lukasz Luba
2018-10-17 16:21         ` Thara Gopinath
2018-10-18  6:48           ` Ingo Molnar
2018-10-18  7:08             ` Rafael J. Wysocki
2018-10-18  7:50               ` Ingo Molnar
2018-10-18  8:14                 ` Rafael J. Wysocki
2018-10-18  9:35                   ` [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files Daniel Lezcano
2018-10-18  9:35                     ` [PATCH 2/2] sched/cpufreq: Add the SPDX tags Daniel Lezcano
2018-10-18  9:42                     ` [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files Rafael J. Wysocki
2018-10-18  9:42                       ` Rafael J. Wysocki
2018-10-18  9:54                       ` Daniel Lezcano
2018-10-18 10:06                         ` Rafael J. Wysocki
2018-10-18 10:06                           ` Rafael J. Wysocki
2018-10-18 10:13                           ` Daniel Lezcano
2018-10-18  9:45                     ` Daniel Lezcano
2018-10-19  5:24                     ` kbuild test robot
2018-10-19  5:52                     ` kbuild test robot
2018-10-18  9:44                   ` [PATCH V2 " Daniel Lezcano
2018-10-18  9:44                     ` [PATCH V2 2/2] sched/cpufreq: Add the SPDX tags Daniel Lezcano
2018-10-18 16:17             ` [RFC PATCH 0/7] Introduce thermal pressure Thara Gopinath
2018-10-19  8:02               ` Ingo Molnar
2018-10-19 11:29                 ` Valentin Schneider
2018-10-10 15:35   ` Lukasz Luba
2018-10-10 16:54     ` Daniel Lezcano
2018-10-11  7:35       ` Lukasz Luba
2018-10-11  8:23         ` Daniel Lezcano
2018-10-12  9:37           ` Lukasz Luba
2018-10-10 17:30     ` Thara Gopinath
2018-10-11 11:10       ` Lukasz Luba
2018-10-16 17:11         ` Vincent Guittot
2018-10-17 16:24           ` Thara Gopinath
2018-10-18  8:00             ` Lukasz Luba
2018-10-18  8:12           ` Lukasz Luba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad3492f2-f5a9-97ea-8068-d4df2d9288ce@arm.com \
    --to=ionela.voinescu@arm.com \
    --cc=amit.kachhap@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=edubezval@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=javi.merino@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=thara.gopinath@linaro.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.