All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thara Gopinath <thara.gopinath@linaro.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	peterz@infradead.org, rui.zhang@intel.com,
	gregkh@linuxfoundation.org, rafael@kernel.org,
	amit.kachhap@gmail.com, viresh.kumar@linaro.org,
	javi.merino@kernel.org, edubezval@gmail.com,
	daniel.lezcano@linaro.org, linux-pm@vger.kernel.org,
	quentin.perret@arm.com, ionela.voinescu@arm.com,
	vincent.guittot@linaro.org
Subject: Re: [RFC PATCH 0/7] Introduce thermal pressure
Date: Wed, 10 Oct 2018 11:43:27 -0400	[thread overview]
Message-ID: <5BBE1E1F.3030308@linaro.org> (raw)
In-Reply-To: <20181010061751.GA37224@gmail.com>

Hello Ingo,
Thank you for the review.

On 10/10/2018 02:17 AM, Ingo Molnar wrote:
> 
> * Thara Gopinath <thara.gopinath@linaro.org> wrote:
> 
>> Thermal governors can respond to an overheat event for a cpu by
>> capping the cpu's maximum possible frequency. This in turn
>> means that the maximum available compute capacity of the
>> cpu is restricted. But today in linux kernel, in event of maximum
>> frequency capping of a cpu, the maximum available compute
>> capacity of the cpu is not adjusted at all. In other words, scheduler
>> is unware maximum cpu capacity restrictions placed due to thermal
>> activity. This patch series attempts to address this issue.
>> The benefits identif

ied are better task placement among available
>> cpus in event of overheating which in turn leads to better
>> performance numbers.
>>
>> The delta between the maximum possible capacity of a cpu and
>> maximum available capacity of a cpu due to thermal event can
>> be considered as thermal pressure. Instantaneous thermal pressure
>> is hard to record and can sometime be erroneous as there can be mismatch
>> between the actual capping of capacity and scheduler recording it.
>> Thus solution is to have a weighted average per cpu value for thermal
>> pressure over time. The weight reflects the amount of time the cpu has
>> spent at a capped maximum frequency. To accumulate, average and
>> appropriately decay thermal pressure, this patch series uses pelt
>> signals and reuses the available framework that does a similar
>> bookkeeping of rt/dl task utilization.
>>
>> Regarding testing, basic build, boot and sanity testing have been
>> performed on hikey960 mainline kernel with debian file system.
>> Further aobench (An occlusion renderer for benchmarking realworld
>> floating point performance) showed the following results on hikey960
>> with debain.
>>
>>                                         Result          Standard        Standard
>>                                         (Time secs)     Error           Deviation
>> Hikey 960 - no thermal pressure applied 138.67          6.52            11.52%
>> Hikey 960 -  thermal pressure applied   122.37          5.78            11.57%
> 
> Wow, +13% speedup, impressive! We definitely want this outcome.
> 
> I'm wondering what happens if we do not track and decay the thermal load at all at the PELT 
> level, but instantaneously decrease/increase effective CPU capacity in reaction to thermal 
> events we receive from the CPU.

The problem with instantaneous update is that sometimes thermal events
happen at a much faster pace than cpu_capacity is updated in
the scheduler. This means that at the moment when scheduler uses the
value, it might not be correct anymore.

Having said that, today Android common kernel has a solution which
instantaneously updates cpu_capacity in case of a thermal event.
To give a bit of background on the evolution of the solution I have
proposed, below is a time line of analysis I have done.

1.  I started this activity by analyzing the existing framework on
android common kernel. I ran android benchmark tests (Jankbench,
Vellamo, Geekbench) with and without the existing instantaneous update
mechanism. I found that there is no real performance difference to be
observed with an instantaneous updated of cpu_capacity at least in my
test scenarios.
2. Then I developed an algorithm to track, accumulate and decay the
capacity capping i.e an algorithm without using the pelt signals(this
was prior to the new pelt framework in mainline). With this android
benchmarks showed performance improvements. At this point I also ported
the solution to mainline kernel and ran the aobench analysis which again
showed a performance improvement.
3. Finally with the new pelt framework in place, I replaced my algorithm
with the one used for rt and dl utilization tracking which is the
current patch series. I have not been able to run tests with this on
Android yet.

All tests were performed on hikey960.
I have a Google spreadsheet, documenting results at various stages of
analysis. I am not sure how to share it with the group here.


> 
> You describe the averaging as:
> 
>> Instantaneous thermal pressure is hard to record and can sometime be erroneous as there can 
>> be mismatch between the actual capping of capacity and scheduler recording it.
> 
> Not sure I follow the argument here: are there bogus thermal throttling events? If so then
> they are hopefully not frequent enough and should average out over time even if we follow
> it instantly.

No bogus events. It is more like sometimes capping happens at a much
faster rate than cpu_capacity is updated and the scheduler looses these
events.

> 
> I.e. what is 'can sometimes be erroneous', exactly?
> 
> Thanks,
> 
> 	Ingo
> 


-- 
Regards
Thara

  parent reply	other threads:[~2018-10-10 15:43 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20181009162509epcas1p4fdd2e23039caa24586a4a52c6d2e7336@epcas1p4.samsung.com>
2018-10-09 16:24 ` [RFC PATCH 0/7] Introduce thermal pressure Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 1/7] sched/pelt.c: Add option to make load and util calculations frequency invariant Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 2/7] sched/pelt.c: Add support to track thermal pressure Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 3/7] sched: Add infrastructure to store and update instantaneous " Thara Gopinath
2018-10-09 16:24   ` [RFC PATCH 4/7] sched: Initialize per cpu thermal pressure structure Thara Gopinath
2018-10-09 16:25   ` [RFC PATCH 5/7] sched/fair: Enable CFS periodic tick to update thermal pressure Thara Gopinath
2018-12-04 15:43     ` Vincent Guittot
2018-10-09 16:25   ` [RFC PATCH 6/7] sched/fair: update cpu_capcity to reflect " Thara Gopinath
2018-10-10  5:57     ` Javi Merino
2018-10-10 14:22       ` Thara Gopinath
2018-10-09 16:25   ` [RFC PATCH 7/7] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
2018-10-10  5:44   ` [RFC PATCH 0/7] Introduce thermal pressure Javi Merino
2018-10-10 14:15     ` Thara Gopinath
2018-10-10  6:17   ` Ingo Molnar
2018-10-10  8:29     ` Quentin Perret
2018-10-10  8:50       ` Vincent Guittot
2018-10-10  9:55         ` Quentin Perret
2018-10-10 10:14           ` Vincent Guittot
2018-10-10 10:36             ` Quentin Perret
2018-10-10 12:04               ` Vincent Guittot
2018-10-10 12:23                 ` Juri Lelli
2018-10-10 12:34                   ` Vincent Guittot
2018-10-10 12:50                     ` Juri Lelli
2018-10-10 13:08                       ` Vincent Guittot
2018-10-10 13:34                         ` Juri Lelli
2018-10-10 13:38                           ` Vincent Guittot
2018-10-10 17:08                           ` Thara Gopinath
2018-10-10 13:11                       ` Quentin Perret
2018-10-10 13:05                 ` Quentin Perret
2018-10-10 13:27                   ` Vincent Guittot
2018-10-10 13:47                     ` Quentin Perret
2018-10-10 15:19                       ` Vincent Guittot
2018-10-10 16:15                       ` Ionela Voinescu
2018-10-10 17:03           ` Thara Gopinath
2018-10-10 15:43     ` Thara Gopinath [this message]
2018-10-16  7:33       ` Ingo Molnar
2018-10-16  9:28         ` Lukasz Luba
2018-10-17 16:21         ` Thara Gopinath
2018-10-18  6:48           ` Ingo Molnar
2018-10-18  7:08             ` Rafael J. Wysocki
2018-10-18  7:50               ` Ingo Molnar
2018-10-18  8:14                 ` Rafael J. Wysocki
2018-10-18  9:35                   ` [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files Daniel Lezcano
2018-10-18  9:35                     ` [PATCH 2/2] sched/cpufreq: Add the SPDX tags Daniel Lezcano
2018-10-18  9:42                     ` [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files Rafael J. Wysocki
2018-10-18  9:42                       ` Rafael J. Wysocki
2018-10-18  9:54                       ` Daniel Lezcano
2018-10-18 10:06                         ` Rafael J. Wysocki
2018-10-18 10:06                           ` Rafael J. Wysocki
2018-10-18 10:13                           ` Daniel Lezcano
2018-10-18  9:45                     ` Daniel Lezcano
2018-10-19  5:24                     ` kbuild test robot
2018-10-19  5:52                     ` kbuild test robot
2018-10-18  9:44                   ` [PATCH V2 " Daniel Lezcano
2018-10-18  9:44                     ` [PATCH V2 2/2] sched/cpufreq: Add the SPDX tags Daniel Lezcano
2018-10-18 16:17             ` [RFC PATCH 0/7] Introduce thermal pressure Thara Gopinath
2018-10-19  8:02               ` Ingo Molnar
2018-10-19 11:29                 ` Valentin Schneider
2018-10-10 15:35   ` Lukasz Luba
2018-10-10 16:54     ` Daniel Lezcano
2018-10-11  7:35       ` Lukasz Luba
2018-10-11  8:23         ` Daniel Lezcano
2018-10-12  9:37           ` Lukasz Luba
2018-10-10 17:30     ` Thara Gopinath
2018-10-11 11:10       ` Lukasz Luba
2018-10-16 17:11         ` Vincent Guittot
2018-10-17 16:24           ` Thara Gopinath
2018-10-18  8:00             ` Lukasz Luba
2018-10-18  8:12           ` Lukasz Luba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5BBE1E1F.3030308@linaro.org \
    --to=thara.gopinath@linaro.org \
    --cc=amit.kachhap@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=edubezval@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ionela.voinescu@arm.com \
    --cc=javi.merino@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.