linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@suse.de>,
	Len Brown <lenb@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Mel Gorman <mgorman@techsingularity.net>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC/RFT] [PATCH 02/10] cpufreq: intel_pstate: Conditional frequency invariant accounting
Date: Fri, 18 May 2018 11:57:42 +0100	[thread overview]
Message-ID: <20180518105742.GN30654@e110439-lin> (raw)
In-Reply-To: <20180517182803.GY12217@hirez.programming.kicks-ass.net>

On 17-May 20:28, Peter Zijlstra wrote:
> On Thu, May 17, 2018 at 06:56:37PM +0200, Rafael J. Wysocki wrote:
> > On Thu, May 17, 2018 at 6:42 PM, Srinivas Pandruvada
> 
> > > What will happen if we look at all core turbo as max and cap any
> > > utilization above this to 1024?
> > 
> > I was going to suggest that.
> 
> To the basic premise behind all our frequency scaling is that there's a
> linear relation between utilization and frequency, where u=1 gets us the
> fastest.
> 
> Now, we all know this is fairly crude, but it is what we work with.
> 
> OTOH, the whole premise of turbo is that you don't in fact know what the
> fastest is, and in that respect setting u=1 at the guaranteed or
> sustainable frequency makes sense.

Looking from the FAIR class standpoint, we can also argue that
although you know that the max possible utilization is 1024, you are
not always granted to reach it because of RT and Interrupts pressure.
Or in big.LITTLE systems, because of the arch scaling factor.

Is it not something quite similar to the problem of having
   "some not always available OPPs"
?

To track these "capacity limitations" we already have the two
different concepts of cpu_capacity_orig and cpu_capacity.

Are not "thermal constraints" and "some not always available OPPs"
just another form of "capacity limitations".
They are:
 - transient
   exactly like RT and Interrupt pressure
 - HW related
   which is the main different wrt RT and Interrupt pressure

But, apart from this last point (i.e.they have an HW related "nature"),
IMHO they seems quite similar concept... which are already addresses,
although only within the FAIR class perhaps.

Thus, my simple (maybe dumb) questions are:
- why can't we just fold turbo boost frequency into the existing concepts?
- what are the limitations of such a "simple" approach?

IOW: utilization always measures wrt the maximum possible capacity
(i.e. max turbo mode) and then there is a way to know what is, on each
CPU and at every decision time, the actual "transient maximum" we can
expect to reach for a "reasonable" future time.

> The direct concequence of allowing clipping is that u=1 doesn't select
> the highest frequency, but since we don't select anything anyway
> (p-code does that for us) all we really need is to have u=1 above that
> turbo activation point you mentioned.

If clipping means that we can also have >1024 values which are just
clamped at read/get time, this could maybe have some side-effects on
math (signals propagations across TG) and type ranges control?

> For parts where we have to directly select frequency this obviously
> comes apart.

Moreover, utilization is not (will not be) just for frequency driving.
We should keep the task placement perspective into account.

On that side, I personally like the definition _I think_ we have now:

  utilization is the amount of maximum capacity used

where maximum is a constant defined at boot time and representing the
absolute max you can expect to get...
... apart from "transient capacity limitations".

Scaling the maximum depending on these transient conditions to me it
reads like "changing the scale". Which I fear will make it more
difficult for example to compare in space (different CPUs) or time
(different scheduler events) what a utilization measure means.

For example, if you have a busy loop running on a CPU which is subject
to RT pressure, you will read a <100% utilization (let say 60%). Still
it's interesting to know that maybe I can try to move that task on an
IDLE CPU to run it faster.

Should not be the same for turbo boost?

If the same task is generating only 60% utilization, because of not
available turbo boost OPPs,  should still not be useful to see that
there is for example another CPU (maybe on a different NUMA node)
which is IDLE and cold, where we can move the task there to exploit
the 100% capacity provided by the topmost turbo boost mode?

> However; what happens when the sustainable freq drops below our initial
> 'max'? Imagine us dropping below the all-core-turbo because of AVX. Then
> we're back to running at u<1 at full tilt.
> 
> Or for mobile parts, the sustainable frequency could drop because of
> severe thermal limits. Now I _think_ we have the possibility for getting
> interrupts and reading the new guaranteed frequency, so we could
> re-guage.
> 
> So in theory I think it works, in practise we need to always be able to
> find the actual max -- be it all-core turbo, AVX or thermal constrained
> frequency. Can we do that in all cases?
> 
> 
> I need to go back to see what the complains against Vincent's proposal
> were, because I really liked the fact that it did away with all this.

AFAIR Vincent proposal was mainly addressing a different issue: fast
ramp-up... I don't recall if there was any specific intent to cover
the issue of "transient maximum capacities".

And still, based on my (maybe bogus) reasoning above, I think we are
discussing here a slightly different problem which has already a
(maybe partial) solution.

-- 
#include <best/regards.h>

Patrick Bellasi

  parent reply	other threads:[~2018-05-18 10:57 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-16  4:49 [RFC/RFT] [PATCH 00/10] Intel_pstate: HWP Dynamic performance boost Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 01/10] x86,sched: Add support for frequency invariance Srinivas Pandruvada
2018-05-16  9:56   ` Peter Zijlstra
2018-05-16  4:49 ` [RFC/RFT] [PATCH 02/10] cpufreq: intel_pstate: Conditional frequency invariant accounting Srinivas Pandruvada
2018-05-16  7:16   ` Peter Zijlstra
2018-05-16  7:29     ` Peter Zijlstra
2018-05-16  9:07       ` Rafael J. Wysocki
2018-05-16 17:32         ` Srinivas Pandruvada
2018-05-16 15:19   ` Juri Lelli
2018-05-16 15:47     ` Peter Zijlstra
2018-05-16 16:31       ` Juri Lelli
2018-05-17 10:59         ` Juri Lelli
2018-05-17 15:04           ` Juri Lelli
2018-05-17 15:41             ` Srinivas Pandruvada
2018-05-17 16:16               ` Peter Zijlstra
2018-05-17 16:42                 ` Srinivas Pandruvada
2018-05-17 16:56                   ` Rafael J. Wysocki
2018-05-17 18:28                     ` Peter Zijlstra
2018-05-18  7:36                       ` Rafael J. Wysocki
2018-05-18 10:57                       ` Patrick Bellasi [this message]
2018-05-18 11:29                         ` Peter Zijlstra
2018-05-18 13:33                           ` Patrick Bellasi
2018-05-30 16:57                             ` Patrick Bellasi
2018-05-18 14:09                           ` Valentin Schneider
2018-05-16 15:58     ` Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 03/10] cpufreq: intel_pstate: Utility functions to boost HWP performance limits Srinivas Pandruvada
2018-05-16  7:22   ` Peter Zijlstra
2018-05-16  9:15     ` Rafael J. Wysocki
2018-05-16 10:43       ` Peter Zijlstra
2018-05-16 15:39         ` Srinivas Pandruvada
2018-05-16 15:41     ` Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 04/10] cpufreq: intel_pstate: Add update_util_hook for HWP Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 05/10] cpufreq: intel_pstate: HWP boost performance on IO Wake Srinivas Pandruvada
2018-05-16  7:37   ` Peter Zijlstra
2018-05-16 17:55     ` Srinivas Pandruvada
2018-05-17  8:19       ` Peter Zijlstra
2018-05-16  9:45   ` Rafael J. Wysocki
2018-05-16 19:28     ` Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 06/10] cpufreq / sched: Add interface to get utilization values Srinivas Pandruvada
2018-05-16  6:40   ` Viresh Kumar
2018-05-16 22:25     ` Srinivas Pandruvada
2018-05-16  8:11   ` Peter Zijlstra
2018-05-16 22:40     ` Srinivas Pandruvada
2018-05-17  7:50       ` Peter Zijlstra
2018-05-16  4:49 ` [RFC/RFT] [PATCH 07/10] cpufreq: intel_pstate: HWP boost performance on busy task migrate Srinivas Pandruvada
2018-05-16  9:49   ` Rafael J. Wysocki
2018-05-16 20:59     ` Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 08/10] cpufreq: intel_pstate: Dyanmically update busy pct Srinivas Pandruvada
2018-05-16  7:43   ` Peter Zijlstra
2018-05-16  7:47   ` Peter Zijlstra
2018-05-16  4:49 ` [RFC/RFT] [PATCH 09/10] cpufreq: intel_pstate: New sysfs entry to control HWP boost Srinivas Pandruvada
2018-05-16  4:49 ` [RFC/RFT] [PATCH 10/10] cpufreq: intel_pstate: enable boost for SKX Srinivas Pandruvada
2018-05-16  7:49   ` Peter Zijlstra
2018-05-16 15:46     ` Srinivas Pandruvada
2018-05-16 15:54       ` Peter Zijlstra
2018-05-17  0:52         ` Srinivas Pandruvada
2018-05-16  6:49 ` [RFC/RFT] [PATCH 00/10] Intel_pstate: HWP Dynamic performance boost Juri Lelli
2018-05-16 15:43   ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518105742.GN30654@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=bp@suse.de \
    --cc=juri.lelli@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).