From: Peter Zijlstra <firstname.lastname@example.org> To: Patrick Bellasi <email@example.com> Cc: Juri Lelli <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, Ingo Molnar <email@example.com>, Tejun Heo <firstname.lastname@example.org>, "Rafael J . Wysocki" <email@example.com>, Viresh Kumar <firstname.lastname@example.org>, Vincent Guittot <email@example.com>, Paul Turner <firstname.lastname@example.org>, Quentin Perret <email@example.com>, Dietmar Eggemann <firstname.lastname@example.org>, Morten Rasmussen <email@example.com>, Todd Kjos <firstname.lastname@example.org>, Joel Fernandes <email@example.com>, Steve Muckle <firstname.lastname@example.org>, Suren Baghdasaryan <email@example.com> Subject: Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default Date: Fri, 21 Sep 2018 11:13:08 +0200 Message-ID: <20180921091308.GD24082@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <20180917122723.GS1413@e110439-lin> On Mon, Sep 17, 2018 at 01:27:23PM +0100, Patrick Bellasi wrote: > On 14-Sep 16:28, Peter Zijlstra wrote: > > The thing is, the values you'd want to use are for example the capacity > > of the little CPUs. or the capacity of the most energy efficient OPP > > (the knee). > > I don't think so. > > On the knee topic, we had some thinking and on most platforms it seems > to be a rather arbitrary decision. > > On sane platforms, the Energy Efficiency (EE) is monotonically > decreasing with frequency increase. Maybe we can define a threshold > for a "EE derivative ratio", but it will still be quite arbitrary. > Moreover, it could be that in certain use-cases we want to push for > higher energy efficiency (i.e. lower derivatives) then others. I remember IBM-power folks asking for knee related features a number of years ago (Dusseldorf IIRC) because after some point their chips start to _really_ suck power. Sure, the curve is monotonic, but the perf/watt takes a nose dive. And given that: P = CfV^2, that seems like a fairly generic observation. However, maybe, due to the very limited thermal capacity of these mobile things, the issue doesn't really arrise in them. Laptops with active cooling however... > > Similarly for boosting, how are we 'easily' going to find the values > > that correspond to the various available OPPs. > > In our experience with SchedTune on Android, we found that we > generally focus on a small set of representative use-cases and then > run an exploration, by tuning the percentage of boost, to identify the > optimal trade-off between Performance and Energy. So you basically do an automated optimization for a benchmark? > The value you get could be something which do not match exactly an OPP > but still, since we (will) bias not only OPP selection but also tasks > placement, it's the one which makes most sense. *groan*, so how exactly does that work? By limiting the task capacity, we allow some stacking on the CPUs before we switch to regular load-balancing? > Thus, the capacity of little CPUs, or the exact capacity of an OPP, is > something we don't care to specify exactly, since: > > - schedutil will top the util request to the next frequency anyway > > - capacity by itself is a loosely defined metric, since it's usually > measured considering a specific kind of instructions mix, which > can be very different from the actual instruction mix (e.g. integer > vs floating point) Sure, things like pure SIMD workloads can skew things pretty bad, but on average it should not drastically change the overall shape of the curve and the knee point should not move around a lot. > - certain platforms don't even expose OPPs, but just "performance > levels"... which ultimately are a "percentage" Well, the whole capacity thing is a 'percentage', it's just that 1024 is much nicer to work with (for computers) than 100 is (also it provides a wee bit more resolution). But even the platforms with hidden OPPs (can) have knee points, and if you measure their power to capacity curve you can place a workload around the knee by capping capacity. But yes, this gets trick real fast :/ > - there are so many rounding errors around on utilization tracking > and it aggregation that being exact on an OPP if of "relative" > importance I'm not sure I understand that argument; sure the measurement is subject to 'issues', but if we hard clip the result, that will exactly match the fixed points for OPP selection. Any issues on the measurement are lost after clipping. > Do you see specific use-cases where an exact OPP capacity is much > better then a percentage value ? If I don't have algorithmic optimization available, hand selecting an OPP is the 'obvious' thing to do. > Of course there can be scenarios in which wa want to clamp to a > specific OPP. But still, why should it be difficult for a platform > integrator to express it as a close enough percentage value ? But why put him through the trouble of finding the capacity value in the EAS exposed data, converting that to a percentage that will work and then feeding it back in. I don't see the point or benefit of percentages, there's nothing magical about 1/100, _any_ other fraction works exactly the same. So why bother changing it around? > > The EAS thing might have these around; but I forgot if/how they're > > exposed to userspace (I'll have to soon look at the latest posting). > > The new "Energy Model Management" framework can certainly be use to > get the list of OPPs for each frequency domain. IMO this could be > used to identify the maximum number of clamp groups we can have. > In this case, the discretization patch can translate a generic > percentage clamp into the closest OPP capacity... > > ... but to me that's an internal detail which I'm not convinced we > don't need to expose to user-space. > > IMHO we should instead focus just on defining a usable and generic > userspace interface. Then, platform specific tuning is something > user-space can do, either offline or on-line. The thing I worry about is how do we determine the value to put in in the first place. How are expecting people to determine what to put into the interface? Knee points, little capacity, those things make 'obvious' sense. > > But changing the clamp metric to something different than these values > > is going to be pain. > > Maybe I don't completely get what you mean here... are you saying that > not using exact capacity values to defined clamps is difficult ? > If that's the case why? Can you elaborate with an example ? I meant changing the unit around, 1/1024 is what we use throughout and is what EAS is also exposing IIRC, so why make things complicated again and use 1/100 (which is a shit fraction for computers).
next prev parent reply index Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-28 13:53 [PATCH v4 00/16] Add utilization clamping support Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi 2018-09-05 11:01 ` Juri Lelli 2018-08-28 13:53 ` [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi 2018-09-05 10:45 ` Juri Lelli 2018-09-06 13:48 ` Patrick Bellasi 2018-09-06 14:13 ` Juri Lelli 2018-09-06 8:17 ` Juri Lelli 2018-09-06 14:00 ` Patrick Bellasi 2018-09-08 23:47 ` Suren Baghdasaryan 2018-09-12 10:32 ` Patrick Bellasi 2018-09-12 13:49 ` Peter Zijlstra 2018-09-12 15:56 ` Patrick Bellasi 2018-09-12 16:12 ` Peter Zijlstra 2018-09-12 17:35 ` Patrick Bellasi 2018-09-12 17:42 ` Peter Zijlstra 2018-09-12 17:52 ` Patrick Bellasi 2018-09-13 19:14 ` Peter Zijlstra 2018-09-14 8:51 ` Patrick Bellasi 2018-09-12 16:24 ` Peter Zijlstra 2018-09-12 17:42 ` Patrick Bellasi 2018-09-13 19:20 ` Peter Zijlstra 2018-09-14 8:47 ` Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi 2018-09-12 17:34 ` Peter Zijlstra 2018-09-12 17:44 ` Patrick Bellasi 2018-09-13 19:12 ` Peter Zijlstra 2018-09-14 9:07 ` Patrick Bellasi 2018-09-14 11:52 ` Peter Zijlstra 2018-09-14 13:41 ` Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 05/16] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi 2018-09-14 9:32 ` Peter Zijlstra 2018-09-14 13:19 ` Patrick Bellasi 2018-09-14 13:36 ` Peter Zijlstra 2018-09-14 13:57 ` Patrick Bellasi 2018-09-27 10:23 ` Quentin Perret 2018-08-28 13:53 ` [PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi 2018-08-28 18:29 ` Randy Dunlap 2018-08-29 8:53 ` Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps Patrick Bellasi 2018-09-09 3:02 ` Suren Baghdasaryan 2018-09-12 12:51 ` Patrick Bellasi 2018-09-12 15:56 ` Suren Baghdasaryan 2018-09-11 15:18 ` Tejun Heo 2018-09-11 16:26 ` Patrick Bellasi 2018-09-11 16:28 ` Tejun Heo 2018-08-28 13:53 ` [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi 2018-09-09 18:52 ` Suren Baghdasaryan 2018-09-12 14:19 ` Patrick Bellasi 2018-09-12 15:53 ` Suren Baghdasaryan 2018-08-28 13:53 ` [PATCH v4 10/16] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 11/16] sched/core: uclamp: add system default clamps Patrick Bellasi 2018-09-10 16:20 ` Suren Baghdasaryan 2018-09-11 16:46 ` Patrick Bellasi 2018-09-11 19:25 ` Suren Baghdasaryan 2018-08-28 13:53 ` [PATCH v4 12/16] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 13/16] sched/core: uclamp: use percentage clamp values Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default Patrick Bellasi 2018-09-04 13:47 ` Juri Lelli 2018-09-06 14:40 ` Patrick Bellasi 2018-09-06 14:59 ` Juri Lelli 2018-09-06 17:21 ` Patrick Bellasi 2018-09-14 11:10 ` Peter Zijlstra 2018-09-14 14:07 ` Patrick Bellasi 2018-09-14 14:28 ` Peter Zijlstra 2018-09-17 12:27 ` Patrick Bellasi 2018-09-21 9:13 ` Peter Zijlstra [this message] 2018-09-24 15:14 ` Patrick Bellasi 2018-09-24 15:56 ` Peter Zijlstra 2018-09-24 17:23 ` Patrick Bellasi 2018-09-24 16:26 ` Peter Zijlstra 2018-09-24 17:19 ` Patrick Bellasi 2018-09-25 15:49 ` Peter Zijlstra 2018-09-26 10:43 ` Patrick Bellasi 2018-09-27 10:00 ` Quentin Perret 2018-09-26 17:51 ` Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 15/16] sched/core: uclamp: add clamp group discretization support Patrick Bellasi 2018-08-28 13:53 ` [PATCH v4 16/16] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180921091308.GD24082@hirez.programming.kicks-ass.net \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ firstname.lastname@example.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git