All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Paul Turner <pjt@google.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	John Stultz <john.stultz@linaro.org>,
	Todd Kjos <tkjos@android.com>, Tim Murray <timmurray@google.com>,
	Andres Oportus <andresoportus@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Juri Lelli <juri.lelli@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>
Subject: Re: [RFC v3 0/5] Add capacity capping support to the CPU controller
Date: Mon, 10 Apr 2017 09:36:22 +0200	[thread overview]
Message-ID: <20170410073622.2y6tnpcd2ssuoztz@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20170320172233.GA28391@e110439-lin>

On Mon, Mar 20, 2017 at 05:22:33PM +0000, Patrick Bellasi wrote:

> a) Bias OPP selection.
>    Thus granting that certain critical tasks always run at least at a
>    specified frequency.
> 
> b) Bias TASKS placement, which requires an additional extension not
>    yet posted to keep things simple.
>    This allows heterogeneous systems, where different CPUs have
>    different capacities, to schedule important tasks in more capable
>    CPUs.

So I dislike the capacity min/max knobs because:

 1) capacity is more an implementation detail than a primary metric;
    illustrated per your above points in that it affects both, while in
    fact it actually modifies another metric, namely util_avg.

 2) they look like an absolute clamp for a best effort class; while it
    very much is not. This also leads to very confusing nesting
    properties re cgroup.

 3) they have absolutely unacceptable overhead in implementation. Two
    more RB tree operations per enqueue/dequeue is just not going to
    happen.

 4) they have muddled semantics, because while its presented as a task
    property, it very much is not. The max(min) and min(max) of all
    runnable tasks is applied to the aggregate util signal. Also see 2.


So 'nice' is a fairly well understood knob; it controls relative time
received between competing tasks (and we really should replace the cpu
controllers 'shares' file with a 'nice' file, see below).


I would much rather introduce something like nice-opp, which only
affects the OPP selection in a relative fashion. This immediately also
has a natural and obvious extension to cgroup hierarchies (just like
regular nice).


And could be folded as a factor in util_avg (just as we do with weight
in load_avg), although this will mess up everything else we use util for
:/. Or, alternatively, decompose the aggregate value upon usage and only
apply the factor for the current running task or something.... Blergh..
difficult..




---
 kernel/sched/core.c  | 29 +++++++++++++++++++++--------
 kernel/sched/fair.c  |  2 +-
 kernel/sched/sched.h |  1 +
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 27b4dd55b0c7..20ed17369cb1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6963,18 +6963,31 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static int cpu_shares_write_u64(struct cgroup_subsys_state *css,
-				struct cftype *cftype, u64 shareval)
+static int cpu_nice_write_s64(struct cgroup_subsys_state *css,
+				struct cftype *cftype, s64 val)
 {
-	return sched_group_set_shares(css_tg(css), scale_load(shareval));
+	struct task_group *tg = css_tg(css);
+	unsigned long weight;
+
+	int ret;
+
+	if (val < MIN_NICE || val > MAX_NICE)
+		return -EINVAL;
+
+	weight = sched_prio_to_weight[val - MIN_NICE];
+
+	ret = sched_group_set_shares(tg, scale_load(weight));
+
+	if (!ret)
+		tg->nice = val;
 }
 
-static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
+static u64 cpu_shares_read_s64(struct cgroup_subsys_state *css,
 			       struct cftype *cft)
 {
 	struct task_group *tg = css_tg(css);
 
-	return (u64) scale_load_down(tg->shares);
+	return (s64)tg->nice;
 }
 
 #ifdef CONFIG_CFS_BANDWIDTH
@@ -7252,9 +7265,9 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
 static struct cftype cpu_files[] = {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	{
-		.name = "shares",
-		.read_u64 = cpu_shares_read_u64,
-		.write_u64 = cpu_shares_write_u64,
+		.name = "nice",
+		.read_s64 = cpu_nice_read_s64,
+		.write_s64 = cpu_nice_write_s64,
 	},
 #endif
 #ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 76f67b3e34d6..8088043f46d1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9471,7 +9471,7 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
 	if (!tg->se[0])
 		return -EINVAL;
 
-	shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));
+	shares = clamp(shares, MIN_SHARES, scale_load(MAX_SHARES));
 
 	mutex_lock(&shares_mutex);
 	if (tg->shares == shares)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 57caf3606114..27f1ffe763bc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -283,6 +283,7 @@ struct task_group {
 	/* runqueue "owned" by this group on each cpu */
 	struct cfs_rq **cfs_rq;
 	unsigned long shares;
+	int nice;
 
 #ifdef	CONFIG_SMP
 	/*

  reply	other threads:[~2017-04-10  7:36 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 14:38 [RFC v3 0/5] Add capacity capping support to the CPU controller Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 1/5] sched/core: add capacity constraints to " Patrick Bellasi
2017-03-13 10:46   ` Joel Fernandes (Google)
2017-03-15 11:20     ` Patrick Bellasi
2017-03-15 13:20       ` Joel Fernandes
2017-03-15 16:10         ` Paul E. McKenney
2017-03-15 16:44           ` Patrick Bellasi
2017-03-15 17:24             ` Paul E. McKenney
2017-03-15 17:57               ` Patrick Bellasi
2017-03-20 17:15   ` Tejun Heo
2017-03-20 17:36     ` Tejun Heo
2017-03-20 18:08     ` Patrick Bellasi
2017-03-23  0:28       ` Joel Fernandes (Google)
2017-03-23 10:32         ` Patrick Bellasi
2017-03-23 16:01           ` Tejun Heo
2017-03-23 18:15             ` Patrick Bellasi
2017-03-23 18:39               ` Tejun Heo
2017-03-24  6:37                 ` Joel Fernandes (Google)
2017-03-24 15:00                   ` Tejun Heo
2017-03-30 21:13                 ` Paul Turner
2017-03-24  7:02           ` Joel Fernandes (Google)
2017-03-30 21:15       ` Paul Turner
2017-04-01 16:25         ` Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 2/5] sched/core: track CPU's capacity_{min,max} Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 3/5] sched/core: sync capacity_{min,max} between slow and fast paths Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 4/5] sched/{core,cpufreq_schedutil}: add capacity clamping for FAIR tasks Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks Patrick Bellasi
2017-03-13 10:08   ` Joel Fernandes (Google)
2017-03-15 11:40     ` Patrick Bellasi
2017-03-15 12:59       ` Joel Fernandes
2017-03-15 14:44         ` Juri Lelli
2017-03-15 16:13           ` Joel Fernandes
2017-03-15 16:24             ` Juri Lelli
2017-03-15 23:40               ` Joel Fernandes
2017-03-16 11:16                 ` Juri Lelli
2017-03-16 12:27                   ` Patrick Bellasi
2017-03-16 12:44                     ` Juri Lelli
2017-03-16 16:58                       ` Joel Fernandes
2017-03-16 17:17                         ` Juri Lelli
2017-03-15 11:41 ` [RFC v3 0/5] Add capacity capping support to the CPU controller Rafael J. Wysocki
2017-03-15 12:59   ` Patrick Bellasi
2017-03-16  1:04     ` Rafael J. Wysocki
2017-03-16  3:15       ` Joel Fernandes
2017-03-20 22:51         ` Rafael J. Wysocki
2017-03-21 11:01           ` Patrick Bellasi
2017-03-24 23:52             ` Rafael J. Wysocki
2017-03-16 12:23       ` Patrick Bellasi
2017-03-20 14:51 ` Tejun Heo
2017-03-20 17:22   ` Patrick Bellasi
2017-04-10  7:36     ` Peter Zijlstra [this message]
2017-04-11 17:58       ` Patrick Bellasi
2017-04-12 12:10         ` Peter Zijlstra
2017-04-12 13:55           ` Patrick Bellasi
2017-04-12 15:37             ` Peter Zijlstra
2017-04-13 11:33               ` Patrick Bellasi
2017-04-12 12:15         ` Peter Zijlstra
2017-04-12 13:34           ` Patrick Bellasi
2017-04-12 14:41             ` Peter Zijlstra
2017-04-12 12:22         ` Peter Zijlstra
2017-04-12 13:24           ` Patrick Bellasi
2017-04-12 12:48         ` Peter Zijlstra
2017-04-12 13:27           ` Patrick Bellasi
2017-04-12 14:34             ` Peter Zijlstra
2017-04-12 14:43               ` Patrick Bellasi
2017-04-12 16:14                 ` Peter Zijlstra
2017-04-13 10:34                   ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170410073622.2y6tnpcd2ssuoztz@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=andresoportus@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=john.stultz@linaro.org \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=timmurray@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@android.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.