All of lore.kernel.org
 help / color / mirror / Atom feed
From: Morten Rasmussen <morten.rasmussen@arm.com>
To: peterz@infradead.org, mingo@redhat.com
Cc: vincent.guittot@linaro.org,
	Dietmar Eggemann <Dietmar.Eggemann@arm.com>,
	yuyang.du@intel.com, preeti@linux.vnet.ibm.com,
	mturquette@linaro.org, rjw@rjwysocki.net,
	Juri Lelli <Juri.Lelli@arm.com>,
	sgurrappadi@nvidia.com, pang.xunlei@zte.com.cn,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	morten.rasmussen@arm.com
Subject: [RFCv4 PATCH 30/34] sched: Add cpu capacity awareness to wakeup balancing
Date: Tue, 12 May 2015 20:39:05 +0100	[thread overview]
Message-ID: <1431459549-18343-31-git-send-email-morten.rasmussen@arm.com> (raw)
In-Reply-To: <1431459549-18343-1-git-send-email-morten.rasmussen@arm.com>

Wakeup balancing is completely unaware of cpu capacity, cpu usage and
task utilization. The task is preferably placed on a cpu which is idle
in the instant the wakeup happens. New tasks (SD_BALANCE_{FORK,EXEC} are
placed on an idle cpu in the idlest group if such can be found, otherwise
it goes on the least loaded one. Existing tasks (SD_BALANCE_WAKE) are
placed on the previous cpu or an idle cpu sharing the same last level
cache. Hence existing tasks don't get a chance to migrate to a different
group at wakeup in case the current one has reduced cpu capacity (due
RT/IRQ pressure or different uarch e.g. ARM big.LITTLE). They may
eventually get pulled by other cpus doing periodic/idle/nohz_idle
balance, but it may take quite a while before it happens.

This patch adds capacity awareness to find_idlest_{group,queue} (used by
SD_BALANCE_{FORK,EXEC}) such that groups/cpus that can accommodate the
waking task based on task utilization are preferred. In addition, wakeup
of existing tasks (SD_BALANCE_WAKE) is sent through
find_idlest_{group,queue} if the task doesn't fit the capacity of the
previous cpu to allow it to escape (override wake_affine) when
necessary instead of relying on periodic/idle/nohz_idle balance to
eventually sort it out.

The patch doesn't depend on any energy model infrastructure, but it is
kept behind the energy_aware() static key despite being primarily a
performance optimization as it may increase scheduler overhead slightly.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/core.c |  2 +-
 kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 98a83e4..ec46248 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6315,7 +6315,7 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 					| 1*SD_BALANCE_NEWIDLE
 					| 1*SD_BALANCE_EXEC
 					| 1*SD_BALANCE_FORK
-					| 0*SD_BALANCE_WAKE
+					| 1*SD_BALANCE_WAKE
 					| 1*SD_WAKE_AFFINE
 					| 0*SD_SHARE_CPUCAPACITY
 					| 0*SD_SHARE_PKG_RESOURCES
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 19601c9..bb44646 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5190,6 +5190,39 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	return 1;
 }
 
+static inline unsigned long task_utilization(struct task_struct *p)
+{
+	return p->se.avg.utilization_avg_contrib;
+}
+
+static inline bool __task_fits(struct task_struct *p, int cpu, int usage)
+{
+	unsigned long capacity = capacity_of(cpu);
+
+	usage += task_utilization(p);
+
+	return (capacity * 1024) > (usage * capacity_margin);
+}
+
+static inline bool task_fits_capacity(struct task_struct *p, int cpu)
+{
+	unsigned long capacity = capacity_of(cpu);
+	unsigned long max_capacity = cpu_rq(cpu)->rd->max_cpu_capacity;
+
+	if (capacity == max_capacity)
+		return true;
+
+	if (capacity * capacity_margin > max_capacity * 1024)
+		return true;
+
+	return __task_fits(p, cpu, 0);
+}
+
+static inline bool task_fits_cpu(struct task_struct *p, int cpu)
+{
+	return __task_fits(p, cpu, get_cpu_usage(cpu));
+}
+
 /*
  * find_idlest_group finds and returns the least busy CPU group within the
  * domain.
@@ -5199,7 +5232,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		  int this_cpu, int sd_flag)
 {
 	struct sched_group *idlest = NULL, *group = sd->groups;
+	struct sched_group *fit_group = NULL;
 	unsigned long min_load = ULONG_MAX, this_load = 0;
+	unsigned long fit_capacity = ULONG_MAX;
 	int load_idx = sd->forkexec_idx;
 	int imbalance = 100 + (sd->imbalance_pct-100)/2;
 
@@ -5230,6 +5265,12 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 				load = target_load(i, load_idx);
 
 			avg_load += load;
+
+			if (energy_aware() && capacity_of(i) < fit_capacity &&
+			    task_fits_cpu(p, i)) {
+				fit_capacity = capacity_of(i);
+				fit_group = group;
+			}
 		}
 
 		/* Adjust by relative CPU capacity of the group */
@@ -5243,6 +5284,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		}
 	} while (group = group->next, group != sd->groups);
 
+	if (fit_group)
+		return fit_group;
+
 	if (!idlest || 100*this_load < imbalance*min_load)
 		return NULL;
 	return idlest;
@@ -5263,7 +5307,7 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-		if (idle_cpu(i)) {
+		if (task_fits_cpu(p, i)) {
 			struct rq *rq = cpu_rq(i);
 			struct cpuidle_state *idle = idle_get_state(rq);
 			if (idle && idle->exit_latency < min_exit_latency) {
@@ -5275,7 +5319,8 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 				min_exit_latency = idle->exit_latency;
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
-			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
+			} else if (idle_cpu(i) &&
+				   (!idle || idle->exit_latency == min_exit_latency) &&
 				   rq->idle_stamp > latest_idle_timestamp) {
 				/*
 				 * If equal or no active idle state, then
@@ -5284,6 +5329,13 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 				 */
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
+			} else if (shallowest_idle_cpu == -1) {
+				/*
+				 * If we haven't found an idle CPU yet
+				 * pick a non-idle one that can fit the task as
+				 * fallback.
+				 */
+				shallowest_idle_cpu = i;
 			}
 		} else if (shallowest_idle_cpu == -1) {
 			load = weighted_cpuload(i);
@@ -5361,9 +5413,14 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	int cpu = smp_processor_id();
 	int new_cpu = cpu;
 	int want_affine = 0;
+	int want_sibling = true;
 	int sync = wake_flags & WF_SYNC;
 
-	if (sd_flag & SD_BALANCE_WAKE)
+	/* Check if prev_cpu can fit us ignoring its current usage */
+	if (energy_aware() && !task_fits_capacity(p, prev_cpu))
+		want_sibling = false;
+
+	if (sd_flag & SD_BALANCE_WAKE && want_sibling)
 		want_affine = cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
 
 	rcu_read_lock();
@@ -5388,7 +5445,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	if (affine_sd && cpu != prev_cpu && wake_affine(affine_sd, p, sync))
 		prev_cpu = cpu;
 
-	if (sd_flag & SD_BALANCE_WAKE) {
+	if (sd_flag & SD_BALANCE_WAKE && want_sibling) {
 		new_cpu = select_idle_sibling(p, prev_cpu);
 		goto unlock;
 	}
-- 
1.9.1


  parent reply	other threads:[~2015-05-12 19:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-12 19:38 [RFCv4 PATCH 00/34] sched: Energy cost model for energy-aware scheduling Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 01/34] arm: Frequency invariant scheduler load-tracking support Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 02/34] sched: Make load tracking frequency scale-invariant Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 03/34] arm: vexpress: Add CPU clock-frequencies to TC2 device-tree Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 04/34] sched: Convert arch_scale_cpu_capacity() from weak function to #define Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 05/34] arm: Update arch_scale_cpu_capacity() to reflect change to define Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 06/34] sched: Make usage tracking cpu scale-invariant Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 07/34] arm: Cpu invariant scheduler load-tracking support Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 08/34] sched: Get rid of scaling usage by cpu_capacity_orig Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 09/34] sched: Track blocked utilization contributions Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 10/34] sched: Include blocked utilization in usage tracking Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 11/34] sched: Remove blocked load and utilization contributions of dying tasks Morten Rasmussen
2015-05-13  0:33   ` Sai Gurrappadi
2015-05-13  0:33     ` Sai Gurrappadi
2015-05-13 13:49     ` Morten Rasmussen
2015-05-19 14:22       ` Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 12/34] sched: Initialize CFS task load and usage before placing task on rq Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 13/34] sched: Documentation for scheduler energy cost model Morten Rasmussen
2015-05-20  4:04   ` Kamalesh Babulal
2015-05-20  9:27     ` Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 14/34] sched: Make energy awareness a sched feature Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 15/34] sched: Introduce energy data structures Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 16/34] sched: Allocate and initialize " Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 17/34] sched: Introduce SD_SHARE_CAP_STATES sched_domain flag Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 18/34] arm: topology: Define TC2 energy and provide it to the scheduler Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 19/34] sched: Compute cpu capacity available at current frequency Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 20/34] sched: Relocated get_cpu_usage() and change return type Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 21/34] sched: Highest energy aware balancing sched_domain level pointer Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 22/34] sched: Calculate energy consumption of sched_group Morten Rasmussen
2015-05-21  7:57   ` Kamalesh Babulal
2015-05-22 15:38     ` Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 23/34] sched: Extend sched_group_energy to test load-balancing decisions Morten Rasmussen
2015-05-12 19:38 ` [RFCv4 PATCH 24/34] sched: Estimate energy impact of scheduling decisions Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 25/34] sched: Add over-utilization/tipping point indicator Morten Rasmussen
2015-05-22 19:48   ` [PATCH] sched: Fix compiler errors for NO_SMP machines Abel Vesa
2015-05-23 14:52     ` Ingo Molnar
2015-05-23 19:22       ` Abel Vesa
2015-06-30  9:35   ` [RFCv4 PATCH 25/34] sched: Add over-utilization/tipping point indicator pang.xunlei
2015-06-30  9:35     ` pang.xunlei
2015-05-12 19:39 ` [RFCv4 PATCH 26/34] sched: Store system-wide maximum cpu capacity in root domain Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 27/34] sched, cpuidle: Track cpuidle state index in the scheduler Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 28/34] sched: Count number of shallower idle-states in struct sched_group_energy Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 29/34] sched: Determine the current sched_group idle-state Morten Rasmussen
2015-05-12 19:39 ` Morten Rasmussen [this message]
2015-05-12 19:39 ` [RFCv4 PATCH 31/34] sched: Energy-aware wake-up task placement Morten Rasmussen
2015-05-14 14:03   ` Dietmar Eggemann
     [not found]   ` <OF168B7415.9556008C-ON48257E45.003388D7-48257E45.00349D8D@zte.com.cn>
2015-05-14 15:10     ` Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 32/34] sched: Consider a not over-utilized energy-aware system as balanced Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 33/34] sched: Enable idle balance to pull single task towards cpu with higher capacity Morten Rasmussen
2015-05-12 19:39 ` [RFCv4 PATCH 34/34] sched: Disable energy-unfriendly nohz kicks Morten Rasmussen
2015-05-12 22:07 ` [RFCv4 PATCH 00/34] sched: Energy cost model for energy-aware scheduling Sai Gurrappadi
2015-05-12 22:07   ` Sai Gurrappadi
2015-05-13 13:47   ` Morten Rasmussen
2015-06-28 20:26     ` Abel Vesa
2015-06-29  9:06       ` pang.xunlei
2015-06-29 10:19         ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1431459549-18343-31-git-send-email-morten.rasmussen@arm.com \
    --to=morten.rasmussen@arm.com \
    --cc=Dietmar.Eggemann@arm.com \
    --cc=Juri.Lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mturquette@linaro.org \
    --cc=pang.xunlei@zte.com.cn \
    --cc=peterz@infradead.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=rjw@rjwysocki.net \
    --cc=sgurrappadi@nvidia.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.