linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Quentin Perret <qperret@google.com>,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	Rik van Riel <riel@surriel.com>,
	Lingutla Chandrasekhar <clingutla@codeaurora.org>
Subject: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
Date: Thu, 15 Apr 2021 18:58:46 +0100	[thread overview]
Message-ID: <20210415175846.494385-3-valentin.schneider@arm.com> (raw)
In-Reply-To: <20210415175846.494385-1-valentin.schneider@arm.com>

Consider the following topology:

  DIE [          ]
  MC  [    ][    ]
       0  1  2  3

  capacity_orig_of(x \in {0-1}) < capacity_orig_of(x \in {2-3})

w/ CPUs 2-3 idle and CPUs 0-1 running CPU hogs (util_avg=1024).

When CPU2 goes through load_balance() (via periodic / NOHZ balance), it
should pull one CPU hog from either CPU0 or CPU1 (this is misfit task
upmigration). However, should a e.g. pcpu kworker awake on CPU0 just before
this load_balance() happens and preempt the CPU hog running there, we would
have, for the [0-1] group at CPU2's DIE level:

o sgs->sum_nr_running > sgs->group_weight
o sgs->group_capacity * 100 < sgs->group_util * imbalance_pct

IOW, this group is group_overloaded.

Considering CPU0 is picked by find_busiest_queue(), we would then visit the
preempted CPU hog in detach_tasks(). However, given it has just been
preempted by this pcpu kworker, task_hot() will prevent it from being
detached. We then leave load_balance() without having done anything.

Long story short, preempted misfit tasks are affected by task_hot(), while
currently running misfit tasks are intentionally preempted by the stopper
task to migrate them over to a higher-capacity CPU.

Align detach_tasks() with the active-balance logic and let it pick a
cache-hot misfit task when the destination CPU can provide a capacity
uplift.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d2d1a69d7aa7..43fc98d34276 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7493,6 +7493,7 @@ struct lb_env {
 	enum fbq_type		fbq_type;
 	enum migration_type	migration_type;
 	enum group_type         src_grp_type;
+	enum group_type         dst_grp_type;
 	struct list_head	tasks;
 };
 
@@ -7533,6 +7534,31 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
 	return delta < (s64)sysctl_sched_migration_cost;
 }
 
+
+/*
+ * What does migrating this task do to our capacity-aware scheduling criterion?
+ *
+ * Returns 1, if the task needs more capacity than the dst CPU can provide.
+ * Returns 0, if the task needs the extra capacity provided by the dst CPU
+ * Returns -1, if the task isn't impacted by the migration wrt capacity.
+ */
+static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env)
+{
+	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
+		return -1;
+
+	if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
+		if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
+			return 0;
+		else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
+			return 1;
+		else
+			return -1;
+	}
+
+	return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1;
+}
+
 #ifdef CONFIG_NUMA_BALANCING
 /*
  * Returns 1, if task migration degrades locality
@@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	if (tsk_cache_hot == -1)
 		tsk_cache_hot = task_hot(p, env);
 
+	/*
+	 * On a (sane) asymmetric CPU capacity system, the increase in compute
+	 * capacity should offset any potential performance hit caused by a
+	 * migration.
+	 */
+	if ((env->dst_grp_type == group_has_spare) &&
+	    !migrate_degrades_capacity(p, env))
+		tsk_cache_hot = 0;
+
 	if (tsk_cache_hot <= 0 ||
 	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
 		if (tsk_cache_hot == 1) {
@@ -9310,6 +9345,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (!sds.busiest)
 		goto out_balanced;
 
+	env->dst_grp_type = local->group_type;
 	env->src_grp_type = busiest->group_type;
 
 	/* Misfit tasks should be dealt with regardless of the avg load */
-- 
2.25.1


  parent reply	other threads:[~2021-04-15 17:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15 17:58 [PATCH 0/2] sched/fair: (The return of) misfit task load-balance tweaks Valentin Schneider
2021-04-15 17:58 ` [PATCH 1/2] sched/fair: Filter out locally-unsolvable misfit imbalances Valentin Schneider
2021-04-15 18:47   ` Rik van Riel
2021-04-16 13:29   ` Vincent Guittot
2021-04-19 17:13     ` Valentin Schneider
2021-04-22  9:48   ` Dietmar Eggemann
2021-04-22 19:19     ` Valentin Schneider
2021-04-15 17:58 ` Valentin Schneider [this message]
2021-04-15 20:39   ` [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks Rik van Riel
2021-04-16  9:43     ` Valentin Schneider
2021-04-19 12:59       ` Phil Auld
2021-04-19 17:17         ` Valentin Schneider
2021-04-19 20:23           ` Phil Auld
2021-04-16 13:51   ` Vincent Guittot
2021-04-19 17:13     ` Valentin Schneider
2021-04-20 14:33       ` Vincent Guittot
2021-04-21 10:52         ` Valentin Schneider
2021-04-22 17:29           ` Dietmar Eggemann
2021-04-22 19:19             ` Valentin Schneider
2021-04-30  6:58           ` Vincent Guittot
2021-05-07 13:46             ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210415175846.494385-3-valentin.schneider@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=clingutla@codeaurora.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=riel@surriel.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).