linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity
@ 2017-08-07 16:39 Brendan Jackman
  2017-09-20 17:21 ` Brendan Jackman
  2017-10-10 10:59 ` [tip:sched/core] sched/fair: Force balancing on NOHZ " tip-bot for Brendan Jackman
  0 siblings, 2 replies; 3+ messages in thread
From: Brendan Jackman @ 2017-08-07 16:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Morten Rasmussen, Peter Zijlstra, Mike Galbraith,
	Paul Turner

The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical. From the original commit (fab476228ba3
"sched: Force balancing on newidle balance if local group has
capacity") that adds it:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM
big.LITTLE (asymmetrical CPU capacity) systems - consider 8
always-running, same-priority tasks on a system with 4 "big" and 4
"little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which
will be represented by one sched_group in the DIE sched_domain) and 3
on the "little" (the other sched_group in DIE), leaving one CPU
unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Turner <pjt@google.com>
---
 kernel/sched/fair.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e216f6..63eff3e881a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7801,8 +7801,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (busiest->group_type == group_imbalanced)
 		goto force_balance;
 
-	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
+	/*
+	 * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
+	 * capacities from resulting in underutilization due to avg_load.
+	 */
+	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
 	    busiest->group_no_capacity)
 		goto force_balance;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity
  2017-08-07 16:39 [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity Brendan Jackman
@ 2017-09-20 17:21 ` Brendan Jackman
  2017-10-10 10:59 ` [tip:sched/core] sched/fair: Force balancing on NOHZ " tip-bot for Brendan Jackman
  1 sibling, 0 replies; 3+ messages in thread
From: Brendan Jackman @ 2017-09-20 17:21 UTC (permalink / raw)
  To: linux-kernel, Josef Bacik, Peter Zijlstra
  Cc: Ingo Molnar, Morten Rasmussen, Mike Galbraith, Paul Turner

Hi Peter, Josef,

Do you have any thoughts on this one?

On Mon, Aug 07 2017 at 16:39, Brendan Jackman wrote:
> The "goto force_balance" here is intended to mitigate the fact that
> avg_load calculations can result in bad placement decisions when
> priority is asymmetrical. From the original commit (fab476228ba3
> "sched: Force balancing on newidle balance if local group has
> capacity") that adds it:
>
>     Under certain situations, such as a niced down task (i.e. nice =
>     -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
>     on a sched group and kicks away other tasks because of its large
>     weight. This leads to sub-optimal utilization of the
>     machine. Even though the sched group has capacity, it does not
>     pull tasks because sds.this_load >> sds.max_load, and f_b_g()
>     returns NULL.
>
> A similar but inverted issue also affects ARM
> big.LITTLE (asymmetrical CPU capacity) systems - consider 8
> always-running, same-priority tasks on a system with 4 "big" and 4
> "little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which
> will be represented by one sched_group in the DIE sched_domain) and 3
> on the "little" (the other sched_group in DIE), leaving one CPU
> unused. Because the "big" group has a higher group_capacity its
> avg_load may not present an imbalance that would cause migrating a
> task to the idle "little".
>
> The force_balance case here solves the problem but currently only for
> CPU_NEWLY_IDLE balances, which in theory might never happen on the
> unused CPU. Including CPU_IDLE in the force_balance case means
> there's an upper bound on the time before we can attempt to solve the
> underutilization: after DIE's sd->balance_interval has passed the
> next nohz balance kick will help us out.
>
> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Turner <pjt@google.com>
> ---
>  kernel/sched/fair.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c95880e216f6..63eff3e881a0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7801,8 +7801,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  	if (busiest->group_type == group_imbalanced)
>  		goto force_balance;
>
> -	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
> -	if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
> +	/*
> +	 * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
> +	 * capacities from resulting in underutilization due to avg_load.
> +	 */
> +	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
>  	    busiest->group_no_capacity)
>  		goto force_balance;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:sched/core] sched/fair: Force balancing on NOHZ balance if local group has capacity
  2017-08-07 16:39 [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity Brendan Jackman
  2017-09-20 17:21 ` Brendan Jackman
@ 2017-10-10 10:59 ` tip-bot for Brendan Jackman
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Brendan Jackman @ 2017-10-10 10:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, mingo, pjt, tglx, morten.rasmussen, torvalds,
	linux-kernel, brendan.jackman, hpa, peterz

Commit-ID:  583ffd99d7657755736d831bbc182612d1d2697d
Gitweb:     https://git.kernel.org/tip/583ffd99d7657755736d831bbc182612d1d2697d
Author:     Brendan Jackman <brendan.jackman@arm.com>
AuthorDate: Thu, 5 Oct 2017 11:58:54 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 10 Oct 2017 11:45:32 +0200

sched/fair: Force balancing on NOHZ balance if local group has capacity

The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228ba3 ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c04a425..cf3e816 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8186,8 +8186,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (busiest->group_type == group_imbalanced)
 		goto force_balance;
 
-	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
+	/*
+	 * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
+	 * capacities from resulting in underutilization due to avg_load.
+	 */
+	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
 	    busiest->group_no_capacity)
 		goto force_balance;
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-10-10 11:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-07 16:39 [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity Brendan Jackman
2017-09-20 17:21 ` Brendan Jackman
2017-10-10 10:59 ` [tip:sched/core] sched/fair: Force balancing on NOHZ " tip-bot for Brendan Jackman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).