All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity
@ 2017-08-07 16:39 Brendan Jackman
  2017-09-20 17:21 ` Brendan Jackman
  2017-10-10 10:59 ` [tip:sched/core] sched/fair: Force balancing on NOHZ " tip-bot for Brendan Jackman
  0 siblings, 2 replies; 3+ messages in thread
From: Brendan Jackman @ 2017-08-07 16:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Morten Rasmussen, Peter Zijlstra, Mike Galbraith,
	Paul Turner

The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical. From the original commit (fab476228ba3
"sched: Force balancing on newidle balance if local group has
capacity") that adds it:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM
big.LITTLE (asymmetrical CPU capacity) systems - consider 8
always-running, same-priority tasks on a system with 4 "big" and 4
"little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which
will be represented by one sched_group in the DIE sched_domain) and 3
on the "little" (the other sched_group in DIE), leaving one CPU
unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Turner <pjt@google.com>
---
 kernel/sched/fair.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e216f6..63eff3e881a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7801,8 +7801,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (busiest->group_type == group_imbalanced)
 		goto force_balance;
 
-	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
+	/*
+	 * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
+	 * capacities from resulting in underutilization due to avg_load.
+	 */
+	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
 	    busiest->group_no_capacity)
 		goto force_balance;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity
  2017-08-07 16:39 [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity Brendan Jackman
@ 2017-09-20 17:21 ` Brendan Jackman
  2017-10-10 10:59 ` [tip:sched/core] sched/fair: Force balancing on NOHZ " tip-bot for Brendan Jackman
  1 sibling, 0 replies; 3+ messages in thread
From: Brendan Jackman @ 2017-09-20 17:21 UTC (permalink / raw)
  To: linux-kernel, Josef Bacik, Peter Zijlstra
  Cc: Ingo Molnar, Morten Rasmussen, Mike Galbraith, Paul Turner

Hi Peter, Josef,

Do you have any thoughts on this one?

On Mon, Aug 07 2017 at 16:39, Brendan Jackman wrote:
> The "goto force_balance" here is intended to mitigate the fact that
> avg_load calculations can result in bad placement decisions when
> priority is asymmetrical. From the original commit (fab476228ba3
> "sched: Force balancing on newidle balance if local group has
> capacity") that adds it:
>
>     Under certain situations, such as a niced down task (i.e. nice =
>     -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
>     on a sched group and kicks away other tasks because of its large
>     weight. This leads to sub-optimal utilization of the
>     machine. Even though the sched group has capacity, it does not
>     pull tasks because sds.this_load >> sds.max_load, and f_b_g()
>     returns NULL.
>
> A similar but inverted issue also affects ARM
> big.LITTLE (asymmetrical CPU capacity) systems - consider 8
> always-running, same-priority tasks on a system with 4 "big" and 4
> "little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which
> will be represented by one sched_group in the DIE sched_domain) and 3
> on the "little" (the other sched_group in DIE), leaving one CPU
> unused. Because the "big" group has a higher group_capacity its
> avg_load may not present an imbalance that would cause migrating a
> task to the idle "little".
>
> The force_balance case here solves the problem but currently only for
> CPU_NEWLY_IDLE balances, which in theory might never happen on the
> unused CPU. Including CPU_IDLE in the force_balance case means
> there's an upper bound on the time before we can attempt to solve the
> underutilization: after DIE's sd->balance_interval has passed the
> next nohz balance kick will help us out.
>
> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Turner <pjt@google.com>
> ---
>  kernel/sched/fair.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c95880e216f6..63eff3e881a0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7801,8 +7801,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  	if (busiest->group_type == group_imbalanced)
>  		goto force_balance;
>
> -	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
> -	if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
> +	/*
> +	 * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
> +	 * capacities from resulting in underutilization due to avg_load.
> +	 */
> +	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
>  	    busiest->group_no_capacity)
>  		goto force_balance;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:sched/core] sched/fair: Force balancing on NOHZ balance if local group has capacity
  2017-08-07 16:39 [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity Brendan Jackman
  2017-09-20 17:21 ` Brendan Jackman
@ 2017-10-10 10:59 ` tip-bot for Brendan Jackman
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Brendan Jackman @ 2017-10-10 10:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, mingo, pjt, tglx, morten.rasmussen, torvalds,
	linux-kernel, brendan.jackman, hpa, peterz

Commit-ID:  583ffd99d7657755736d831bbc182612d1d2697d
Gitweb:     https://git.kernel.org/tip/583ffd99d7657755736d831bbc182612d1d2697d
Author:     Brendan Jackman <brendan.jackman@arm.com>
AuthorDate: Thu, 5 Oct 2017 11:58:54 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 10 Oct 2017 11:45:32 +0200

sched/fair: Force balancing on NOHZ balance if local group has capacity

The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228ba3 ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c04a425..cf3e816 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8186,8 +8186,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (busiest->group_type == group_imbalanced)
 		goto force_balance;
 
-	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
+	/*
+	 * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
+	 * capacities from resulting in underutilization due to avg_load.
+	 */
+	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
 	    busiest->group_no_capacity)
 		goto force_balance;
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-10-10 11:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-07 16:39 [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity Brendan Jackman
2017-09-20 17:21 ` Brendan Jackman
2017-10-10 10:59 ` [tip:sched/core] sched/fair: Force balancing on NOHZ " tip-bot for Brendan Jackman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.