All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks
@ 2020-09-21  7:24 Vincent Guittot
  2020-09-21  7:24 ` [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance Vincent Guittot
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Vincent Guittot @ 2020-09-21  7:24 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, pauld, hdanton, Vincent Guittot

When the system doesn't have enough cycles for all tasks, the scheduler
must ensure a fair split of those CPUs cycles between CFS tasks. The
fairness of some use cases can't be solved with a static distribution of
the tasks on the system and requires a periodic rebalancing of the system
but this dynamic behavior is not always optimal and the fair distribution
of the CPU's time is not always ensured.

The patchset improves the fairness by decreasing  the constraint for
selecting migratable tasks with the number of failed load balance. This
change enables then to decrease the imbalance threshold because 1st LB
will try to migrate tasks that fully match the imbalance.

Some tests results:

- small 2 x 4 cores arm64 system

hackbench -l (256000/#grp) -g #grp

grp    tip/sched/core         +patchset             improvement
1      1.420(+/- 11.72 %)     1.382(+/-10.50 %)     2.72 %
4      1.295(+/-  2.72 %)     1.218(+/- 2.97 %)     0.76 %
8      1.220(+/-  2.17 %)     1.218(+/- 1.60 %)     0.17 %
16     1.258(+/-  1.88 %)     1.250(+/- 1,78 %)     0.58 %


fairness tests: run always running rt-app threads
monitor the ratio between min/max work done by threads

                  v5.9-rc1             w/ patchset
9 threads  avg     78.3% (+/- 6.60%)   91.20% (+/- 2.44%)
           worst   68.6%               85.67%

11 threads avg     65.91% (+/- 8.26%)  91.34% (+/- 1.87%)
           worst   53.52%              87.26%

- large 2 nodes x 28 cores x 4 threads arm64 system

The hackbench tests that I usually run as well as the sp.C.x and lu.C.x
tests with 224 threads have not shown any difference with a mix of less
than 0.5% of improvements or regressions.

Changes for v2:
- rebased on tip/sched/core
- added comment for patch 3
- added acked and reviewed tags

Vincent Guittot (4):
  sched/fair: relax constraint on task's load during load balance
  sched/fair: reduce minimal imbalance threshold
  sched/fair: minimize concurrent LBs between domain level
  sched/fair: reduce busy load balance interval

 kernel/sched/fair.c     | 13 +++++++++++--
 kernel/sched/topology.c |  4 ++--
 2 files changed, 13 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance
  2020-09-21  7:24 [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Vincent Guittot
@ 2020-09-21  7:24 ` Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Relax " tip-bot2 for Vincent Guittot
  2020-09-21  7:24 ` [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold Vincent Guittot
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2020-09-21  7:24 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, pauld, hdanton, Vincent Guittot

Some UCs like 9 always running tasks on 8 CPUs can't be balanced and the
load balancer currently migrates the waiting task between the CPUs in an
almost random manner. The success of a rq pulling a task depends of the
value of nr_balance_failed of its domains and its ability to be faster
than others to detach it. This behavior results in an unfair distribution
of the running time between tasks because some CPUs will run most of the
time, if not always, the same task whereas others will share their time
between several tasks.

Instead of using nr_balance_failed as a boolean to relax the condition
for detaching task, the LB will use nr_balanced_failed to relax the
threshold between the tasks'load and the imbalance. This mecanism
prevents the same rq or domain to always win the load balance fight.

Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33699db27ed5..d8320dc9d014 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7674,8 +7674,8 @@ static int detach_tasks(struct lb_env *env)
 			 * scheduler fails to find a good waiting task to
 			 * migrate.
 			 */
-			if (load/2 > env->imbalance &&
-			    env->sd->nr_balance_failed <= env->sd->cache_nice_tries)
+
+			if ((load >> env->sd->nr_balance_failed) > env->imbalance)
 				goto next;
 
 			env->imbalance -= load;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold
  2020-09-21  7:24 [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Vincent Guittot
  2020-09-21  7:24 ` [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance Vincent Guittot
@ 2020-09-21  7:24 ` Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Reduce " tip-bot2 for Vincent Guittot
  2020-09-21  7:24 ` [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level Vincent Guittot
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2020-09-21  7:24 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, pauld, hdanton, Vincent Guittot

The 25% default imbalance threshold for DIE and NUMA domain is large
enough to generate significant unfairness between threads. A typical
example is the case of 11 threads running on 2x4 CPUs. The imbalance of
20% between the 2 groups of 4 cores is just low enough to not trigger
the load balance between the 2 groups. We will have always the same 6
threads on one group of 4 CPUs and the other 5 threads on the other
group of CPUS. With a fair time sharing in each group, we ends up with
+20% running time for the group of 5 threads.

Consider decreasing the imbalance threshold for overloaded case where we
use the load to balance task and to ensure fair time sharing.

Acked-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 249bec7b0a4c..41df62884cea 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1349,7 +1349,7 @@ sd_init(struct sched_domain_topology_level *tl,
 		.min_interval		= sd_weight,
 		.max_interval		= 2*sd_weight,
 		.busy_factor		= 32,
-		.imbalance_pct		= 125,
+		.imbalance_pct		= 117,
 
 		.cache_nice_tries	= 0,
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level
  2020-09-21  7:24 [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Vincent Guittot
  2020-09-21  7:24 ` [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance Vincent Guittot
  2020-09-21  7:24 ` [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold Vincent Guittot
@ 2020-09-21  7:24 ` Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Minimize " tip-bot2 for Vincent Guittot
  2020-09-21  7:24 ` [PATCH 4/4 v2] sched/fair: reduce busy load balance interval Vincent Guittot
  2020-09-23 15:47 ` [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Mel Gorman
  4 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2020-09-21  7:24 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, pauld, hdanton, Vincent Guittot

sched domains tend to trigger simultaneously the load balance loop but
the larger domains often need more time to collect statistics. This
slowness makes the larger domain trying to detach tasks from a rq whereas
tasks already migrated somewhere else at a sub-domain level. This is not
a real problem for idle LB because the period of smaller domains will
increase with its CPUs being busy and this will let time for higher ones
to pulled tasks. But this becomes a problem when all CPUs are already busy
because all domains stay synced when they trigger their LB.

A simple way to minimize simultaneous LB of all domains is to decrement the
the busy interval by 1 jiffies. Because of the busy_factor, the interval of
larger domain will not be a multiple of smaller ones anymore.

Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d8320dc9d014..458702062d3b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9785,6 +9785,15 @@ get_sd_balance_interval(struct sched_domain *sd, int cpu_busy)
 
 	/* scale ms to jiffies */
 	interval = msecs_to_jiffies(interval);
+
+	/*
+	 * Reduce likelihood of busy balancing at higher domains racing with
+	 * balancing at lower domains by preventing their balancing periods
+	 * from being multiples of each other.
+	 */
+	if (cpu_busy)
+		interval -= 1;
+
 	interval = clamp(interval, 1UL, max_load_balance_interval);
 
 	return interval;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4 v2] sched/fair: reduce busy load balance interval
  2020-09-21  7:24 [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Vincent Guittot
                   ` (2 preceding siblings ...)
  2020-09-21  7:24 ` [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level Vincent Guittot
@ 2020-09-21  7:24 ` Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Reduce " tip-bot2 for Vincent Guittot
  2020-09-23 15:47 ` [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Mel Gorman
  4 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2020-09-21  7:24 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, pauld, hdanton, Vincent Guittot

The busy_factor, which increases load balance interval when a cpu is busy,
is set to 32 by default. This value generates some huge LB interval on
large system like the THX2 made of 2 node x 28 cores x 4 threads.
For such system, the interval increases from 112ms to 3584ms at MC level.
And from 228ms to 7168ms at NUMA level.

Even on smaller system, a lower busy factor has shown improvement on the
fair distribution of the running time so let reduce it for all.

Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 41df62884cea..a3a2417fec54 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1348,7 +1348,7 @@ sd_init(struct sched_domain_topology_level *tl,
 	*sd = (struct sched_domain){
 		.min_interval		= sd_weight,
 		.max_interval		= 2*sd_weight,
-		.busy_factor		= 32,
+		.busy_factor		= 16,
 		.imbalance_pct		= 117,
 
 		.cache_nice_tries	= 0,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance
  2020-09-21  7:24 ` [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance Vincent Guittot
@ 2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Relax " tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2020-09-23 14:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel, pauld, hdanton


On 21/09/20 08:24, Vincent Guittot wrote:
> Some UCs like 9 always running tasks on 8 CPUs can't be balanced and the
> load balancer currently migrates the waiting task between the CPUs in an
> almost random manner. The success of a rq pulling a task depends of the
> value of nr_balance_failed of its domains and its ability to be faster
> than others to detach it. This behavior results in an unfair distribution
> of the running time between tasks because some CPUs will run most of the
> time, if not always, the same task whereas others will share their time
> between several tasks.
>
> Instead of using nr_balance_failed as a boolean to relax the condition
> for detaching task, the LB will use nr_balanced_failed to relax the
> threshold between the tasks'load and the imbalance. This mecanism
> prevents the same rq or domain to always win the load balance fight.
>
> Reviewed-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold
  2020-09-21  7:24 ` [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold Vincent Guittot
@ 2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Reduce " tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2020-09-23 14:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel, pauld, hdanton


On 21/09/20 08:24, Vincent Guittot wrote:
> The 25% default imbalance threshold for DIE and NUMA domain is large
> enough to generate significant unfairness between threads. A typical
> example is the case of 11 threads running on 2x4 CPUs. The imbalance of
> 20% between the 2 groups of 4 cores is just low enough to not trigger
> the load balance between the 2 groups. We will have always the same 6
> threads on one group of 4 CPUs and the other 5 threads on the other
> group of CPUS. With a fair time sharing in each group, we ends up with
> +20% running time for the group of 5 threads.
>
> Consider decreasing the imbalance threshold for overloaded case where we
> use the load to balance task and to ensure fair time sharing.
>
> Acked-by: Hillf Danton <hdanton@sina.com>
> Reviewed-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level
  2020-09-21  7:24 ` [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level Vincent Guittot
@ 2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Minimize " tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2020-09-23 14:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel, pauld, hdanton


On 21/09/20 08:24, Vincent Guittot wrote:
> sched domains tend to trigger simultaneously the load balance loop but
> the larger domains often need more time to collect statistics. This
> slowness makes the larger domain trying to detach tasks from a rq whereas
> tasks already migrated somewhere else at a sub-domain level. This is not
> a real problem for idle LB because the period of smaller domains will
> increase with its CPUs being busy and this will let time for higher ones
> to pulled tasks. But this becomes a problem when all CPUs are already busy
> because all domains stay synced when they trigger their LB.
>
> A simple way to minimize simultaneous LB of all domains is to decrement the
> the busy interval by 1 jiffies. Because of the busy_factor, the interval of
> larger domain will not be a multiple of smaller ones anymore.
>
> Reviewed-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4 v2] sched/fair: reduce busy load balance interval
  2020-09-21  7:24 ` [PATCH 4/4 v2] sched/fair: reduce busy load balance interval Vincent Guittot
@ 2020-09-23 14:43   ` Valentin Schneider
  2020-09-29  7:56   ` [tip: sched/core] sched/fair: Reduce " tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2020-09-23 14:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel, pauld, hdanton


On 21/09/20 08:24, Vincent Guittot wrote:
> The busy_factor, which increases load balance interval when a cpu is busy,
> is set to 32 by default. This value generates some huge LB interval on
> large system like the THX2 made of 2 node x 28 cores x 4 threads.
> For such system, the interval increases from 112ms to 3584ms at MC level.
> And from 228ms to 7168ms at NUMA level.
>
> Even on smaller system, a lower busy factor has shown improvement on the
> fair distribution of the running time so let reduce it for all.
>
> Reviewed-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks
  2020-09-21  7:24 [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Vincent Guittot
                   ` (3 preceding siblings ...)
  2020-09-21  7:24 ` [PATCH 4/4 v2] sched/fair: reduce busy load balance interval Vincent Guittot
@ 2020-09-23 15:47 ` Mel Gorman
  4 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2020-09-23 15:47 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	linux-kernel, valentin.schneider, pauld, hdanton

On Mon, Sep 21, 2020 at 09:24:20AM +0200, Vincent Guittot wrote:
> When the system doesn't have enough cycles for all tasks, the scheduler
> must ensure a fair split of those CPUs cycles between CFS tasks. The
> fairness of some use cases can't be solved with a static distribution of
> the tasks on the system and requires a periodic rebalancing of the system
> but this dynamic behavior is not always optimal and the fair distribution
> of the CPU's time is not always ensured.
> 

FWIW, nothing bad fell out of the series from a battery of scheduler
tests across various machines. Headline-wise, EPYC 1 looked very bad for
hackbench but a detailed look showed that it was great until the very
highest group count when it looked bad. Otherwise EPYC 1 looked good
as-did EPYC 2. Various generation of Intel boxes showed marginal gains
or losses, nothing dramatic.  will-it-scale for various test loads looks
looked fractionally worse across some machines which may how up in the
0-day bot but it probably will be marginal.

As the patches are partially magic numbers which you could reason about
either way, I'm not going to say that it's universally better. However
it's slightly better in normal cases, your tests indicate its good for
a specific corner case and it does not look like anything obvious falls
apart.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: sched/core] sched/fair: Minimize concurrent LBs between domain level
  2020-09-21  7:24 ` [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
@ 2020-09-29  7:56   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-09-29  7:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), Phil Auld, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e4d32e4d5444977d8dc25fa98b3ce0a65544db8c
Gitweb:        https://git.kernel.org/tip/e4d32e4d5444977d8dc25fa98b3ce0a65544db8c
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Mon, 21 Sep 2020 09:24:23 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 25 Sep 2020 14:23:26 +02:00

sched/fair: Minimize concurrent LBs between domain level

sched domains tend to trigger simultaneously the load balance loop but
the larger domains often need more time to collect statistics. This
slowness makes the larger domain trying to detach tasks from a rq whereas
tasks already migrated somewhere else at a sub-domain level. This is not
a real problem for idle LB because the period of smaller domains will
increase with its CPUs being busy and this will let time for higher ones
to pulled tasks. But this becomes a problem when all CPUs are already busy
because all domains stay synced when they trigger their LB.

A simple way to minimize simultaneous LB of all domains is to decrement the
the busy interval by 1 jiffies. Because of the busy_factor, the interval of
larger domain will not be a multiple of smaller ones anymore.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-4-vincent.guittot@linaro.org
---
 kernel/sched/fair.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5e3add3..24a5ee6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9790,6 +9790,15 @@ get_sd_balance_interval(struct sched_domain *sd, int cpu_busy)
 
 	/* scale ms to jiffies */
 	interval = msecs_to_jiffies(interval);
+
+	/*
+	 * Reduce likelihood of busy balancing at higher domains racing with
+	 * balancing at lower domains by preventing their balancing periods
+	 * from being multiples of each other.
+	 */
+	if (cpu_busy)
+		interval -= 1;
+
 	interval = clamp(interval, 1UL, max_load_balance_interval);
 
 	return interval;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: sched/core] sched/fair: Reduce busy load balance interval
  2020-09-21  7:24 ` [PATCH 4/4 v2] sched/fair: reduce busy load balance interval Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
@ 2020-09-29  7:56   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-09-29  7:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), Phil Auld, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     6e7499135db724539ca887b3aa64122502875c71
Gitweb:        https://git.kernel.org/tip/6e7499135db724539ca887b3aa64122502875c71
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Mon, 21 Sep 2020 09:24:24 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 25 Sep 2020 14:23:26 +02:00

sched/fair: Reduce busy load balance interval

The busy_factor, which increases load balance interval when a cpu is busy,
is set to 32 by default. This value generates some huge LB interval on
large system like the THX2 made of 2 node x 28 cores x 4 threads.
For such system, the interval increases from 112ms to 3584ms at MC level.
And from 228ms to 7168ms at NUMA level.

Even on smaller system, a lower busy factor has shown improvement on the
fair distribution of the running time so let reduce it for all.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-5-vincent.guittot@linaro.org
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 41df628..a3a2417 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1348,7 +1348,7 @@ sd_init(struct sched_domain_topology_level *tl,
 	*sd = (struct sched_domain){
 		.min_interval		= sd_weight,
 		.max_interval		= 2*sd_weight,
-		.busy_factor		= 32,
+		.busy_factor		= 16,
 		.imbalance_pct		= 117,
 
 		.cache_nice_tries	= 0,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: sched/core] sched/fair: Reduce minimal imbalance threshold
  2020-09-21  7:24 ` [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
@ 2020-09-29  7:56   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-09-29  7:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Phil Auld, Hillf Danton, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     2208cdaa56c957e20d8e16f28819aeb47851cb1e
Gitweb:        https://git.kernel.org/tip/2208cdaa56c957e20d8e16f28819aeb47851cb1e
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Mon, 21 Sep 2020 09:24:22 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 25 Sep 2020 14:23:26 +02:00

sched/fair: Reduce minimal imbalance threshold

The 25% default imbalance threshold for DIE and NUMA domain is large
enough to generate significant unfairness between threads. A typical
example is the case of 11 threads running on 2x4 CPUs. The imbalance of
20% between the 2 groups of 4 cores is just low enough to not trigger
the load balance between the 2 groups. We will have always the same 6
threads on one group of 4 CPUs and the other 5 threads on the other
group of CPUS. With a fair time sharing in each group, we ends up with
+20% running time for the group of 5 threads.

Consider decreasing the imbalance threshold for overloaded case where we
use the load to balance task and to ensure fair time sharing.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Hillf Danton <hdanton@sina.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-3-vincent.guittot@linaro.org
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 249bec7..41df628 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1349,7 +1349,7 @@ sd_init(struct sched_domain_topology_level *tl,
 		.min_interval		= sd_weight,
 		.max_interval		= 2*sd_weight,
 		.busy_factor		= 32,
-		.imbalance_pct		= 125,
+		.imbalance_pct		= 117,
 
 		.cache_nice_tries	= 0,
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: sched/core] sched/fair: Relax constraint on task's load during load balance
  2020-09-21  7:24 ` [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance Vincent Guittot
  2020-09-23 14:43   ` Valentin Schneider
@ 2020-09-29  7:56   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-09-29  7:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), Phil Auld, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5a7f555904671c0737819fe4d19bd6143de3f6c0
Gitweb:        https://git.kernel.org/tip/5a7f555904671c0737819fe4d19bd6143de3f6c0
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Mon, 21 Sep 2020 09:24:21 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 25 Sep 2020 14:23:25 +02:00

sched/fair: Relax constraint on task's load during load balance

Some UCs like 9 always running tasks on 8 CPUs can't be balanced and the
load balancer currently migrates the waiting task between the CPUs in an
almost random manner. The success of a rq pulling a task depends of the
value of nr_balance_failed of its domains and its ability to be faster
than others to detach it. This behavior results in an unfair distribution
of the running time between tasks because some CPUs will run most of the
time, if not always, the same task whereas others will share their time
between several tasks.

Instead of using nr_balance_failed as a boolean to relax the condition
for detaching task, the LB will use nr_balanced_failed to relax the
threshold between the tasks'load and the imbalance. This mecanism
prevents the same rq or domain to always win the load balance fight.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-2-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b56276a..5e3add3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7679,8 +7679,8 @@ static int detach_tasks(struct lb_env *env)
 			 * scheduler fails to find a good waiting task to
 			 * migrate.
 			 */
-			if (load/2 > env->imbalance &&
-			    env->sd->nr_balance_failed <= env->sd->cache_nice_tries)
+
+			if ((load >> env->sd->nr_balance_failed) > env->imbalance)
 				goto next;
 
 			env->imbalance -= load;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-29  7:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-21  7:24 [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Vincent Guittot
2020-09-21  7:24 ` [PATCH 1/4 v2] sched/fair: relax constraint on task's load during load balance Vincent Guittot
2020-09-23 14:43   ` Valentin Schneider
2020-09-29  7:56   ` [tip: sched/core] sched/fair: Relax " tip-bot2 for Vincent Guittot
2020-09-21  7:24 ` [PATCH 2/4 v2] sched/fair: reduce minimal imbalance threshold Vincent Guittot
2020-09-23 14:43   ` Valentin Schneider
2020-09-29  7:56   ` [tip: sched/core] sched/fair: Reduce " tip-bot2 for Vincent Guittot
2020-09-21  7:24 ` [PATCH 3/4 v2] sched/fair: minimize concurrent LBs between domain level Vincent Guittot
2020-09-23 14:43   ` Valentin Schneider
2020-09-29  7:56   ` [tip: sched/core] sched/fair: Minimize " tip-bot2 for Vincent Guittot
2020-09-21  7:24 ` [PATCH 4/4 v2] sched/fair: reduce busy load balance interval Vincent Guittot
2020-09-23 14:43   ` Valentin Schneider
2020-09-29  7:56   ` [tip: sched/core] sched/fair: Reduce " tip-bot2 for Vincent Guittot
2020-09-23 15:47 ` [PATCH 0/4 v2] sched/fair: Improve fairness between cfs tasks Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.