[PATCH 0/4] sched/fair: Improve fairness between cfs tasks

* [PATCH 0/4] sched/fair: Improve fairness between cfs tasks
@ 2020-09-14 10:03 Vincent Guittot
  2020-09-14 10:03 ` [PATCH 1/4] sched/fair: relax constraint on task's load during load balance Vincent Guittot
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Vincent Guittot @ 2020-09-14 10:03 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, Vincent Guittot

When the system doesn't have enough cycles for all tasks, the scheduler
must ensure a fair split of those CPUs cycles between CFS tasks. The
fairness of some use cases can't be solved with a static distribution of
the tasks on the system and requires a periodic rebalancing of the system
but this dynamic behavior is not always optimal and the fair distribution
of the CPU's time is not always ensured.

The patchset improves the fairness by decreasing  the constraint for
selecting migratable tasks with the number of failed load balance. This
change enables then to decrease the imbalance threshold because 1st LB
will try to migrate tasks that fully match the imbalance.

Some tests results:

- small 2 x 4 cores arm64 system

hackbench -l (256000/#grp) -g #grp

grp    tip/sched/core         +patchset             improvement
1      1.420(+/- 11.72 %)     1.382(+/-10.50 %)     2.72 %
4      1.295(+/-  2.72 %)     1.218(+/- 2.97 %)     0.76 %
8      1.220(+/-  2.17 %)     1.218(+/- 1.60 %)     0.17 %
16     1.258(+/-  1.88 %)     1.250(+/- 1,78 %)     0.58 %

fairness tests: run always running rt-app threads
monitor the ratio between min/max work done by threads

                  v5.9-rc1             w/ patchset
9 threads  avg     78.3% (+/- 6.60%)   91.20% (+/- 2.44%)
           worst   68.6%               85.67%

11 threads avg     65.91% (+/- 8.26%)  91.34% (+/- 1.87%)
           worst   53.52%              87.26%

- large 2 nodes x 28 cores x 4 threads arm64 system

The hackbench tests that I usually run as well as the sp.C.x and lu.C.x
tests with 224 threads have not shown any difference with a mix of less
than 0.5% of improvements or regressions.

Vincent Guittot (4):
  sched/fair: relax constraint on task's load during load balance
  sched/fair: reduce minimal imbalance threshold
  sched/fair: minimize concurrent LBs between domain level
  sched/fair: reduce busy load balance interval

 kernel/sched/fair.c     | 7 +++++--
 kernel/sched/topology.c | 4 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 26+ messages in thread