[PATCH v8 00/10] sched: consolidation of CPU capacity and usage

* [PATCH v8 00/10] sched: consolidation of CPU capacity and usage
@ 2014-10-31  8:47 Vincent Guittot
  2014-10-31  8:47 ` [PATCH v8 01/10] sched: add per rq cpu_capacity_orig Vincent Guittot
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Vincent Guittot @ 2014-10-31  8:47 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, Morten.Rasmussen, kamalesh,
	linux, linux-arm-kernel
  Cc: riel, efault, nicolas.pitre, linaro-kernel, Vincent Guittot

This patchset consolidates several changes in the capacity and the usage
tracking of the CPU. It provides a frequency invariant metric of the usage of
CPUs and generally improves the accuracy of load/usage tracking in the
scheduler. The frequency invariant metric is the foundation required for the
consolidation of cpufreq and implementation of a fully invariant load tracking.
These are currently WIP and require several changes to the load balancer
(including how it will use and interprets load and capacity metrics) and
extensive validation. The frequency invariance is done with
arch_scale_freq_capacity and this patchset doesn't provide the backends of
the function which are architecture dependent.

As discussed at LPC14, Morten and I have consolidated our changes into a single
patchset to make it easier to review and merge.

During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores or by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
evaluate the number of available cores based on the group_capacity but instead
we evaluate the usage of a group and compare it with its capacity.

This patchset mainly replaces the old capacity_factor method by a new one and
keeps the general policy almost unchanged. These new metrics will be also used
in later patches.

The CPU usage is based on a running time tracking version of the current
implementation of the load average tracking. I also have a version that is
based on the new implementation proposal [1] but I haven't provide the patches
and results as [1] is still under review. I can provide change above [1] to
change how CPU usage is computed and to adapt to new mecanism.

Change since V7
 - add freq invariance for usage tracking
 - add freq invariance for scale_rt
 - update comments and commits' message
 - fix init of utilization_avg_contrib
 - fix prefer_sibling

Change since V6
 - add group usage tracking
 - fix some commits' messages
 - minor fix like comments and argument order

Change since V5
 - remove patches that have been merged since v5 : patches 01, 02, 03, 04, 05, 07
 - update commit log and add more details on the purpose of the patches
 - fix/remove useless code with the rebase on patchset [2]
 - remove capacity_orig in sched_group_capacity as it is not used
 - move code in the right patch
 - add some helper function to factorize code

Change since V4
 - rebase to manage conflicts with changes in selection of busiest group

Change since V3:
 - add usage_avg_contrib statistic which sums the running time of tasks on a rq
 - use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
 - fix replacement power by capacity
 - update some comments

Change since V2:
 - rebase on top of capacity renaming
 - fix wake_affine statistic update
 - rework nohz_kick_needed
 - optimize the active migration of a task from CPU with reduced capacity
 - rename group_activity by group_utilization and remove unused total_utilization
 - repair SD_PREFER_SIBLING and use it for SMT level
 - reorder patchset to gather patches with same topics

Change since V1:
 - add 3 fixes
 - correct some commit messages
 - replace capacity computation by activity
 - take into account current cpu capacity

[1] https://lkml.org/lkml/2014/10/10/131
[2] https://lkml.org/lkml/2014/7/25/589

Morten Rasmussen (2):
  sched: Track group sched_entity usage contributions
  sched: Make sched entity usage tracking scale-invariant

Vincent Guittot (8):
  sched: add per rq cpu_capacity_orig
  sched: remove frequency scaling from cpu_capacity
  sched: move cfs task on a CPU with higher capacity
  sched: add utilization_avg_contrib
  sched: get CPU's usage statistic
  sched: replace capacity_factor by usage
  sched: add SD_PREFER_SIBLING for SMT level
  sched: make scale_rt invariant with frequency

 include/linux/sched.h |  21 ++-
 kernel/sched/core.c   |  15 +-
 kernel/sched/debug.c  |  12 +-
 kernel/sched/fair.c   | 369 ++++++++++++++++++++++++++++++++------------------
 kernel/sched/sched.h  |  15 +-
 5 files changed, 276 insertions(+), 156 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 27+ messages in thread