linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues
@ 2019-08-22  2:17 Rik van Riel
  2019-08-22  2:17 ` [PATCH 01/15] sched: introduce task_se_h_load helper Rik van Riel
                   ` (15 more replies)
  0 siblings, 16 replies; 46+ messages in thread
From: Rik van Riel @ 2019-08-22  2:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, pjt, dietmar.eggemann, peterz, mingo,
	morten.rasmussen, tglx, mgorman, vincent.guittot

The current implementation of the CPU controller uses hierarchical
runqueues, where on wakeup a task is enqueued on its group's runqueue,
the group is enqueued on the runqueue of the group above it, etc.

This increases a fairly large amount of overhead for workloads that
do a lot of wakeups a second, especially given that the default systemd
hierarchy is 2 or 3 levels deep.

This patch series is an attempt at reducing that overhead, by placing
all the tasks on the same runqueue, and scaling the task priority by
the priority of the group, which is calculated periodically.

My main TODO items for the next period of time are likely going to
be testing, testing, and testing. I hope to find and flush out any
corner case I can find, and make sure performance does not regress
with any workloads, and hopefully improves some.

Other TODO items:
- More code cleanups.
- Remove some more now unused code.
- Reimplement CONFIG_CFS_BANDWIDTH.

Plan for the CONFIG_CFS_BANDWIDTH reimplementation:
- When a cgroup gets throttled, mark the cgroup and its children
  as throttled.
- When pick_next_entity finds a task that is on a throttled cgroup,
  stash it on the cgroup runqueue (which is not used for runnable
  tasks any more). Leave the vruntime unchanged, and adjust that
  runqueue's vruntime to be that of the left-most task.
- When a cgroup gets unthrottled, and has tasks on it, place it on
  a vruntime ordered heap separate from the main runqueue.
- Have pick_next_task_fair grab one task off that heap every time it
  is called, and the min vruntime of that heap is lower than the
  vruntime of the CPU's cfs_rq (or the CPU has no other runnable tasks).
- Place that selected task on the CPU's cfs_rq, renormalizing its
  vruntime with the GENTLE_FAIR_SLEEPERS logic. That should help
  interleave the already runnable tasks with the recently unthrottled
  group, and prevent thundering herd issues.
- If the group gets throttled again before all of its task had a chance
  to run, vruntime sorting ensures all the tasks in the throttled cgroup
  get a chance to run over time.

Changes from v3:
- replace max_h_load with another hacky idea to ramp up the
  task_se_h_weight; I believe this new idea is wrong as well, but
  it will hopefully inspire a better solution (thanks to Peter Zijlstra)
- fix the ordering inside enqueue_task_fair to get task weights set up right
  (thanks to Peter Zijlstra)
- change wakeup_preempt_entity to reduce the number of task preemptions,
  hopefully resulting in behavior closer to what people configure in sysctl
- various other small cleanups and fixes

Changes from v2:
- fixed the web server performance regression, in a way vaguely similar
  to what Josef Bacik suggested (blame me for the implementation)
- removed some code duplication so the diffstat is redder than before
- propagate sum_exec_runtime up the tree, in preparation for CFS_BANDWIDTH
- small cleanups left and right

Changes from v1:
- use task_se_h_weight instead of task_se_h_load in calc_delta_fair
  and sched_slice, this seems to improve performance a little, but
  I still have some remaining regression to chase with our web server
  workload
- implement a number of the changes suggested by Dietmar Eggemann
  (still holding out for a better name for group_cfs_rq_of_parent)

This series applies on top of 5.2

 include/linux/sched.h |    7 
 kernel/sched/core.c   |    3 
 kernel/sched/debug.c  |   15 
 kernel/sched/fair.c   |  803 +++++++++++++++++++++-----------------------------
 kernel/sched/pelt.c   |   68 +---
 kernel/sched/pelt.h   |    2 
 kernel/sched/sched.h  |    9 
 7 files changed, 372 insertions(+), 535 deletions(-)



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2019-09-04  6:44 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22  2:17 [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues Rik van Riel
2019-08-22  2:17 ` [PATCH 01/15] sched: introduce task_se_h_load helper Rik van Riel
2019-08-23 18:13   ` Dietmar Eggemann
2019-08-24  0:05     ` Rik van Riel
2019-08-22  2:17 ` [PATCH 02/15] sched: change /proc/sched_debug fields Rik van Riel
2019-08-22  2:17 ` [PATCH 03/15] sched,fair: redefine runnable_load_avg as the sum of task_h_load Rik van Riel
2019-08-28 13:50   ` Vincent Guittot
2019-08-28 14:47     ` Rik van Riel
2019-08-28 15:02       ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 04/15] sched,fair: move runnable_load_avg to cfs_rq Rik van Riel
2019-08-22  2:17 ` [PATCH 05/15] sched,fair: remove cfs_rqs from leaf_cfs_rq_list bottom up Rik van Riel
2019-08-28 14:09   ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 06/15] sched,cfs: use explicit cfs_rq of parent se helper Rik van Riel
2019-08-28 13:53   ` Vincent Guittot
2019-08-28 15:28     ` Rik van Riel
2019-08-28 16:34       ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 07/15] sched,cfs: fix zero length timeslice calculation Rik van Riel
2019-08-28 16:59   ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 08/15] sched,fair: simplify timeslice length code Rik van Riel
2019-08-28 17:32   ` Vincent Guittot
2019-08-28 23:18     ` Rik van Riel
2019-08-29 14:02       ` Vincent Guittot
2019-08-29 16:00         ` Rik van Riel
2019-08-30  6:41           ` Vincent Guittot
2019-08-30 15:01             ` Rik van Riel
2019-09-02  7:51               ` Vincent Guittot
2019-09-02 17:47                 ` Rik van Riel
2019-08-22  2:17 ` [PATCH 09/15] sched,fair: refactor enqueue/dequeue_entity Rik van Riel
2019-09-03 15:38   ` Vincent Guittot
2019-09-03 20:27     ` Rik van Riel
2019-09-04  6:44       ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 10/15] sched,fair: add helper functions for flattened runqueue Rik van Riel
2019-08-22  2:17 ` [PATCH 11/15] sched,fair: flatten hierarchical runqueues Rik van Riel
2019-08-23 18:14   ` Dietmar Eggemann
2019-08-24  1:16     ` Rik van Riel
2019-08-22  2:17 ` [PATCH 12/15] sched,fair: flatten update_curr functionality Rik van Riel
2019-08-27 10:37   ` Dietmar Eggemann
2019-08-22  2:17 ` [PATCH 13/15] sched,fair: propagate sum_exec_runtime up the hierarchy Rik van Riel
2019-08-28  7:51   ` Dietmar Eggemann
2019-08-28 13:14     ` Rik van Riel
2019-08-29 17:20       ` Dietmar Eggemann
2019-08-29 18:06         ` Rik van Riel
2019-08-22  2:17 ` [PATCH 14/15] sched,fair: ramp up task_se_h_weight quickly Rik van Riel
2019-08-22  2:17 ` [PATCH 15/15] sched,fair: scale vdiff in wakeup_preempt_entity Rik van Riel
2019-09-02 10:53 ` [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues Dietmar Eggemann
2019-09-03  1:44   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).