[RFC v2 PATCH 0/8] CFS Hard limits

* [RFC v2 PATCH 0/8] CFS Hard limits - v2
@ 2009-09-30 12:49 Bharata B Rao
  2009-09-30 12:50 ` [RFC v2 PATCH 1/8] sched: Rename sched_rt_period_mask() and use it in CFS also Bharata B Rao
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Bharata B Rao @ 2009-09-30 12:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
	Gautham R Shenoy, Srivatsa Vaddagiri, Ingo Molnar,
	Peter Zijlstra, Pavel Emelyanov, Herbert Poetzl, Avi Kivity,
	Chris Friesen, Paul Menage, Mike Waychison

Hi,

Here is the v2 post of hard limits feature for CFS group scheduler. This
RFC post mainly adds runtime borrowing feature and has a new locking scheme
to protect CFS runtime related fields.

It would be nice to have some comments on this set!

Changes
-------

RFC v2:
- Upgraded to 2.6.31.
- Added CFS runtime borrowing.
- New locking scheme
    The hard limit specific fields of cfs_rq (cfs_runtime, cfs_time and
    cfs_throttled) were being protected by rq->lock. This simple scheme will
    not work when runtime rebalancing is introduced where it will be required
    to look at these fields on other CPU's which requires us to acquire
    rq->lock of other CPUs. This will not be feasible from update_curr().
    Hence introduce a separate lock (rq->runtime_lock) to protect these
    fields of all cfs_rq under it.
- Handle the task wakeup in a throttled group correctly.
- Make CFS_HARD_LIMITS dependent on CGROUP_SCHED (Thanks to Andrea Righi)

RFC v1:
- First version of the patches with minimal features was posted at
  http://lkml.org/lkml/2009/8/25/128

RFC v0:
- The CFS hard limits proposal was first posted at
  http://lkml.org/lkml/2009/6/4/24

Testing and Benchmark numbers
-----------------------------
- This patchset has seen very minimal testing on 24way machine and is expected
  to have bugs. I need to test this under more test scenarios.
- I have run a few common benchmarks to see if my patches introduce any visible
  overhead. I am aware that the number of runs or the combinations I have
  used may not be ideal, but the intention in this early stage is to catch any
  serious regressions that the patches would have introduced.
- I plan to get numbers from more benchmarks in future releases. Any inputs
  on specific benchmarks to try would be helpful.

- hackbench (hackbench -pipe N)
  (hackbench was run as part of a group under root group)
  -----------------------------------------------------------------------
				Time
	-----------------------------------------------------------------
  N	CFS_HARD_LIMTS=n	CFS_HARD_LIMTS=y	CFS_HARD_LIMITS=y
				(infinite runtime)	(BW=450000/500000)
  -----------------------------------------------------------------------
  10	0.475			0.384			0.253
  20	0.610			0.670			0.692
  50	1.250			1.201			1.295
  100	1.981			2.174			1.583
  -----------------------------------------------------------------------
  - BW = Bandwidth = runtime/period
  - Infinite runtime means no hard limiting

- lmbench (lat_ctx -N 5 -s <size_in_kb> N)

  (i) size_in_kb = 1024
  -----------------------------------------------------------------------
				Context switch time (us)
	-----------------------------------------------------------------
  N	CFS_HARD_LIMTS=n	CFS_HARD_LIMTS=y	CFS_HARD_LIMITS=y
				(infinite runtime)	(BW=450000/500000)
  -----------------------------------------------------------------------
  10	315.87			330.19			317.04
  100	675.52			699.90			698.50
  500	775.01			772.86			772.30
  -----------------------------------------------------------------------

  (ii) size_in_kb = 2048
  -----------------------------------------------------------------------
				Context switch time (us)
	-----------------------------------------------------------------
  N	CFS_HARD_LIMTS=n	CFS_HARD_LIMTS=y	CFS_HARD_LIMITS=y
				(infinite runtime)	(BW=450000/500000)
  -----------------------------------------------------------------------
  10	1319.01			1332.16			1328.09
  100	1400.77			1372.67			1382.27
  500	1479.40			1524.57			1615.84
  -----------------------------------------------------------------------

- kernbench

Average Half load -j 12 Run (std deviation):
------------------------------------------------------------------------------
	CFS_HARD_LIMTS=n	CFS_HARD_LIMTS=y	CFS_HARD_LIMITS=y
				(infinite runtime)	(BW=450000/500000)
------------------------------------------------------------------------------
Elapsd	5.716 (0.278711)	6.06 (0.479322)		5.41 (0.360694)
User	20.464 (2.22087)	22.978 (3.43738)	18.486 (2.60754)
System	14.82 (1.52086)		16.68 (2.3438)		13.514 (1.77074)
% CPU	615.2 (41.1667)		651.6 (43.397)		588.4 (42.0214)
CtxSwt	2727.8 (243.19)		3030.6 (425.338)	2536 (302.498)
Sleeps	4981.4 (442.337)	5532.2 (847.27)		4554.6 (510.532)
------------------------------------------------------------------------------

Average Optimal load -j 96 Run (std deviation):
------------------------------------------------------------------------------
	CFS_HARD_LIMTS=n	CFS_HARD_LIMTS=y	CFS_HARD_LIMITS=y
				(infinite runtime)	(BW=450000/500000)
------------------------------------------------------------------------------
Elapsd	4.826 (0.276641)	4.776 (0.291599)	5.13 (0.50448)
User	21.278 (2.67999)	22.138 (3.2045)		21.988 (5.63116)
System	19.213 (5.38314)	19.796 (4.32574)	20.407 (8.53682)
% CPU	778.3 (184.522)		786.1 (154.295)		803.1 (244.865)
CtxSwt	2906.5 (387.799)	3052.1 (397.15)		3030.6 (765.418)
Sleeps	4576.6 (565.383)	4796 (990.278)		4576.9 (625.933)
------------------------------------------------------------------------------

Average Maximal load -j Run (std deviation):
------------------------------------------------------------------------------
	CFS_HARD_LIMTS=n	CFS_HARD_LIMTS=y	CFS_HARD_LIMITS=y
				(infinite runtime)	(BW=450000/500000)
------------------------------------------------------------------------------
Elapsd	5.13 (0.530236)		5.062 (0.0408656)	4.94 (0.229891)
User	22.7293 (4.37921)	22.9973 (2.86311)	22.5507 (4.78016)
System	21.966 (6.81872)	21.9713 (4.72952)	22.0287 (7.39655)
% CPU	860 (202.295)		859.8 (164.415)		864.467 (218.721)
CtxSwt	3154.27 (659.933)	3172.93 (370.439)	3127.2 (657.224)
Sleeps	4602.6 (662.155)	4676.67 (813.274)	4489.2 (542.859)
------------------------------------------------------------------------------

Features TODO
-------------
- CFS runtime borrowing still needs some work, especially need to handle
  runtime redistribution when a CPU goes offline.
- Bandwidth inheritance support (long term, not under consideration currently)
- This implementation doesn't work for user group scheduler. Since user group
  scheduler will eventually go away, I don't plan to work on this.

Implementation TODO
-------------------
- It is possible to share some of the bandwidth handling code with RT, but
  the intention of this post is to show the changes associated with hard limits.
  Hence the sharing/cleanup will be done down the line when this patchset
  itself becomes more accepatable.
- When a dequeued entity is enqueued back, I don't change its vruntime. The
  entity might get undue advantage due to its old (lower) vruntime. Need to
  address this.

Patches description
-------------------
This post has the following patches:

1/8 sched: Rename sched_rt_period_mask() and use it in CFS also
2/8 sched: Maintain aggregated tasks count in cfs_rq at each hierarchy level
3/8 sched: Bandwidth initialization for fair task groups
4/8 sched: Enforce hard limits by throttling
5/8 sched: Unthrottle the throttled tasks
6/8 sched: Add throttle time statistics to /proc/sched_debug
7/8 sched: CFS runtime borrowing
8/8 sched: Hard limits documentation

 Documentation/scheduler/sched-cfs-hard-limits.txt |   52 ++
 include/linux/sched.h                               |    9 
 init/Kconfig                                        |   13 
 kernel/sched.c                                      |  427 +++++++++++++++++++
 kernel/sched_debug.c                                |   21 
 kernel/sched_fair.c                                 |  432 +++++++++++++++++++-
 kernel/sched_rt.c                                   |   22 -
 7 files changed, 932 insertions(+), 44 deletions(-)

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 39+ messages in thread