linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP
@ 2016-03-11 15:41 Tejun Heo
  2016-03-11 15:41 ` [PATCH 01/10] cgroup: introduce cgroup_[un]lock() Tejun Heo
                   ` (13 more replies)
  0 siblings, 14 replies; 50+ messages in thread
From: Tejun Heo @ 2016-03-11 15:41 UTC (permalink / raw)
  To: torvalds, akpm, a.p.zijlstra, mingo, lizefan, hannes, pjt
  Cc: linux-kernel, cgroups, linux-api, kernel-team

Hello,

This patchset extends cgroup v2 to support rgroup (resource group) for
in-process hierarchical resource control and implements PRIO_RGRP for
setpriority(2) on top to allow in-process hierarchical CPU cycle
control in a seamless way.

cgroup v1 allowed putting threads of a process in different cgroups
which enabled ad-hoc in-process resource control of some resources.
Unfortunately, this approach was fraught with problems such as
membership ambiguity with per-process resources and lack of isolation
between system management and in-process properties.  For a more
detailed discussion on the subject, please refer to the following
message.

 [1] [RFD] cgroup: thread granularity support for cpu controller

This patchset implements the mechanism outlined in the above message.
The new mechanism is named rgroup (resource group).  When explicitly
designating a non-rgroup cgroup, the term sgroup (system group) is
used.  rgroup has the following properties.

* A rgroup is a cgroup which is invisible on and transparent to the
  system-level cgroupfs interface.

* A rgroup can be created by specifying CLONE_NEWRGRP flag, along with
  CLONE_THREAD, during clone(2).  A new rgroup is created under the
  parent thread's cgroup and the new thread is created in it.

* A rgroup is automatically destroyed when empty.

* A top-level rgroup of a process is a rgroup whose parent cgroup is a
  sgroup.  A process may have multiple top-level rgroups and thus
  multiple rgroup subtrees under the same parent sgroup.

* Unlike sgroups, rgroups are allowed to compete against peer threads.
  Each rgroup behaves equivalent to a sibling task.

* rgroup subtrees are local to the process.  When the process forks or
  execs, its rgroup subtrees are collapsed.

* When a process is migrated to a different cgroup, its rgroup
  subtrees are preserved.

* Subset of controllers available on the parent sgroup are available
  to rgroup subtrees.  Controller management on rgroups is automatic
  and implicit and doesn't interfere with system-level cgroup
  controller management.  If a controller is made unavailable on the
  parent sgroup, it's automatically disabled from child rgroup
  subtrees.

rgroup lays the foundation for other kernel mechanisms to make use of
resource controllers while providing proper isolation between system
management and in-process operations removing the awkward and
layer-violating requirement for coordination between individual
applications and system management.  On top of the rgroup mechanism,
PRIO_RGRP is implemented for {set|get}priority(2).

* PRIO_RGRP can only be used if the target task is already in a
  rgroup.  If setpriority(2) is used and cpu controller is available,
  cpu controller is enabled until the target rgroup is covered and the
  specified nice value is set as the weight of the rgroup.

* The specified nice value has the same meaning as for tasks.  For
  example, a rgroup and a task competing under the same parent would
  behave exactly the same as two tasks.

* For top-level rgroups, PRIO_RGRP follows the same rlimit
  restrictions as PRIO_PROCESS; however, as nested rgroups only
  distribute CPU cycles which are allocated to the process, no
  restriction is applied.

PRIO_RGRP allows in-process hierarchical control of CPU cycles in a
manner which is a straight-forward and minimal extension of existing
task and priority management.

There are still some missing pieces.

* Documentation updates.

* A mechanism that applications can use to publish certain rgroups so
  that external entities can determine which IDs to use to change
  rgroup settings.  I already have interface and implementation design
  mostly pinned down.

* Userland updates such as integrating CLONE_NEWRGRP handling to
  pthread or updating renice(1) to handle resource groups.

I'll attach a test program which demonstrates PRIO_RGRP usage in a
follow up email.

This patchset contains the following 10 patches.

 0001-cgroup-introduce-cgroup_-un-lock.patch
 0002-cgroup-un-inline-cgroup_path-and-friends.patch
 0003-cgroup-introduce-CGRP_MIGRATE_-flags.patch
 0004-signal-make-put_signal_struct-public.patch
 0005-cgroup-fork-add-new_rgrp_cset-p-and-clone_flags-to-c.patch
 0006-cgroup-fork-add-child-and-clone_flags-to-threadgroup.patch
 0007-cgroup-introduce-resource-group.patch
 0008-cgroup-implement-rgroup-control-mask-handling.patch
 0009-cgroup-implement-rgroup-subtree-migration.patch
 0010-cgroup-sched-implement-PRIO_RGRP-for-set-get-priorit.patch

0001-0006 are prepatory patches.
0007-0009 implemnet rgroup support.
0010 implements PRIO_RGRP.

This patchset is on top of

  cgroup/for-4.6 f6d635ad341d ("cgroup: implement cgroup_subsys->implicit_on_dfl")
+ [2] [PATCH 2/2] cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
+ [3] [PATCHSET REPOST] sched, cgroup: implement cgroup v2 interface for cpu controller

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-rgroup

diffstat follows.

 fs/exec.c                     |    8 
 include/linux/cgroup-defs.h   |   72 ++-
 include/linux/cgroup.h        |   60 +--
 include/linux/sched.h         |   31 +
 include/uapi/linux/resource.h |    1 
 include/uapi/linux/sched.h    |    1 
 kernel/cgroup.c               |  828 ++++++++++++++++++++++++++++++++++++++----
 kernel/fork.c                 |   27 -
 kernel/sched/core.c           |   32 +
 kernel/signal.c               |    6 
 kernel/sys.c                  |   11 
 11 files changed, 917 insertions(+), 160 deletions(-)

Thanks.

--
tejun

[1] http://lkml.kernel.org/g/20160105154503.GC5995@mtj.duckdns.org
[2] http://lkml.kernel.org/g/1456351975-1899-3-git-send-email-tj@kernel.org
[3] http://lkml.kernel.org/g/20160105164758.GD5995@mtj.duckdns.org

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-04-15  2:42 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-11 15:41 [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Tejun Heo
2016-03-11 15:41 ` [PATCH 01/10] cgroup: introduce cgroup_[un]lock() Tejun Heo
2016-03-11 15:41 ` [PATCH 02/10] cgroup: un-inline cgroup_path() and friends Tejun Heo
2016-03-11 15:41 ` [PATCH 03/10] cgroup: introduce CGRP_MIGRATE_* flags Tejun Heo
2016-03-11 15:41 ` [PATCH 04/10] signal: make put_signal_struct() public Tejun Heo
2016-03-11 15:41 ` [PATCH 05/10] cgroup, fork: add @new_rgrp_cset[p] and @clone_flags to cgroup fork callbacks Tejun Heo
2016-03-11 15:41 ` [PATCH 06/10] cgroup, fork: add @child and @clone_flags to threadgroup_change_begin/end() Tejun Heo
2016-03-11 15:41 ` [PATCH 07/10] cgroup: introduce resource group Tejun Heo
2016-03-11 15:41 ` [PATCH 08/10] cgroup: implement rgroup control mask handling Tejun Heo
2016-03-11 15:41 ` [PATCH 09/10] cgroup: implement rgroup subtree migration Tejun Heo
2016-03-11 15:41 ` [PATCH 10/10] cgroup, sched: implement PRIO_RGRP for {set|get}priority() Tejun Heo
2016-03-11 16:05 ` Example program for PRIO_RGRP Tejun Heo
2016-03-12  6:26 ` [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Mike Galbraith
2016-03-12 17:04   ` Mike Galbraith
2016-03-12 17:13     ` cgroup NAKs ignored? " Ingo Molnar
2016-03-13 14:42       ` Tejun Heo
2016-03-13 15:00   ` Tejun Heo
2016-03-13 17:40     ` Mike Galbraith
2016-04-07  0:00       ` Tejun Heo
2016-04-07  3:26         ` Mike Galbraith
2016-03-14  2:23     ` Mike Galbraith
2016-03-14 11:30 ` Peter Zijlstra
2016-04-06 15:58   ` Tejun Heo
2016-04-07  6:45     ` Peter Zijlstra
2016-04-07  7:35       ` Johannes Weiner
2016-04-07  8:05         ` Mike Galbraith
2016-04-07  8:08         ` Peter Zijlstra
2016-04-07  9:28           ` Johannes Weiner
2016-04-07 10:42             ` Peter Zijlstra
2016-04-07 19:45           ` Tejun Heo
2016-04-07 20:25             ` Peter Zijlstra
2016-04-08 20:11               ` Tejun Heo
2016-04-09  6:16                 ` Mike Galbraith
2016-04-09 13:39                 ` Peter Zijlstra
2016-04-12 22:29                   ` Tejun Heo
2016-04-13  7:43                     ` Mike Galbraith
2016-04-13 15:59                       ` Tejun Heo
2016-04-13 19:15                         ` Mike Galbraith
2016-04-14  6:07                         ` Mike Galbraith
2016-04-14 19:57                           ` Tejun Heo
2016-04-15  2:42                             ` Mike Galbraith
2016-04-09 16:02                 ` Peter Zijlstra
2016-04-07  8:28         ` Peter Zijlstra
2016-04-07 19:04           ` Johannes Weiner
2016-04-07 19:31             ` Peter Zijlstra
2016-04-07 20:23               ` Johannes Weiner
2016-04-08  3:13                 ` Mike Galbraith
2016-03-15 17:21 ` Michal Hocko
2016-04-06 21:53   ` Tejun Heo
2016-04-07  6:40     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).