[PATCH 0/8] cgroups: Task counter subsystem v7

* [PATCH 0/8] cgroups: Task counter subsystem v7
@ 2012-01-13 18:13 Frederic Weisbecker
  2012-01-13 18:13 ` Frederic Weisbecker
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2012-01-13 18:13 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Paul Menage, Li Zefan, Johannes Weiner,
	Aditya Kali, Oleg Nesterov, Andrew Morton, Kay Sievers,
	Tim Hockin, Tejun Heo, Kirill A. Shutemov, Containers,
	Glauber Costa, Cgroups, Daniel J Walsh, Daniel P. Berrange,
	KAMEZAWA Hiroyuki, Max Kellermann, Mandeep Singh Baines

Hi,

This is the task counter limitation patchset rebased on top
of Tejun's latest cgroup tree (cgroup/for-3.3). In a later
iteration, I also intend to include its selftests once the
selftest subsystem is merged after -rc1.

In fact, the rebase mostly is a concern of the last patch. The
others haven't changed except a few unnoticeable dusts. Some patches
have also been removed because either the last cgroup patches cover
what they were doing or they were tiny changes I merged in the last
patch (like a missing include of err.h fixed by Stephen Rothwell).

Please note that Andrew Morton had doubts whether we want to merge
it upstream or not. So don't merge it too eagerly before we sort out
the debate.

= What is this ? =

The task counter subsystem counts the tasks inside a cgroup and
rejects forks and cgroup migration when they result in a number
of task above the user tunable limit.

= Why is this needed ? =

We want to be able to run untrustee programs into sandboxes and
secure containers while protecting against forkbombs.

This patchset allow us to:

1) Prevent against forkbombs by setting an upper bound number of tasks
in a cgroup. This prevents from a forkbomb to spread. This is typically
NR_PROC rlimit but in the scope of a cgroup. Traditional NR_PROC doesn't
help us here because we don't want to have some container starving all the
others by spawning a high number of tasks when all these containers
are running under the same user.

2) Kill safely a cgroup. We want a non-racy and reliable way to kill
all tasks in a cgroup, without racing against concurrent forks.

Some practical cases from people who request this can be found here:

     https://lkml.org/lkml/2011/12/13/309
     https://lkml.org/lkml/2011/12/13/364

More details on the last patch that provides the documentation.

= Can that be used by Systemd? =

Systemd uses cgroups to keep track of services and the processes it
creates. Some feature have been requested in order to be able to reliably
kill all the processes in a cgroup such that systemd to kill services without
race.

(Note I'm not debating here to know if Systemd is doing the right thing by
using cgroups. I'm just focusing here on this particular feature request).

The task counter subsystem could be used to solve this problem. However
this involves the whole task counting machinery and this is too much
overhead to be used for system services that tend to fork often.

A simple core latch that rejects forks in a cgroup would be much more efficient
for this precise purpose.

= How does it interact with NR_PROC rlimit? =

Both can be used at the same time. They don't conflict, they
are just complementary.

= Why not rather focus on a generic solution to protect against forkbomb ? =

If you know a more generic solution to protect against forkbombs that not
only works in containers but in more cases, I'll be happy to drop this patchset
and focus on that instead.

Note we need a solution that meets our requirements for untrustees running in
containers, something that also prevents a forkbomb from doing any damage like
even a temporary DDOS. We don't want sandboxes and containers to severely impact
the rest of the system.

Thanks.

---
Frederic Weisbecker (7):
  cgroups: add res_counter_write_u64() API
  cgroups: new resource counter inheritance API
  cgroups: ability to stop res charge propagation on bounded ancestor
  res_counter: allow charge failure pointer to be null
  cgroups: pull up res counter charge failure interpretation to caller
  cgroups: allow subsystems to cancel a fork
  cgroups: Add a task counter subsystem

Kirill A. Shutemov (1):
  cgroups: add res counter common ancestor searching

 Documentation/cgroups/resource_counter.txt |   20 ++-
 Documentation/cgroups/task_counter.txt     |  153 ++++++++++++++++
 include/linux/cgroup.h                     |   20 ++-
 include/linux/cgroup_subsys.h              |    8 +
 include/linux/res_counter.h                |   27 +++-
 init/Kconfig                               |    9 +
 kernel/Makefile                            |    1 +
 kernel/cgroup.c                            |   23 ++-
 kernel/cgroup_freezer.c                    |    6 +-
 kernel/cgroup_task_counter.c               |  272 ++++++++++++++++++++++++++++
 kernel/exit.c                              |    2 +-
 kernel/fork.c                              |    7 +-
 kernel/res_counter.c                       |   97 +++++++++--
 13 files changed, 612 insertions(+), 33 deletions(-)
 create mode 100644 Documentation/cgroups/task_counter.txt
 create mode 100644 kernel/cgroup_task_counter.c

-- 
1.7.5.4

^ permalink raw reply	[flat|nested] 22+ messages in thread