linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] cgroups: Task counter subsystem v7
@ 2012-01-13 18:13 Frederic Weisbecker
  2012-01-13 18:13 ` Frederic Weisbecker
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2012-01-13 18:13 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Paul Menage, Li Zefan, Johannes Weiner,
	Aditya Kali, Oleg Nesterov, Andrew Morton, Kay Sievers,
	Tim Hockin, Tejun Heo, Kirill A. Shutemov, Containers,
	Glauber Costa, Cgroups, Daniel J Walsh, Daniel P. Berrange,
	KAMEZAWA Hiroyuki, Max Kellermann, Mandeep Singh Baines

Hi,

This is the task counter limitation patchset rebased on top
of Tejun's latest cgroup tree (cgroup/for-3.3). In a later
iteration, I also intend to include its selftests once the
selftest subsystem is merged after -rc1.

In fact, the rebase mostly is a concern of the last patch. The
others haven't changed except a few unnoticeable dusts. Some patches
have also been removed because either the last cgroup patches cover
what they were doing or they were tiny changes I merged in the last
patch (like a missing include of err.h fixed by Stephen Rothwell).

Please note that Andrew Morton had doubts whether we want to merge
it upstream or not. So don't merge it too eagerly before we sort out
the debate.


= What is this ? =

The task counter subsystem counts the tasks inside a cgroup and
rejects forks and cgroup migration when they result in a number
of task above the user tunable limit.

= Why is this needed ? =

We want to be able to run untrustee programs into sandboxes and
secure containers while protecting against forkbombs.

This patchset allow us to:

1) Prevent against forkbombs by setting an upper bound number of tasks
in a cgroup. This prevents from a forkbomb to spread. This is typically
NR_PROC rlimit but in the scope of a cgroup. Traditional NR_PROC doesn't
help us here because we don't want to have some container starving all the
others by spawning a high number of tasks when all these containers
are running under the same user.

2) Kill safely a cgroup. We want a non-racy and reliable way to kill
all tasks in a cgroup, without racing against concurrent forks.

Some practical cases from people who request this can be found here:

     https://lkml.org/lkml/2011/12/13/309
     https://lkml.org/lkml/2011/12/13/364

More details on the last patch that provides the documentation.


= Can that be used by Systemd? =

Systemd uses cgroups to keep track of services and the processes it
creates. Some feature have been requested in order to be able to reliably
kill all the processes in a cgroup such that systemd to kill services without
race.

(Note I'm not debating here to know if Systemd is doing the right thing by
using cgroups. I'm just focusing here on this particular feature request).

The task counter subsystem could be used to solve this problem. However
this involves the whole task counting machinery and this is too much
overhead to be used for system services that tend to fork often.

A simple core latch that rejects forks in a cgroup would be much more efficient
for this precise purpose.


= How does it interact with NR_PROC rlimit? =

Both can be used at the same time. They don't conflict, they
are just complementary.


= Why not rather focus on a generic solution to protect against forkbomb ? =

If you know a more generic solution to protect against forkbombs that not
only works in containers but in more cases, I'll be happy to drop this patchset
and focus on that instead.

Note we need a solution that meets our requirements for untrustees running in
containers, something that also prevents a forkbomb from doing any damage like
even a temporary DDOS. We don't want sandboxes and containers to severely impact
the rest of the system.

Thanks.

---
Frederic Weisbecker (7):
  cgroups: add res_counter_write_u64() API
  cgroups: new resource counter inheritance API
  cgroups: ability to stop res charge propagation on bounded ancestor
  res_counter: allow charge failure pointer to be null
  cgroups: pull up res counter charge failure interpretation to caller
  cgroups: allow subsystems to cancel a fork
  cgroups: Add a task counter subsystem

Kirill A. Shutemov (1):
  cgroups: add res counter common ancestor searching

 Documentation/cgroups/resource_counter.txt |   20 ++-
 Documentation/cgroups/task_counter.txt     |  153 ++++++++++++++++
 include/linux/cgroup.h                     |   20 ++-
 include/linux/cgroup_subsys.h              |    8 +
 include/linux/res_counter.h                |   27 +++-
 init/Kconfig                               |    9 +
 kernel/Makefile                            |    1 +
 kernel/cgroup.c                            |   23 ++-
 kernel/cgroup_freezer.c                    |    6 +-
 kernel/cgroup_task_counter.c               |  272 ++++++++++++++++++++++++++++
 kernel/exit.c                              |    2 +-
 kernel/fork.c                              |    7 +-
 kernel/res_counter.c                       |   97 +++++++++--
 13 files changed, 612 insertions(+), 33 deletions(-)
 create mode 100644 Documentation/cgroups/task_counter.txt
 create mode 100644 kernel/cgroup_task_counter.c

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-01-25 18:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-13 18:13 [PATCH 0/8] cgroups: Task counter subsystem v7 Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:13 ` [PATCH 1/8] cgroups: add res_counter_write_u64() API Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-16 12:27   ` Kirill A. Shutemov
2012-01-25 18:31     ` Frederic Weisbecker
2012-01-25 18:35       ` Kirill A. Shutemov
2012-01-13 18:13 ` [PATCH 2/8] cgroups: new resource counter inheritance API Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:13 ` [PATCH 3/8] cgroups: ability to stop res charge propagation on bounded ancestor Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:13 ` [PATCH 4/8] cgroups: add res counter common ancestor searching Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:13 ` [PATCH 5/8] res_counter: allow charge failure pointer to be null Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:13 ` [PATCH 6/8] cgroups: pull up res counter charge failure interpretation to caller Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:13 ` [PATCH 7/8] cgroups: allow subsystems to cancel a fork Frederic Weisbecker
2012-01-13 18:13 ` Frederic Weisbecker
2012-01-13 18:14 ` [PATCH 8/8] cgroups: Add a task counter subsystem Frederic Weisbecker
2012-01-16 12:38   ` Kirill A. Shutemov
2012-01-13 18:14 ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).