All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFD] Merge task counter into memcg
@ 2012-04-11 18:57 Frederic Weisbecker
       [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-11 18:57 UTC (permalink / raw)
  To: Hugh Dickins, Johannes Weiner, Andrew Morton, KAMEZAWA Hiroyuki,
	Glauber Costa, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan
  Cc: LKML, Cgroups, Containers

Hi,

While talking with Tejun about targetting the cgroup task counter subsystem
for the next merge window, he suggested to check if this could be merged into
the memcg subsystem rather than creating a new one cgroup subsystem just
for task count limit purpose.

So I'm pinging you guys to seek your insight.

I assume not everybody in the Cc list knows what the task counter subsystem
is all about. So here is a summary: this is a cgroup subsystem (latest version
in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
maintain this accounting visible to a special tasks.usage file. The user can
set a limit on the number of tasks by writing on the tasks.limit file.
Further forks or cgroup migration are then rejected if the limit is exceeded.

This feature is especially useful to protect against forkbombs in containers.
Or more generally to limit the resources on the number of tasks on a cgroup
as it involves some kernel memory allocation.

Now the dilemna is how to implement it?

1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)

2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
because this is about kernel memory allocation limitation. We could have a
memory.kmem.tasks.count

My personal opinion is that the task counter brings some overhead: a charge
across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
And this overhead happens even in the off-case (when the task counter susbsystem
is mounted but the limit is the default: ULLONG_MAX).

So if we choose the second solution, this overhead will be added unconditionally
to memcg.
But I don't expect every users of memcg will need the task counter. So perhaps
the overhead should be kept in its own separate subsystem.

OTOH memory.kmem.* interface would have be a good fit.

What do you think?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-11 18:57 [RFD] Merge task counter into memcg Frederic Weisbecker
@ 2012-04-11 19:21     ` Glauber Costa
  2012-04-12  0:56 ` KAMEZAWA Hiroyuki
  2012-04-12  1:07 ` Johannes Weiner
  2 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-11 19:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/11/2012 03:57 PM, Frederic Weisbecker wrote:
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.

What we're usually doing with kmem paths, like the upcoming slab 
tracking, is do not account if it is not limited. So if you are not
limited in a particular cgroup, you jut don't bother with accounting.

If this suits your need, you can probably do the same, and then
pay the price just for the users that are interested on it.

Now, whether or not this should be considered memory, is a different 
story. You can say it is memory yes, but I bet you can very well find a 
bunch of arguments to consider it "cpu" as well.

Against the memcg, consider this: Your counter would probably be the 
first non-page based data in memcg. At least raises a flag.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-11 19:21     ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-11 19:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, KAMEZAWA Hiroyuki,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On 04/11/2012 03:57 PM, Frederic Weisbecker wrote:
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.

What we're usually doing with kmem paths, like the upcoming slab 
tracking, is do not account if it is not limited. So if you are not
limited in a particular cgroup, you jut don't bother with accounting.

If this suits your need, you can probably do the same, and then
pay the price just for the users that are interested on it.

Now, whether or not this should be considered memory, is a different 
story. You can say it is memory yes, but I bet you can very well find a 
bunch of arguments to consider it "cpu" as well.

Against the memcg, consider this: Your counter would probably be the 
first non-page based data in memcg. At least raises a flag.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  2012-04-11 19:21     ` Glauber Costa
@ 2012-04-12  0:56   ` KAMEZAWA Hiroyuki
  2012-04-12  1:07   ` Johannes Weiner
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12  0:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

(2012/04/12 3:57), Frederic Weisbecker wrote:

> Hi,
> 
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
> 
> So I'm pinging you guys to seek your insight.
> 
> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
> 
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.
> 
> Now the dilemna is how to implement it?
> 
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> 
> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
> 
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).
> 
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
> 
> OTOH memory.kmem.* interface would have be a good fit.
> 
> What do you think?


Sounds interesting to me. Hm, does your 'overhead' of task accounting is
enough large to be visible to users ? How performance regression is big ?

BTW, now, all memcg's limit interfaces use 'bytes' as an unit of accounting.
It's a small concern to me to have mixture of bytes and numbers of objects
for accounting. But I think increasing number of subsystem is not very good....
 
Regards,
-Kame

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-11 18:57 [RFD] Merge task counter into memcg Frederic Weisbecker
       [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2012-04-12  0:56 ` KAMEZAWA Hiroyuki
       [not found]   ` <4F862851.3040208-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  2012-04-12  1:07 ` Johannes Weiner
  2 siblings, 1 reply; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12  0:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, Glauber Costa,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

(2012/04/12 3:57), Frederic Weisbecker wrote:

> Hi,
> 
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
> 
> So I'm pinging you guys to seek your insight.
> 
> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
> 
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.
> 
> Now the dilemna is how to implement it?
> 
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> 
> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
> 
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).
> 
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
> 
> OTOH memory.kmem.* interface would have be a good fit.
> 
> What do you think?


Sounds interesting to me. Hm, does your 'overhead' of task accounting is
enough large to be visible to users ? How performance regression is big ?

BTW, now, all memcg's limit interfaces use 'bytes' as an unit of accounting.
It's a small concern to me to have mixture of bytes and numbers of objects
for accounting. But I think increasing number of subsystem is not very good....
 
Regards,
-Kame





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  2012-04-11 19:21     ` Glauber Costa
  2012-04-12  0:56   ` KAMEZAWA Hiroyuki
@ 2012-04-12  1:07   ` Johannes Weiner
  2012-04-12  3:56   ` Alexander Nikiforov
  2012-04-12  4:00     ` Alexander Nikiforov
  4 siblings, 0 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12  1:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Tejun Heo, Cgroups, Andrew Morton

On Wed, Apr 11, 2012 at 08:57:20PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
> 
> So I'm pinging you guys to seek your insight.

I'm sorry you are given a runaround like this with that code.

> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
> 
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.

You could also twist this around and argue the same for cpu usage and
make it part of the cpu cgroup, but it doesn't really fit in either
subsystem, IMO.

> Now the dilemna is how to implement it?
> 
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)

What was wrong with that again?

> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
> 
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).

3) Make it an integral part of cgroups, because keeping track of tasks
in them already is, so it would be a more natural approach than
bolting it onto the memory controller.

But this has the same overhead.  And even if this would end up being a
better idea, we could still do this after merging it as a separate
controller as long as we maintain the interface.

> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
> 
> OTOH memory.kmem.* interface would have be a good fit.
> 
> What do you think?

Instead of integrating it task-wise, could the problem be solved by
accounting the kernel stack to kmem?  And then have a kmem limit,
which we already want anway?

After all, we would only restrict the number of tasks for the
resources they require, not to only allow an arbitrary number of tasks
(unless one wants to sell Windows 7 Starter style containers, in which
case one can go play with oneself out of tree as far as I'm concerned)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-11 18:57 [RFD] Merge task counter into memcg Frederic Weisbecker
       [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  2012-04-12  0:56 ` KAMEZAWA Hiroyuki
@ 2012-04-12  1:07 ` Johannes Weiner
       [not found]   ` <20120412010745.GE1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  2 siblings, 1 reply; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12  1:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Hugh Dickins, Andrew Morton, KAMEZAWA Hiroyuki, Glauber Costa,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Wed, Apr 11, 2012 at 08:57:20PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
> 
> So I'm pinging you guys to seek your insight.

I'm sorry you are given a runaround like this with that code.

> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
> 
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.

You could also twist this around and argue the same for cpu usage and
make it part of the cpu cgroup, but it doesn't really fit in either
subsystem, IMO.

> Now the dilemna is how to implement it?
> 
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)

What was wrong with that again?

> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
> 
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).

3) Make it an integral part of cgroups, because keeping track of tasks
in them already is, so it would be a more natural approach than
bolting it onto the memory controller.

But this has the same overhead.  And even if this would end up being a
better idea, we could still do this after merging it as a separate
controller as long as we maintain the interface.

> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
> 
> OTOH memory.kmem.* interface would have be a good fit.
> 
> What do you think?

Instead of integrating it task-wise, could the problem be solved by
accounting the kernel stack to kmem?  And then have a kmem limit,
which we already want anway?

After all, we would only restrict the number of tasks for the
resources they require, not to only allow an arbitrary number of tasks
(unless one wants to sell Windows 7 Starter style containers, in which
case one can go play with oneself out of tree as far as I'm concerned)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12  1:07 ` Johannes Weiner
@ 2012-04-12  2:15       ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12  2:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

On 04/11/2012 10:07 PM, Johannes Weiner wrote:
> You could also twist this around and argue the same for cpu usage and
> make it part of the cpu cgroup, but it doesn't really fit in either
> subsystem, IMO.
I myself really prefer this in the cpu controller.
Besides the bytes vs objects things, Whenever you create a process, at 
some point it will end up in the runqueues to be scheduled. It is a 
natural point of accounting.

Either that, or making it a core feature of cgroups, like limiting the 
number of processes in the tasks file (just have to find a natural way 
to make it hierarchical). It will make more and more sense as people 
seem to be favoring single hierarchies these days. (granted, not a 
settled discussion, so your views may vary)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12  2:15       ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12  2:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Frederic Weisbecker, Hugh Dickins, Andrew Morton,
	KAMEZAWA Hiroyuki, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/11/2012 10:07 PM, Johannes Weiner wrote:
> You could also twist this around and argue the same for cpu usage and
> make it part of the cpu cgroup, but it doesn't really fit in either
> subsystem, IMO.
I myself really prefer this in the cpu controller.
Besides the bytes vs objects things, Whenever you create a process, at 
some point it will end up in the runqueues to be scheduled. It is a 
natural point of accounting.

Either that, or making it a core feature of cgroups, like limiting the 
number of processes in the tasks file (just have to find a natural way 
to make it hierarchical). It will make more and more sense as people 
seem to be favoring single hierarchies these days. (granted, not a 
settled discussion, so your views may vary)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12  1:07 ` Johannes Weiner
@ 2012-04-12  3:26       ` Li Zefan
  0 siblings, 0 replies; 88+ messages in thread
From: Li Zefan @ 2012-04-12  3:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

Johannes Weiner wrote:

> On Wed, Apr 11, 2012 at 08:57:20PM +0200, Frederic Weisbecker wrote:
>> Hi,
>>
>> While talking with Tejun about targetting the cgroup task counter subsystem
>> for the next merge window, he suggested to check if this could be merged into
>> the memcg subsystem rather than creating a new one cgroup subsystem just
>> for task count limit purpose.
>>
>> So I'm pinging you guys to seek your insight.
> 
> I'm sorry you are given a runaround like this with that code.
> 


I don't like the idea of putting this stuff into memcg either.

>> I assume not everybody in the Cc list knows what the task counter subsystem
>> is all about. So here is a summary: this is a cgroup subsystem (latest version
>> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
>> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
>> maintain this accounting visible to a special tasks.usage file. The user can
>> set a limit on the number of tasks by writing on the tasks.limit file.
>> Further forks or cgroup migration are then rejected if the limit is exceeded.
>>
>> This feature is especially useful to protect against forkbombs in containers.
>> Or more generally to limit the resources on the number of tasks on a cgroup
>> as it involves some kernel memory allocation.
> 
> You could also twist this around and argue the same for cpu usage and
> make it part of the cpu cgroup, but it doesn't really fit in either
> subsystem, IMO.
> 
>> Now the dilemna is how to implement it?
>>
>> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> 
> What was wrong with that again?
> 
>> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
>> because this is about kernel memory allocation limitation. We could have a
>> memory.kmem.tasks.count
>>
>> My personal opinion is that the task counter brings some overhead: a charge
>> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
>> And this overhead happens even in the off-case (when the task counter susbsystem
>> is mounted but the limit is the default: ULLONG_MAX).
> 
> 3) Make it an integral part of cgroups, because keeping track of tasks
> in them already is, so it would be a more natural approach than
> bolting it onto the memory controller.
> 

> But this has the same overhead.

This makes the most sense to me. Task counting and limiting sounds part of
cgroups in nature.

It has overhead, but what makes it worse than that in the single hierarchy
that we are aiming at?

> And even if this would end up being a

> better idea, we could still do this after merging it as a separate
> controller as long as we maintain the interface.
> 


That would add some tricky messy code, so better no.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12  3:26       ` Li Zefan
  0 siblings, 0 replies; 88+ messages in thread
From: Li Zefan @ 2012-04-12  3:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Frederic Weisbecker, Hugh Dickins, Andrew Morton,
	KAMEZAWA Hiroyuki, Glauber Costa, Tejun Heo, Daniel Walsh,
	Daniel P. Berrange, Li Zefan, LKML, Cgroups, Containers

Johannes Weiner wrote:

> On Wed, Apr 11, 2012 at 08:57:20PM +0200, Frederic Weisbecker wrote:
>> Hi,
>>
>> While talking with Tejun about targetting the cgroup task counter subsystem
>> for the next merge window, he suggested to check if this could be merged into
>> the memcg subsystem rather than creating a new one cgroup subsystem just
>> for task count limit purpose.
>>
>> So I'm pinging you guys to seek your insight.
> 
> I'm sorry you are given a runaround like this with that code.
> 


I don't like the idea of putting this stuff into memcg either.

>> I assume not everybody in the Cc list knows what the task counter subsystem
>> is all about. So here is a summary: this is a cgroup subsystem (latest version
>> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
>> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
>> maintain this accounting visible to a special tasks.usage file. The user can
>> set a limit on the number of tasks by writing on the tasks.limit file.
>> Further forks or cgroup migration are then rejected if the limit is exceeded.
>>
>> This feature is especially useful to protect against forkbombs in containers.
>> Or more generally to limit the resources on the number of tasks on a cgroup
>> as it involves some kernel memory allocation.
> 
> You could also twist this around and argue the same for cpu usage and
> make it part of the cpu cgroup, but it doesn't really fit in either
> subsystem, IMO.
> 
>> Now the dilemna is how to implement it?
>>
>> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> 
> What was wrong with that again?
> 
>> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
>> because this is about kernel memory allocation limitation. We could have a
>> memory.kmem.tasks.count
>>
>> My personal opinion is that the task counter brings some overhead: a charge
>> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
>> And this overhead happens even in the off-case (when the task counter susbsystem
>> is mounted but the limit is the default: ULLONG_MAX).
> 
> 3) Make it an integral part of cgroups, because keeping track of tasks
> in them already is, so it would be a more natural approach than
> bolting it onto the memory controller.
> 

> But this has the same overhead.

This makes the most sense to me. Task counting and limiting sounds part of
cgroups in nature.

It has overhead, but what makes it worse than that in the single hierarchy
that we are aiming at?

> And even if this would end up being a

> better idea, we could still do this after merging it as a separate
> controller as long as we maintain the interface.
> 


That would add some tricky messy code, so better no.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
                     ` (2 preceding siblings ...)
  2012-04-12  1:07   ` Johannes Weiner
@ 2012-04-12  3:56   ` Alexander Nikiforov
       [not found]     ` <4F86527C.2080507-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
  2012-04-17  1:09     ` Frederic Weisbecker
  2012-04-12  4:00     ` Alexander Nikiforov
  4 siblings, 2 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-12  3:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
> Hi,
>
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
>
> So I'm pinging you guys to seek your insight.
>
> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
>
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.
>
> Now the dilemna is how to implement it?
>
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
>
> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
>
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).
>
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
>
> OTOH memory.kmem.* interface would have be a good fit.
>
> What do you think?
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Hi,

I'm agree that this is memory related thing, but I prefer this as a 
separate subsystem.
Yes it has some impact on a system, but on the other hand we will have 
some very useful tool to track tasks state.
As I wrote before

http://comments.gmane.org/gmane.linux.kernel.cgroups/1448

it'll very useful to have event in the userspace about fork/exit about 
group of the processes.



-- 
Best regards,
      Alex Nikiforov,
      Mobile SW, Advanced Software Group,
      Moscow R&D center, Samsung Electronics

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-11 18:57 [RFD] Merge task counter into memcg Frederic Weisbecker
@ 2012-04-12  4:00     ` Alexander Nikiforov
  2012-04-12  0:56 ` KAMEZAWA Hiroyuki
  2012-04-12  1:07 ` Johannes Weiner
  2 siblings, 0 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-12  4:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
> Hi,
>
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
>
> So I'm pinging you guys to seek your insight.
>
> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
>
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.
>
> Now the dilemna is how to implement it?
>
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
>
> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
>
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).
>
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
>
> OTOH memory.kmem.* interface would have be a good fit.
>
> What do you think?
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Hi,

I'm agree that this is memory related thing, but I prefer this as a 
separate subsystem.
Yes it has some impact on a system, but on the other hand we will have 
some very useful tool to track tasks state.
As I wrote before

http://comments.gmane.org/gmane.linux.kernel.cgroups/1448

it'll very useful to have event in the userspace about fork/exit about 
group of the processes.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12  4:00     ` Alexander Nikiforov
  0 siblings, 0 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-12  4:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, KAMEZAWA Hiroyuki,
	Glauber Costa, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
> Hi,
>
> While talking with Tejun about targetting the cgroup task counter subsystem
> for the next merge window, he suggested to check if this could be merged into
> the memcg subsystem rather than creating a new one cgroup subsystem just
> for task count limit purpose.
>
> So I'm pinging you guys to seek your insight.
>
> I assume not everybody in the Cc list knows what the task counter subsystem
> is all about. So here is a summary: this is a cgroup subsystem (latest version
> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> maintain this accounting visible to a special tasks.usage file. The user can
> set a limit on the number of tasks by writing on the tasks.limit file.
> Further forks or cgroup migration are then rejected if the limit is exceeded.
>
> This feature is especially useful to protect against forkbombs in containers.
> Or more generally to limit the resources on the number of tasks on a cgroup
> as it involves some kernel memory allocation.
>
> Now the dilemna is how to implement it?
>
> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
>
> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> because this is about kernel memory allocation limitation. We could have a
> memory.kmem.tasks.count
>
> My personal opinion is that the task counter brings some overhead: a charge
> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> And this overhead happens even in the off-case (when the task counter susbsystem
> is mounted but the limit is the default: ULLONG_MAX).
>
> So if we choose the second solution, this overhead will be added unconditionally
> to memcg.
> But I don't expect every users of memcg will need the task counter. So perhaps
> the overhead should be kept in its own separate subsystem.
>
> OTOH memory.kmem.* interface would have be a good fit.
>
> What do you think?
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Hi,

I'm agree that this is memory related thing, but I prefer this as a 
separate subsystem.
Yes it has some impact on a system, but on the other hand we will have 
some very useful tool to track tasks state.
As I wrote before

http://comments.gmane.org/gmane.linux.kernel.cgroups/1448

it'll very useful to have event in the userspace about fork/exit about 
group of the processes.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]     ` <4F85D9C6.5000202-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-12 11:19       ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 11:19 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On Wed, Apr 11, 2012 at 04:21:42PM -0300, Glauber Costa wrote:
> On 04/11/2012 03:57 PM, Frederic Weisbecker wrote:
> >So if we choose the second solution, this overhead will be added unconditionally
> >to memcg.
> >But I don't expect every users of memcg will need the task counter. So perhaps
> >the overhead should be kept in its own separate subsystem.
> 
> What we're usually doing with kmem paths, like the upcoming slab
> tracking, is do not account if it is not limited. So if you are not
> limited in a particular cgroup, you jut don't bother with accounting.

So that's a good point. I can start accounting tasks and apply limits
once we write to the file only.

> 
> If this suits your need, you can probably do the same, and then
> pay the price just for the users that are interested on it.
> 
> Now, whether or not this should be considered memory, is a different
> story. You can say it is memory yes, but I bet you can very well
> find a bunch of arguments to consider it "cpu" as well.
> 
> Against the memcg, consider this: Your counter would probably be the
> first non-page based data in memcg. At least raises a flag.

Good points.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-11 19:21     ` Glauber Costa
  (?)
@ 2012-04-12 11:19     ` Frederic Weisbecker
  -1 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 11:19 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, KAMEZAWA Hiroyuki,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Wed, Apr 11, 2012 at 04:21:42PM -0300, Glauber Costa wrote:
> On 04/11/2012 03:57 PM, Frederic Weisbecker wrote:
> >So if we choose the second solution, this overhead will be added unconditionally
> >to memcg.
> >But I don't expect every users of memcg will need the task counter. So perhaps
> >the overhead should be kept in its own separate subsystem.
> 
> What we're usually doing with kmem paths, like the upcoming slab
> tracking, is do not account if it is not limited. So if you are not
> limited in a particular cgroup, you jut don't bother with accounting.

So that's a good point. I can start accounting tasks and apply limits
once we write to the file only.

> 
> If this suits your need, you can probably do the same, and then
> pay the price just for the users that are interested on it.
> 
> Now, whether or not this should be considered memory, is a different
> story. You can say it is memory yes, but I bet you can very well
> find a bunch of arguments to consider it "cpu" as well.
> 
> Against the memcg, consider this: Your counter would probably be the
> first non-page based data in memcg. At least raises a flag.

Good points.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12  0:56 ` KAMEZAWA Hiroyuki
@ 2012-04-12 11:32       ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 11:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On Thu, Apr 12, 2012 at 09:56:49AM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/12 3:57), Frederic Weisbecker wrote:
> 
> > Hi,
> > 
> > While talking with Tejun about targetting the cgroup task counter subsystem
> > for the next merge window, he suggested to check if this could be merged into
> > the memcg subsystem rather than creating a new one cgroup subsystem just
> > for task count limit purpose.
> > 
> > So I'm pinging you guys to seek your insight.
> > 
> > I assume not everybody in the Cc list knows what the task counter subsystem
> > is all about. So here is a summary: this is a cgroup subsystem (latest version
> > in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> > present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> > maintain this accounting visible to a special tasks.usage file. The user can
> > set a limit on the number of tasks by writing on the tasks.limit file.
> > Further forks or cgroup migration are then rejected if the limit is exceeded.
> > 
> > This feature is especially useful to protect against forkbombs in containers.
> > Or more generally to limit the resources on the number of tasks on a cgroup
> > as it involves some kernel memory allocation.
> > 
> > Now the dilemna is how to implement it?
> > 
> > 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> > 
> > 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> > because this is about kernel memory allocation limitation. We could have a
> > memory.kmem.tasks.count
> > 
> > My personal opinion is that the task counter brings some overhead: a charge
> > across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> > And this overhead happens even in the off-case (when the task counter susbsystem
> > is mounted but the limit is the default: ULLONG_MAX).
> > 
> > So if we choose the second solution, this overhead will be added unconditionally
> > to memcg.
> > But I don't expect every users of memcg will need the task counter. So perhaps
> > the overhead should be kept in its own separate subsystem.
> > 
> > OTOH memory.kmem.* interface would have be a good fit.
> > 
> > What do you think?
> 
> 
> Sounds interesting to me. Hm, does your 'overhead' of task accounting is
> enough large to be visible to users ? How performance regression is big ?

I haven't measured. But on every fork, we do a res_counter_charge() that
walks through css_set and all its css_set ancestors, take a spinlock and
increment something to every level. In terms of cache trashing and algorithm
complexity, I believe the issue is real.

> BTW, now, all memcg's limit interfaces use 'bytes' as an unit of accounting.
> It's a small concern to me to have mixture of bytes and numbers of objects
> for accounting.

Indeed, this can be confusing for users.

> But I think increasing number of subsystem is not very good....

If the result is a better granularity on the overhead, I believe this
can be a good thing.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 11:32       ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 11:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, Glauber Costa,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Thu, Apr 12, 2012 at 09:56:49AM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/12 3:57), Frederic Weisbecker wrote:
> 
> > Hi,
> > 
> > While talking with Tejun about targetting the cgroup task counter subsystem
> > for the next merge window, he suggested to check if this could be merged into
> > the memcg subsystem rather than creating a new one cgroup subsystem just
> > for task count limit purpose.
> > 
> > So I'm pinging you guys to seek your insight.
> > 
> > I assume not everybody in the Cc list knows what the task counter subsystem
> > is all about. So here is a summary: this is a cgroup subsystem (latest version
> > in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> > present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> > maintain this accounting visible to a special tasks.usage file. The user can
> > set a limit on the number of tasks by writing on the tasks.limit file.
> > Further forks or cgroup migration are then rejected if the limit is exceeded.
> > 
> > This feature is especially useful to protect against forkbombs in containers.
> > Or more generally to limit the resources on the number of tasks on a cgroup
> > as it involves some kernel memory allocation.
> > 
> > Now the dilemna is how to implement it?
> > 
> > 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> > 
> > 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> > because this is about kernel memory allocation limitation. We could have a
> > memory.kmem.tasks.count
> > 
> > My personal opinion is that the task counter brings some overhead: a charge
> > across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> > And this overhead happens even in the off-case (when the task counter susbsystem
> > is mounted but the limit is the default: ULLONG_MAX).
> > 
> > So if we choose the second solution, this overhead will be added unconditionally
> > to memcg.
> > But I don't expect every users of memcg will need the task counter. So perhaps
> > the overhead should be kept in its own separate subsystem.
> > 
> > OTOH memory.kmem.* interface would have be a good fit.
> > 
> > What do you think?
> 
> 
> Sounds interesting to me. Hm, does your 'overhead' of task accounting is
> enough large to be visible to users ? How performance regression is big ?

I haven't measured. But on every fork, we do a res_counter_charge() that
walks through css_set and all its css_set ancestors, take a spinlock and
increment something to every level. In terms of cache trashing and algorithm
complexity, I believe the issue is real.

> BTW, now, all memcg's limit interfaces use 'bytes' as an unit of accounting.
> It's a small concern to me to have mixture of bytes and numbers of objects
> for accounting.

Indeed, this can be confusing for users.

> But I think increasing number of subsystem is not very good....

If the result is a better granularity on the overhead, I believe this
can be a good thing.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]       ` <20120412113217.GB11455-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2012-04-12 11:43         ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 11:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
>> But I think increasing number of subsystem is not very good....
> If the result is a better granularity on the overhead, I believe this
> can be a good thing.

But again, since there is quite number of people trying to merge those 
stuff together, you are just swimming against the tide.

If this gets really integrated, out of a sudden the overhead will 
appear. So better care about it now.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 11:32       ` Frederic Weisbecker
  (?)
@ 2012-04-12 11:43       ` Glauber Costa
       [not found]         ` <4F86BFC6.2050400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-04-12 12:32         ` Johannes Weiner
  -1 siblings, 2 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 11:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, Johannes Weiner, Andrew Morton,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
>> But I think increasing number of subsystem is not very good....
> If the result is a better granularity on the overhead, I believe this
> can be a good thing.

But again, since there is quite number of people trying to merge those 
stuff together, you are just swimming against the tide.

If this gets really integrated, out of a sudden the overhead will 
appear. So better care about it now.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]         ` <4F86BFC6.2050400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-12 12:32           ` Johannes Weiner
  0 siblings, 0 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12 12:32 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

On Thu, Apr 12, 2012 at 08:43:02AM -0300, Glauber Costa wrote:
> On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
> >>But I think increasing number of subsystem is not very good....
> >If the result is a better granularity on the overhead, I believe this
> >can be a good thing.
> 
> But again, since there is quite number of people trying to merge
> those stuff together, you are just swimming against the tide.

I don't see where merging unrelated controllers together is being
discussed, do you have a reference?

> If this gets really integrated, out of a sudden the overhead will
> appear. So better care about it now.

Forcing people that want to account/limit one resource to take the hit
for something else they are not interested in requires justification.
You can optimize only so much, in the end, the hierarchical accounting
is just expensive and unacceptable if you don't care about a certain
resource.  For that reason, I think controllers should stay opt-in.

Btw, can we please have a discussion where raised concerns are
supported by more than gut feeling?  "I think X is not very good" is
hardly an argument.  Where is the technical problem in increasing the
number of available controllers?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 11:43       ` Glauber Costa
       [not found]         ` <4F86BFC6.2050400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-12 12:32         ` Johannes Weiner
       [not found]           ` <20120412123256.GI1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  1 sibling, 1 reply; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12 12:32 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Frederic Weisbecker, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On Thu, Apr 12, 2012 at 08:43:02AM -0300, Glauber Costa wrote:
> On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
> >>But I think increasing number of subsystem is not very good....
> >If the result is a better granularity on the overhead, I believe this
> >can be a good thing.
> 
> But again, since there is quite number of people trying to merge
> those stuff together, you are just swimming against the tide.

I don't see where merging unrelated controllers together is being
discussed, do you have a reference?

> If this gets really integrated, out of a sudden the overhead will
> appear. So better care about it now.

Forcing people that want to account/limit one resource to take the hit
for something else they are not interested in requires justification.
You can optimize only so much, in the end, the hierarchical accounting
is just expensive and unacceptable if you don't care about a certain
resource.  For that reason, I think controllers should stay opt-in.

Btw, can we please have a discussion where raised concerns are
supported by more than gut feeling?  "I think X is not very good" is
hardly an argument.  Where is the technical problem in increasing the
number of available controllers?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 12:32         ` Johannes Weiner
@ 2012-04-12 13:12               ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 13:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

On 04/12/2012 09:32 AM, Johannes Weiner wrote:
> On Thu, Apr 12, 2012 at 08:43:02AM -0300, Glauber Costa wrote:
>> On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
>>>> But I think increasing number of subsystem is not very good....
>>> If the result is a better granularity on the overhead, I believe this
>>> can be a good thing.
>>
>> But again, since there is quite number of people trying to merge
>> those stuff together, you are just swimming against the tide.
>
> I don't see where merging unrelated controllers together is being
> discussed, do you have a reference?

https://lkml.org/lkml/2012/2/21/379

But also, I believe this has been widely discussed in person by people, 
in separate groups. Maybe Tejun can do a small writeup of where we stand?

I would also point out that this is exactly what it is (IMHO): an 
ongoing discussion. You are more than welcome to chime in.

>> If this gets really integrated, out of a sudden the overhead will
>> appear. So better care about it now.
>
> Forcing people that want to account/limit one resource to take the hit
> for something else they are not interested in requires justification.

Agree. Even people aiming for unified hierarchies are okay with an 
opt-in/out system, I believe. So the controllers need not to be active 
at all times. One way of doing this is what I suggested to Frederic: If 
you don't limit, don't account.

> You can optimize only so much, in the end, the hierarchical accounting
> is just expensive and unacceptable if you don't care about a certain
> resource.  For that reason, I think controllers should stay opt-in.

see above.

> Btw, can we please have a discussion where raised concerns are
> supported by more than gut feeling?  "I think X is not very good" is
> hardly an argument.  Where is the technical problem in increasing the
> number of available controllers?

Kame said that, not me. But FWIW, I don't disagree. And this is hardly 
gut feeling.

A big number of controllers creates complexity. When coding, we can 
assume a lot less things about their relationships, and more 
importantly: at some point people get confused. Fuck, sometimes *we* get 
confused about which controller do what, where its responsibility end 
and where the other's begin. And we're the ones writing it! Avoiding 
complexity is an engineering principle, not a gut feeling.

Now, of course, we should aim to make things as simple as possible, but 
not simpler: So you can argue that in Frederic's specific case, it is 
justified. And I'd be fine with that 100 %. If I agreed...

There are two natural points for inclusion here:

1) every cgroup has a task counter by itself. If we're putting the tasks 
there anyway, this provides a natural point of accounting.

2) The cpu cgroup, in the end, is the realm of the scheduler. We 
determine which % of the cpu the process will get, bandwidth, time spent 
by tasks, and all that. It is also more natural for that, because it is 
task based.

Don't get me wrong: I actually love the feature Frederic is working on.
I just don't believe a different controller is justified. Nor do I 
believe memcg is the place for that (specially now that I thought it 
overnight)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 13:12               ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 13:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Frederic Weisbecker, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/12/2012 09:32 AM, Johannes Weiner wrote:
> On Thu, Apr 12, 2012 at 08:43:02AM -0300, Glauber Costa wrote:
>> On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
>>>> But I think increasing number of subsystem is not very good....
>>> If the result is a better granularity on the overhead, I believe this
>>> can be a good thing.
>>
>> But again, since there is quite number of people trying to merge
>> those stuff together, you are just swimming against the tide.
>
> I don't see where merging unrelated controllers together is being
> discussed, do you have a reference?

https://lkml.org/lkml/2012/2/21/379

But also, I believe this has been widely discussed in person by people, 
in separate groups. Maybe Tejun can do a small writeup of where we stand?

I would also point out that this is exactly what it is (IMHO): an 
ongoing discussion. You are more than welcome to chime in.

>> If this gets really integrated, out of a sudden the overhead will
>> appear. So better care about it now.
>
> Forcing people that want to account/limit one resource to take the hit
> for something else they are not interested in requires justification.

Agree. Even people aiming for unified hierarchies are okay with an 
opt-in/out system, I believe. So the controllers need not to be active 
at all times. One way of doing this is what I suggested to Frederic: If 
you don't limit, don't account.

> You can optimize only so much, in the end, the hierarchical accounting
> is just expensive and unacceptable if you don't care about a certain
> resource.  For that reason, I think controllers should stay opt-in.

see above.

> Btw, can we please have a discussion where raised concerns are
> supported by more than gut feeling?  "I think X is not very good" is
> hardly an argument.  Where is the technical problem in increasing the
> number of available controllers?

Kame said that, not me. But FWIW, I don't disagree. And this is hardly 
gut feeling.

A big number of controllers creates complexity. When coding, we can 
assume a lot less things about their relationships, and more 
importantly: at some point people get confused. Fuck, sometimes *we* get 
confused about which controller do what, where its responsibility end 
and where the other's begin. And we're the ones writing it! Avoiding 
complexity is an engineering principle, not a gut feeling.

Now, of course, we should aim to make things as simple as possible, but 
not simpler: So you can argue that in Frederic's specific case, it is 
justified. And I'd be fine with that 100 %. If I agreed...

There are two natural points for inclusion here:

1) every cgroup has a task counter by itself. If we're putting the tasks 
there anyway, this provides a natural point of accounting.

2) The cpu cgroup, in the end, is the realm of the scheduler. We 
determine which % of the cpu the process will get, bandwidth, time spent 
by tasks, and all that. It is also more natural for that, because it is 
task based.

Don't get me wrong: I actually love the feature Frederic is working on.
I just don't believe a different controller is justified. Nor do I 
believe memcg is the place for that (specially now that I thought it 
overnight)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12  1:07 ` Johannes Weiner
@ 2012-04-12 14:55       ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 14:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Tejun Heo, Cgroups, Andrew Morton

On Thu, Apr 12, 2012 at 03:07:45AM +0200, Johannes Weiner wrote:
> On Wed, Apr 11, 2012 at 08:57:20PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > While talking with Tejun about targetting the cgroup task counter subsystem
> > for the next merge window, he suggested to check if this could be merged into
> > the memcg subsystem rather than creating a new one cgroup subsystem just
> > for task count limit purpose.
> > 
> > So I'm pinging you guys to seek your insight.
> 
> I'm sorry you are given a runaround like this with that code.

Nevermind, as long as I end up with something with most people are fine
with.

> 
> > I assume not everybody in the Cc list knows what the task counter subsystem
> > is all about. So here is a summary: this is a cgroup subsystem (latest version
> > in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> > present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> > maintain this accounting visible to a special tasks.usage file. The user can
> > set a limit on the number of tasks by writing on the tasks.limit file.
> > Further forks or cgroup migration are then rejected if the limit is exceeded.
> > 
> > This feature is especially useful to protect against forkbombs in containers.
> > Or more generally to limit the resources on the number of tasks on a cgroup
> > as it involves some kernel memory allocation.
> 
> You could also twist this around and argue the same for cpu usage and
> make it part of the cpu cgroup, but it doesn't really fit in either
> subsystem, IMO.

Ok.

> 
> > Now the dilemna is how to implement it?
> > 
> > 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> 
> What was wrong with that again?

Nothing. Tejun and I just wanted to do a last check to see if we are not
missing an existing interface/subsys where it would potentially fit.

> 
> > 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> > because this is about kernel memory allocation limitation. We could have a
> > memory.kmem.tasks.count
> > 
> > My personal opinion is that the task counter brings some overhead: a charge
> > across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> > And this overhead happens even in the off-case (when the task counter susbsystem
> > is mounted but the limit is the default: ULLONG_MAX).
> 
> 3) Make it an integral part of cgroups, because keeping track of tasks
> in them already is, so it would be a more natural approach than
> bolting it onto the memory controller.

(Adding Kosaki in Cc because he proposed me the same at the collab
summit).

Yeah. But keeping track of tasks is not unconditional in cgroups. It
triggers only after the first call to cgroup_iter_start(). It seems
we've tried hard to keep the check for this lockless. The end result
is that we account new tasks in cgroup_post_fork(): after the task is
added on the tasklist.

If we want to use the task tracking for counting purpose on top of
which we can cancel a fork, we need to move it before the task is added
to the tasklist. Because afterward it can't be cancelled anymore.

Doing this means that we can't do the task tracking conditionally
anymore.

That said, this lockless off-case is an issue on some other areas.
Like this race in the cgroup freezer: https://lkml.org/lkml/2012/3/8/69

So for now doing this in the cgroup core involves a real overhead even
in the off-case.

> 
> But this has the same overhead.  And even if this would end up being a
> better idea, we could still do this after merging it as a separate
> controller as long as we maintain the interface.

Yeah indeed.

> 
> > So if we choose the second solution, this overhead will be added unconditionally
> > to memcg.
> > But I don't expect every users of memcg will need the task counter. So perhaps
> > the overhead should be kept in its own separate subsystem.
> > 
> > OTOH memory.kmem.* interface would have be a good fit.
> > 
> > What do you think?
> 
> Instead of integrating it task-wise, could the problem be solved by
> accounting the kernel stack to kmem?  And then have a kmem limit,
> which we already want anway?

I don't know how the kernel stack is allocated for tasks. Do you mean
that we allocate a chunck of it for each new task and we could rely
on that?

> After all, we would only restrict the number of tasks for the
> resources they require

It depends if the kernel stack can have other kind of "consumer".

>, not to only allow an arbitrary number of tasks
> (unless one wants to sell Windows 7 Starter style containers, in which
> case one can go play with oneself out of tree as far as I'm concerned)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 14:55       ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 14:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hugh Dickins, Andrew Morton, KAMEZAWA Hiroyuki, Glauber Costa,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Thu, Apr 12, 2012 at 03:07:45AM +0200, Johannes Weiner wrote:
> On Wed, Apr 11, 2012 at 08:57:20PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > While talking with Tejun about targetting the cgroup task counter subsystem
> > for the next merge window, he suggested to check if this could be merged into
> > the memcg subsystem rather than creating a new one cgroup subsystem just
> > for task count limit purpose.
> > 
> > So I'm pinging you guys to seek your insight.
> 
> I'm sorry you are given a runaround like this with that code.

Nevermind, as long as I end up with something with most people are fine
with.

> 
> > I assume not everybody in the Cc list knows what the task counter subsystem
> > is all about. So here is a summary: this is a cgroup subsystem (latest version
> > in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> > present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> > maintain this accounting visible to a special tasks.usage file. The user can
> > set a limit on the number of tasks by writing on the tasks.limit file.
> > Further forks or cgroup migration are then rejected if the limit is exceeded.
> > 
> > This feature is especially useful to protect against forkbombs in containers.
> > Or more generally to limit the resources on the number of tasks on a cgroup
> > as it involves some kernel memory allocation.
> 
> You could also twist this around and argue the same for cpu usage and
> make it part of the cpu cgroup, but it doesn't really fit in either
> subsystem, IMO.

Ok.

> 
> > Now the dilemna is how to implement it?
> > 
> > 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> 
> What was wrong with that again?

Nothing. Tejun and I just wanted to do a last check to see if we are not
missing an existing interface/subsys where it would potentially fit.

> 
> > 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> > because this is about kernel memory allocation limitation. We could have a
> > memory.kmem.tasks.count
> > 
> > My personal opinion is that the task counter brings some overhead: a charge
> > across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> > And this overhead happens even in the off-case (when the task counter susbsystem
> > is mounted but the limit is the default: ULLONG_MAX).
> 
> 3) Make it an integral part of cgroups, because keeping track of tasks
> in them already is, so it would be a more natural approach than
> bolting it onto the memory controller.

(Adding Kosaki in Cc because he proposed me the same at the collab
summit).

Yeah. But keeping track of tasks is not unconditional in cgroups. It
triggers only after the first call to cgroup_iter_start(). It seems
we've tried hard to keep the check for this lockless. The end result
is that we account new tasks in cgroup_post_fork(): after the task is
added on the tasklist.

If we want to use the task tracking for counting purpose on top of
which we can cancel a fork, we need to move it before the task is added
to the tasklist. Because afterward it can't be cancelled anymore.

Doing this means that we can't do the task tracking conditionally
anymore.

That said, this lockless off-case is an issue on some other areas.
Like this race in the cgroup freezer: https://lkml.org/lkml/2012/3/8/69

So for now doing this in the cgroup core involves a real overhead even
in the off-case.

> 
> But this has the same overhead.  And even if this would end up being a
> better idea, we could still do this after merging it as a separate
> controller as long as we maintain the interface.

Yeah indeed.

> 
> > So if we choose the second solution, this overhead will be added unconditionally
> > to memcg.
> > But I don't expect every users of memcg will need the task counter. So perhaps
> > the overhead should be kept in its own separate subsystem.
> > 
> > OTOH memory.kmem.* interface would have be a good fit.
> > 
> > What do you think?
> 
> Instead of integrating it task-wise, could the problem be solved by
> accounting the kernel stack to kmem?  And then have a kmem limit,
> which we already want anway?

I don't know how the kernel stack is allocated for tasks. Do you mean
that we allocate a chunck of it for each new task and we could rely
on that?

> After all, we would only restrict the number of tasks for the
> resources they require

It depends if the kernel stack can have other kind of "consumer".

>, not to only allow an arbitrary number of tasks
> (unless one wants to sell Windows 7 Starter style containers, in which
> case one can go play with oneself out of tree as far as I'm concerned)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]               ` <4F86D4BD.1040305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-12 15:30                 ` Johannes Weiner
  0 siblings, 0 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12 15:30 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

On Thu, Apr 12, 2012 at 10:12:29AM -0300, Glauber Costa wrote:
> On 04/12/2012 09:32 AM, Johannes Weiner wrote:
> >On Thu, Apr 12, 2012 at 08:43:02AM -0300, Glauber Costa wrote:
> >>On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
> >>>>But I think increasing number of subsystem is not very good....
> >>>If the result is a better granularity on the overhead, I believe this
> >>>can be a good thing.
> >>
> >>But again, since there is quite number of people trying to merge
> >>those stuff together, you are just swimming against the tide.
> >
> >I don't see where merging unrelated controllers together is being
> >discussed, do you have a reference?
> 
> https://lkml.org/lkml/2012/2/21/379
> 
> But also, I believe this has been widely discussed in person by
> people, in separate groups. Maybe Tejun can do a small writeup of
> where we stand?
> 
> I would also point out that this is exactly what it is (IMHO): an
> ongoing discussion. You are more than welcome to chime in.

I thought the conclusion was that nobody really had any sane use case
for multiple hierarchies.  So while nobody wanted to just disable them
in fear of breaking someones usecase, individual controllers still can
only be active in a single hierarchy.  I don't see why the task
controller should now as a precedence support a level of flexibility
that is very doubtful in the first place.

> >>If this gets really integrated, out of a sudden the overhead will
> >>appear. So better care about it now.
> >
> >Forcing people that want to account/limit one resource to take the hit
> >for something else they are not interested in requires justification.
> 
> Agree. Even people aiming for unified hierarchies are okay with an
> opt-in/out system, I believe. So the controllers need not to be
> active at all times. One way of doing this is what I suggested to
> Frederic: If you don't limit, don't account.

I don't agree, it's a valid usecase to monitor a workload without
limiting it in any way.  I do it all the time.

> >You can optimize only so much, in the end, the hierarchical accounting
> >is just expensive and unacceptable if you don't care about a certain
> >resource.  For that reason, I think controllers should stay opt-in.
> 
> see above.
> 
> >Btw, can we please have a discussion where raised concerns are
> >supported by more than gut feeling?  "I think X is not very good" is
> >hardly an argument.  Where is the technical problem in increasing the
> >number of available controllers?
> 
> Kame said that, not me. But FWIW, I don't disagree. And this is
> hardly gut feeling.
> 
> A big number of controllers creates complexity. When coding, we can
> assume a lot less things about their relationships, and more
> importantly: at some point people get confused. Fuck, sometimes *we*
> get confused about which controller do what, where its
> responsibility end and where the other's begin. And we're the ones
> writing it! Avoiding complexity is an engineering principle, not a
> gut feeling.

And that's why I have a horrible feeling about extending the cgroup
core to do hierarchical accounting and limiting.  See below.

> Now, of course, we should aim to make things as simple as possible,
> but not simpler: So you can argue that in Frederic's specific case,
> it is justified. And I'd be fine with that 100 %. If I agreed...
> 
> There are two natural points for inclusion here:
> 
> 1) every cgroup has a task counter by itself. If we're putting the
> tasks there anyway, this provides a natural point of accounting.

I do think there is a big difference between having a list of tasks
per individual cgroup to manage basic task-cgroup relationship on one
hand, and accounting and limiting the number of allowed tasks over
multi-level group hierarchies on the other.  It may seem natural on
the face of it, but it really isn't, IMO.  One is basic plumbing, the
other is applying actual semantics to a hierarchy of groups, which has
always been the domain of controllers.  It's simply a layering
violation in my eyes.

> 2) The cpu cgroup, in the end, is the realm of the scheduler. We
> determine which % of the cpu the process will get, bandwidth, time
> spent by tasks, and all that. It is also more natural for that,
> because it is task based.
> 
> Don't get me wrong: I actually love the feature Frederic is working on.
> I just don't believe a different controller is justified. Nor do I
> believe memcg is the place for that (specially now that I thought it
> overnight)

To reraise a point from my other email that was ignored: do users
actually really care about the number of tasks when they want to
prevent forkbombs?  If a task would use neither CPU nor memory, you
would not be interested in limiting the number of tasks.

Because the number of tasks is not a resource.  CPU and memory are.

So again, if we would include the memory impact of tasks properly
(structures, kernel stack pages) in the kernel memory counters which
we allow to limit, shouldn't this solve our problem?

You said in private email that you didn't like the idea because
administrators wouldn't know how big the kernel stack was and that the
number of tasks would be a more natural thing to limit.  But I think
that is actually an argument in favor of the kmem approach: the user
has no idea how much impact a task actually has resource-wise!  On the
other hand, he knows exactly how much memory and CPU his machine has
and how he wants to distribute these resources.  So why provide him
with an interface to control some number in an unknowwn unit?

You don't propose we allow limiting the number of dcache entries,
either, but rather the memory they use.

The historical limiting of number of tasks through rlimit is anything
but scientific or natural.  You essentially set it to a random value
between allowing most users to do their job and preventing things from
taking down the machine.  With proper resource accounting, which we
want to have anyway, we can do much better than that, so why shouldn't
we?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 13:12               ` Glauber Costa
  (?)
@ 2012-04-12 15:30               ` Johannes Weiner
       [not found]                 ` <20120412153055.GL1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  2012-04-12 16:54                 ` Glauber Costa
  -1 siblings, 2 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12 15:30 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Frederic Weisbecker, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On Thu, Apr 12, 2012 at 10:12:29AM -0300, Glauber Costa wrote:
> On 04/12/2012 09:32 AM, Johannes Weiner wrote:
> >On Thu, Apr 12, 2012 at 08:43:02AM -0300, Glauber Costa wrote:
> >>On 04/12/2012 08:32 AM, Frederic Weisbecker wrote:
> >>>>But I think increasing number of subsystem is not very good....
> >>>If the result is a better granularity on the overhead, I believe this
> >>>can be a good thing.
> >>
> >>But again, since there is quite number of people trying to merge
> >>those stuff together, you are just swimming against the tide.
> >
> >I don't see where merging unrelated controllers together is being
> >discussed, do you have a reference?
> 
> https://lkml.org/lkml/2012/2/21/379
> 
> But also, I believe this has been widely discussed in person by
> people, in separate groups. Maybe Tejun can do a small writeup of
> where we stand?
> 
> I would also point out that this is exactly what it is (IMHO): an
> ongoing discussion. You are more than welcome to chime in.

I thought the conclusion was that nobody really had any sane use case
for multiple hierarchies.  So while nobody wanted to just disable them
in fear of breaking someones usecase, individual controllers still can
only be active in a single hierarchy.  I don't see why the task
controller should now as a precedence support a level of flexibility
that is very doubtful in the first place.

> >>If this gets really integrated, out of a sudden the overhead will
> >>appear. So better care about it now.
> >
> >Forcing people that want to account/limit one resource to take the hit
> >for something else they are not interested in requires justification.
> 
> Agree. Even people aiming for unified hierarchies are okay with an
> opt-in/out system, I believe. So the controllers need not to be
> active at all times. One way of doing this is what I suggested to
> Frederic: If you don't limit, don't account.

I don't agree, it's a valid usecase to monitor a workload without
limiting it in any way.  I do it all the time.

> >You can optimize only so much, in the end, the hierarchical accounting
> >is just expensive and unacceptable if you don't care about a certain
> >resource.  For that reason, I think controllers should stay opt-in.
> 
> see above.
> 
> >Btw, can we please have a discussion where raised concerns are
> >supported by more than gut feeling?  "I think X is not very good" is
> >hardly an argument.  Where is the technical problem in increasing the
> >number of available controllers?
> 
> Kame said that, not me. But FWIW, I don't disagree. And this is
> hardly gut feeling.
> 
> A big number of controllers creates complexity. When coding, we can
> assume a lot less things about their relationships, and more
> importantly: at some point people get confused. Fuck, sometimes *we*
> get confused about which controller do what, where its
> responsibility end and where the other's begin. And we're the ones
> writing it! Avoiding complexity is an engineering principle, not a
> gut feeling.

And that's why I have a horrible feeling about extending the cgroup
core to do hierarchical accounting and limiting.  See below.

> Now, of course, we should aim to make things as simple as possible,
> but not simpler: So you can argue that in Frederic's specific case,
> it is justified. And I'd be fine with that 100 %. If I agreed...
> 
> There are two natural points for inclusion here:
> 
> 1) every cgroup has a task counter by itself. If we're putting the
> tasks there anyway, this provides a natural point of accounting.

I do think there is a big difference between having a list of tasks
per individual cgroup to manage basic task-cgroup relationship on one
hand, and accounting and limiting the number of allowed tasks over
multi-level group hierarchies on the other.  It may seem natural on
the face of it, but it really isn't, IMO.  One is basic plumbing, the
other is applying actual semantics to a hierarchy of groups, which has
always been the domain of controllers.  It's simply a layering
violation in my eyes.

> 2) The cpu cgroup, in the end, is the realm of the scheduler. We
> determine which % of the cpu the process will get, bandwidth, time
> spent by tasks, and all that. It is also more natural for that,
> because it is task based.
> 
> Don't get me wrong: I actually love the feature Frederic is working on.
> I just don't believe a different controller is justified. Nor do I
> believe memcg is the place for that (specially now that I thought it
> overnight)

To reraise a point from my other email that was ignored: do users
actually really care about the number of tasks when they want to
prevent forkbombs?  If a task would use neither CPU nor memory, you
would not be interested in limiting the number of tasks.

Because the number of tasks is not a resource.  CPU and memory are.

So again, if we would include the memory impact of tasks properly
(structures, kernel stack pages) in the kernel memory counters which
we allow to limit, shouldn't this solve our problem?

You said in private email that you didn't like the idea because
administrators wouldn't know how big the kernel stack was and that the
number of tasks would be a more natural thing to limit.  But I think
that is actually an argument in favor of the kmem approach: the user
has no idea how much impact a task actually has resource-wise!  On the
other hand, he knows exactly how much memory and CPU his machine has
and how he wants to distribute these resources.  So why provide him
with an interface to control some number in an unknowwn unit?

You don't propose we allow limiting the number of dcache entries,
either, but rather the memory they use.

The historical limiting of number of tasks through rlimit is anything
but scientific or natural.  You essentially set it to a random value
between allowing most users to do their job and preventing things from
taking down the machine.  With proper resource accounting, which we
want to have anyway, we can do much better than that, so why shouldn't
we?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 14:55       ` Frederic Weisbecker
@ 2012-04-12 16:34           ` Glauber Costa
  -1 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 16:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/12/2012 11:55 AM, Frederic Weisbecker wrote:
> I don't know how the kernel stack is allocated for tasks. Do you mean
> that we allocate a chunck of it for each new task and we could rely
> on that?
>
More than this: amount of kernel stack is really, really something 
indirect if what you want to track is # of processes. Now, Hannes made a 
fair point in his other e-mail about what is a resource and what is not.

>> >  After all, we would only restrict the number of tasks for the
>> >  resources they require
> It depends if the kernel stack can have other kind of "consumer".
>
It also depends on what you really want to achieve.
If you want to prevent fork bombs, limiting kernel stack will do just fine.

Is there anything for which you need to know exactly the number of 
processes?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 16:34           ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 16:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Johannes Weiner, Hugh Dickins, Andrew Morton, KAMEZAWA Hiroyuki,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On 04/12/2012 11:55 AM, Frederic Weisbecker wrote:
> I don't know how the kernel stack is allocated for tasks. Do you mean
> that we allocate a chunck of it for each new task and we could rely
> on that?
>
More than this: amount of kernel stack is really, really something 
indirect if what you want to track is # of processes. Now, Hannes made a 
fair point in his other e-mail about what is a resource and what is not.

>> >  After all, we would only restrict the number of tasks for the
>> >  resources they require
> It depends if the kernel stack can have other kind of "consumer".
>
It also depends on what you really want to achieve.
If you want to prevent fork bombs, limiting kernel stack will do just fine.

Is there anything for which you need to know exactly the number of 
processes?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 15:30               ` Johannes Weiner
@ 2012-04-12 16:38                     ` Tejun Heo
  2012-04-12 16:54                 ` Glauber Costa
  1 sibling, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-12 16:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Cgroups, Andrew Morton

Hello, Johannes.

On Thu, Apr 12, 2012 at 05:30:55PM +0200, Johannes Weiner wrote:
> > But also, I believe this has been widely discussed in person by
> > people, in separate groups. Maybe Tejun can do a small writeup of
> > where we stand?

I'm still mulling over it.  What I want is single hierarchy with more
unified rules regarding how different controllers handle the hierarchy
in such way that different controllers may interact - ie. memcg and
blkio can share the same page tags, or cgroup-freezer can provide
freezing service to other controllers.  I want if a task belongs to
cgroup, it belongs to _the_ cgroup and you can figure out all cgroup
related stuff from there.

I also want to move away from this notion that any random userland
application can modify and access the cgroupfs hierarhcies directly.
It's way too low level and cgroup doesn't have nearly enough
multiplexing capability to support such usage.  We end up where
everyone is wading through fog hoping not to step on someone else's
toe, and the interface is a bit too integrated with internal
mechanisms to be exposed directly to random userland application
without another layer of abstraction / indirection / control.

> > I would also point out that this is exactly what it is (IMHO): an
> > ongoing discussion. You are more than welcome to chime in.
> 
> I thought the conclusion was that nobody really had any sane use case
> for multiple hierarchies.  So while nobody wanted to just disable them
> in fear of breaking someones usecase, individual controllers still can
> only be active in a single hierarchy.  I don't see why the task
> controller should now as a precedence support a level of flexibility
> that is very doubtful in the first place.

The reason why I asked Frederic whether it would make more sense as
part of memcg wasn't about flexibility but mostly about the type of
the resource.  I'll continue below.

> > Agree. Even people aiming for unified hierarchies are okay with an
> > opt-in/out system, I believe. So the controllers need not to be
> > active at all times. One way of doing this is what I suggested to
> > Frederic: If you don't limit, don't account.
> 
> I don't agree, it's a valid usecase to monitor a workload without
> limiting it in any way.  I do it all the time.

AFAICS, this seems to be the most valid use case for different
controllers seeing different part of the hierarchy, even if the
hierarchies aren't completely separate.  Accounting and control being
in separate controllers is pretty sucky too as it ends up accounting
things multiple times.  Maybe all controllers should learn how to do
accounting w/o applying limits?  Not sure yet.

> To reraise a point from my other email that was ignored: do users
> actually really care about the number of tasks when they want to
> prevent forkbombs?  If a task would use neither CPU nor memory, you
> would not be interested in limiting the number of tasks.
> 
> Because the number of tasks is not a resource.  CPU and memory are.
>
> So again, if we would include the memory impact of tasks properly
> (structures, kernel stack pages) in the kernel memory counters which
> we allow to limit, shouldn't this solve our problem?

The task counter is trying to control the *number* of tasks, which is
purely memory overhead.  Translating #tasks into the actual amount of
memory isn't too trivial tho - the task stack isn't the only
allocation and the numbers should somehow make sense to the userland
in consistent way.  Also, I'm not sure whether this particular limit
should live in its silo or should be summed up together as part of
kmem (kmem itself is in its own silo after all apart from user memory,
right?).  So, if those can be settled, I think protecting against fork
bombs could fit memcg better in the sense that the whole thing makes
more sense.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 16:38                     ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-12 16:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Glauber Costa, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

Hello, Johannes.

On Thu, Apr 12, 2012 at 05:30:55PM +0200, Johannes Weiner wrote:
> > But also, I believe this has been widely discussed in person by
> > people, in separate groups. Maybe Tejun can do a small writeup of
> > where we stand?

I'm still mulling over it.  What I want is single hierarchy with more
unified rules regarding how different controllers handle the hierarchy
in such way that different controllers may interact - ie. memcg and
blkio can share the same page tags, or cgroup-freezer can provide
freezing service to other controllers.  I want if a task belongs to
cgroup, it belongs to _the_ cgroup and you can figure out all cgroup
related stuff from there.

I also want to move away from this notion that any random userland
application can modify and access the cgroupfs hierarhcies directly.
It's way too low level and cgroup doesn't have nearly enough
multiplexing capability to support such usage.  We end up where
everyone is wading through fog hoping not to step on someone else's
toe, and the interface is a bit too integrated with internal
mechanisms to be exposed directly to random userland application
without another layer of abstraction / indirection / control.

> > I would also point out that this is exactly what it is (IMHO): an
> > ongoing discussion. You are more than welcome to chime in.
> 
> I thought the conclusion was that nobody really had any sane use case
> for multiple hierarchies.  So while nobody wanted to just disable them
> in fear of breaking someones usecase, individual controllers still can
> only be active in a single hierarchy.  I don't see why the task
> controller should now as a precedence support a level of flexibility
> that is very doubtful in the first place.

The reason why I asked Frederic whether it would make more sense as
part of memcg wasn't about flexibility but mostly about the type of
the resource.  I'll continue below.

> > Agree. Even people aiming for unified hierarchies are okay with an
> > opt-in/out system, I believe. So the controllers need not to be
> > active at all times. One way of doing this is what I suggested to
> > Frederic: If you don't limit, don't account.
> 
> I don't agree, it's a valid usecase to monitor a workload without
> limiting it in any way.  I do it all the time.

AFAICS, this seems to be the most valid use case for different
controllers seeing different part of the hierarchy, even if the
hierarchies aren't completely separate.  Accounting and control being
in separate controllers is pretty sucky too as it ends up accounting
things multiple times.  Maybe all controllers should learn how to do
accounting w/o applying limits?  Not sure yet.

> To reraise a point from my other email that was ignored: do users
> actually really care about the number of tasks when they want to
> prevent forkbombs?  If a task would use neither CPU nor memory, you
> would not be interested in limiting the number of tasks.
> 
> Because the number of tasks is not a resource.  CPU and memory are.
>
> So again, if we would include the memory impact of tasks properly
> (structures, kernel stack pages) in the kernel memory counters which
> we allow to limit, shouldn't this solve our problem?

The task counter is trying to control the *number* of tasks, which is
purely memory overhead.  Translating #tasks into the actual amount of
memory isn't too trivial tho - the task stack isn't the only
allocation and the numbers should somehow make sense to the userland
in consistent way.  Also, I'm not sure whether this particular limit
should live in its silo or should be summed up together as part of
kmem (kmem itself is in its own silo after all apart from user memory,
right?).  So, if those can be settled, I think protecting against fork
bombs could fit memcg better in the sense that the whole thing makes
more sense.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                 ` <20120412153055.GL1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  2012-04-12 16:38                     ` Tejun Heo
@ 2012-04-12 16:54                   ` Glauber Costa
  1 sibling, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 16:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton


>
>>>> If this gets really integrated, out of a sudden the overhead will
>>>> appear. So better care about it now.
>>>
>>> Forcing people that want to account/limit one resource to take the hit
>>> for something else they are not interested in requires justification.
>>
>> Agree. Even people aiming for unified hierarchies are okay with an
>> opt-in/out system, I believe. So the controllers need not to be
>> active at all times. One way of doing this is what I suggested to
>> Frederic: If you don't limit, don't account.
>
> I don't agree, it's a valid usecase to monitor a workload without
> limiting it in any way.  I do it all the time.

That's side-tracking. This is one way to do it, not the way to do it.
The main point is that a controller can be trivially made present in a 
hierarchy, without doing anything.


>>
>> A big number of controllers creates complexity. When coding, we can
>> assume a lot less things about their relationships, and more
>> importantly: at some point people get confused. Fuck, sometimes *we*
>> get confused about which controller do what, where its
>> responsibility end and where the other's begin. And we're the ones
>> writing it! Avoiding complexity is an engineering principle, not a
>> gut feeling.
>
> And that's why I have a horrible feeling about extending the cgroup
> core to do hierarchical accounting and limiting.  See below.
>
>> Now, of course, we should aim to make things as simple as possible,
>> but not simpler: So you can argue that in Frederic's specific case,
>> it is justified. And I'd be fine with that 100 %. If I agreed...
>>
>> There are two natural points for inclusion here:
>>
>> 1) every cgroup has a task counter by itself. If we're putting the
>> tasks there anyway, this provides a natural point of accounting.
>
> I do think there is a big difference between having a list of tasks
> per individual cgroup to manage basic task-cgroup relationship on one
> hand, and accounting and limiting the number of allowed tasks over
> multi-level group hierarchies on the other.  It may seem natural on
> the face of it, but it really isn't, IMO.

It makes less sense to me now after I read Frederic's last e-mail. 
Indeed, you are both right in this point.

> To reraise a point from my other email that was ignored: do users
> actually really care about the number of tasks when they want to
> prevent forkbombs?  If a task would use neither CPU nor memory, you
> would not be interested in limiting the number of tasks.
>
> Because the number of tasks is not a resource.  CPU and memory are.
>
> So again, if we would include the memory impact of tasks properly
> (structures, kernel stack pages) in the kernel memory counters which
> we allow to limit, shouldn't this solve our problem?
>
> You said in private email that you didn't like the idea because
> administrators wouldn't know how big the kernel stack was and that the
> number of tasks would be a more natural thing to limit.  But I think
> that is actually an argument in favor of the kmem approach: the user
> has no idea how much impact a task actually has resource-wise!  On the
> other hand, he knows exactly how much memory and CPU his machine has
> and how he wants to distribute these resources.  So why provide him
> with an interface to control some number in an unknowwn unit?
>
> You don't propose we allow limiting the number of dcache entries,
> either, but rather the memory they use.
>
> The historical limiting of number of tasks through rlimit is anything
> but scientific or natural.  You essentially set it to a random value
> between allowing most users to do their job and preventing things from
> taking down the machine.  With proper resource accounting, which we
> want to have anyway, we can do much better than that, so why shouldn't
> we?

Okay.

I may agree with you, I might not.

It really depends on Frederic's real use case - (Frederic, please 
comment on it).

If we're trying to limit the number of processes *as a way* of limiting 
the amount of memory they use, then yes, what you say makes total sense.

I was always under the assumption that they wanted something more. One 
of the things I remember reading on the descriptions, was that some 
services shouldn't be allowed to fork after a certain point. Then you 
could limit its amount of processes to whatever value it has now.

For that, stack usage may not help for much.

Now, my personal take on this: Use cases like that, if really needed, 
can be achieved in some other ways, that does not even involve cgroups.

One of the things for the near future, is start putting more kinds of 
data in the kmem controller. Things like page tables and the stack are 
the natural candidates. I am in Hannes side in saying that it should be 
enough to disallow any malicious container to do any harm.

But it is not even necessary!

- Each process has a task struct
- task struct comes from the slab.

Even my slab accounting patches are enough to prevent harm outside the 
container. Because if you fill all your kmem with task_structs, you will 
be stopped to go any further.

As a matter of fact, I've being doing it all the time during the last 
few days while testing the patchset.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 15:30               ` Johannes Weiner
       [not found]                 ` <20120412153055.GL1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2012-04-12 16:54                 ` Glauber Costa
  1 sibling, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 16:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Frederic Weisbecker, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers


>
>>>> If this gets really integrated, out of a sudden the overhead will
>>>> appear. So better care about it now.
>>>
>>> Forcing people that want to account/limit one resource to take the hit
>>> for something else they are not interested in requires justification.
>>
>> Agree. Even people aiming for unified hierarchies are okay with an
>> opt-in/out system, I believe. So the controllers need not to be
>> active at all times. One way of doing this is what I suggested to
>> Frederic: If you don't limit, don't account.
>
> I don't agree, it's a valid usecase to monitor a workload without
> limiting it in any way.  I do it all the time.

That's side-tracking. This is one way to do it, not the way to do it.
The main point is that a controller can be trivially made present in a 
hierarchy, without doing anything.


>>
>> A big number of controllers creates complexity. When coding, we can
>> assume a lot less things about their relationships, and more
>> importantly: at some point people get confused. Fuck, sometimes *we*
>> get confused about which controller do what, where its
>> responsibility end and where the other's begin. And we're the ones
>> writing it! Avoiding complexity is an engineering principle, not a
>> gut feeling.
>
> And that's why I have a horrible feeling about extending the cgroup
> core to do hierarchical accounting and limiting.  See below.
>
>> Now, of course, we should aim to make things as simple as possible,
>> but not simpler: So you can argue that in Frederic's specific case,
>> it is justified. And I'd be fine with that 100 %. If I agreed...
>>
>> There are two natural points for inclusion here:
>>
>> 1) every cgroup has a task counter by itself. If we're putting the
>> tasks there anyway, this provides a natural point of accounting.
>
> I do think there is a big difference between having a list of tasks
> per individual cgroup to manage basic task-cgroup relationship on one
> hand, and accounting and limiting the number of allowed tasks over
> multi-level group hierarchies on the other.  It may seem natural on
> the face of it, but it really isn't, IMO.

It makes less sense to me now after I read Frederic's last e-mail. 
Indeed, you are both right in this point.

> To reraise a point from my other email that was ignored: do users
> actually really care about the number of tasks when they want to
> prevent forkbombs?  If a task would use neither CPU nor memory, you
> would not be interested in limiting the number of tasks.
>
> Because the number of tasks is not a resource.  CPU and memory are.
>
> So again, if we would include the memory impact of tasks properly
> (structures, kernel stack pages) in the kernel memory counters which
> we allow to limit, shouldn't this solve our problem?
>
> You said in private email that you didn't like the idea because
> administrators wouldn't know how big the kernel stack was and that the
> number of tasks would be a more natural thing to limit.  But I think
> that is actually an argument in favor of the kmem approach: the user
> has no idea how much impact a task actually has resource-wise!  On the
> other hand, he knows exactly how much memory and CPU his machine has
> and how he wants to distribute these resources.  So why provide him
> with an interface to control some number in an unknowwn unit?
>
> You don't propose we allow limiting the number of dcache entries,
> either, but rather the memory they use.
>
> The historical limiting of number of tasks through rlimit is anything
> but scientific or natural.  You essentially set it to a random value
> between allowing most users to do their job and preventing things from
> taking down the machine.  With proper resource accounting, which we
> want to have anyway, we can do much better than that, so why shouldn't
> we?

Okay.

I may agree with you, I might not.

It really depends on Frederic's real use case - (Frederic, please 
comment on it).

If we're trying to limit the number of processes *as a way* of limiting 
the amount of memory they use, then yes, what you say makes total sense.

I was always under the assumption that they wanted something more. One 
of the things I remember reading on the descriptions, was that some 
services shouldn't be allowed to fork after a certain point. Then you 
could limit its amount of processes to whatever value it has now.

For that, stack usage may not help for much.

Now, my personal take on this: Use cases like that, if really needed, 
can be achieved in some other ways, that does not even involve cgroups.

One of the things for the near future, is start putting more kinds of 
data in the kmem controller. Things like page tables and the stack are 
the natural candidates. I am in Hannes side in saying that it should be 
enough to disallow any malicious container to do any harm.

But it is not even necessary!

- Each process has a task struct
- task struct comes from the slab.

Even my slab accounting patches are enough to prevent harm outside the 
container. Because if you fill all your kmem with task_structs, you will 
be stopped to go any further.

As a matter of fact, I've being doing it all the time during the last 
few days while testing the patchset.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 16:34           ` Glauber Costa
@ 2012-04-12 16:59               ` Frederic Weisbecker
  -1 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 16:59 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On Thu, Apr 12, 2012 at 01:34:50PM -0300, Glauber Costa wrote:
> On 04/12/2012 11:55 AM, Frederic Weisbecker wrote:
> >I don't know how the kernel stack is allocated for tasks. Do you mean
> >that we allocate a chunck of it for each new task and we could rely
> >on that?
> >
> More than this: amount of kernel stack is really, really something
> indirect if what you want to track is # of processes. Now, Hannes
> made a fair point in his other e-mail about what is a resource and
> what is not.

I start to consider this option, are there other people interested
in accounting/limiting kernel stack as well?

> 
> >>>  After all, we would only restrict the number of tasks for the
> >>>  resources they require
> >It depends if the kernel stack can have other kind of "consumer".
> >
> It also depends on what you really want to achieve.
> If you want to prevent fork bombs, limiting kernel stack will do just fine.

I want:

a) to prevent the forkbomb from going far enough to DDOS the machine
b) to be able to kill that forkbomb once detected, in one go without race
against concurrent forks.

I think a) can work just fine with kernel stack limiting. I also need
to be notified about the fact we reached the limit. And b) should
be feasible with the help of the cgroup freezer. 

> 
> Is there anything for which you need to know exactly the number of
> processes?

No that's really about prevent/kill forkbomb as far as I'm concerned.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 16:59               ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 16:59 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Johannes Weiner, Hugh Dickins, Andrew Morton, KAMEZAWA Hiroyuki,
	Tejun Heo, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Thu, Apr 12, 2012 at 01:34:50PM -0300, Glauber Costa wrote:
> On 04/12/2012 11:55 AM, Frederic Weisbecker wrote:
> >I don't know how the kernel stack is allocated for tasks. Do you mean
> >that we allocate a chunck of it for each new task and we could rely
> >on that?
> >
> More than this: amount of kernel stack is really, really something
> indirect if what you want to track is # of processes. Now, Hannes
> made a fair point in his other e-mail about what is a resource and
> what is not.

I start to consider this option, are there other people interested
in accounting/limiting kernel stack as well?

> 
> >>>  After all, we would only restrict the number of tasks for the
> >>>  resources they require
> >It depends if the kernel stack can have other kind of "consumer".
> >
> It also depends on what you really want to achieve.
> If you want to prevent fork bombs, limiting kernel stack will do just fine.

I want:

a) to prevent the forkbomb from going far enough to DDOS the machine
b) to be able to kill that forkbomb once detected, in one go without race
against concurrent forks.

I think a) can work just fine with kernel stack limiting. I also need
to be notified about the fact we reached the limit. And b) should
be feasible with the help of the cgroup freezer. 

> 
> Is there anything for which you need to know exactly the number of
> processes?

No that's really about prevent/kill forkbomb as far as I'm concerned.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg)
  2012-04-12 16:38                     ` Tejun Heo
@ 2012-04-12 17:04                         ` Glauber Costa
  -1 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 17:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

On 04/12/2012 01:38 PM, Tejun Heo wrote:
> Hello, Johannes.
>
> On Thu, Apr 12, 2012 at 05:30:55PM +0200, Johannes Weiner wrote:
>>> But also, I believe this has been widely discussed in person by
>>> people, in separate groups. Maybe Tejun can do a small writeup of
>>> where we stand?
>
> I'm still mulling over it.  What I want is single hierarchy with more
> unified rules regarding how different controllers handle the hierarchy
> in such way that different controllers may interact - ie. memcg and
> blkio can share the same page tags, or cgroup-freezer can provide
> freezing service to other controllers.  I want if a task belongs to
> cgroup, it belongs to _the_ cgroup and you can figure out all cgroup
> related stuff from there.

Agreed.
If you would allow me to side track (I'll answer the rest of the e-mail 
separately to avoid confusion)

One of the things I attempted in cpu/cpuacct, is to patch in code with 
static_branches when they are comounted. I so far temporarily failed 
because some locking order rules, and could not yet go back to that.

But we could do something more general, that could work at least until
we actually finally finish whatever rework we're doing.

1) Right now, the controllers work independently, and that code will 
have to live for at least some years anyway. So leave it there.

2) But also, insert optimization code that can be enabled/disabled when 
companion cgroups are in the same hierarchy.

3) After we mount the cgroup, apply those optimization to all of them 
from the cgroup core (the current bind stuff is just way to weird for 
that, IMHO)

4) Then we start telling userspace people to favor co-mounts as much as 
they can

5) Pray.

Of course this is a sketch, but what do you think?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg)
@ 2012-04-12 17:04                         ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 17:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/12/2012 01:38 PM, Tejun Heo wrote:
> Hello, Johannes.
>
> On Thu, Apr 12, 2012 at 05:30:55PM +0200, Johannes Weiner wrote:
>>> But also, I believe this has been widely discussed in person by
>>> people, in separate groups. Maybe Tejun can do a small writeup of
>>> where we stand?
>
> I'm still mulling over it.  What I want is single hierarchy with more
> unified rules regarding how different controllers handle the hierarchy
> in such way that different controllers may interact - ie. memcg and
> blkio can share the same page tags, or cgroup-freezer can provide
> freezing service to other controllers.  I want if a task belongs to
> cgroup, it belongs to _the_ cgroup and you can figure out all cgroup
> related stuff from there.

Agreed.
If you would allow me to side track (I'll answer the rest of the e-mail 
separately to avoid confusion)

One of the things I attempted in cpu/cpuacct, is to patch in code with 
static_branches when they are comounted. I so far temporarily failed 
because some locking order rules, and could not yet go back to that.

But we could do something more general, that could work at least until
we actually finally finish whatever rework we're doing.

1) Right now, the controllers work independently, and that code will 
have to live for at least some years anyway. So leave it there.

2) But also, insert optimization code that can be enabled/disabled when 
companion cgroups are in the same hierarchy.

3) After we mount the cgroup, apply those optimization to all of them 
from the cgroup core (the current bind stuff is just way to weird for 
that, IMHO)

4) Then we start telling userspace people to favor co-mounts as much as 
they can

5) Pray.

Of course this is a sketch, but what do you think?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 16:38                     ` Tejun Heo
@ 2012-04-12 17:13                         ` Glauber Costa
  -1 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 17:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

>
> The reason why I asked Frederic whether it would make more sense as
> part of memcg wasn't about flexibility but mostly about the type of
> the resource.  I'll continue below.
>
>>> Agree. Even people aiming for unified hierarchies are okay with an
>>> opt-in/out system, I believe. So the controllers need not to be
>>> active at all times. One way of doing this is what I suggested to
>>> Frederic: If you don't limit, don't account.
>>
>> I don't agree, it's a valid usecase to monitor a workload without
>> limiting it in any way.  I do it all the time.
>
> AFAICS, this seems to be the most valid use case for different
> controllers seeing different part of the hierarchy, even if the
> hierarchies aren't completely separate.  Accounting and control being
> in separate controllers is pretty sucky too as it ends up accounting
> things multiple times.  Maybe all controllers should learn how to do
> accounting w/o applying limits?  Not sure yet.

Well...

* I don't know how blkcgrp applies limits
* the cpu cgroup, is limiting by nature, in the sense that it divides 
shares in proportion to the number of cgroups in a hierarchy
* memcg has a RESOURCE_MAX default limit that is bigger than anything 
you can possibly count.

So one of the problems, is that "limiting" may mean different thing to 
each controller.

I am mostly talking about memory cgroup here. And there. "Accounting 
without limiting" can trivially be done by setting limit to 
RESOURCE_MAX-delta. This won't work when we start having machines with 
2^64 physical memory, but I guess we have some time until it happens.

The way I see, it's just a technicality over a way to runtime disable 
the accounting of a resource without filling the hierarchy with flags.


>> To reraise a point from my other email that was ignored: do users
>> actually really care about the number of tasks when they want to
>> prevent forkbombs?  If a task would use neither CPU nor memory, you
>> would not be interested in limiting the number of tasks.
>>
>> Because the number of tasks is not a resource.  CPU and memory are.
>>
>> So again, if we would include the memory impact of tasks properly
>> (structures, kernel stack pages) in the kernel memory counters which
>> we allow to limit, shouldn't this solve our problem?
>
> The task counter is trying to control the *number* of tasks, which is
> purely memory overhead.

No, it is not. As we talk, it is becoming increasingly clear that given 
the use case, the correct term is "translating task *back* into the 
actual amount of memory".

> Translating #tasks into the actual amount of
> memory isn't too trivial tho - the task stack isn't the only
> allocation and the numbers should somehow make sense to the userland
> in consistent way.  Also, I'm not sure whether this particular limit
> should live in its silo or should be summed up together as part of
> kmem (kmem itself is in its own silo after all apart from user memory,
> right?).


It is accounted together, but limited separately. Setting 
memory.kmem.limit > memory.limit is a trivial way to say "Don't limit 
kmem". (and yet account it)

Same thing would go for a stack limit (Well, assuming it won't be merged 
into kmem itself as well)

> So, if those can be settled, I think protecting against fork
> bombs could fit memcg better in the sense that the whole thing makes
> more sense.

I myself will advise against merging anything not byte-based to memcg.
"task counter" is not byte-based.
"fork bomb preventer" might be.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 17:13                         ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 17:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

>
> The reason why I asked Frederic whether it would make more sense as
> part of memcg wasn't about flexibility but mostly about the type of
> the resource.  I'll continue below.
>
>>> Agree. Even people aiming for unified hierarchies are okay with an
>>> opt-in/out system, I believe. So the controllers need not to be
>>> active at all times. One way of doing this is what I suggested to
>>> Frederic: If you don't limit, don't account.
>>
>> I don't agree, it's a valid usecase to monitor a workload without
>> limiting it in any way.  I do it all the time.
>
> AFAICS, this seems to be the most valid use case for different
> controllers seeing different part of the hierarchy, even if the
> hierarchies aren't completely separate.  Accounting and control being
> in separate controllers is pretty sucky too as it ends up accounting
> things multiple times.  Maybe all controllers should learn how to do
> accounting w/o applying limits?  Not sure yet.

Well...

* I don't know how blkcgrp applies limits
* the cpu cgroup, is limiting by nature, in the sense that it divides 
shares in proportion to the number of cgroups in a hierarchy
* memcg has a RESOURCE_MAX default limit that is bigger than anything 
you can possibly count.

So one of the problems, is that "limiting" may mean different thing to 
each controller.

I am mostly talking about memory cgroup here. And there. "Accounting 
without limiting" can trivially be done by setting limit to 
RESOURCE_MAX-delta. This won't work when we start having machines with 
2^64 physical memory, but I guess we have some time until it happens.

The way I see, it's just a technicality over a way to runtime disable 
the accounting of a resource without filling the hierarchy with flags.


>> To reraise a point from my other email that was ignored: do users
>> actually really care about the number of tasks when they want to
>> prevent forkbombs?  If a task would use neither CPU nor memory, you
>> would not be interested in limiting the number of tasks.
>>
>> Because the number of tasks is not a resource.  CPU and memory are.
>>
>> So again, if we would include the memory impact of tasks properly
>> (structures, kernel stack pages) in the kernel memory counters which
>> we allow to limit, shouldn't this solve our problem?
>
> The task counter is trying to control the *number* of tasks, which is
> purely memory overhead.

No, it is not. As we talk, it is becoming increasingly clear that given 
the use case, the correct term is "translating task *back* into the 
actual amount of memory".

> Translating #tasks into the actual amount of
> memory isn't too trivial tho - the task stack isn't the only
> allocation and the numbers should somehow make sense to the userland
> in consistent way.  Also, I'm not sure whether this particular limit
> should live in its silo or should be summed up together as part of
> kmem (kmem itself is in its own silo after all apart from user memory,
> right?).


It is accounted together, but limited separately. Setting 
memory.kmem.limit > memory.limit is a trivial way to say "Don't limit 
kmem". (and yet account it)

Same thing would go for a stack limit (Well, assuming it won't be merged 
into kmem itself as well)

> So, if those can be settled, I think protecting against fork
> bombs could fit memcg better in the sense that the whole thing makes
> more sense.

I myself will advise against merging anything not byte-based to memcg.
"task counter" is not byte-based.
"fork bomb preventer" might be.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                     ` <20120412163825.GB13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-04-12 17:04                         ` Glauber Costa
  2012-04-12 17:13                         ` Glauber Costa
@ 2012-04-12 17:23                       ` Johannes Weiner
  2 siblings, 0 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12 17:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Cgroups, Andrew Morton

On Thu, Apr 12, 2012 at 09:38:25AM -0700, Tejun Heo wrote:
> Hello, Johannes.
> 
> On Thu, Apr 12, 2012 at 05:30:55PM +0200, Johannes Weiner wrote:
> > To reraise a point from my other email that was ignored: do users
> > actually really care about the number of tasks when they want to
> > prevent forkbombs?  If a task would use neither CPU nor memory, you
> > would not be interested in limiting the number of tasks.
> > 
> > Because the number of tasks is not a resource.  CPU and memory are.
> >
> > So again, if we would include the memory impact of tasks properly
> > (structures, kernel stack pages) in the kernel memory counters which
> > we allow to limit, shouldn't this solve our problem?
> 
> The task counter is trying to control the *number* of tasks, which is
> purely memory overhead.  Translating #tasks into the actual amount of
> memory isn't too trivial tho - the task stack isn't the only
> allocation and the numbers should somehow make sense to the userland
> in consistent way.

But why would you ever even care about that number, though?  It has no
intrinsic value.  We used it in a past because we had no other control
over kernel memory and CPU usage.

Even if we start out accounting just the kernel stack (which should be
the biggest chunk), it won't be less accurate than limiting numbers of
tasks.  It's just a different unit, but one which we can account and
limit with less extra code, and even improve as we go along.

[ You could have tuned your task counter limit perfectly to one kernel
  version, the next version will have changed the memory required per
  task, file, random object, and suddenly your working setup runs out
  of memory.  So it's not like starting with kernel stack and adding
  more stuff later would be any less predictable. ]

I don't think anyone wants to come back in a few months and discuss
where the nr-of-open-files counter subsystem should live.

> Also, I'm not sure whether this particular limit should live in its
> silo or should be summed up together as part of kmem (kmem itself is
> in its own silo after all apart from user memory, right?).

There is k and u+k.  I don't see a technical problem with adding a
separate stat for it later, but also not a particular reason to treat
it differently, because it's nothing special.  It's just kernel
memory.  Do you care if your cgroup has 2M tasks with one open socket
each or one task with 2M sockets, as long as the group plays along
nicely with the others?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 16:38                     ` Tejun Heo
  (?)
  (?)
@ 2012-04-12 17:23                     ` Johannes Weiner
       [not found]                       ` <20120412172309.GM1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  -1 siblings, 1 reply; 88+ messages in thread
From: Johannes Weiner @ 2012-04-12 17:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Glauber Costa, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On Thu, Apr 12, 2012 at 09:38:25AM -0700, Tejun Heo wrote:
> Hello, Johannes.
> 
> On Thu, Apr 12, 2012 at 05:30:55PM +0200, Johannes Weiner wrote:
> > To reraise a point from my other email that was ignored: do users
> > actually really care about the number of tasks when they want to
> > prevent forkbombs?  If a task would use neither CPU nor memory, you
> > would not be interested in limiting the number of tasks.
> > 
> > Because the number of tasks is not a resource.  CPU and memory are.
> >
> > So again, if we would include the memory impact of tasks properly
> > (structures, kernel stack pages) in the kernel memory counters which
> > we allow to limit, shouldn't this solve our problem?
> 
> The task counter is trying to control the *number* of tasks, which is
> purely memory overhead.  Translating #tasks into the actual amount of
> memory isn't too trivial tho - the task stack isn't the only
> allocation and the numbers should somehow make sense to the userland
> in consistent way.

But why would you ever even care about that number, though?  It has no
intrinsic value.  We used it in a past because we had no other control
over kernel memory and CPU usage.

Even if we start out accounting just the kernel stack (which should be
the biggest chunk), it won't be less accurate than limiting numbers of
tasks.  It's just a different unit, but one which we can account and
limit with less extra code, and even improve as we go along.

[ You could have tuned your task counter limit perfectly to one kernel
  version, the next version will have changed the memory required per
  task, file, random object, and suddenly your working setup runs out
  of memory.  So it's not like starting with kernel stack and adding
  more stuff later would be any less predictable. ]

I don't think anyone wants to come back in a few months and discuss
where the nr-of-open-files counter subsystem should live.

> Also, I'm not sure whether this particular limit should live in its
> silo or should be summed up together as part of kmem (kmem itself is
> in its own silo after all apart from user memory, right?).

There is k and u+k.  I don't see a technical problem with adding a
separate stat for it later, but also not a particular reason to treat
it differently, because it's nothing special.  It's just kernel
memory.  Do you care if your cgroup has 2M tasks with one open socket
each or one task with 2M sockets, as long as the group plays along
nicely with the others?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 17:23                     ` Johannes Weiner
@ 2012-04-12 17:41                           ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-12 17:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Cgroups, Andrew Morton

Hello, Johannes.

On Thu, Apr 12, 2012 at 07:23:09PM +0200, Johannes Weiner wrote:
> > The task counter is trying to control the *number* of tasks, which is
> > purely memory overhead.  Translating #tasks into the actual amount of
> > memory isn't too trivial tho - the task stack isn't the only
> > allocation and the numbers should somehow make sense to the userland
> > in consistent way.
> 
> But why would you ever even care about that number, though?  It has no
> intrinsic value.  We used it in a past because we had no other control
> over kernel memory and CPU usage.

I was describing task_counter as implemented mostly to point out that
it's not cpu related.  It's fundamentally a memory overhead which is
coarsely / impreciesly mapped to some number, so umm... we're actually
agreeing.

> Even if we start out accounting just the kernel stack (which should be
> the biggest chunk), it won't be less accurate than limiting numbers of
> tasks.  It's just a different unit, but one which we can account and
> limit with less extra code, and even improve as we go along.
> 
> [ You could have tuned your task counter limit perfectly to one kernel
>   version, the next version will have changed the memory required per
>   task, file, random object, and suddenly your working setup runs out
>   of memory.  So it's not like starting with kernel stack and adding
>   more stuff later would be any less predictable. ]
> 
> I don't think anyone wants to come back in a few months and discuss
> where the nr-of-open-files counter subsystem should live.
>
> > Also, I'm not sure whether this particular limit should live in its
> > silo or should be summed up together as part of kmem (kmem itself is
> > in its own silo after all apart from user memory, right?).
> 
> There is k and u+k.  I don't see a technical problem with adding a
> separate stat for it later, but also not a particular reason to treat
> it differently, because it's nothing special.  It's just kernel
> memory.  Do you care if your cgroup has 2M tasks with one open socket
> each or one task with 2M sockets, as long as the group plays along
> nicely with the others?

I'm still split on the issue.

* #tasks as unit of accounting / limiting is well understood (or at
  least known).  I think this holds the same to #open files, to a
  lesser extent.  It means there are and will continue to be people
  wanting them.  So, they have some value in familiarity - "but... I
  want to limit the resources consumed by tasks cuz that's what I
  know!" factor.

* People could want counting and limiting #tasks or #open files
  without the overhead of tracking all memory resources.  This stems
  from the same reason #tasks was used for this sort of things in the
  first place.  Counting tasks or open files tends to be easier and
  cheaper than tracking all memory allocations.

So, there's spectrum of solutions between merging task counter and
just directing everyone to kmem without distinguishing task resource
at all, and at the moment voices in my head are succeeding at making
cases for both directions.  What do you guys think about the above two
issues?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-12 17:41                           ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-12 17:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Glauber Costa, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

Hello, Johannes.

On Thu, Apr 12, 2012 at 07:23:09PM +0200, Johannes Weiner wrote:
> > The task counter is trying to control the *number* of tasks, which is
> > purely memory overhead.  Translating #tasks into the actual amount of
> > memory isn't too trivial tho - the task stack isn't the only
> > allocation and the numbers should somehow make sense to the userland
> > in consistent way.
> 
> But why would you ever even care about that number, though?  It has no
> intrinsic value.  We used it in a past because we had no other control
> over kernel memory and CPU usage.

I was describing task_counter as implemented mostly to point out that
it's not cpu related.  It's fundamentally a memory overhead which is
coarsely / impreciesly mapped to some number, so umm... we're actually
agreeing.

> Even if we start out accounting just the kernel stack (which should be
> the biggest chunk), it won't be less accurate than limiting numbers of
> tasks.  It's just a different unit, but one which we can account and
> limit with less extra code, and even improve as we go along.
> 
> [ You could have tuned your task counter limit perfectly to one kernel
>   version, the next version will have changed the memory required per
>   task, file, random object, and suddenly your working setup runs out
>   of memory.  So it's not like starting with kernel stack and adding
>   more stuff later would be any less predictable. ]
> 
> I don't think anyone wants to come back in a few months and discuss
> where the nr-of-open-files counter subsystem should live.
>
> > Also, I'm not sure whether this particular limit should live in its
> > silo or should be summed up together as part of kmem (kmem itself is
> > in its own silo after all apart from user memory, right?).
> 
> There is k and u+k.  I don't see a technical problem with adding a
> separate stat for it later, but also not a particular reason to treat
> it differently, because it's nothing special.  It's just kernel
> memory.  Do you care if your cgroup has 2M tasks with one open socket
> each or one task with 2M sockets, as long as the group plays along
> nicely with the others?

I'm still split on the issue.

* #tasks as unit of accounting / limiting is well understood (or at
  least known).  I think this holds the same to #open files, to a
  lesser extent.  It means there are and will continue to be people
  wanting them.  So, they have some value in familiarity - "but... I
  want to limit the resources consumed by tasks cuz that's what I
  know!" factor.

* People could want counting and limiting #tasks or #open files
  without the overhead of tracking all memory resources.  This stems
  from the same reason #tasks was used for this sort of things in the
  first place.  Counting tasks or open files tends to be easier and
  cheaper than tracking all memory allocations.

So, there's spectrum of solutions between merging task counter and
just directing everyone to kmem without distinguishing task resource
at all, and at the moment voices in my head are succeeding at making
cases for both directions.  What do you guys think about the above two
issues?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                           ` <20120412174155.GC13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-04-12 17:53                             ` Glauber Costa
  2012-04-13  1:42                             ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 17:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

On 04/12/2012 02:41 PM, Tejun Heo wrote:
> I'm still split on the issue.
>
> * #tasks as unit of accounting / limiting is well understood (or at
>    least known).  I think this holds the same to #open files, to a
>    lesser extent.  It means there are and will continue to be people
>    wanting them.  So, they have some value in familiarity - "but... I
>    want to limit the resources consumed by tasks cuz that's what I
>    know!" factor.
>
> * People could want counting and limiting #tasks or #open files
>    without the overhead of tracking all memory resources.  This stems
>    from the same reason #tasks was used for this sort of things in the
>    first place.  Counting tasks or open files tends to be easier and
>    cheaper than tracking all memory allocations.
>
> So, there's spectrum of solutions between merging task counter and
> just directing everyone to kmem without distinguishing task resource
> at all, and at the moment voices in my head are succeeding at making
> cases for both directions.  What do you guys think about the above two
> issues?
>

About each of your points:

1) Quite honestly, if we were implementing what people say they want...
We'd have a lisp interpreter in the kernel by now.
At the very best, it is an issue of getting the communication right.
I really don't think this is of any concern.

2) It is dependent on the previous question/answer. Do people really 
want to account and limit that? Or do they just think they want?

Also note that we need to make memcg cheaper anyway... And right not it 
is not *that* expensive if you are not doing deep hierarchy.

user pages get cached through the stock mechanism, slab pages are not 
very frequently allocated (because first you need to exhaust the objects 
on the slab, etc)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 17:41                           ` Tejun Heo
  (?)
@ 2012-04-12 17:53                           ` Glauber Costa
  -1 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-12 17:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/12/2012 02:41 PM, Tejun Heo wrote:
> I'm still split on the issue.
>
> * #tasks as unit of accounting / limiting is well understood (or at
>    least known).  I think this holds the same to #open files, to a
>    lesser extent.  It means there are and will continue to be people
>    wanting them.  So, they have some value in familiarity - "but... I
>    want to limit the resources consumed by tasks cuz that's what I
>    know!" factor.
>
> * People could want counting and limiting #tasks or #open files
>    without the overhead of tracking all memory resources.  This stems
>    from the same reason #tasks was used for this sort of things in the
>    first place.  Counting tasks or open files tends to be easier and
>    cheaper than tracking all memory allocations.
>
> So, there's spectrum of solutions between merging task counter and
> just directing everyone to kmem without distinguishing task resource
> at all, and at the moment voices in my head are succeeding at making
> cases for both directions.  What do you guys think about the above two
> issues?
>

About each of your points:

1) Quite honestly, if we were implementing what people say they want...
We'd have a lisp interpreter in the kernel by now.
At the very best, it is an issue of getting the communication right.
I really don't think this is of any concern.

2) It is dependent on the previous question/answer. Do people really 
want to account and limit that? Or do they just think they want?

Also note that we need to make memcg cheaper anyway... And right not it 
is not *that* expensive if you are not doing deep hierarchy.

user pages get cached through the stock mechanism, slab pages are not 
very frequently allocated (because first you need to exhaust the objects 
on the slab, etc)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                           ` <20120412174155.GC13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-04-12 17:53                             ` Glauber Costa
@ 2012-04-13  1:42                             ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  1:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

(2012/04/13 2:41), Tejun Heo wrote:

> Hello, Johannes.

> I'm still split on the issue.
> 
> * #tasks as unit of accounting / limiting is well understood (or at
>   least known).  I think this holds the same to #open files, to a
>   lesser extent.  It means there are and will continue to be people
>   wanting them.  So, they have some value in familiarity - "but... I
>   want to limit the resources consumed by tasks cuz that's what I
>   know!" factor.
> 
> * People could want counting and limiting #tasks or #open files
>   without the overhead of tracking all memory resources.  This stems
>   from the same reason #tasks was used for this sort of things in the
>   first place.  Counting tasks or open files tends to be easier and
>   cheaper than tracking all memory allocations.
> 
> So, there's spectrum of solutions between merging task counter and
> just directing everyone to kmem without distinguishing task resource
> at all, and at the moment voices in my head are succeeding at making
> cases for both directions.  What do you guys think about the above two
> issues?
> 


To be honest, I doubt that task counter is unnecessary...memcg can catch
oom situation well. I often test 'make -j' under memcg.

To the questions
*   It sounds like a 'ulimit' cgroup. How about overwriting
    ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
    I'm not sure it can be hierarchical and doesn't break userland.

    If people wants to limit the number of tasks, I think interface should provide it
    in the unit of objects. Then, I'm ok to have other subsystem for counting something.
    fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
    is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
    So, having task-limit cgroup subsys for a sandbox will make sense in some situation.

In short, I don't think it's better to have task-counting and fd-counting in memcg.
It's kmem, but it's more than that, I think.
Please provide subsys like ulimit.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 17:41                           ` Tejun Heo
  (?)
  (?)
@ 2012-04-13  1:42                           ` KAMEZAWA Hiroyuki
  2012-04-17 15:41                             ` Tejun Heo
       [not found]                             ` <4F878480.60505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  -1 siblings, 2 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  1:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, Glauber Costa, Frederic Weisbecker,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

(2012/04/13 2:41), Tejun Heo wrote:

> Hello, Johannes.

> I'm still split on the issue.
> 
> * #tasks as unit of accounting / limiting is well understood (or at
>   least known).  I think this holds the same to #open files, to a
>   lesser extent.  It means there are and will continue to be people
>   wanting them.  So, they have some value in familiarity - "but... I
>   want to limit the resources consumed by tasks cuz that's what I
>   know!" factor.
> 
> * People could want counting and limiting #tasks or #open files
>   without the overhead of tracking all memory resources.  This stems
>   from the same reason #tasks was used for this sort of things in the
>   first place.  Counting tasks or open files tends to be easier and
>   cheaper than tracking all memory allocations.
> 
> So, there's spectrum of solutions between merging task counter and
> just directing everyone to kmem without distinguishing task resource
> at all, and at the moment voices in my head are succeeding at making
> cases for both directions.  What do you guys think about the above two
> issues?
> 


To be honest, I doubt that task counter is unnecessary...memcg can catch
oom situation well. I often test 'make -j' under memcg.

To the questions
*   It sounds like a 'ulimit' cgroup. How about overwriting
    ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
    I'm not sure it can be hierarchical and doesn't break userland.

    If people wants to limit the number of tasks, I think interface should provide it
    in the unit of objects. Then, I'm ok to have other subsystem for counting something.
    fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
    is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
    So, having task-limit cgroup subsys for a sandbox will make sense in some situation.

In short, I don't think it's better to have task-counting and fd-counting in memcg.
It's kmem, but it's more than that, I think.
Please provide subsys like ulimit.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-13  1:42                           ` KAMEZAWA Hiroyuki
@ 2012-04-13  1:50                                 ` Glauber Costa
       [not found]                             ` <4F878480.60505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  1 sibling, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-13  1:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Tejun Heo,
	Cgroups, Andrew Morton

On 04/12/2012 10:42 PM, KAMEZAWA Hiroyuki wrote:
> To be honest, I doubt that task counter is unnecessary...memcg can catch
> oom situation well. I often test 'make -j' under memcg.
>
> To the questions
> *   It sounds like a 'ulimit' cgroup. How about overwriting
>      ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
>      I'm not sure it can be hierarchical and doesn't break userland.
>
>      If people wants to limit the number of tasks, I think interface should provide it
>      in the unit of objects. Then, I'm ok to have other subsystem for counting something.
>      fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
>      is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
>      So, having task-limit cgroup subsys for a sandbox will make sense in some situation.
>
> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> It's kmem, but it's more than that, I think.
> Please provide subsys like ulimit.
>
Kame,

You're talking about the memcg that is in the kernel today.
I think the discussion is orbiting around how it is going to be once we
start tracking kernel memory like the slab (for task_struct), or kernel 
stack pages.

In those scenarios, a fork bomb will be stopped anyway, because it will 
need kernel memory it can't grab.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-13  1:50                                 ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-13  1:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Tejun Heo, Johannes Weiner, Frederic Weisbecker, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On 04/12/2012 10:42 PM, KAMEZAWA Hiroyuki wrote:
> To be honest, I doubt that task counter is unnecessary...memcg can catch
> oom situation well. I often test 'make -j' under memcg.
>
> To the questions
> *   It sounds like a 'ulimit' cgroup. How about overwriting
>      ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
>      I'm not sure it can be hierarchical and doesn't break userland.
>
>      If people wants to limit the number of tasks, I think interface should provide it
>      in the unit of objects. Then, I'm ok to have other subsystem for counting something.
>      fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
>      is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
>      So, having task-limit cgroup subsys for a sandbox will make sense in some situation.
>
> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> It's kmem, but it's more than that, I think.
> Please provide subsys like ulimit.
>
Kame,

You're talking about the memcg that is in the kernel today.
I think the discussion is orbiting around how it is going to be once we
start tracking kernel memory like the slab (for task_struct), or kernel 
stack pages.

In those scenarios, a fork bomb will be stopped anyway, because it will 
need kernel memory it can't grab.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                 ` <4F87865F.5060701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-13  2:48                                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  2:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Tejun Heo,
	Cgroups, Andrew Morton

(2012/04/13 10:50), Glauber Costa wrote:

> On 04/12/2012 10:42 PM, KAMEZAWA Hiroyuki wrote:
>> To be honest, I doubt that task counter is unnecessary...memcg can catch
>> oom situation well. I often test 'make -j' under memcg.
>>
>> To the questions
>> *   It sounds like a 'ulimit' cgroup. How about overwriting
>>      ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
>>      I'm not sure it can be hierarchical and doesn't break userland.
>>
>>      If people wants to limit the number of tasks, I think interface should provide it
>>      in the unit of objects. Then, I'm ok to have other subsystem for counting something.
>>      fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
>>      is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
>>      So, having task-limit cgroup subsys for a sandbox will make sense in some situation.
>>
>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>> It's kmem, but it's more than that, I think.
>> Please provide subsys like ulimit.
>>
> Kame,
> 
> You're talking about the memcg that is in the kernel today.
> I think the discussion is orbiting around how it is going to be once we
> start tracking kernel memory like the slab (for task_struct), or kernel 
> stack pages.
> 
> In those scenarios, a fork bomb will be stopped anyway, because it will 
> need kernel memory it can't grab.
> 



fork-bomb can be caught by some method.


I just consider about 'task' cgroup. You can know the number of tasks by reading
tasks file even if we don't have task cgroup. Because of this, using
task cgroup for accounting the number of tasks doesn't make sense to me.
But, here, Tejun? mentioned accounting the number of 'fd'. 

Hearing that, I think of ulimit.

We do resource accounting based on cgroup. But there are another limiting
feature as ulimit, sysctl, etc...This makes total view of resource accounting in Linux
complicated. So, I wonder whether cgroup can be a unified control feature and have
subsys for ulimit or ipc, by overriding other control stuffs.

But  resources which doesn't belong to 'thread' ...as memory may add something
messy to cgroup, it's accounting resources based on threads.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-13  1:50                                 ` Glauber Costa
  (?)
@ 2012-04-13  2:48                                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  2:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Tejun Heo, Johannes Weiner, Frederic Weisbecker, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

(2012/04/13 10:50), Glauber Costa wrote:

> On 04/12/2012 10:42 PM, KAMEZAWA Hiroyuki wrote:
>> To be honest, I doubt that task counter is unnecessary...memcg can catch
>> oom situation well. I often test 'make -j' under memcg.
>>
>> To the questions
>> *   It sounds like a 'ulimit' cgroup. How about overwriting
>>      ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
>>      I'm not sure it can be hierarchical and doesn't break userland.
>>
>>      If people wants to limit the number of tasks, I think interface should provide it
>>      in the unit of objects. Then, I'm ok to have other subsystem for counting something.
>>      fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
>>      is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
>>      So, having task-limit cgroup subsys for a sandbox will make sense in some situation.
>>
>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>> It's kmem, but it's more than that, I think.
>> Please provide subsys like ulimit.
>>
> Kame,
> 
> You're talking about the memcg that is in the kernel today.
> I think the discussion is orbiting around how it is going to be once we
> start tracking kernel memory like the slab (for task_struct), or kernel 
> stack pages.
> 
> In those scenarios, a fork bomb will be stopped anyway, because it will 
> need kernel memory it can't grab.
> 



fork-bomb can be caught by some method.


I just consider about 'task' cgroup. You can know the number of tasks by reading
tasks file even if we don't have task cgroup. Because of this, using
task cgroup for accounting the number of tasks doesn't make sense to me.
But, here, Tejun? mentioned accounting the number of 'fd'. 

Hearing that, I think of ulimit.

We do resource accounting based on cgroup. But there are another limiting
feature as ulimit, sysctl, etc...This makes total view of resource accounting in Linux
complicated. So, I wonder whether cgroup can be a unified control feature and have
subsys for ulimit or ipc, by overriding other control stuffs.

But  resources which doesn't belong to 'thread' ...as memory may add something
messy to cgroup, it's accounting resources based on threads.

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]     ` <4F86527C.2080507-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
@ 2012-04-17  1:09       ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-17  1:09 UTC (permalink / raw)
  To: Alexander Nikiforov
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On Thu, Apr 12, 2012 at 07:56:44AM +0400, Alexander Nikiforov wrote:
> On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
> >Hi,
> >
> >While talking with Tejun about targetting the cgroup task counter subsystem
> >for the next merge window, he suggested to check if this could be merged into
> >the memcg subsystem rather than creating a new one cgroup subsystem just
> >for task count limit purpose.
> >
> >So I'm pinging you guys to seek your insight.
> >
> >I assume not everybody in the Cc list knows what the task counter subsystem
> >is all about. So here is a summary: this is a cgroup subsystem (latest version
> >in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> >present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> >maintain this accounting visible to a special tasks.usage file. The user can
> >set a limit on the number of tasks by writing on the tasks.limit file.
> >Further forks or cgroup migration are then rejected if the limit is exceeded.
> >
> >This feature is especially useful to protect against forkbombs in containers.
> >Or more generally to limit the resources on the number of tasks on a cgroup
> >as it involves some kernel memory allocation.
> >
> >Now the dilemna is how to implement it?
> >
> >1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> >
> >2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> >because this is about kernel memory allocation limitation. We could have a
> >memory.kmem.tasks.count
> >
> >My personal opinion is that the task counter brings some overhead: a charge
> >across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> >And this overhead happens even in the off-case (when the task counter susbsystem
> >is mounted but the limit is the default: ULLONG_MAX).
> >
> >So if we choose the second solution, this overhead will be added unconditionally
> >to memcg.
> >But I don't expect every users of memcg will need the task counter. So perhaps
> >the overhead should be kept in its own separate subsystem.
> >
> >OTOH memory.kmem.* interface would have be a good fit.
> >
> >What do you think?
> >--
> >To unsubscribe from this list: send the line "unsubscribe cgroups" in
> >the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Hi,
> 
> I'm agree that this is memory related thing, but I prefer this as a
> separate subsystem.
> Yes it has some impact on a system, but on the other hand we will
> have some very useful tool to track tasks state.
> As I wrote before
> 
> http://comments.gmane.org/gmane.linux.kernel.cgroups/1448
> 
> it'll very useful to have event in the userspace about fork/exit
> about group of the processes.

I need more clarifications about your needs. The task counter susbsytem
doesn't inform you about forks or exits unless you reach the limit on
the number of tasks.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12  3:56   ` Alexander Nikiforov
       [not found]     ` <4F86527C.2080507-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
@ 2012-04-17  1:09     ` Frederic Weisbecker
  2012-04-17  6:45       ` Alexander Nikiforov
       [not found]       ` <20120417010902.GA14646-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-17  1:09 UTC (permalink / raw)
  To: Alexander Nikiforov
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, KAMEZAWA Hiroyuki,
	Glauber Costa, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On Thu, Apr 12, 2012 at 07:56:44AM +0400, Alexander Nikiforov wrote:
> On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
> >Hi,
> >
> >While talking with Tejun about targetting the cgroup task counter subsystem
> >for the next merge window, he suggested to check if this could be merged into
> >the memcg subsystem rather than creating a new one cgroup subsystem just
> >for task count limit purpose.
> >
> >So I'm pinging you guys to seek your insight.
> >
> >I assume not everybody in the Cc list knows what the task counter subsystem
> >is all about. So here is a summary: this is a cgroup subsystem (latest version
> >in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
> >present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
> >maintain this accounting visible to a special tasks.usage file. The user can
> >set a limit on the number of tasks by writing on the tasks.limit file.
> >Further forks or cgroup migration are then rejected if the limit is exceeded.
> >
> >This feature is especially useful to protect against forkbombs in containers.
> >Or more generally to limit the resources on the number of tasks on a cgroup
> >as it involves some kernel memory allocation.
> >
> >Now the dilemna is how to implement it?
> >
> >1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
> >
> >2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
> >because this is about kernel memory allocation limitation. We could have a
> >memory.kmem.tasks.count
> >
> >My personal opinion is that the task counter brings some overhead: a charge
> >across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
> >And this overhead happens even in the off-case (when the task counter susbsystem
> >is mounted but the limit is the default: ULLONG_MAX).
> >
> >So if we choose the second solution, this overhead will be added unconditionally
> >to memcg.
> >But I don't expect every users of memcg will need the task counter. So perhaps
> >the overhead should be kept in its own separate subsystem.
> >
> >OTOH memory.kmem.* interface would have be a good fit.
> >
> >What do you think?
> >--
> >To unsubscribe from this list: send the line "unsubscribe cgroups" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Hi,
> 
> I'm agree that this is memory related thing, but I prefer this as a
> separate subsystem.
> Yes it has some impact on a system, but on the other hand we will
> have some very useful tool to track tasks state.
> As I wrote before
> 
> http://comments.gmane.org/gmane.linux.kernel.cgroups/1448
> 
> it'll very useful to have event in the userspace about fork/exit
> about group of the processes.

I need more clarifications about your needs. The task counter susbsytem
doesn't inform you about forks or exits unless you reach the limit on
the number of tasks.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]       ` <20120417010902.GA14646-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2012-04-17  6:45         ` Alexander Nikiforov
  0 siblings, 0 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-17  6:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/17/2012 05:09 AM, Frederic Weisbecker wrote:
> On Thu, Apr 12, 2012 at 07:56:44AM +0400, Alexander Nikiforov wrote:
>> On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
>>> Hi,
>>>
>>> While talking with Tejun about targetting the cgroup task counter subsystem
>>> for the next merge window, he suggested to check if this could be merged into
>>> the memcg subsystem rather than creating a new one cgroup subsystem just
>>> for task count limit purpose.
>>>
>>> So I'm pinging you guys to seek your insight.
>>>
>>> I assume not everybody in the Cc list knows what the task counter subsystem
>>> is all about. So here is a summary: this is a cgroup subsystem (latest version
>>> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
>>> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
>>> maintain this accounting visible to a special tasks.usage file. The user can
>>> set a limit on the number of tasks by writing on the tasks.limit file.
>>> Further forks or cgroup migration are then rejected if the limit is exceeded.
>>>
>>> This feature is especially useful to protect against forkbombs in containers.
>>> Or more generally to limit the resources on the number of tasks on a cgroup
>>> as it involves some kernel memory allocation.
>>>
>>> Now the dilemna is how to implement it?
>>>
>>> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
>>>
>>> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
>>> because this is about kernel memory allocation limitation. We could have a
>>> memory.kmem.tasks.count
>>>
>>> My personal opinion is that the task counter brings some overhead: a charge
>>> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
>>> And this overhead happens even in the off-case (when the task counter susbsystem
>>> is mounted but the limit is the default: ULLONG_MAX).
>>>
>>> So if we choose the second solution, this overhead will be added unconditionally
>>> to memcg.
>>> But I don't expect every users of memcg will need the task counter. So perhaps
>>> the overhead should be kept in its own separate subsystem.
>>>
>>> OTOH memory.kmem.* interface would have be a good fit.
>>>
>>> What do you think?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> Hi,
>>
>> I'm agree that this is memory related thing, but I prefer this as a
>> separate subsystem.
>> Yes it has some impact on a system, but on the other hand we will
>> have some very useful tool to track tasks state.
>> As I wrote before
>>
>> http://comments.gmane.org/gmane.linux.kernel.cgroups/1448
>>
>> it'll very useful to have event in the userspace about fork/exit
>> about group of the processes.
> I need more clarifications about your needs. The task counter susbsytem
> doesn't inform you about forks or exits unless you reach the limit on
> the number of tasks.
>
Hi Frederic,

yup now it doesn't have this functionality, but we can add it. Please 
look on my prevous post about this feature

http://comments.gmane.org/gmane.linux.kernel.cgroups/1448

Now userspace tools, for example libcgroup, can't catch event when 
process died/forked (or maybe moved to another group). Shortly when 
tasks file change.
According to this userspace tool go to inconsistency state when user 
manually kill process. Another example we want to balance number of 
process with several groups and make round-robin
between them. Now we have only 1 way to get notification about tasks 
file - inotify(), but this approach works only if you work from 
userspace with file (e.g. create struct file, for example with echo $$ 
/sys/abc/tasks), but when something happens from kernel side 
(do_fork()/do_exit) we cannot get any event about group of the process 
(we can scan tasks file and count number of PID, or work with
waitpid(), but IMHO this is ugly solutions)

Thx for your reply

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-17  1:09     ` Frederic Weisbecker
@ 2012-04-17  6:45       ` Alexander Nikiforov
  2012-04-17 15:23         ` Tejun Heo
       [not found]         ` <4F8D1171.1090504-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
       [not found]       ` <20120417010902.GA14646-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-17  6:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Hugh Dickins, Johannes Weiner, Andrew Morton, KAMEZAWA Hiroyuki,
	Glauber Costa, Tejun Heo, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/17/2012 05:09 AM, Frederic Weisbecker wrote:
> On Thu, Apr 12, 2012 at 07:56:44AM +0400, Alexander Nikiforov wrote:
>> On 04/11/2012 10:57 PM, Frederic Weisbecker wrote:
>>> Hi,
>>>
>>> While talking with Tejun about targetting the cgroup task counter subsystem
>>> for the next merge window, he suggested to check if this could be merged into
>>> the memcg subsystem rather than creating a new one cgroup subsystem just
>>> for task count limit purpose.
>>>
>>> So I'm pinging you guys to seek your insight.
>>>
>>> I assume not everybody in the Cc list knows what the task counter subsystem
>>> is all about. So here is a summary: this is a cgroup subsystem (latest version
>>> in https://lwn.net/Articles/478631/) that keeps track of the number of tasks
>>> present in a cgroup. Hooks are set in task fork/exit and cgroup migration to
>>> maintain this accounting visible to a special tasks.usage file. The user can
>>> set a limit on the number of tasks by writing on the tasks.limit file.
>>> Further forks or cgroup migration are then rejected if the limit is exceeded.
>>>
>>> This feature is especially useful to protect against forkbombs in containers.
>>> Or more generally to limit the resources on the number of tasks on a cgroup
>>> as it involves some kernel memory allocation.
>>>
>>> Now the dilemna is how to implement it?
>>>
>>> 1) As a standalone subsystem, as it stands currently (https://lwn.net/Articles/478631/)
>>>
>>> 2) As a feature in memcg, part of the memory.kmem.* files. This makes sense
>>> because this is about kernel memory allocation limitation. We could have a
>>> memory.kmem.tasks.count
>>>
>>> My personal opinion is that the task counter brings some overhead: a charge
>>> across the whole hierarchy at every fork, and the mirrored uncharge on task exit.
>>> And this overhead happens even in the off-case (when the task counter susbsystem
>>> is mounted but the limit is the default: ULLONG_MAX).
>>>
>>> So if we choose the second solution, this overhead will be added unconditionally
>>> to memcg.
>>> But I don't expect every users of memcg will need the task counter. So perhaps
>>> the overhead should be kept in its own separate subsystem.
>>>
>>> OTOH memory.kmem.* interface would have be a good fit.
>>>
>>> What do you think?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> Hi,
>>
>> I'm agree that this is memory related thing, but I prefer this as a
>> separate subsystem.
>> Yes it has some impact on a system, but on the other hand we will
>> have some very useful tool to track tasks state.
>> As I wrote before
>>
>> http://comments.gmane.org/gmane.linux.kernel.cgroups/1448
>>
>> it'll very useful to have event in the userspace about fork/exit
>> about group of the processes.
> I need more clarifications about your needs. The task counter susbsytem
> doesn't inform you about forks or exits unless you reach the limit on
> the number of tasks.
>
Hi Frederic,

yup now it doesn't have this functionality, but we can add it. Please 
look on my prevous post about this feature

http://comments.gmane.org/gmane.linux.kernel.cgroups/1448

Now userspace tools, for example libcgroup, can't catch event when 
process died/forked (or maybe moved to another group). Shortly when 
tasks file change.
According to this userspace tool go to inconsistency state when user 
manually kill process. Another example we want to balance number of 
process with several groups and make round-robin
between them. Now we have only 1 way to get notification about tasks 
file - inotify(), but this approach works only if you work from 
userspace with file (e.g. create struct file, for example with echo $$ 
/sys/abc/tasks), but when something happens from kernel side 
(do_fork()/do_exit) we cannot get any event about group of the process 
(we can scan tasks file and count number of PID, or work with
waitpid(), but IMHO this is ugly solutions)

Thx for your reply

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg)
       [not found]                         ` <4F870B18.5060703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-17 15:13                           ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:13 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

Hello, Glauber.

On Thu, Apr 12, 2012 at 02:04:24PM -0300, Glauber Costa wrote:
> 1) Right now, the controllers work independently, and that code will
> have to live for at least some years anyway. So leave it there.
> 
> 2) But also, insert optimization code that can be enabled/disabled
> when companion cgroups are in the same hierarchy.
> 
> 3) After we mount the cgroup, apply those optimization to all of
> them from the cgroup core (the current bind stuff is just way to
> weird for that, IMHO)
> 
> 4) Then we start telling userspace people to favor co-mounts as much
> as they can
> 
> 5) Pray.

Pretty similar to the plan that I was thinking about.

* Provide both mechanisms from the kernel while implementing new
  features / optimizations with the assumption that there's one
  hierarchy.

* Make the switch to single hierarchy from userland, probably by
  implementing a (policy based) userland thing which takes over the
  whole cgroup hierarchy.

* Phase out multiple hierarchy support from kernel slowly.

That said, there are quite a few obstacles including being able to
support most (probably not all) use cases possible under multiple
hierarchies and managing the added complexity over the transition
period.  I don't think it's gonna be easy.  Needs more thinking.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg)
  2012-04-12 17:04                         ` Glauber Costa
  (?)
@ 2012-04-17 15:13                         ` Tejun Heo
       [not found]                           ` <20120417151352.GA32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  -1 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:13 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Johannes Weiner, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

Hello, Glauber.

On Thu, Apr 12, 2012 at 02:04:24PM -0300, Glauber Costa wrote:
> 1) Right now, the controllers work independently, and that code will
> have to live for at least some years anyway. So leave it there.
> 
> 2) But also, insert optimization code that can be enabled/disabled
> when companion cgroups are in the same hierarchy.
> 
> 3) After we mount the cgroup, apply those optimization to all of
> them from the cgroup core (the current bind stuff is just way to
> weird for that, IMHO)
> 
> 4) Then we start telling userspace people to favor co-mounts as much
> as they can
> 
> 5) Pray.

Pretty similar to the plan that I was thinking about.

* Provide both mechanisms from the kernel while implementing new
  features / optimizations with the assumption that there's one
  hierarchy.

* Make the switch to single hierarchy from userland, probably by
  implementing a (policy based) userland thing which takes over the
  whole cgroup hierarchy.

* Phase out multiple hierarchy support from kernel slowly.

That said, there are quite a few obstacles including being able to
support most (probably not all) use cases possible under multiple
hierarchies and managing the added complexity over the transition
period.  I don't think it's gonna be easy.  Needs more thinking.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]               ` <20120412165922.GA12484-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2012-04-17 15:17                 ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Cgroups, Andrew Morton

Hello, Frederic.

On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
> I want:
> 
> a) to prevent the forkbomb from going far enough to DDOS the machine
> b) to be able to kill that forkbomb once detected, in one go without race
> against concurrent forks.
> 
> I think a) can work just fine with kernel stack limiting. I also need
> to be notified about the fact we reached the limit. And b) should
> be feasible with the help of the cgroup freezer. 

kmem allocation fail after reaching the limit which in turn should
fail task creation.  Isn't that the same effect as the task_counter as
implemented?

> > Is there anything for which you need to know exactly the number of
> > processes?
> 
> No that's really about prevent/kill forkbomb as far as I'm concerned.

Hmm... so, accounting overhead aside, if the only purpose is
preventing the whole machine being brought down by a fork bomb, kmem
limiting is enough, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-12 16:59               ` Frederic Weisbecker
  (?)
@ 2012-04-17 15:17               ` Tejun Heo
  2012-04-18  6:54                 ` Frederic Weisbecker
       [not found]                 ` <20120417151753.GB32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  -1 siblings, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Glauber Costa, Johannes Weiner, Hugh Dickins, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walsh, Daniel P. Berrange, Li Zefan,
	LKML, Cgroups, Containers

Hello, Frederic.

On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
> I want:
> 
> a) to prevent the forkbomb from going far enough to DDOS the machine
> b) to be able to kill that forkbomb once detected, in one go without race
> against concurrent forks.
> 
> I think a) can work just fine with kernel stack limiting. I also need
> to be notified about the fact we reached the limit. And b) should
> be feasible with the help of the cgroup freezer. 

kmem allocation fail after reaching the limit which in turn should
fail task creation.  Isn't that the same effect as the task_counter as
implemented?

> > Is there anything for which you need to know exactly the number of
> > processes?
> 
> No that's really about prevent/kill forkbomb as far as I'm concerned.

Hmm... so, accounting overhead aside, if the only purpose is
preventing the whole machine being brought down by a fork bomb, kmem
limiting is enough, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]         ` <4F8D1171.1090504-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
@ 2012-04-17 15:23           ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:23 UTC (permalink / raw)
  To: Alexander Nikiforov
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

Hello,

On Tue, Apr 17, 2012 at 10:45:05AM +0400, Alexander Nikiforov wrote:
> between them. Now we have only 1 way to get notification about tasks
> file - inotify(), but this approach works only if you work from
> userspace with file (e.g. create struct file, for example with echo
> $$ /sys/abc/tasks), but when something happens from kernel side
> (do_fork()/do_exit) we cannot get any event about group of the
> process (we can scan tasks file and count number of PID, or work
> with
> waitpid(), but IMHO this is ugly solutions)

Wouldn't simply generating FS_MODIFY event on the tasks file do the
trick?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-17  6:45       ` Alexander Nikiforov
@ 2012-04-17 15:23         ` Tejun Heo
       [not found]           ` <20120417152350.GC32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
       [not found]         ` <4F8D1171.1090504-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
  1 sibling, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:23 UTC (permalink / raw)
  To: Alexander Nikiforov
  Cc: Frederic Weisbecker, Hugh Dickins, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki, Glauber Costa, Daniel Walsh,
	Daniel P. Berrange, Li Zefan, LKML, Cgroups, Containers

Hello,

On Tue, Apr 17, 2012 at 10:45:05AM +0400, Alexander Nikiforov wrote:
> between them. Now we have only 1 way to get notification about tasks
> file - inotify(), but this approach works only if you work from
> userspace with file (e.g. create struct file, for example with echo
> $$ /sys/abc/tasks), but when something happens from kernel side
> (do_fork()/do_exit) we cannot get any event about group of the
> process (we can scan tasks file and count number of PID, or work
> with
> waitpid(), but IMHO this is ugly solutions)

Wouldn't simply generating FS_MODIFY event on the tasks file do the
trick?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg)
  2012-04-17 15:13                         ` Tejun Heo
@ 2012-04-17 15:27                               ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-17 15:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

On 04/17/2012 12:13 PM, Tejun Heo wrote:
> Pretty similar to the plan that I was thinking about.
>
> * Provide both mechanisms from the kernel while implementing new
>    features / optimizations with the assumption that there's one
>    hierarchy.
I believe the static_keys, that we are already using for the tcp
buffers can play a large role here. They can't save us from the 
complexity of still supporting multiple hierarchies in the mean time, 
but maybe nothing can.

the only problem with that, is that it is proving itself to be quite 
fragile. Because cpusets gets a bunch of function calls with the 
cgroup_mutex held from within the cpu hotplug notifier, this creates a 
lock dependency between the hotplug lock and cgroup mutex, meaning we
can't call any jump label function patching with the cgroup mutex held.

Neither we seem to be able to defer it to a worker, since it will create 
a window of opportunity in which the information presented is inconsistent.

Sigh...

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg)
@ 2012-04-17 15:27                               ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-17 15:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, Frederic Weisbecker, KAMEZAWA Hiroyuki,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

On 04/17/2012 12:13 PM, Tejun Heo wrote:
> Pretty similar to the plan that I was thinking about.
>
> * Provide both mechanisms from the kernel while implementing new
>    features / optimizations with the assumption that there's one
>    hierarchy.
I believe the static_keys, that we are already using for the tcp
buffers can play a large role here. They can't save us from the 
complexity of still supporting multiple hierarchies in the mean time, 
but maybe nothing can.

the only problem with that, is that it is proving itself to be quite 
fragile. Because cpusets gets a bunch of function calls with the 
cgroup_mutex held from within the cpu hotplug notifier, this creates a 
lock dependency between the hotplug lock and cgroup mutex, meaning we
can't call any jump label function patching with the cgroup mutex held.

Neither we seem to be able to defer it to a worker, since it will create 
a window of opportunity in which the information presented is inconsistent.

Sigh...

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                             ` <4F878480.60505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  2012-04-13  1:50                                 ` Glauber Costa
@ 2012-04-17 15:41                               ` Tejun Heo
  1 sibling, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton

Hello, KAMEZAWA.

On Fri, Apr 13, 2012 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > So, there's spectrum of solutions between merging task counter and
> > just directing everyone to kmem without distinguishing task resource
> > at all, and at the moment voices in my head are succeeding at making
> > cases for both directions.  What do you guys think about the above two
> > issues?
> 
> 
> To be honest, I doubt that task counter is unnecessary...memcg can catch
> oom situation well. I often test 'make -j' under memcg.

Heh, the double negation is confusing me.  Were you trying to say that
task_counter is necessary or was it the other way around?

> To the questions
> *   It sounds like a 'ulimit' cgroup. How about overwriting
>     ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
>     I'm not sure it can be hierarchical and doesn't break userland.
> 
>     If people wants to limit the number of tasks, I think interface should provide it
>     in the unit of objects. Then, I'm ok to have other subsystem for counting something.
>     fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
>     is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
>     So, having task-limit cgroup subsys for a sandbox will make sense in some situation.
> 
> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> It's kmem, but it's more than that, I think.
> Please provide subsys like ulimit.

So, you think that while kmem would be enough to prevent fork-bombs,
it would still make sense to limit in more traditional ways
(ie. ulimit style object limits).  Hmmm....

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-13  1:42                           ` KAMEZAWA Hiroyuki
@ 2012-04-17 15:41                             ` Tejun Heo
  2012-04-17 16:52                               ` Glauber Costa
       [not found]                               ` <20120417154117.GE32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
       [not found]                             ` <4F878480.60505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2012-04-17 15:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Glauber Costa, Frederic Weisbecker,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers

Hello, KAMEZAWA.

On Fri, Apr 13, 2012 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > So, there's spectrum of solutions between merging task counter and
> > just directing everyone to kmem without distinguishing task resource
> > at all, and at the moment voices in my head are succeeding at making
> > cases for both directions.  What do you guys think about the above two
> > issues?
> 
> 
> To be honest, I doubt that task counter is unnecessary...memcg can catch
> oom situation well. I often test 'make -j' under memcg.

Heh, the double negation is confusing me.  Were you trying to say that
task_counter is necessary or was it the other way around?

> To the questions
> *   It sounds like a 'ulimit' cgroup. How about overwriting
>     ulimit values via cgroup ? (sounds joke?) Then, overhead will be small but
>     I'm not sure it can be hierarchical and doesn't break userland.
> 
>     If people wants to limit the number of tasks, I think interface should provide it
>     in the unit of objects. Then, I'm ok to have other subsystem for counting something.
>     fork-bomb's memory overhead can be prevent by memcg. What memcg cannot handle
>     is ulimit. If forkbomb exhausts all ulimit/tasks, the user cannot login.
>     So, having task-limit cgroup subsys for a sandbox will make sense in some situation.
> 
> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> It's kmem, but it's more than that, I think.
> Please provide subsys like ulimit.

So, you think that while kmem would be enough to prevent fork-bombs,
it would still make sense to limit in more traditional ways
(ie. ulimit style object limits).  Hmmm....

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                               ` <20120417154117.GE32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-04-17 16:52                                 ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-17 16:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Cgroups,
	Andrew Morton


>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>> It's kmem, but it's more than that, I think.
>> Please provide subsys like ulimit.
>
> So, you think that while kmem would be enough to prevent fork-bombs,
> it would still make sense to limit in more traditional ways
> (ie. ulimit style object limits).  Hmmm....
>

I personally think this is namespaces business, not cgroups.
If you have a process namespace, an interface that works to limit the 
number of processes should keep working given the constraints you are 
given.

What doesn't make sense, is to create a *new* interface to limit 
something that doesn't really need to be limited, just because you
limited a similar resource before.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-17 15:41                             ` Tejun Heo
@ 2012-04-17 16:52                               ` Glauber Costa
       [not found]                                 ` <4F8D9FC4.3080800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-04-18  6:51                                 ` KAMEZAWA Hiroyuki
       [not found]                               ` <20120417154117.GE32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-17 16:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Frederic Weisbecker,
	Hugh Dickins, Andrew Morton, Daniel Walsh, Daniel P. Berrange,
	Li Zefan, LKML, Cgroups, Containers


>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>> It's kmem, but it's more than that, I think.
>> Please provide subsys like ulimit.
>
> So, you think that while kmem would be enough to prevent fork-bombs,
> it would still make sense to limit in more traditional ways
> (ie. ulimit style object limits).  Hmmm....
>

I personally think this is namespaces business, not cgroups.
If you have a process namespace, an interface that works to limit the 
number of processes should keep working given the constraints you are 
given.

What doesn't make sense, is to create a *new* interface to limit 
something that doesn't really need to be limited, just because you
limited a similar resource before.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                 ` <4F8D9FC4.3080800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-18  6:51                                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  6:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Johannes Weiner, Tejun Heo,
	Cgroups, Andrew Morton

(2012/04/18 1:52), Glauber Costa wrote:

> 
>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>> It's kmem, but it's more than that, I think.
>>> Please provide subsys like ulimit.
>>
>> So, you think that while kmem would be enough to prevent fork-bombs,
>> it would still make sense to limit in more traditional ways
>> (ie. ulimit style object limits).  Hmmm....
>>
> 
> I personally think this is namespaces business, not cgroups.
> If you have a process namespace, an interface that works to limit the 
> number of processes should keep working given the constraints you are 
> given.
> 
> What doesn't make sense, is to create a *new* interface to limit 
> something that doesn't really need to be limited, just because you
> limited a similar resource before.
> 


Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
What we need is user-id namespace, isn't it ? If we have that, ulimit
works enough fine, no overheads.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-17 16:52                               ` Glauber Costa
       [not found]                                 ` <4F8D9FC4.3080800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-04-18  6:51                                 ` KAMEZAWA Hiroyuki
  2012-04-18  7:53                                   ` Frederic Weisbecker
       [not found]                                   ` <4F8E646B.1020807-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  6:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Tejun Heo, Johannes Weiner, Frederic Weisbecker, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

(2012/04/18 1:52), Glauber Costa wrote:

> 
>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>> It's kmem, but it's more than that, I think.
>>> Please provide subsys like ulimit.
>>
>> So, you think that while kmem would be enough to prevent fork-bombs,
>> it would still make sense to limit in more traditional ways
>> (ie. ulimit style object limits).  Hmmm....
>>
> 
> I personally think this is namespaces business, not cgroups.
> If you have a process namespace, an interface that works to limit the 
> number of processes should keep working given the constraints you are 
> given.
> 
> What doesn't make sense, is to create a *new* interface to limit 
> something that doesn't really need to be limited, just because you
> limited a similar resource before.
> 


Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
What we need is user-id namespace, isn't it ? If we have that, ulimit
works enough fine, no overheads.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                 ` <20120417151753.GB32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-04-18  6:54                   ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  6:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Cgroups, Andrew Morton

On Tue, Apr 17, 2012 at 08:17:53AM -0700, Tejun Heo wrote:
> Hello, Frederic.
> 
> On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
> > I want:
> > 
> > a) to prevent the forkbomb from going far enough to DDOS the machine
> > b) to be able to kill that forkbomb once detected, in one go without race
> > against concurrent forks.
> > 
> > I think a) can work just fine with kernel stack limiting. I also need
> > to be notified about the fact we reached the limit. And b) should
> > be feasible with the help of the cgroup freezer. 
> 
> kmem allocation fail after reaching the limit which in turn should
> fail task creation.  Isn't that the same effect as the task_counter as
> implemented?

That's it.

> 
> > > Is there anything for which you need to know exactly the number of
> > > processes?
> > 
> > No that's really about prevent/kill forkbomb as far as I'm concerned.
> 
> Hmm... so, accounting overhead aside, if the only purpose is
> preventing the whole machine being brought down by a fork bomb, kmem
> limiting is enough, right?

I think so yeah.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-17 15:17               ` Tejun Heo
@ 2012-04-18  6:54                 ` Frederic Weisbecker
  2012-04-18  8:10                   ` Frederic Weisbecker
  2012-04-18  8:10                   ` Frederic Weisbecker
       [not found]                 ` <20120417151753.GB32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  6:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Glauber Costa, Johannes Weiner, Hugh Dickins, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walsh, Daniel P. Berrange, Li Zefan,
	LKML, Cgroups, Containers

On Tue, Apr 17, 2012 at 08:17:53AM -0700, Tejun Heo wrote:
> Hello, Frederic.
> 
> On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
> > I want:
> > 
> > a) to prevent the forkbomb from going far enough to DDOS the machine
> > b) to be able to kill that forkbomb once detected, in one go without race
> > against concurrent forks.
> > 
> > I think a) can work just fine with kernel stack limiting. I also need
> > to be notified about the fact we reached the limit. And b) should
> > be feasible with the help of the cgroup freezer. 
> 
> kmem allocation fail after reaching the limit which in turn should
> fail task creation.  Isn't that the same effect as the task_counter as
> implemented?

That's it.

> 
> > > Is there anything for which you need to know exactly the number of
> > > processes?
> > 
> > No that's really about prevent/kill forkbomb as far as I'm concerned.
> 
> Hmm... so, accounting overhead aside, if the only purpose is
> preventing the whole machine being brought down by a fork bomb, kmem
> limiting is enough, right?

I think so yeah.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                   ` <4F8E646B.1020807-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2012-04-18  7:53                                     ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  7:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>:
> (2012/04/18 1:52), Glauber Costa wrote:
>
>>
>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>>> It's kmem, but it's more than that, I think.
>>>> Please provide subsys like ulimit.
>>>
>>> So, you think that while kmem would be enough to prevent fork-bombs,
>>> it would still make sense to limit in more traditional ways
>>> (ie. ulimit style object limits).  Hmmm....
>>>
>>
>> I personally think this is namespaces business, not cgroups.
>> If you have a process namespace, an interface that works to limit the
>> number of processes should keep working given the constraints you are
>> given.
>>
>> What doesn't make sense, is to create a *new* interface to limit
>> something that doesn't really need to be limited, just because you
>> limited a similar resource before.
>>
>
>
> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
> What we need is user-id namespace, isn't it ? If we have that, ulimit
> works enough fine, no overheads.

I have considered using NR_PROC rlimit on top of user namespaces to
fight forkbombs inside a container.
ie: one user namespace per container with its own rlimit.

But it doesn't work because we can have multiuser apps running in a
single container.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  6:51                                 ` KAMEZAWA Hiroyuki
@ 2012-04-18  7:53                                   ` Frederic Weisbecker
  2012-04-18  8:42                                     ` KAMEZAWA Hiroyuki
       [not found]                                     ` <CAFTL4hw3C4s6VS07pJzdBawv0ugKJJa+Vnb-Q_9FrWEq4=ka9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]                                   ` <4F8E646B.1020807-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  1 sibling, 2 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  7:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Glauber Costa, Tejun Heo, Johannes Weiner, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> (2012/04/18 1:52), Glauber Costa wrote:
>
>>
>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>>> It's kmem, but it's more than that, I think.
>>>> Please provide subsys like ulimit.
>>>
>>> So, you think that while kmem would be enough to prevent fork-bombs,
>>> it would still make sense to limit in more traditional ways
>>> (ie. ulimit style object limits).  Hmmm....
>>>
>>
>> I personally think this is namespaces business, not cgroups.
>> If you have a process namespace, an interface that works to limit the
>> number of processes should keep working given the constraints you are
>> given.
>>
>> What doesn't make sense, is to create a *new* interface to limit
>> something that doesn't really need to be limited, just because you
>> limited a similar resource before.
>>
>
>
> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
> What we need is user-id namespace, isn't it ? If we have that, ulimit
> works enough fine, no overheads.

I have considered using NR_PROC rlimit on top of user namespaces to
fight forkbombs inside a container.
ie: one user namespace per container with its own rlimit.

But it doesn't work because we can have multiuser apps running in a
single container.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  6:54                 ` Frederic Weisbecker
  2012-04-18  8:10                   ` Frederic Weisbecker
@ 2012-04-18  8:10                   ` Frederic Weisbecker
  1 sibling, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  8:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Cgroups, Andrew Morton

2012/4/18 Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> On Tue, Apr 17, 2012 at 08:17:53AM -0700, Tejun Heo wrote:
>> Hello, Frederic.
>>
>> On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
>> > I want:
>> >
>> > a) to prevent the forkbomb from going far enough to DDOS the machine
>> > b) to be able to kill that forkbomb once detected, in one go without race
>> > against concurrent forks.
>> >
>> > I think a) can work just fine with kernel stack limiting. I also need
>> > to be notified about the fact we reached the limit. And b) should
>> > be feasible with the help of the cgroup freezer.
>>
>> kmem allocation fail after reaching the limit which in turn should
>> fail task creation.  Isn't that the same effect as the task_counter as
>> implemented?
>
> That's it.
>
>>
>> > > Is there anything for which you need to know exactly the number of
>> > > processes?
>> >
>> > No that's really about prevent/kill forkbomb as far as I'm concerned.
>>
>> Hmm... so, accounting overhead aside, if the only purpose is
>> preventing the whole machine being brought down by a fork bomb, kmem
>> limiting is enough, right?
>
> I think so yeah.

But this needs to be a well defined kind of kmem I think. Relying on
kernel memory
alone is too general to just protect against forkbombs. Kernel stack,
OTOH, should be
a good criteria.

But now I'm worrying, do you think this kmem.kernel_stack limitation
is going to be useful
for other kind of usecase?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  6:54                 ` Frederic Weisbecker
@ 2012-04-18  8:10                   ` Frederic Weisbecker
       [not found]                     ` <CAFTL4hxXT+hXWEnKop84JQ8ieHX4e=otpHnXYxdxaPgsiZYCiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-18 12:00                     ` Glauber Costa
  2012-04-18  8:10                   ` Frederic Weisbecker
  1 sibling, 2 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  8:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Glauber Costa, Johannes Weiner, Hugh Dickins, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walsh, Daniel P. Berrange, Li Zefan,
	LKML, Cgroups, Containers

2012/4/18 Frederic Weisbecker <fweisbec@gmail.com>:
> On Tue, Apr 17, 2012 at 08:17:53AM -0700, Tejun Heo wrote:
>> Hello, Frederic.
>>
>> On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
>> > I want:
>> >
>> > a) to prevent the forkbomb from going far enough to DDOS the machine
>> > b) to be able to kill that forkbomb once detected, in one go without race
>> > against concurrent forks.
>> >
>> > I think a) can work just fine with kernel stack limiting. I also need
>> > to be notified about the fact we reached the limit. And b) should
>> > be feasible with the help of the cgroup freezer.
>>
>> kmem allocation fail after reaching the limit which in turn should
>> fail task creation.  Isn't that the same effect as the task_counter as
>> implemented?
>
> That's it.
>
>>
>> > > Is there anything for which you need to know exactly the number of
>> > > processes?
>> >
>> > No that's really about prevent/kill forkbomb as far as I'm concerned.
>>
>> Hmm... so, accounting overhead aside, if the only purpose is
>> preventing the whole machine being brought down by a fork bomb, kmem
>> limiting is enough, right?
>
> I think so yeah.

But this needs to be a well defined kind of kmem I think. Relying on
kernel memory
alone is too general to just protect against forkbombs. Kernel stack,
OTOH, should be
a good criteria.

But now I'm worrying, do you think this kmem.kernel_stack limitation
is going to be useful
for other kind of usecase?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                     ` <CAFTL4hw3C4s6VS07pJzdBawv0ugKJJa+Vnb-Q_9FrWEq4=ka9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-18  8:42                                       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  8:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

(2012/04/18 16:53), Frederic Weisbecker wrote:

> 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>:
>> (2012/04/18 1:52), Glauber Costa wrote:
>>
>>>
>>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>>>> It's kmem, but it's more than that, I think.
>>>>> Please provide subsys like ulimit.
>>>>
>>>> So, you think that while kmem would be enough to prevent fork-bombs,
>>>> it would still make sense to limit in more traditional ways
>>>> (ie. ulimit style object limits).  Hmmm....
>>>>
>>>
>>> I personally think this is namespaces business, not cgroups.
>>> If you have a process namespace, an interface that works to limit the
>>> number of processes should keep working given the constraints you are
>>> given.
>>>
>>> What doesn't make sense, is to create a *new* interface to limit
>>> something that doesn't really need to be limited, just because you
>>> limited a similar resource before.
>>>
>>
>>
>> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
>> What we need is user-id namespace, isn't it ? If we have that, ulimit
>> works enough fine, no overheads.
> 
> I have considered using NR_PROC rlimit on top of user namespaces to
> fight forkbombs inside a container.
> ie: one user namespace per container with its own rlimit.
> 
> But it doesn't work because we can have multiuser apps running in a
> single container.
> 

Ok, then, requirements is different from ulimit. ok, please forget my words.

My concern for using 'kmem' is that size of object can be changed, and set up
may be more complicated than limiting 'number' of tasks.
It's very architecture dependent....But hmm... 

If slab accounting can handle task_struct accounting, all you wants can be
done by it (maybe). And implementation can be duplicated.
(But another aspect of the problem will be speed of development..)

One idea is (I'm not sure good or bad)...having following control files.

 - memory.kmem.task_struct.limit_in_bytes
 - memory.kmem.task_struct.usage_in_bytes
 - memory.kmem.task_struct.size_in_bytes   # size of task struct.

At 1st, implement this by accounting task struct(or some) directly.
Later, if we can, replace the implementation with slab(kmem) cgroup..
and unify interfaces.....a long way to go.

2nd idea is

 - memory.object.task.limit_in_number	# limit the number of tasks.
 - memory.object.task.usage_in_number   # usage


If I'm a user, I prefer #2.

Hmm, 
   global kmem limiting           -> done by bytes.
   special kernel object limiting -> done by the number of objects.

is...complicated ?

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  7:53                                   ` Frederic Weisbecker
@ 2012-04-18  8:42                                     ` KAMEZAWA Hiroyuki
       [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
                                                         ` (2 more replies)
       [not found]                                     ` <CAFTL4hw3C4s6VS07pJzdBawv0ugKJJa+Vnb-Q_9FrWEq4=ka9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 3 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  8:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Glauber Costa, Tejun Heo, Johannes Weiner, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

(2012/04/18 16:53), Frederic Weisbecker wrote:

> 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
>> (2012/04/18 1:52), Glauber Costa wrote:
>>
>>>
>>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>>>> It's kmem, but it's more than that, I think.
>>>>> Please provide subsys like ulimit.
>>>>
>>>> So, you think that while kmem would be enough to prevent fork-bombs,
>>>> it would still make sense to limit in more traditional ways
>>>> (ie. ulimit style object limits).  Hmmm....
>>>>
>>>
>>> I personally think this is namespaces business, not cgroups.
>>> If you have a process namespace, an interface that works to limit the
>>> number of processes should keep working given the constraints you are
>>> given.
>>>
>>> What doesn't make sense, is to create a *new* interface to limit
>>> something that doesn't really need to be limited, just because you
>>> limited a similar resource before.
>>>
>>
>>
>> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
>> What we need is user-id namespace, isn't it ? If we have that, ulimit
>> works enough fine, no overheads.
> 
> I have considered using NR_PROC rlimit on top of user namespaces to
> fight forkbombs inside a container.
> ie: one user namespace per container with its own rlimit.
> 
> But it doesn't work because we can have multiuser apps running in a
> single container.
> 

Ok, then, requirements is different from ulimit. ok, please forget my words.

My concern for using 'kmem' is that size of object can be changed, and set up
may be more complicated than limiting 'number' of tasks.
It's very architecture dependent....But hmm... 

If slab accounting can handle task_struct accounting, all you wants can be
done by it (maybe). And implementation can be duplicated.
(But another aspect of the problem will be speed of development..)

One idea is (I'm not sure good or bad)...having following control files.

 - memory.kmem.task_struct.limit_in_bytes
 - memory.kmem.task_struct.usage_in_bytes
 - memory.kmem.task_struct.size_in_bytes   # size of task struct.

At 1st, implement this by accounting task struct(or some) directly.
Later, if we can, replace the implementation with slab(kmem) cgroup..
and unify interfaces.....a long way to go.

2nd idea is

 - memory.object.task.limit_in_number	# limit the number of tasks.
 - memory.object.task.usage_in_number   # usage


If I'm a user, I prefer #2.

Hmm, 
   global kmem limiting           -> done by bytes.
   special kernel object limiting -> done by the number of objects.

is...complicated ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2012-04-18  9:12                                         ` Frederic Weisbecker
  2012-04-18 10:39                                         ` Johannes Weiner
  1 sibling, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  9:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/18 16:53), Frederic Weisbecker wrote:
> 
> > 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>:
> >> (2012/04/18 1:52), Glauber Costa wrote:
> >>
> >>>
> >>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> >>>>> It's kmem, but it's more than that, I think.
> >>>>> Please provide subsys like ulimit.
> >>>>
> >>>> So, you think that while kmem would be enough to prevent fork-bombs,
> >>>> it would still make sense to limit in more traditional ways
> >>>> (ie. ulimit style object limits).  Hmmm....
> >>>>
> >>>
> >>> I personally think this is namespaces business, not cgroups.
> >>> If you have a process namespace, an interface that works to limit the
> >>> number of processes should keep working given the constraints you are
> >>> given.
> >>>
> >>> What doesn't make sense, is to create a *new* interface to limit
> >>> something that doesn't really need to be limited, just because you
> >>> limited a similar resource before.
> >>>
> >>
> >>
> >> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
> >> What we need is user-id namespace, isn't it ? If we have that, ulimit
> >> works enough fine, no overheads.
> > 
> > I have considered using NR_PROC rlimit on top of user namespaces to
> > fight forkbombs inside a container.
> > ie: one user namespace per container with its own rlimit.
> > 
> > But it doesn't work because we can have multiuser apps running in a
> > single container.
> > 
> 
> Ok, then, requirements is different from ulimit. ok, please forget my words.
> 
> My concern for using 'kmem' is that size of object can be changed, and set up
> may be more complicated than limiting 'number' of tasks.
> It's very architecture dependent....But hmm... 

Sure. But I believe the user can easily cope with that. One just need
to create a cgroup, move a task there and look at the accounted kmem.kernel_stack
to get the size used by one task.

That's less intuitive for the user than a task counter of course. But that
may be more generally useful than just forkbomb protection. At least I hope
because I haven't heard about other possible usecases.

> 
> If slab accounting can handle task_struct accounting, all you wants can be
> done by it (maybe). And implementation can be duplicated.
> (But another aspect of the problem will be speed of development..)
> 
> One idea is (I'm not sure good or bad)...having following control files.
> 
>  - memory.kmem.task_struct.limit_in_bytes
>  - memory.kmem.task_struct.usage_in_bytes
>  - memory.kmem.task_struct.size_in_bytes   # size of task struct.

I'm fine either way. Counting task_struct memory usage is also a way
to count the tasks for me. But is it going to be more generally useful
than counting kernel stack?

> 
> At 1st, implement this by accounting task struct(or some) directly.
> Later, if we can, replace the implementation with slab(kmem) cgroup..
> and unify interfaces.....a long way to go.
> 
> 2nd idea is
> 
>  - memory.object.task.limit_in_number	# limit the number of tasks.
>  - memory.object.task.usage_in_number   # usage
> 
> 
> If I'm a user, I prefer #2.

People seem to object on defining the number of task as a relevant unit of resource.
It's indeed a semantic resource on top of the more lower level one memory resource
(could be CPU as well).

And if it can be mapped back to memory resource, it might be more generally
useful to limit at that level.

At least I hope...

> 
> Hmm, 
>    global kmem limiting           -> done by bytes.
>    special kernel object limiting -> done by the number of objects.
> 
> is...complicated ?
> 
> Thanks,
> -Kame
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  8:42                                     ` KAMEZAWA Hiroyuki
       [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2012-04-18  9:12                                       ` Frederic Weisbecker
  2012-04-18 10:39                                       ` Johannes Weiner
  2 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  9:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Glauber Costa, Tejun Heo, Johannes Weiner, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/18 16:53), Frederic Weisbecker wrote:
> 
> > 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> >> (2012/04/18 1:52), Glauber Costa wrote:
> >>
> >>>
> >>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> >>>>> It's kmem, but it's more than that, I think.
> >>>>> Please provide subsys like ulimit.
> >>>>
> >>>> So, you think that while kmem would be enough to prevent fork-bombs,
> >>>> it would still make sense to limit in more traditional ways
> >>>> (ie. ulimit style object limits).  Hmmm....
> >>>>
> >>>
> >>> I personally think this is namespaces business, not cgroups.
> >>> If you have a process namespace, an interface that works to limit the
> >>> number of processes should keep working given the constraints you are
> >>> given.
> >>>
> >>> What doesn't make sense, is to create a *new* interface to limit
> >>> something that doesn't really need to be limited, just because you
> >>> limited a similar resource before.
> >>>
> >>
> >>
> >> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
> >> What we need is user-id namespace, isn't it ? If we have that, ulimit
> >> works enough fine, no overheads.
> > 
> > I have considered using NR_PROC rlimit on top of user namespaces to
> > fight forkbombs inside a container.
> > ie: one user namespace per container with its own rlimit.
> > 
> > But it doesn't work because we can have multiuser apps running in a
> > single container.
> > 
> 
> Ok, then, requirements is different from ulimit. ok, please forget my words.
> 
> My concern for using 'kmem' is that size of object can be changed, and set up
> may be more complicated than limiting 'number' of tasks.
> It's very architecture dependent....But hmm... 

Sure. But I believe the user can easily cope with that. One just need
to create a cgroup, move a task there and look at the accounted kmem.kernel_stack
to get the size used by one task.

That's less intuitive for the user than a task counter of course. But that
may be more generally useful than just forkbomb protection. At least I hope
because I haven't heard about other possible usecases.

> 
> If slab accounting can handle task_struct accounting, all you wants can be
> done by it (maybe). And implementation can be duplicated.
> (But another aspect of the problem will be speed of development..)
> 
> One idea is (I'm not sure good or bad)...having following control files.
> 
>  - memory.kmem.task_struct.limit_in_bytes
>  - memory.kmem.task_struct.usage_in_bytes
>  - memory.kmem.task_struct.size_in_bytes   # size of task struct.

I'm fine either way. Counting task_struct memory usage is also a way
to count the tasks for me. But is it going to be more generally useful
than counting kernel stack?

> 
> At 1st, implement this by accounting task struct(or some) directly.
> Later, if we can, replace the implementation with slab(kmem) cgroup..
> and unify interfaces.....a long way to go.
> 
> 2nd idea is
> 
>  - memory.object.task.limit_in_number	# limit the number of tasks.
>  - memory.object.task.usage_in_number   # usage
> 
> 
> If I'm a user, I prefer #2.

People seem to object on defining the number of task as a relevant unit of resource.
It's indeed a semantic resource on top of the more lower level one memory resource
(could be CPU as well).

And if it can be mapped back to memory resource, it might be more generally
useful to limit at that level.

At least I hope...

> 
> Hmm, 
>    global kmem limiting           -> done by bytes.
>    special kernel object limiting -> done by the number of objects.
> 
> is...complicated ?
> 
> Thanks,
> -Kame
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  2012-04-18  9:12                                         ` Frederic Weisbecker
@ 2012-04-18 10:39                                         ` Johannes Weiner
  1 sibling, 0 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-18 10:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/18 16:53), Frederic Weisbecker wrote:
> 
> > 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>:
> >> (2012/04/18 1:52), Glauber Costa wrote:
> >>
> >>>
> >>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> >>>>> It's kmem, but it's more than that, I think.
> >>>>> Please provide subsys like ulimit.
> >>>>
> >>>> So, you think that while kmem would be enough to prevent fork-bombs,
> >>>> it would still make sense to limit in more traditional ways
> >>>> (ie. ulimit style object limits).  Hmmm....
> >>>>
> >>>
> >>> I personally think this is namespaces business, not cgroups.
> >>> If you have a process namespace, an interface that works to limit the
> >>> number of processes should keep working given the constraints you are
> >>> given.
> >>>
> >>> What doesn't make sense, is to create a *new* interface to limit
> >>> something that doesn't really need to be limited, just because you
> >>> limited a similar resource before.
> >>>
> >>
> >>
> >> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
> >> What we need is user-id namespace, isn't it ? If we have that, ulimit
> >> works enough fine, no overheads.
> > 
> > I have considered using NR_PROC rlimit on top of user namespaces to
> > fight forkbombs inside a container.
> > ie: one user namespace per container with its own rlimit.
> > 
> > But it doesn't work because we can have multiuser apps running in a
> > single container.
> > 
> 
> Ok, then, requirements is different from ulimit. ok, please forget my words.
> 
> My concern for using 'kmem' is that size of object can be changed, and set up
> may be more complicated than limiting 'number' of tasks.
> It's very architecture dependent....But hmm... 

BECAUSE it is architecture/kernel version/runtime dependent how big a
task really is, limiting available kernel memory is much more
meaningful than limiting a container to a number of units of unknown
and dynamically changing size.

How could this argument ever work IN FAVOR of limiting the number of
tasks?

> If slab accounting can handle task_struct accounting, all you wants can be
> done by it (maybe). And implementation can be duplicated.
> (But another aspect of the problem will be speed of development..)
> 
> One idea is (I'm not sure good or bad)...having following control files.
> 
>  - memory.kmem.task_struct.limit_in_bytes
>  - memory.kmem.task_struct.usage_in_bytes
>  - memory.kmem.task_struct.size_in_bytes   # size of task struct.

A task's memory impact is not just its task_struct.

> At 1st, implement this by accounting task struct(or some) directly.
> Later, if we can, replace the implementation with slab(kmem) cgroup..
> and unify interfaces.....a long way to go.
> 
> 2nd idea is
> 
>  - memory.object.task.limit_in_number	# limit the number of tasks.
>  - memory.object.task.usage_in_number   # usage
> 
> If I'm a user, I prefer #2.

The memory controller is there to partition physical memory.  This is
usually measured in bytes and that's why the user-visible object size
in the memory controller is a byte.  When you add other types of
objects, you force the user to know about them and give them a method
of knowing the object size in bytes, which in case of a task, can vary
at runtime.

I will agree to this interface the moment I can buy RAM whose quantity
is measured in number of tasks.

> Hmm, 
>    global kmem limiting           -> done by bytes.
>    special kernel object limiting -> done by the number of objects.
> 
> is...complicated ?

Yes, and you don't provide any arguments!

What are you trying to do that would make limiting the number of tasks
a useful mechanism?

Why should some kernel objects be special?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  8:42                                     ` KAMEZAWA Hiroyuki
       [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  2012-04-18  9:12                                       ` Frederic Weisbecker
@ 2012-04-18 10:39                                       ` Johannes Weiner
  2012-04-18 11:00                                         ` KAMEZAWA Hiroyuki
       [not found]                                         ` <20120418103930.GA1771-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  2 siblings, 2 replies; 88+ messages in thread
From: Johannes Weiner @ 2012-04-18 10:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Frederic Weisbecker, Glauber Costa, Tejun Heo, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/18 16:53), Frederic Weisbecker wrote:
> 
> > 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> >> (2012/04/18 1:52), Glauber Costa wrote:
> >>
> >>>
> >>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
> >>>>> It's kmem, but it's more than that, I think.
> >>>>> Please provide subsys like ulimit.
> >>>>
> >>>> So, you think that while kmem would be enough to prevent fork-bombs,
> >>>> it would still make sense to limit in more traditional ways
> >>>> (ie. ulimit style object limits).  Hmmm....
> >>>>
> >>>
> >>> I personally think this is namespaces business, not cgroups.
> >>> If you have a process namespace, an interface that works to limit the
> >>> number of processes should keep working given the constraints you are
> >>> given.
> >>>
> >>> What doesn't make sense, is to create a *new* interface to limit
> >>> something that doesn't really need to be limited, just because you
> >>> limited a similar resource before.
> >>>
> >>
> >>
> >> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
> >> What we need is user-id namespace, isn't it ? If we have that, ulimit
> >> works enough fine, no overheads.
> > 
> > I have considered using NR_PROC rlimit on top of user namespaces to
> > fight forkbombs inside a container.
> > ie: one user namespace per container with its own rlimit.
> > 
> > But it doesn't work because we can have multiuser apps running in a
> > single container.
> > 
> 
> Ok, then, requirements is different from ulimit. ok, please forget my words.
> 
> My concern for using 'kmem' is that size of object can be changed, and set up
> may be more complicated than limiting 'number' of tasks.
> It's very architecture dependent....But hmm... 

BECAUSE it is architecture/kernel version/runtime dependent how big a
task really is, limiting available kernel memory is much more
meaningful than limiting a container to a number of units of unknown
and dynamically changing size.

How could this argument ever work IN FAVOR of limiting the number of
tasks?

> If slab accounting can handle task_struct accounting, all you wants can be
> done by it (maybe). And implementation can be duplicated.
> (But another aspect of the problem will be speed of development..)
> 
> One idea is (I'm not sure good or bad)...having following control files.
> 
>  - memory.kmem.task_struct.limit_in_bytes
>  - memory.kmem.task_struct.usage_in_bytes
>  - memory.kmem.task_struct.size_in_bytes   # size of task struct.

A task's memory impact is not just its task_struct.

> At 1st, implement this by accounting task struct(or some) directly.
> Later, if we can, replace the implementation with slab(kmem) cgroup..
> and unify interfaces.....a long way to go.
> 
> 2nd idea is
> 
>  - memory.object.task.limit_in_number	# limit the number of tasks.
>  - memory.object.task.usage_in_number   # usage
> 
> If I'm a user, I prefer #2.

The memory controller is there to partition physical memory.  This is
usually measured in bytes and that's why the user-visible object size
in the memory controller is a byte.  When you add other types of
objects, you force the user to know about them and give them a method
of knowing the object size in bytes, which in case of a task, can vary
at runtime.

I will agree to this interface the moment I can buy RAM whose quantity
is measured in number of tasks.

> Hmm, 
>    global kmem limiting           -> done by bytes.
>    special kernel object limiting -> done by the number of objects.
> 
> is...complicated ?

Yes, and you don't provide any arguments!

What are you trying to do that would make limiting the number of tasks
a useful mechanism?

Why should some kernel objects be special?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                                         ` <20120418103930.GA1771-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2012-04-18 11:00                                           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18 11:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML, Tejun Heo, Cgroups,
	Andrew Morton

(2012/04/18 19:39), Johannes Weiner wrote:

> On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
>> (2012/04/18 16:53), Frederic Weisbecker wrote:
>>
>>> 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>:
>>>> (2012/04/18 1:52), Glauber Costa wrote:
>>>>
>>>>>
>>>>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>>>>>> It's kmem, but it's more than that, I think.
>>>>>>> Please provide subsys like ulimit.
>>>>>>
>>>>>> So, you think that while kmem would be enough to prevent fork-bombs,
>>>>>> it would still make sense to limit in more traditional ways
>>>>>> (ie. ulimit style object limits).  Hmmm....
>>>>>>
>>>>>
>>>>> I personally think this is namespaces business, not cgroups.
>>>>> If you have a process namespace, an interface that works to limit the
>>>>> number of processes should keep working given the constraints you are
>>>>> given.
>>>>>
>>>>> What doesn't make sense, is to create a *new* interface to limit
>>>>> something that doesn't really need to be limited, just because you
>>>>> limited a similar resource before.
>>>>>
>>>>
>>>>
>>>> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
>>>> What we need is user-id namespace, isn't it ? If we have that, ulimit
>>>> works enough fine, no overheads.
>>>
>>> I have considered using NR_PROC rlimit on top of user namespaces to
>>> fight forkbombs inside a container.
>>> ie: one user namespace per container with its own rlimit.
>>>
>>> But it doesn't work because we can have multiuser apps running in a
>>> single container.
>>>
>>
>> Ok, then, requirements is different from ulimit. ok, please forget my words.
>>
>> My concern for using 'kmem' is that size of object can be changed, and set up
>> may be more complicated than limiting 'number' of tasks.
>> It's very architecture dependent....But hmm... 
> 
> BECAUSE it is architecture/kernel version/runtime dependent how big a
> task really is, limiting available kernel memory is much more
> meaningful than limiting a container to a number of units of unknown
> and dynamically changing size.
> 
> How could this argument ever work IN FAVOR of limiting the number of
> tasks?


I think this shows limiting the number of tasks (with memory limitation)
is difficult. Ah, I realize I don't like limiting task numbers.

> 
>> If slab accounting can handle task_struct accounting, all you wants can be
>> done by it (maybe). And implementation can be duplicated.
>> (But another aspect of the problem will be speed of development..)
>>
>> One idea is (I'm not sure good or bad)...having following control files.
>>
>>  - memory.kmem.task_struct.limit_in_bytes
>>  - memory.kmem.task_struct.usage_in_bytes
>>  - memory.kmem.task_struct.size_in_bytes   # size of task struct.
> 
> A task's memory impact is not just its task_struct.
> 

Yes. It's a sum of several objects, including page tables, kernel stack, etc..


>> At 1st, implement this by accounting task struct(or some) directly.
>> Later, if we can, replace the implementation with slab(kmem) cgroup..
>> and unify interfaces.....a long way to go.
>>
>> 2nd idea is
>>
>>  - memory.object.task.limit_in_number	# limit the number of tasks.
>>  - memory.object.task.usage_in_number   # usage
>>
>> If I'm a user, I prefer #2.
> 
> The memory controller is there to partition physical memory.  This is
> usually measured in bytes and that's why the user-visible object size
> in the memory controller is a byte.  When you add other types of
> objects, you force the user to know about them and give them a method
> of knowing the object size in bytes, which in case of a task, can vary
> at runtime.
> 
> I will agree to this interface the moment I can buy RAM whose quantity
> is measured in number of tasks.
> 
>> Hmm, 
>>    global kmem limiting           -> done by bytes.
>>    special kernel object limiting -> done by the number of objects.
>>
>> is...complicated ?
> 
> Yes, and you don't provide any arguments!
> 
> What are you trying to do that would make limiting the number of tasks
> a useful mechanism?


Just considering what is easy to use and simple and meets requirements, finally.

> 
> Why should some kernel objects be special?
> 

I mentioned above because I remembered some guys proposed a feature to set
limit per each slab types. I'm sorry if I remember wrong.

And, 'task' has some other limitation than cgroup. ulimit, sysctl etc...
someone may want to isolate them. (It's namespace problem ?)
The number of task itself has some meaning in the system.

If forkbomb's problem is just a problem of memory usage, it's simple.
What's required is global kmem limit and not limiting tasks.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18 10:39                                       ` Johannes Weiner
@ 2012-04-18 11:00                                         ` KAMEZAWA Hiroyuki
       [not found]                                         ` <20120418103930.GA1771-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
  1 sibling, 0 replies; 88+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18 11:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Frederic Weisbecker, Glauber Costa, Tejun Heo, Hugh Dickins,
	Andrew Morton, Daniel Walsh, Daniel P. Berrange, Li Zefan, LKML,
	Cgroups, Containers

(2012/04/18 19:39), Johannes Weiner wrote:

> On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote:
>> (2012/04/18 16:53), Frederic Weisbecker wrote:
>>
>>> 2012/4/18 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
>>>> (2012/04/18 1:52), Glauber Costa wrote:
>>>>
>>>>>
>>>>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg.
>>>>>>> It's kmem, but it's more than that, I think.
>>>>>>> Please provide subsys like ulimit.
>>>>>>
>>>>>> So, you think that while kmem would be enough to prevent fork-bombs,
>>>>>> it would still make sense to limit in more traditional ways
>>>>>> (ie. ulimit style object limits).  Hmmm....
>>>>>>
>>>>>
>>>>> I personally think this is namespaces business, not cgroups.
>>>>> If you have a process namespace, an interface that works to limit the
>>>>> number of processes should keep working given the constraints you are
>>>>> given.
>>>>>
>>>>> What doesn't make sense, is to create a *new* interface to limit
>>>>> something that doesn't really need to be limited, just because you
>>>>> limited a similar resource before.
>>>>>
>>>>
>>>>
>>>> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work.
>>>> What we need is user-id namespace, isn't it ? If we have that, ulimit
>>>> works enough fine, no overheads.
>>>
>>> I have considered using NR_PROC rlimit on top of user namespaces to
>>> fight forkbombs inside a container.
>>> ie: one user namespace per container with its own rlimit.
>>>
>>> But it doesn't work because we can have multiuser apps running in a
>>> single container.
>>>
>>
>> Ok, then, requirements is different from ulimit. ok, please forget my words.
>>
>> My concern for using 'kmem' is that size of object can be changed, and set up
>> may be more complicated than limiting 'number' of tasks.
>> It's very architecture dependent....But hmm... 
> 
> BECAUSE it is architecture/kernel version/runtime dependent how big a
> task really is, limiting available kernel memory is much more
> meaningful than limiting a container to a number of units of unknown
> and dynamically changing size.
> 
> How could this argument ever work IN FAVOR of limiting the number of
> tasks?


I think this shows limiting the number of tasks (with memory limitation)
is difficult. Ah, I realize I don't like limiting task numbers.

> 
>> If slab accounting can handle task_struct accounting, all you wants can be
>> done by it (maybe). And implementation can be duplicated.
>> (But another aspect of the problem will be speed of development..)
>>
>> One idea is (I'm not sure good or bad)...having following control files.
>>
>>  - memory.kmem.task_struct.limit_in_bytes
>>  - memory.kmem.task_struct.usage_in_bytes
>>  - memory.kmem.task_struct.size_in_bytes   # size of task struct.
> 
> A task's memory impact is not just its task_struct.
> 

Yes. It's a sum of several objects, including page tables, kernel stack, etc..


>> At 1st, implement this by accounting task struct(or some) directly.
>> Later, if we can, replace the implementation with slab(kmem) cgroup..
>> and unify interfaces.....a long way to go.
>>
>> 2nd idea is
>>
>>  - memory.object.task.limit_in_number	# limit the number of tasks.
>>  - memory.object.task.usage_in_number   # usage
>>
>> If I'm a user, I prefer #2.
> 
> The memory controller is there to partition physical memory.  This is
> usually measured in bytes and that's why the user-visible object size
> in the memory controller is a byte.  When you add other types of
> objects, you force the user to know about them and give them a method
> of knowing the object size in bytes, which in case of a task, can vary
> at runtime.
> 
> I will agree to this interface the moment I can buy RAM whose quantity
> is measured in number of tasks.
> 
>> Hmm, 
>>    global kmem limiting           -> done by bytes.
>>    special kernel object limiting -> done by the number of objects.
>>
>> is...complicated ?
> 
> Yes, and you don't provide any arguments!
> 
> What are you trying to do that would make limiting the number of tasks
> a useful mechanism?


Just considering what is easy to use and simple and meets requirements, finally.

> 
> Why should some kernel objects be special?
> 

I mentioned above because I remembered some guys proposed a feature to set
limit per each slab types. I'm sorry if I remember wrong.

And, 'task' has some other limitation than cgroup. ulimit, sysctl etc...
someone may want to isolate them. (It's namespace problem ?)
The number of task itself has some meaning in the system.

If forkbomb's problem is just a problem of memory usage, it's simple.
What's required is global kmem limit and not limiting tasks.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
       [not found]                     ` <CAFTL4hxXT+hXWEnKop84JQ8ieHX4e=otpHnXYxdxaPgsiZYCiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-18 12:00                       ` Glauber Costa
  0 siblings, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-18 12:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Daniel P. Berrange, Containers, Daniel Walsh, Hugh Dickins, LKML,
	Johannes Weiner, Tejun Heo, Cgroups, Andrew Morton

On 04/18/2012 05:10 AM, Frederic Weisbecker wrote:
> 2012/4/18 Frederic Weisbecker<fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> On Tue, Apr 17, 2012 at 08:17:53AM -0700, Tejun Heo wrote:
>>> Hello, Frederic.
>>>
>>> On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
>>>> I want:
>>>>
>>>> a) to prevent the forkbomb from going far enough to DDOS the machine
>>>> b) to be able to kill that forkbomb once detected, in one go without race
>>>> against concurrent forks.
>>>>
>>>> I think a) can work just fine with kernel stack limiting. I also need
>>>> to be notified about the fact we reached the limit. And b) should
>>>> be feasible with the help of the cgroup freezer.
>>>
>>> kmem allocation fail after reaching the limit which in turn should
>>> fail task creation.  Isn't that the same effect as the task_counter as
>>> implemented?
>>
>> That's it.
>>
>>>
>>>>> Is there anything for which you need to know exactly the number of
>>>>> processes?
>>>>
>>>> No that's really about prevent/kill forkbomb as far as I'm concerned.
>>>
>>> Hmm... so, accounting overhead aside, if the only purpose is
>>> preventing the whole machine being brought down by a fork bomb, kmem
>>> limiting is enough, right?
>>
>> I think so yeah.
>
> But this needs to be a well defined kind of kmem I think. Relying on
> kernel memory
> alone is too general to just protect against forkbombs. Kernel stack,
> OTOH, should be
> a good criteria.

The problem is not it being too general. The problem is it is 
slab-based, and it takes a lot of allocations to fill a slab page. It
is a small object, but you still have one per task. If you set the limit 
too high, it won't help you. If you set it too low, it will harm
other object users.

> But now I'm worrying, do you think this kmem.kernel_stack limitation
> is going to be useful
> for other kind of usecase?

Yes. Ultimately, we want to track as many kinds of kernel memory to 
avoid having one container harming the others. Page tables and stack 
were already briefly discussed, so I think we would get to that 
eventually anyway.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-18  8:10                   ` Frederic Weisbecker
       [not found]                     ` <CAFTL4hxXT+hXWEnKop84JQ8ieHX4e=otpHnXYxdxaPgsiZYCiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-18 12:00                     ` Glauber Costa
  1 sibling, 0 replies; 88+ messages in thread
From: Glauber Costa @ 2012-04-18 12:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tejun Heo, Johannes Weiner, Hugh Dickins, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walsh, Daniel P. Berrange, Li Zefan,
	LKML, Cgroups, Containers

On 04/18/2012 05:10 AM, Frederic Weisbecker wrote:
> 2012/4/18 Frederic Weisbecker<fweisbec@gmail.com>:
>> On Tue, Apr 17, 2012 at 08:17:53AM -0700, Tejun Heo wrote:
>>> Hello, Frederic.
>>>
>>> On Thu, Apr 12, 2012 at 06:59:27PM +0200, Frederic Weisbecker wrote:
>>>> I want:
>>>>
>>>> a) to prevent the forkbomb from going far enough to DDOS the machine
>>>> b) to be able to kill that forkbomb once detected, in one go without race
>>>> against concurrent forks.
>>>>
>>>> I think a) can work just fine with kernel stack limiting. I also need
>>>> to be notified about the fact we reached the limit. And b) should
>>>> be feasible with the help of the cgroup freezer.
>>>
>>> kmem allocation fail after reaching the limit which in turn should
>>> fail task creation.  Isn't that the same effect as the task_counter as
>>> implemented?
>>
>> That's it.
>>
>>>
>>>>> Is there anything for which you need to know exactly the number of
>>>>> processes?
>>>>
>>>> No that's really about prevent/kill forkbomb as far as I'm concerned.
>>>
>>> Hmm... so, accounting overhead aside, if the only purpose is
>>> preventing the whole machine being brought down by a fork bomb, kmem
>>> limiting is enough, right?
>>
>> I think so yeah.
>
> But this needs to be a well defined kind of kmem I think. Relying on
> kernel memory
> alone is too general to just protect against forkbombs. Kernel stack,
> OTOH, should be
> a good criteria.

The problem is not it being too general. The problem is it is 
slab-based, and it takes a lot of allocations to fill a slab page. It
is a small object, but you still have one per task. If you set the limit 
too high, it won't help you. If you set it too low, it will harm
other object users.

> But now I'm worrying, do you think this kmem.kernel_stack limitation
> is going to be useful
> for other kind of usecase?

Yes. Ultimately, we want to track as many kinds of kernel memory to 
avoid having one container harming the others. Page tables and stack 
were already briefly discussed, so I think we would get to that 
eventually anyway.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
  2012-04-17 15:23         ` Tejun Heo
@ 2012-04-19  3:34               ` Alexander Nikiforov
  0 siblings, 0 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-19  3:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel P. Berrange, Frederic Weisbecker, Containers,
	Daniel Walsh, Hugh Dickins, LKML,
	d.solodkiy-Sze3O3UU22JBDgjK7y7TUQ, Johannes Weiner, Cgroups,
	Andrew Morton

On 04/17/2012 07:23 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 17, 2012 at 10:45:05AM +0400, Alexander Nikiforov wrote:
>> between them. Now we have only 1 way to get notification about tasks
>> file - inotify(), but this approach works only if you work from
>> userspace with file (e.g. create struct file, for example with echo
>> $$ /sys/abc/tasks), but when something happens from kernel side
>> (do_fork()/do_exit) we cannot get any event about group of the
>> process (we can scan tasks file and count number of PID, or work
>> with
>> waitpid(), but IMHO this is ugly solutions)
> Wouldn't simply generating FS_MODIFY event on the tasks file do the
> trick?
>
> Thanks.
>
Maybe it will, but in my mind it should be with event. I thought about 
inotify before my RFD mail. Here we have file update, so FS_MODIFY very 
natural.
But on the other hand - we have event in memcg, so conventional approach 
is event.

If this functionality is acceptable in cgroup and FS_MODIFY better, 
we'll make new patch with this approach.

-- 
Best regards,
      Alex Nikiforov,
      Mobile SW, Advanced Software Group,
      Moscow R&D center, Samsung Electronics

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFD] Merge task counter into memcg
@ 2012-04-19  3:34               ` Alexander Nikiforov
  0 siblings, 0 replies; 88+ messages in thread
From: Alexander Nikiforov @ 2012-04-19  3:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Hugh Dickins, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki, Glauber Costa, Daniel Walsh,
	Daniel P. Berrange, Li Zefan, LKML, Cgroups, Containers,
	d.solodkiy

On 04/17/2012 07:23 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 17, 2012 at 10:45:05AM +0400, Alexander Nikiforov wrote:
>> between them. Now we have only 1 way to get notification about tasks
>> file - inotify(), but this approach works only if you work from
>> userspace with file (e.g. create struct file, for example with echo
>> $$ /sys/abc/tasks), but when something happens from kernel side
>> (do_fork()/do_exit) we cannot get any event about group of the
>> process (we can scan tasks file and count number of PID, or work
>> with
>> waitpid(), but IMHO this is ugly solutions)
> Wouldn't simply generating FS_MODIFY event on the tasks file do the
> trick?
>
> Thanks.
>
Maybe it will, but in my mind it should be with event. I thought about 
inotify before my RFD mail. Here we have file update, so FS_MODIFY very 
natural.
But on the other hand - we have event in memcg, so conventional approach 
is event.

If this functionality is acceptable in cgroup and FS_MODIFY better, 
we'll make new patch with this approach.

-- 
Best regards,
      Alex Nikiforov,
      Mobile SW, Advanced Software Group,
      Moscow R&D center, Samsung Electronics


^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2012-04-19  3:34 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11 18:57 [RFD] Merge task counter into memcg Frederic Weisbecker
     [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-11 19:21   ` Glauber Costa
2012-04-11 19:21     ` Glauber Costa
2012-04-12 11:19     ` Frederic Weisbecker
     [not found]     ` <4F85D9C6.5000202-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 11:19       ` Frederic Weisbecker
2012-04-12  0:56   ` KAMEZAWA Hiroyuki
2012-04-12  1:07   ` Johannes Weiner
2012-04-12  3:56   ` Alexander Nikiforov
     [not found]     ` <4F86527C.2080507-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2012-04-17  1:09       ` Frederic Weisbecker
2012-04-17  1:09     ` Frederic Weisbecker
2012-04-17  6:45       ` Alexander Nikiforov
2012-04-17 15:23         ` Tejun Heo
     [not found]           ` <20120417152350.GC32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-19  3:34             ` Alexander Nikiforov
2012-04-19  3:34               ` Alexander Nikiforov
     [not found]         ` <4F8D1171.1090504-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2012-04-17 15:23           ` Tejun Heo
     [not found]       ` <20120417010902.GA14646-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-17  6:45         ` Alexander Nikiforov
2012-04-12  4:00   ` Alexander Nikiforov
2012-04-12  4:00     ` Alexander Nikiforov
2012-04-12  0:56 ` KAMEZAWA Hiroyuki
     [not found]   ` <4F862851.3040208-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-12 11:32     ` Frederic Weisbecker
2012-04-12 11:32       ` Frederic Weisbecker
2012-04-12 11:43       ` Glauber Costa
     [not found]         ` <4F86BFC6.2050400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 12:32           ` Johannes Weiner
2012-04-12 12:32         ` Johannes Weiner
     [not found]           ` <20120412123256.GI1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12 13:12             ` Glauber Costa
2012-04-12 13:12               ` Glauber Costa
2012-04-12 15:30               ` Johannes Weiner
     [not found]                 ` <20120412153055.GL1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12 16:38                   ` Tejun Heo
2012-04-12 16:38                     ` Tejun Heo
     [not found]                     ` <20120412163825.GB13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-12 17:04                       ` Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg) Glauber Costa
2012-04-12 17:04                         ` Glauber Costa
2012-04-17 15:13                         ` Tejun Heo
     [not found]                           ` <20120417151352.GA32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-17 15:27                             ` Glauber Costa
2012-04-17 15:27                               ` Glauber Costa
     [not found]                         ` <4F870B18.5060703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-17 15:13                           ` Tejun Heo
2012-04-12 17:13                       ` [RFD] Merge task counter into memcg Glauber Costa
2012-04-12 17:13                         ` Glauber Costa
2012-04-12 17:23                       ` Johannes Weiner
2012-04-12 17:23                     ` Johannes Weiner
     [not found]                       ` <20120412172309.GM1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12 17:41                         ` Tejun Heo
2012-04-12 17:41                           ` Tejun Heo
2012-04-12 17:53                           ` Glauber Costa
2012-04-13  1:42                           ` KAMEZAWA Hiroyuki
2012-04-17 15:41                             ` Tejun Heo
2012-04-17 16:52                               ` Glauber Costa
     [not found]                                 ` <4F8D9FC4.3080800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-18  6:51                                   ` KAMEZAWA Hiroyuki
2012-04-18  6:51                                 ` KAMEZAWA Hiroyuki
2012-04-18  7:53                                   ` Frederic Weisbecker
2012-04-18  8:42                                     ` KAMEZAWA Hiroyuki
     [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-18  9:12                                         ` Frederic Weisbecker
2012-04-18 10:39                                         ` Johannes Weiner
2012-04-18  9:12                                       ` Frederic Weisbecker
2012-04-18 10:39                                       ` Johannes Weiner
2012-04-18 11:00                                         ` KAMEZAWA Hiroyuki
     [not found]                                         ` <20120418103930.GA1771-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-18 11:00                                           ` KAMEZAWA Hiroyuki
     [not found]                                     ` <CAFTL4hw3C4s6VS07pJzdBawv0ugKJJa+Vnb-Q_9FrWEq4=ka9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-18  8:42                                       ` KAMEZAWA Hiroyuki
     [not found]                                   ` <4F8E646B.1020807-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-18  7:53                                     ` Frederic Weisbecker
     [not found]                               ` <20120417154117.GE32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-17 16:52                                 ` Glauber Costa
     [not found]                             ` <4F878480.60505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-13  1:50                               ` Glauber Costa
2012-04-13  1:50                                 ` Glauber Costa
2012-04-13  2:48                                 ` KAMEZAWA Hiroyuki
     [not found]                                 ` <4F87865F.5060701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-13  2:48                                   ` KAMEZAWA Hiroyuki
2012-04-17 15:41                               ` Tejun Heo
     [not found]                           ` <20120412174155.GC13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-12 17:53                             ` Glauber Costa
2012-04-13  1:42                             ` KAMEZAWA Hiroyuki
2012-04-12 16:54                   ` Glauber Costa
2012-04-12 16:54                 ` Glauber Costa
     [not found]               ` <4F86D4BD.1040305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 15:30                 ` Johannes Weiner
     [not found]       ` <20120412113217.GB11455-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-12 11:43         ` Glauber Costa
2012-04-12  1:07 ` Johannes Weiner
     [not found]   ` <20120412010745.GE1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12  2:15     ` Glauber Costa
2012-04-12  2:15       ` Glauber Costa
2012-04-12  3:26     ` Li Zefan
2012-04-12  3:26       ` Li Zefan
2012-04-12 14:55     ` Frederic Weisbecker
2012-04-12 14:55       ` Frederic Weisbecker
     [not found]       ` <20120412145507.GC11455-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-12 16:34         ` Glauber Costa
2012-04-12 16:34           ` Glauber Costa
     [not found]           ` <4F87042A.2000902-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 16:59             ` Frederic Weisbecker
2012-04-12 16:59               ` Frederic Weisbecker
2012-04-17 15:17               ` Tejun Heo
2012-04-18  6:54                 ` Frederic Weisbecker
2012-04-18  8:10                   ` Frederic Weisbecker
     [not found]                     ` <CAFTL4hxXT+hXWEnKop84JQ8ieHX4e=otpHnXYxdxaPgsiZYCiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-18 12:00                       ` Glauber Costa
2012-04-18 12:00                     ` Glauber Costa
2012-04-18  8:10                   ` Frederic Weisbecker
     [not found]                 ` <20120417151753.GB32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-18  6:54                   ` Frederic Weisbecker
     [not found]               ` <20120412165922.GA12484-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-17 15:17                 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.