All of lore.kernel.org
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Daniel Walsh <dwalsh@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Li Zefan <lizf@cn.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Cgroups <cgroups@vger.kernel.org>,
	Containers <containers@lists.linux-foundation.org>
Subject: Re: [RFD] Merge task counter into memcg
Date: Thu, 12 Apr 2012 13:54:10 -0300	[thread overview]
Message-ID: <4F8708B2.5050301@parallels.com> (raw)
In-Reply-To: <20120412153055.GL1787@cmpxchg.org>


>
>>>> If this gets really integrated, out of a sudden the overhead will
>>>> appear. So better care about it now.
>>>
>>> Forcing people that want to account/limit one resource to take the hit
>>> for something else they are not interested in requires justification.
>>
>> Agree. Even people aiming for unified hierarchies are okay with an
>> opt-in/out system, I believe. So the controllers need not to be
>> active at all times. One way of doing this is what I suggested to
>> Frederic: If you don't limit, don't account.
>
> I don't agree, it's a valid usecase to monitor a workload without
> limiting it in any way.  I do it all the time.

That's side-tracking. This is one way to do it, not the way to do it.
The main point is that a controller can be trivially made present in a 
hierarchy, without doing anything.


>>
>> A big number of controllers creates complexity. When coding, we can
>> assume a lot less things about their relationships, and more
>> importantly: at some point people get confused. Fuck, sometimes *we*
>> get confused about which controller do what, where its
>> responsibility end and where the other's begin. And we're the ones
>> writing it! Avoiding complexity is an engineering principle, not a
>> gut feeling.
>
> And that's why I have a horrible feeling about extending the cgroup
> core to do hierarchical accounting and limiting.  See below.
>
>> Now, of course, we should aim to make things as simple as possible,
>> but not simpler: So you can argue that in Frederic's specific case,
>> it is justified. And I'd be fine with that 100 %. If I agreed...
>>
>> There are two natural points for inclusion here:
>>
>> 1) every cgroup has a task counter by itself. If we're putting the
>> tasks there anyway, this provides a natural point of accounting.
>
> I do think there is a big difference between having a list of tasks
> per individual cgroup to manage basic task-cgroup relationship on one
> hand, and accounting and limiting the number of allowed tasks over
> multi-level group hierarchies on the other.  It may seem natural on
> the face of it, but it really isn't, IMO.

It makes less sense to me now after I read Frederic's last e-mail. 
Indeed, you are both right in this point.

> To reraise a point from my other email that was ignored: do users
> actually really care about the number of tasks when they want to
> prevent forkbombs?  If a task would use neither CPU nor memory, you
> would not be interested in limiting the number of tasks.
>
> Because the number of tasks is not a resource.  CPU and memory are.
>
> So again, if we would include the memory impact of tasks properly
> (structures, kernel stack pages) in the kernel memory counters which
> we allow to limit, shouldn't this solve our problem?
>
> You said in private email that you didn't like the idea because
> administrators wouldn't know how big the kernel stack was and that the
> number of tasks would be a more natural thing to limit.  But I think
> that is actually an argument in favor of the kmem approach: the user
> has no idea how much impact a task actually has resource-wise!  On the
> other hand, he knows exactly how much memory and CPU his machine has
> and how he wants to distribute these resources.  So why provide him
> with an interface to control some number in an unknowwn unit?
>
> You don't propose we allow limiting the number of dcache entries,
> either, but rather the memory they use.
>
> The historical limiting of number of tasks through rlimit is anything
> but scientific or natural.  You essentially set it to a random value
> between allowing most users to do their job and preventing things from
> taking down the machine.  With proper resource accounting, which we
> want to have anyway, we can do much better than that, so why shouldn't
> we?

Okay.

I may agree with you, I might not.

It really depends on Frederic's real use case - (Frederic, please 
comment on it).

If we're trying to limit the number of processes *as a way* of limiting 
the amount of memory they use, then yes, what you say makes total sense.

I was always under the assumption that they wanted something more. One 
of the things I remember reading on the descriptions, was that some 
services shouldn't be allowed to fork after a certain point. Then you 
could limit its amount of processes to whatever value it has now.

For that, stack usage may not help for much.

Now, my personal take on this: Use cases like that, if really needed, 
can be achieved in some other ways, that does not even involve cgroups.

One of the things for the near future, is start putting more kinds of 
data in the kmem controller. Things like page tables and the stack are 
the natural candidates. I am in Hannes side in saying that it should be 
enough to disallow any malicious container to do any harm.

But it is not even necessary!

- Each process has a task struct
- task struct comes from the slab.

Even my slab accounting patches are enough to prevent harm outside the 
container. Because if you fill all your kmem with task_structs, you will 
be stopped to go any further.

As a matter of fact, I've being doing it all the time during the last 
few days while testing the patchset.

WARNING: multiple messages have this Message-ID (diff)
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Frederic Weisbecker
	<fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	KAMEZAWA Hiroyuki
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Daniel Walsh <dwalsh-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Daniel P. Berrange"
	<berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFD] Merge task counter into memcg
Date: Thu, 12 Apr 2012 13:54:10 -0300	[thread overview]
Message-ID: <4F8708B2.5050301@parallels.com> (raw)
In-Reply-To: <20120412153055.GL1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>


>
>>>> If this gets really integrated, out of a sudden the overhead will
>>>> appear. So better care about it now.
>>>
>>> Forcing people that want to account/limit one resource to take the hit
>>> for something else they are not interested in requires justification.
>>
>> Agree. Even people aiming for unified hierarchies are okay with an
>> opt-in/out system, I believe. So the controllers need not to be
>> active at all times. One way of doing this is what I suggested to
>> Frederic: If you don't limit, don't account.
>
> I don't agree, it's a valid usecase to monitor a workload without
> limiting it in any way.  I do it all the time.

That's side-tracking. This is one way to do it, not the way to do it.
The main point is that a controller can be trivially made present in a 
hierarchy, without doing anything.


>>
>> A big number of controllers creates complexity. When coding, we can
>> assume a lot less things about their relationships, and more
>> importantly: at some point people get confused. Fuck, sometimes *we*
>> get confused about which controller do what, where its
>> responsibility end and where the other's begin. And we're the ones
>> writing it! Avoiding complexity is an engineering principle, not a
>> gut feeling.
>
> And that's why I have a horrible feeling about extending the cgroup
> core to do hierarchical accounting and limiting.  See below.
>
>> Now, of course, we should aim to make things as simple as possible,
>> but not simpler: So you can argue that in Frederic's specific case,
>> it is justified. And I'd be fine with that 100 %. If I agreed...
>>
>> There are two natural points for inclusion here:
>>
>> 1) every cgroup has a task counter by itself. If we're putting the
>> tasks there anyway, this provides a natural point of accounting.
>
> I do think there is a big difference between having a list of tasks
> per individual cgroup to manage basic task-cgroup relationship on one
> hand, and accounting and limiting the number of allowed tasks over
> multi-level group hierarchies on the other.  It may seem natural on
> the face of it, but it really isn't, IMO.

It makes less sense to me now after I read Frederic's last e-mail. 
Indeed, you are both right in this point.

> To reraise a point from my other email that was ignored: do users
> actually really care about the number of tasks when they want to
> prevent forkbombs?  If a task would use neither CPU nor memory, you
> would not be interested in limiting the number of tasks.
>
> Because the number of tasks is not a resource.  CPU and memory are.
>
> So again, if we would include the memory impact of tasks properly
> (structures, kernel stack pages) in the kernel memory counters which
> we allow to limit, shouldn't this solve our problem?
>
> You said in private email that you didn't like the idea because
> administrators wouldn't know how big the kernel stack was and that the
> number of tasks would be a more natural thing to limit.  But I think
> that is actually an argument in favor of the kmem approach: the user
> has no idea how much impact a task actually has resource-wise!  On the
> other hand, he knows exactly how much memory and CPU his machine has
> and how he wants to distribute these resources.  So why provide him
> with an interface to control some number in an unknowwn unit?
>
> You don't propose we allow limiting the number of dcache entries,
> either, but rather the memory they use.
>
> The historical limiting of number of tasks through rlimit is anything
> but scientific or natural.  You essentially set it to a random value
> between allowing most users to do their job and preventing things from
> taking down the machine.  With proper resource accounting, which we
> want to have anyway, we can do much better than that, so why shouldn't
> we?

Okay.

I may agree with you, I might not.

It really depends on Frederic's real use case - (Frederic, please 
comment on it).

If we're trying to limit the number of processes *as a way* of limiting 
the amount of memory they use, then yes, what you say makes total sense.

I was always under the assumption that they wanted something more. One 
of the things I remember reading on the descriptions, was that some 
services shouldn't be allowed to fork after a certain point. Then you 
could limit its amount of processes to whatever value it has now.

For that, stack usage may not help for much.

Now, my personal take on this: Use cases like that, if really needed, 
can be achieved in some other ways, that does not even involve cgroups.

One of the things for the near future, is start putting more kinds of 
data in the kmem controller. Things like page tables and the stack are 
the natural candidates. I am in Hannes side in saying that it should be 
enough to disallow any malicious container to do any harm.

But it is not even necessary!

- Each process has a task struct
- task struct comes from the slab.

Even my slab accounting patches are enough to prevent harm outside the 
container. Because if you fill all your kmem with task_structs, you will 
be stopped to go any further.

As a matter of fact, I've being doing it all the time during the last 
few days while testing the patchset.

  parent reply	other threads:[~2012-04-12 16:55 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-11 18:57 [RFD] Merge task counter into memcg Frederic Weisbecker
2012-04-11 18:57 ` Frederic Weisbecker
     [not found] ` <20120411185715.GA4317-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-11 19:21   ` Glauber Costa
2012-04-11 19:21     ` Glauber Costa
2012-04-12 11:19     ` Frederic Weisbecker
2012-04-12 11:19       ` Frederic Weisbecker
     [not found]     ` <4F85D9C6.5000202-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 11:19       ` Frederic Weisbecker
2012-04-12  0:56   ` KAMEZAWA Hiroyuki
2012-04-12  1:07   ` Johannes Weiner
2012-04-12  3:56   ` Alexander Nikiforov
     [not found]     ` <4F86527C.2080507-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2012-04-17  1:09       ` Frederic Weisbecker
2012-04-17  1:09     ` Frederic Weisbecker
2012-04-17  1:09       ` Frederic Weisbecker
2012-04-17  6:45       ` Alexander Nikiforov
2012-04-17  6:45         ` Alexander Nikiforov
2012-04-17 15:23         ` Tejun Heo
2012-04-17 15:23           ` Tejun Heo
     [not found]           ` <20120417152350.GC32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-19  3:34             ` Alexander Nikiforov
2012-04-19  3:34               ` Alexander Nikiforov
     [not found]         ` <4F8D1171.1090504-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2012-04-17 15:23           ` Tejun Heo
     [not found]       ` <20120417010902.GA14646-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-17  6:45         ` Alexander Nikiforov
2012-04-12  4:00   ` Alexander Nikiforov
2012-04-12  4:00     ` Alexander Nikiforov
2012-04-12  0:56 ` KAMEZAWA Hiroyuki
2012-04-12  0:56   ` KAMEZAWA Hiroyuki
     [not found]   ` <4F862851.3040208-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-12 11:32     ` Frederic Weisbecker
2012-04-12 11:32       ` Frederic Weisbecker
2012-04-12 11:43       ` Glauber Costa
2012-04-12 11:43         ` Glauber Costa
     [not found]         ` <4F86BFC6.2050400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 12:32           ` Johannes Weiner
2012-04-12 12:32         ` Johannes Weiner
2012-04-12 12:32           ` Johannes Weiner
     [not found]           ` <20120412123256.GI1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12 13:12             ` Glauber Costa
2012-04-12 13:12               ` Glauber Costa
2012-04-12 15:30               ` Johannes Weiner
2012-04-12 15:30                 ` Johannes Weiner
     [not found]                 ` <20120412153055.GL1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12 16:38                   ` Tejun Heo
2012-04-12 16:38                     ` Tejun Heo
     [not found]                     ` <20120412163825.GB13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-12 17:04                       ` Cgroup in a single hierarchy (Was: Re: [RFD] Merge task counter into memcg) Glauber Costa
2012-04-12 17:04                         ` Glauber Costa
2012-04-17 15:13                         ` Tejun Heo
2012-04-17 15:13                           ` Tejun Heo
     [not found]                           ` <20120417151352.GA32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-17 15:27                             ` Glauber Costa
2012-04-17 15:27                               ` Glauber Costa
     [not found]                         ` <4F870B18.5060703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-17 15:13                           ` Tejun Heo
2012-04-12 17:13                       ` [RFD] Merge task counter into memcg Glauber Costa
2012-04-12 17:13                         ` Glauber Costa
2012-04-12 17:23                       ` Johannes Weiner
2012-04-12 17:23                     ` Johannes Weiner
2012-04-12 17:23                       ` Johannes Weiner
     [not found]                       ` <20120412172309.GM1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12 17:41                         ` Tejun Heo
2012-04-12 17:41                           ` Tejun Heo
2012-04-12 17:53                           ` Glauber Costa
2012-04-12 17:53                             ` Glauber Costa
2012-04-13  1:42                           ` KAMEZAWA Hiroyuki
2012-04-13  1:42                             ` KAMEZAWA Hiroyuki
2012-04-17 15:41                             ` Tejun Heo
2012-04-17 15:41                               ` Tejun Heo
2012-04-17 16:52                               ` Glauber Costa
2012-04-17 16:52                                 ` Glauber Costa
     [not found]                                 ` <4F8D9FC4.3080800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-18  6:51                                   ` KAMEZAWA Hiroyuki
2012-04-18  6:51                                 ` KAMEZAWA Hiroyuki
2012-04-18  6:51                                   ` KAMEZAWA Hiroyuki
2012-04-18  7:53                                   ` Frederic Weisbecker
2012-04-18  7:53                                     ` Frederic Weisbecker
2012-04-18  8:42                                     ` KAMEZAWA Hiroyuki
2012-04-18  8:42                                       ` KAMEZAWA Hiroyuki
     [not found]                                       ` <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-18  9:12                                         ` Frederic Weisbecker
2012-04-18 10:39                                         ` Johannes Weiner
2012-04-18  9:12                                       ` Frederic Weisbecker
2012-04-18  9:12                                         ` Frederic Weisbecker
2012-04-18 10:39                                       ` Johannes Weiner
2012-04-18 10:39                                         ` Johannes Weiner
2012-04-18 11:00                                         ` KAMEZAWA Hiroyuki
2012-04-18 11:00                                           ` KAMEZAWA Hiroyuki
     [not found]                                         ` <20120418103930.GA1771-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-18 11:00                                           ` KAMEZAWA Hiroyuki
     [not found]                                     ` <CAFTL4hw3C4s6VS07pJzdBawv0ugKJJa+Vnb-Q_9FrWEq4=ka9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-18  8:42                                       ` KAMEZAWA Hiroyuki
     [not found]                                   ` <4F8E646B.1020807-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-18  7:53                                     ` Frederic Weisbecker
     [not found]                               ` <20120417154117.GE32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-17 16:52                                 ` Glauber Costa
     [not found]                             ` <4F878480.60505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-13  1:50                               ` Glauber Costa
2012-04-13  1:50                                 ` Glauber Costa
2012-04-13  2:48                                 ` KAMEZAWA Hiroyuki
2012-04-13  2:48                                   ` KAMEZAWA Hiroyuki
     [not found]                                 ` <4F87865F.5060701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-13  2:48                                   ` KAMEZAWA Hiroyuki
2012-04-17 15:41                               ` Tejun Heo
     [not found]                           ` <20120412174155.GC13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-12 17:53                             ` Glauber Costa
2012-04-13  1:42                             ` KAMEZAWA Hiroyuki
2012-04-12 16:54                   ` Glauber Costa
2012-04-12 16:54                 ` Glauber Costa [this message]
2012-04-12 16:54                   ` Glauber Costa
     [not found]               ` <4F86D4BD.1040305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 15:30                 ` Johannes Weiner
     [not found]       ` <20120412113217.GB11455-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-12 11:43         ` Glauber Costa
2012-04-12  1:07 ` Johannes Weiner
2012-04-12  1:07   ` Johannes Weiner
     [not found]   ` <20120412010745.GE1787-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-04-12  2:15     ` Glauber Costa
2012-04-12  2:15       ` Glauber Costa
2012-04-12  3:26     ` Li Zefan
2012-04-12  3:26       ` Li Zefan
2012-04-12 14:55     ` Frederic Weisbecker
2012-04-12 14:55       ` Frederic Weisbecker
     [not found]       ` <20120412145507.GC11455-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-12 16:34         ` Glauber Costa
2012-04-12 16:34           ` Glauber Costa
     [not found]           ` <4F87042A.2000902-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-12 16:59             ` Frederic Weisbecker
2012-04-12 16:59               ` Frederic Weisbecker
2012-04-17 15:17               ` Tejun Heo
2012-04-17 15:17                 ` Tejun Heo
2012-04-18  6:54                 ` Frederic Weisbecker
2012-04-18  6:54                   ` Frederic Weisbecker
2012-04-18  8:10                   ` Frederic Weisbecker
2012-04-18  8:10                     ` Frederic Weisbecker
     [not found]                     ` <CAFTL4hxXT+hXWEnKop84JQ8ieHX4e=otpHnXYxdxaPgsiZYCiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-18 12:00                       ` Glauber Costa
2012-04-18 12:00                     ` Glauber Costa
2012-04-18 12:00                       ` Glauber Costa
2012-04-18  8:10                   ` Frederic Weisbecker
     [not found]                 ` <20120417151753.GB32402-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-18  6:54                   ` Frederic Weisbecker
     [not found]               ` <20120412165922.GA12484-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2012-04-17 15:17                 ` Tejun Heo
2012-04-11 18:57 Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8708B2.5050301@parallels.com \
    --to=glommer@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=berrange@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dwalsh@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.