From mboxrd@z Thu Jan 1 00:00:00 1970 From: Greg Thelen Subject: Re: [PATCH] per-cgroup tcp buffer limitation Date: Wed, 7 Sep 2011 14:35:17 -0700 Message-ID: References: <1315276556-10970-1-git-send-email-glommer@parallels.com> <4E664766.40200@parallels.com> <4E66A0A9.3060403@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4E66A0A9.3060403@parallels.com> Sender: owner-linux-mm@kvack.org To: Glauber Costa Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, netdev@vger.kernel.org, xemul@parallels.com, "David S. Miller" , Hiroyouki Kamezawa , "Eric W. Biederman" , Suleiman Souhlal List-Id: containers.vger.kernel.org On Tue, Sep 6, 2011 at 3:37 PM, Glauber Costa wrote= : > I think memcg's usage is really all you need here. In the end of the day,= it > tells you how many pages your container has available. The whole > point of kmem cgroup is not any kind of reservation or accounting. The memcg does not reserve memory. It provides upper bound limits on memory usage. A careful admin can configure soft_limit_in_bytes as an approximation of a memory reservation. But the soft limit is really more like a reclaim target when there is global memory pressure. > Once a container (or cgroup) reaches a number of objects *pinned* in memo= ry > (therefore, non-reclaimable), you won't be able to grab anything from it. > >> So >> far my use cases involve a single memory limit which includes both >> kernel and user memory. =A0So I would need a user space agent to poll >> {memcg,kmem}.usage_in_bytes to apply pressure to memcg if kmem grows >> and visa versa. > > Maybe not. > If userspace memory works for you today (supposing it does), why change? Good question. Current upstream memcg user space memory limit does not work for me today. I should have made that more obvious (sorry). See below for details. > Right now you assign X bytes of user memory to a container, and the kerne= l > memory is shared among all of them. If this works for you, kmem_cgroup wo= n't > change that. It just will impose limits over which > your kernel objects can't grow. > > So you don't *need* a userspace agent doing this calculation, because > fundamentally, nothing changed: I am not unbilling memory in memcg to bil= l > it back in kmem_cg. Of course, once it is in, you will be able to do it i= n > such a fine grained fashion if you decide to do so. > >> Do you foresee instantiation of multiple kmem cgroups, so that a >> process could be added into kmem/K1 or kmem/K2? =A0If so do you plan on >> supporting migration between cgroups and/or migration of kmem charge >> between K1 to K2? > > Yes, each container should have its own cgroup, so at least in the use > cases I am concerned, we will have a lot of them. But the usual lifecycle= , > is create, execute and die. Mobility between them > is not something I am overly concerned right now. > > >>>> Do you foresee the kmem cgroup growing to include reclaimable slab, >>>> where freeing one type of memory allows for reclaim of the other? >>> >>> Yes, absolutely. Now I see that you're using kmem to limit the amount of unreclaimable kernel memory. We have a work-in-progress patch series that adds kernel memory accounting = to memcg. These patches allow an admin to specify a single memory limit for a cgroup which encompasses both user memory (as upstream memcg does) and also includes many kernel memory allocations (especially slab, page-tables). When kernel memory grows it puts pressure on user memory; when user memory grows it puts pressure on reclaimable kernel memory using registered shrinkers. We are in the process of cleaning up these memcg slab accounting patches. In my uses cases there is a single memory limit that applies to both kernel and user memory. If a separate kmem cgroup is introduced to manage kernel memory outside of memcg with a distinct limit, then I would need a user space daemon which balances memory between the kmem and memcg subsystems. As kmem grows, this daemon would apply pressure to memcg, and as memcg grows pressure would be applied to kmem. As you stated kernel memory is not necessarily reclaimable. So such reclaim may fail. My resistance to this approach is that with a single memory cgroup admins can do a better job packing a machine. If balancing daemons are employed then more memory would need to be reserved and more user space cpu time would be needed to apply VM pressure between the types of memory. While there are people (like me) who want a combined memory usage limit there are also people (like you) who want separate user and kernel limiting. I have toyed with the idea of having a per cgroup flag that determines if kernel and user memory should be combined charged against a single limit or if they should have separate limits. I have also wondered if there was a way to wire the usage of two subsystems together, then it would also meet meet my needs. But I am not sure how to do that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757214Ab1IGVfq (ORCPT ); Wed, 7 Sep 2011 17:35:46 -0400 Received: from smtp-out.google.com ([74.125.121.67]:45690 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757197Ab1IGVfn convert rfc822-to-8bit (ORCPT ); Wed, 7 Sep 2011 17:35:43 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=dkim-signature:mime-version:in-reply-to:references:from:date: message-id:subject:to:cc:content-type: content-transfer-encoding:x-system-of-record; b=J/GHm+Kycrzdle/B/7hnXX+L6mc+uZg8OtaFESfYEHPzY2GKMqHoO/Ymnra5RCYSq K9DamnF6Hx4CttD1E1BXg== MIME-Version: 1.0 In-Reply-To: <4E66A0A9.3060403@parallels.com> References: <1315276556-10970-1-git-send-email-glommer@parallels.com> <4E664766.40200@parallels.com> <4E66A0A9.3060403@parallels.com> From: Greg Thelen Date: Wed, 7 Sep 2011 14:35:17 -0700 Message-ID: Subject: Re: [PATCH] per-cgroup tcp buffer limitation To: Glauber Costa Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, netdev@vger.kernel.org, xemul@parallels.com, "David S. Miller" , Hiroyouki Kamezawa , "Eric W. Biederman" , Suleiman Souhlal Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 6, 2011 at 3:37 PM, Glauber Costa wrote: > I think memcg's usage is really all you need here. In the end of the day, it > tells you how many pages your container has available. The whole > point of kmem cgroup is not any kind of reservation or accounting. The memcg does not reserve memory. It provides upper bound limits on memory usage. A careful admin can configure soft_limit_in_bytes as an approximation of a memory reservation. But the soft limit is really more like a reclaim target when there is global memory pressure. > Once a container (or cgroup) reaches a number of objects *pinned* in memory > (therefore, non-reclaimable), you won't be able to grab anything from it. > >> So >> far my use cases involve a single memory limit which includes both >> kernel and user memory.  So I would need a user space agent to poll >> {memcg,kmem}.usage_in_bytes to apply pressure to memcg if kmem grows >> and visa versa. > > Maybe not. > If userspace memory works for you today (supposing it does), why change? Good question. Current upstream memcg user space memory limit does not work for me today. I should have made that more obvious (sorry). See below for details. > Right now you assign X bytes of user memory to a container, and the kernel > memory is shared among all of them. If this works for you, kmem_cgroup won't > change that. It just will impose limits over which > your kernel objects can't grow. > > So you don't *need* a userspace agent doing this calculation, because > fundamentally, nothing changed: I am not unbilling memory in memcg to bill > it back in kmem_cg. Of course, once it is in, you will be able to do it in > such a fine grained fashion if you decide to do so. > >> Do you foresee instantiation of multiple kmem cgroups, so that a >> process could be added into kmem/K1 or kmem/K2?  If so do you plan on >> supporting migration between cgroups and/or migration of kmem charge >> between K1 to K2? > > Yes, each container should have its own cgroup, so at least in the use > cases I am concerned, we will have a lot of them. But the usual lifecycle, > is create, execute and die. Mobility between them > is not something I am overly concerned right now. > > >>>> Do you foresee the kmem cgroup growing to include reclaimable slab, >>>> where freeing one type of memory allows for reclaim of the other? >>> >>> Yes, absolutely. Now I see that you're using kmem to limit the amount of unreclaimable kernel memory. We have a work-in-progress patch series that adds kernel memory accounting to memcg. These patches allow an admin to specify a single memory limit for a cgroup which encompasses both user memory (as upstream memcg does) and also includes many kernel memory allocations (especially slab, page-tables). When kernel memory grows it puts pressure on user memory; when user memory grows it puts pressure on reclaimable kernel memory using registered shrinkers. We are in the process of cleaning up these memcg slab accounting patches. In my uses cases there is a single memory limit that applies to both kernel and user memory. If a separate kmem cgroup is introduced to manage kernel memory outside of memcg with a distinct limit, then I would need a user space daemon which balances memory between the kmem and memcg subsystems. As kmem grows, this daemon would apply pressure to memcg, and as memcg grows pressure would be applied to kmem. As you stated kernel memory is not necessarily reclaimable. So such reclaim may fail. My resistance to this approach is that with a single memory cgroup admins can do a better job packing a machine. If balancing daemons are employed then more memory would need to be reserved and more user space cpu time would be needed to apply VM pressure between the types of memory. While there are people (like me) who want a combined memory usage limit there are also people (like you) who want separate user and kernel limiting. I have toyed with the idea of having a per cgroup flag that determines if kernel and user memory should be combined charged against a single limit or if they should have separate limits. I have also wondered if there was a way to wire the usage of two subsystems together, then it would also meet meet my needs. But I am not sure how to do that.