From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id AB2BC8D0039 for ; Mon, 17 Jan 2011 14:14:13 -0500 (EST) Date: Mon, 17 Jan 2011 20:14:00 +0100 From: Johannes Weiner Subject: [LSF/MM TOPIC] memory control groups Message-ID: <20110117191359.GI2212@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org Cc: KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse List-ID: Hello, on the MM summit, I would like to talk about the current state of memory control groups, the features and extensions that are currently being developed for it, and what their status is. I am especially interested in talking about the current runtime memory overhead memcg comes with (1% of ram) and what we can do to shrink it. In comparison to how efficiently struct page is packed, and given that distro kernels come with memcg enabled per default, I think we should put a bit more thought into how struct page_cgroup (which exists for every page in the system as well) is organized. I have a patch series that removes the page backpointer from struct page_cgroup by storing a node ID (or section ID, depending on whether sparsemem is configured) in the free bits of pc->flags. I also plan on replacing the pc->mem_cgroup pointer with an ID (KAMEZAWA-san has patches for that), and move it to pc->flags too. Every flag not used means doubling the amount of possible control groups, so I have patches that get rid of some flags currently allocated, including PCG_CACHE, PCG_ACCT_LRU, and PCG_MIGRATION. [ I meant to send those out much earlier already, but a bug in the migration rework was not responding to my yelling 'Marco', and now my changes collide horribly with THP, so it will take another rebase. ] The per-memcg dirty accounting work e.g. allocates a bunch of new bits in pc->flags and I'd like to hash out if this leaves enough room for the structure packing I described, or whether we can come up with a different way of tracking state. Would other people be interested in discussing this? Hannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id B23038D0039 for ; Mon, 17 Jan 2011 20:16:59 -0500 (EST) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 2C2C03EE0AE for ; Tue, 18 Jan 2011 10:16:56 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 0762F45DD74 for ; Tue, 18 Jan 2011 10:16:56 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id D92C645DE55 for ; Tue, 18 Jan 2011 10:16:55 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id CA8141DB803F for ; Tue, 18 Jan 2011 10:16:55 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.249.87.106]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 819511DB803B for ; Tue, 18 Jan 2011 10:16:55 +0900 (JST) Date: Tue, 18 Jan 2011 10:10:57 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [LSF/MM TOPIC] memory control groups Message-Id: <20110118101057.51d20ed7.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20110117191359.GI2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse List-ID: On Mon, 17 Jan 2011 20:14:00 +0100 Johannes Weiner wrote: > Hello, > > on the MM summit, I would like to talk about the current state of > memory control groups, the features and extensions that are currently > being developed for it, and what their status is. > > I am especially interested in talking about the current runtime memory > overhead memcg comes with (1% of ram) and what we can do to shrink it. > > In comparison to how efficiently struct page is packed, and given that > distro kernels come with memcg enabled per default, I think we should > put a bit more thought into how struct page_cgroup (which exists for > every page in the system as well) is organized. > > I have a patch series that removes the page backpointer from struct > page_cgroup by storing a node ID (or section ID, depending on whether > sparsemem is configured) in the free bits of pc->flags. > > I also plan on replacing the pc->mem_cgroup pointer with an ID > (KAMEZAWA-san has patches for that), and move it to pc->flags too. > Every flag not used means doubling the amount of possible control > groups, so I have patches that get rid of some flags currently > allocated, including PCG_CACHE, PCG_ACCT_LRU, and PCG_MIGRATION. > > [ I meant to send those out much earlier already, but a bug in the > migration rework was not responding to my yelling 'Marco', and now my > changes collide horribly with THP, so it will take another rebase. ] > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > in pc->flags and I'd like to hash out if this leaves enough room for > the structure packing I described, or whether we can come up with a > different way of tracking state. > I see that there are requests for shrinking page_cgroup. And yes, I think we should do so. I think there are trade-off between performance v.s. memory usage. So, could you show the numbers when we discuss it ? BTW, I think we can... - PCG_ACCT_LRU bit can be dropped.(I think list_empty(&pc->lru) can be used. ROOT cgroup will not be problem.) - pc->mem_cgroup can be replaced with ID. But move it into flags field seems difficult because of races. - pc->page can be replaced with some lookup routine. But Section bit encoding may be something mysterious and look up cost will be problem. - PCG_CACHE bit is a duplicate of information of 'page'. So, we can use PageAnon() - I'm not sure PCG_MIGRATION. It's for avoiding races. Note: we'll need to use 16bits for blkio tracking. Another idea is dynamic allocation of page_cgroup. It may be able to be a help for THP enviroment but will not work well (just adds overhead) against file cache workload. Anwyay, my priority of development for memcg this year is: 1. dirty ratio support. 2. Backgound reclaim (kswapd) 3. blkio tracking. Diet of page_cgroup should be done in step by step. We've seen many level down when some new feature comes to memory cgroup. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 8B4818D0039 for ; Tue, 18 Jan 2011 03:18:02 -0500 (EST) Received: from wpaz9.hot.corp.google.com (wpaz9.hot.corp.google.com [172.24.198.73]) by smtp-out.google.com with ESMTP id p0I8Ht04032606 for ; Tue, 18 Jan 2011 00:17:59 -0800 Received: from qyk33 (qyk33.prod.google.com [10.241.83.161]) by wpaz9.hot.corp.google.com with ESMTP id p0I8HrNI011215 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Tue, 18 Jan 2011 00:17:54 -0800 Received: by qyk33 with SMTP id 33so5918915qyk.16 for ; Tue, 18 Jan 2011 00:17:53 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20110117191359.GI2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> Date: Tue, 18 Jan 2011 00:17:53 -0800 Message-ID: Subject: Re: [LSF/MM TOPIC] memory control groups From: Michel Lespinasse Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han List-ID: On Mon, Jan 17, 2011 at 11:14 AM, Johannes Weiner wrote: > on the MM summit, I would like to talk about the current state of > memory control groups, the features and extensions that are currently > being developed for it, and what their status is. +1 - there is a lot to discuss about memcg... > I am especially interested in talking about the current runtime memory > overhead memcg comes with (1% of ram) and what we can do to shrink it. > > In comparison to how efficiently struct page is packed, and given that > distro kernels come with memcg enabled per default, I think we should > put a bit more thought into how struct page_cgroup (which exists for > every page in the system as well) is organized. > > I have a patch series that removes the page backpointer from struct > page_cgroup by storing a node ID (or section ID, depending on whether > sparsemem is configured) in the free bits of pc->flags. > > I also plan on replacing the pc->mem_cgroup pointer with an ID > (KAMEZAWA-san has patches for that), and move it to pc->flags too. > Every flag not used means doubling the amount of possible control > groups, so I have patches that get rid of some flags currently > allocated, including PCG_CACHE, PCG_ACCT_LRU, and PCG_MIGRATION. > > [ I meant to send those out much earlier already, but a bug in the > migration rework was not responding to my yelling 'Marco', and now my > changes collide horribly with THP, so it will take another rebase. ] > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > in pc->flags and I'd like to hash out if this leaves enough room for > the structure packing I described, or whether we can come up with a > different way of tracking state. This is probably longer term, but I would love to get rid of the duplication between global LRU and per-cgroup LRU. Global LRU could be approximated by scanning all per-cgroup LRU lists (in mounts proportional to the list lengths). > Would other people be interested in discussing this? Definitely. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id D8A568D0039 for ; Tue, 18 Jan 2011 03:40:22 -0500 (EST) Date: Tue, 18 Jan 2011 09:40:13 +0100 From: Johannes Weiner Subject: Re: [LSF/MM TOPIC] memory control groups Message-ID: <20110118084013.GK2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> <20110118101057.51d20ed7.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110118101057.51d20ed7.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse List-ID: On Tue, Jan 18, 2011 at 10:10:57AM +0900, KAMEZAWA Hiroyuki wrote: > On Mon, 17 Jan 2011 20:14:00 +0100 > Johannes Weiner wrote: > > > Hello, > > > > on the MM summit, I would like to talk about the current state of > > memory control groups, the features and extensions that are currently > > being developed for it, and what their status is. > > > > I am especially interested in talking about the current runtime memory > > overhead memcg comes with (1% of ram) and what we can do to shrink it. > > > > In comparison to how efficiently struct page is packed, and given that > > distro kernels come with memcg enabled per default, I think we should > > put a bit more thought into how struct page_cgroup (which exists for > > every page in the system as well) is organized. > > > > I have a patch series that removes the page backpointer from struct > > page_cgroup by storing a node ID (or section ID, depending on whether > > sparsemem is configured) in the free bits of pc->flags. > > > > I also plan on replacing the pc->mem_cgroup pointer with an ID > > (KAMEZAWA-san has patches for that), and move it to pc->flags too. > > Every flag not used means doubling the amount of possible control > > groups, so I have patches that get rid of some flags currently > > allocated, including PCG_CACHE, PCG_ACCT_LRU, and PCG_MIGRATION. > > > > [ I meant to send those out much earlier already, but a bug in the > > migration rework was not responding to my yelling 'Marco', and now my > > changes collide horribly with THP, so it will take another rebase. ] > > > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > > in pc->flags and I'd like to hash out if this leaves enough room for > > the structure packing I described, or whether we can come up with a > > different way of tracking state. > > > > I see that there are requests for shrinking page_cgroup. And yes, I think > we should do so. I think there are trade-off between performance v.s. > memory usage. So, could you show the numbers when we discuss it ? Yep, I will prepare them anyway for submission. > BTW, I think we can... > > - PCG_ACCT_LRU bit can be dropped.(I think list_empty(&pc->lru) can be used. > ROOT cgroup will not be problem.) Yes, that's what I did. Should be protected by the lru lock and root cgroup pages can easily be marked so that list_empty() works on them. > - pc->mem_cgroup can be replaced with ID. > But move it into flags field seems difficult because of races. > - pc->page can be replaced with some lookup routine. > But Section bit encoding may be something mysterious and look up cost > will be problem. Why is that? The lookup is actually straight-forward, like lookup_page_cgroup(). And we only need it when coming from the per-cgroup LRU, i.e. in reclaim and force_empty. > - PCG_CACHE bit is a duplicate of information of 'page'. So, we can use PageAnon() I did that, too. But for this to work, we need to make sure that pages are always rmapped when they are charged and uncharged. This is one point where I collide with THP. It's also why I complained that migration clears page->mapping of replaced anonymous pages :) > - I'm not sure PCG_MIGRATION. It's for avoiding races. That's also a scary patch... Yeah, it's to prevent uncharging of oldpage in case migration fails and it has to be reused. I changed the migration sequence for memcg a bit so that we don't have to do that anymore. It survived basic testing. > Note: we'll need to use 16bits for blkio tracking. > > Another idea is dynamic allocation of page_cgroup. It may be able to be a help > for THP enviroment but will not work well (just adds overhead) against file cache > workload. > > Anwyay, my priority of development for memcg this year is: > > 1. dirty ratio support. > 2. Backgound reclaim (kswapd) > 3. blkio tracking. > > Diet of page_cgroup should be done in step by step. We've seen many level down > when some new feature comes to memory cgroup. Yes, and that's what I'm afraid of. We would never be able to add a side-feature that makes struct page increase in arbitrary size. If the feature is sufficiently important and there is no other way, it should of course be an option. But it should not be done careless. E.g. I have a suspicion that we might be able to do dirty accounting without all the flags (we have them in the page anyway!) but use proportionals instead. It's not page-accurate, but I think the fundamental problem is solved: when the dirty ratio is exceeded, throttle the cgroup with the biggest dirty share. But yes, that's sort of what I want to discuss :) Hannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 257468D0039 for ; Tue, 18 Jan 2011 03:51:37 -0500 (EST) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 566A13EE0C1 for ; Tue, 18 Jan 2011 17:51:33 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 3B60945DE5B for ; Tue, 18 Jan 2011 17:51:33 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 10B6845DE56 for ; Tue, 18 Jan 2011 17:51:33 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id ED1111DB803B for ; Tue, 18 Jan 2011 17:51:32 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.249.87.106]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id B1E7FE08004 for ; Tue, 18 Jan 2011 17:51:32 +0900 (JST) Date: Tue, 18 Jan 2011 17:45:23 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [LSF/MM TOPIC] memory control groups Message-Id: <20110118174523.5c79a032.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: References: <20110117191359.GI2212@cmpxchg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Michel Lespinasse Cc: Johannes Weiner , linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han List-ID: On Tue, 18 Jan 2011 00:17:53 -0800 Michel Lespinasse wrote: > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > > in pc->flags and I'd like to hash out if this leaves enough room for > > the structure packing I described, or whether we can come up with a > > different way of tracking state. > > This is probably longer term, but I would love to get rid of the > duplication between global LRU and per-cgroup LRU. Global LRU could be > approximated by scanning all per-cgroup LRU lists (in mounts > proportional to the list lengths). > I can't answer why the design, which memory cgroup's meta-page has its own LRU rather than reusing page->lru, is selected at 1st implementation because I didn't join the birth of memcg. Does anyone remember the reason or discussion ? As far as I can tell, I review patches for memcg with the viewpoint as "Whether this patch will affect global LRU or not ? and will never break the algorithm of page reclaim of global LRU ?" Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id 505F48D0039 for ; Tue, 18 Jan 2011 03:53:49 -0500 (EST) Date: Tue, 18 Jan 2011 03:53:40 -0500 (EST) From: CAI Qian Message-ID: <1805847943.28168.1295340820009.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110117191359.GI2212@cmpxchg.org> Subject: Re: [LSF/MM TOPIC] memory control groups MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse , linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org List-ID: ----- Original Message ----- > Hello, > > on the MM summit, I would like to talk about the current state of > memory control groups, the features and extensions that are currently > being developed for it, and what their status is. > > I am especially interested in talking about the current runtime memory > overhead memcg comes with (1% of ram) and what we can do to shrink it. > > In comparison to how efficiently struct page is packed, and given that > distro kernels come with memcg enabled per default, I think we should > put a bit more thought into how struct page_cgroup (which exists for > every page in the system as well) is organized. > > I have a patch series that removes the page backpointer from struct > page_cgroup by storing a node ID (or section ID, depending on whether > sparsemem is configured) in the free bits of pc->flags. > > I also plan on replacing the pc->mem_cgroup pointer with an ID > (KAMEZAWA-san has patches for that), and move it to pc->flags too. > Every flag not used means doubling the amount of possible control > groups, so I have patches that get rid of some flags currently > allocated, including PCG_CACHE, PCG_ACCT_LRU, and PCG_MIGRATION. > > [ I meant to send those out much earlier already, but a bug in the > migration rework was not responding to my yelling 'Marco', and now my > changes collide horribly with THP, so it will take another rebase. ] > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > in pc->flags and I'd like to hash out if this leaves enough room for > the structure packing I described, or whether we can come up with a > different way of tracking state. > > Would other people be interested in discussing this? I would love to be present the testing we have done here in work, and to gather some ideas from the testing angle as a QE engineer if there is an invitation for me to obtain visa/travel budget etc. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with SMTP id 95F188D0039 for ; Tue, 18 Jan 2011 04:23:59 -0500 (EST) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 0DB6F3EE0BC for ; Tue, 18 Jan 2011 18:23:57 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id E67F245DE5C for ; Tue, 18 Jan 2011 18:23:56 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 3280B45DE59 for ; Tue, 18 Jan 2011 18:23:56 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 19BC7E08002 for ; Tue, 18 Jan 2011 18:23:56 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.249.87.106]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id CD431E78001 for ; Tue, 18 Jan 2011 18:23:55 +0900 (JST) Date: Tue, 18 Jan 2011 18:17:57 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [LSF/MM TOPIC] memory control groups Message-Id: <20110118181757.2aefcf87.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20110118084013.GK2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> <20110118101057.51d20ed7.kamezawa.hiroyu@jp.fujitsu.com> <20110118084013.GK2212@cmpxchg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse List-ID: On Tue, 18 Jan 2011 09:40:13 +0100 Johannes Weiner wrote: > On Tue, Jan 18, 2011 at 10:10:57AM +0900, KAMEZAWA Hiroyuki wrote: > > On Mon, 17 Jan 2011 20:14:00 +0100 > > Johannes Weiner wrote: > > - pc->mem_cgroup can be replaced with ID. > > But move it into flags field seems difficult because of races. > > - pc->page can be replaced with some lookup routine. > > But Section bit encoding may be something mysterious and look up cost > > will be problem. > > Why is that? > > The lookup is actually straight-forward, like lookup_page_cgroup(). > And we only need it when coming from the per-cgroup LRU, i.e. in > reclaim and force_empty. > I see usage of pc->page is not very frequent. But I wonder we should revisit performance of lookup_page_cgroup() before adding new weight. > > - PCG_CACHE bit is a duplicate of information of 'page'. So, we can use PageAnon() > > I did that, too. But for this to work, we need to make sure that > pages are always rmapped when they are charged and uncharged. This is > one point where I collide with THP. It's also why I complained that > migration clears page->mapping of replaced anonymous pages :) > > > - I'm not sure PCG_MIGRATION. It's for avoiding races. > > That's also a scary patch... Yeah, it's to prevent uncharging of > oldpage in case migration fails and it has to be reused. I changed > the migration sequence for memcg a bit so that we don't have to do > that anymore. It survived basic testing. > Hmm. I saw level down of migration under memcg several times. So, I don't want to modify running one without enough reason. I guess all SECTION_BITS can be encoded to pc->flags without diet of flags. > > > > Another idea is dynamic allocation of page_cgroup. It may be able to be a help > > for THP enviroment but will not work well (just adds overhead) against file cache > > workload. > > > > Anwyay, my priority of development for memcg this year is: > > > > 1. dirty ratio support. > > 2. Backgound reclaim (kswapd) > > 3. blkio tracking. > > > > Diet of page_cgroup should be done in step by step. We've seen many level down > > when some new feature comes to memory cgroup. > > Yes, and that's what I'm afraid of. We would never be able to add a > side-feature that makes struct page increase in arbitrary size. > > If the feature is sufficiently important and there is no other way, it > should of course be an option. But it should not be done careless. > > E.g. I have a suspicion that we might be able to do dirty accounting > without all the flags (we have them in the page anyway!) but use > proportionals instead. It's not page-accurate, but I think the > fundamental problem is solved: when the dirty ratio is exceeded, > throttle the cgroup with the biggest dirty share. > > But yes, that's sort of what I want to discuss :) > Using proportionals is a choice. But, IIUC, users of memcg wants something like /proc/meminfo. It doesn't match. If I'm an user of container, I want an information like /proc/meminfo for container. Anyway, if the kernel goes to merge IO-less page reclaim, dirty ratio support is the 1st thing we have to implement. Without that, memcg will easily OOM. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 9AD9D8D0039 for ; Tue, 18 Jan 2011 05:20:13 -0500 (EST) Date: Tue, 18 Jan 2011 11:20:06 +0100 From: Johannes Weiner Subject: Re: [LSF/MM TOPIC] memory control groups Message-ID: <20110118102006.GL2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> <20110118101057.51d20ed7.kamezawa.hiroyu@jp.fujitsu.com> <20110118084013.GK2212@cmpxchg.org> <20110118181757.2aefcf87.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110118181757.2aefcf87.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse List-ID: On Tue, Jan 18, 2011 at 06:17:57PM +0900, KAMEZAWA Hiroyuki wrote: > On Tue, 18 Jan 2011 09:40:13 +0100 > Johannes Weiner wrote: > > On Tue, Jan 18, 2011 at 10:10:57AM +0900, KAMEZAWA Hiroyuki wrote: > > > - pc->page can be replaced with some lookup routine. > > > But Section bit encoding may be something mysterious and look up cost > > > will be problem. > > > > Why is that? > > > > The lookup is actually straight-forward, like lookup_page_cgroup(). > > And we only need it when coming from the per-cgroup LRU, i.e. in > > reclaim and force_empty. > > > > I see usage of pc->page is not very frequent. But I wonder we should > revisit performance of lookup_page_cgroup() before adding new weight. I think those are two different things to tackle. But I will make sure to check for performance overhead when removing pc->page. > > > - I'm not sure PCG_MIGRATION. It's for avoiding races. > > > > That's also a scary patch... Yeah, it's to prevent uncharging of > > oldpage in case migration fails and it has to be reused. I changed > > the migration sequence for memcg a bit so that we don't have to do > > that anymore. It survived basic testing. > > > > Hmm. I saw level down of migration under memcg several times. So, I don't > want to modify running one without enough reason. > I guess all SECTION_BITS can be encoded to pc->flags without diet of flags. That's true, there is enough room for that. Those reduction patches I only wrote to also pack the pc->mem_cgroup ID into pc->flags, but these are two independent problems. I would not have finished the patch only for that one tiny flag, but it actually saved code and made it IMO a bit easier to understand. I consider this a serious upside of code that has a history of breaking. But one at the time, first I will finish testing and benchmarking the pc->page removal. > > E.g. I have a suspicion that we might be able to do dirty accounting > > without all the flags (we have them in the page anyway!) but use > > proportionals instead. It's not page-accurate, but I think the > > fundamental problem is solved: when the dirty ratio is exceeded, > > throttle the cgroup with the biggest dirty share. > > Using proportionals is a choice. But, IIUC, users of memcg wants > something like /proc/meminfo. It doesn't match. > If I'm an user of container, I want an information like /proc/meminfo for > container. I totally agree that this is information that needs exporting. But you can easily calculate an absolute number of bytes by applying a memcg's relative proportion to the absolute amount of dirty pages for example. The only difference is that it probably won't be 100% accurate, but a few pages difference should really not matter for user-visible statistics. No? > Anyway, if the kernel goes to merge IO-less page reclaim, dirty ratio > support is the 1st thing we have to implement. > Without that, memcg will easily OOM. Agreed. I am not saying that my memory footprint concerns should stand in the way of merging important infrastructure. This is work that can still be done even after dirty accounting is merged. Thanks, Hannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id C3BB76B0092 for ; Tue, 18 Jan 2011 19:20:42 -0500 (EST) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 387E83EE0BB for ; Wed, 19 Jan 2011 09:20:40 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 201D845DE57 for ; Wed, 19 Jan 2011 09:20:40 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id F38B445DE56 for ; Wed, 19 Jan 2011 09:20:39 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id E68351DB8037 for ; Wed, 19 Jan 2011 09:20:39 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.249.87.107]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 9D3981DB8038 for ; Wed, 19 Jan 2011 09:20:39 +0900 (JST) Date: Wed, 19 Jan 2011 09:14:29 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [LSF/MM TOPIC] memory control groups Message-Id: <20110119091429.e69ce1f8.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20110118102006.GL2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> <20110118101057.51d20ed7.kamezawa.hiroyu@jp.fujitsu.com> <20110118084013.GK2212@cmpxchg.org> <20110118181757.2aefcf87.kamezawa.hiroyu@jp.fujitsu.com> <20110118102006.GL2212@cmpxchg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han , Michel Lespinasse List-ID: On Tue, 18 Jan 2011 11:20:06 +0100 Johannes Weiner wrote: > On Tue, Jan 18, 2011 at 06:17:57PM +0900, KAMEZAWA Hiroyuki wrote: > > > > - I'm not sure PCG_MIGRATION. It's for avoiding races. > > > > > > That's also a scary patch... Yeah, it's to prevent uncharging of > > > oldpage in case migration fails and it has to be reused. I changed > > > the migration sequence for memcg a bit so that we don't have to do > > > that anymore. It survived basic testing. > > > > > > > Hmm. I saw level down of migration under memcg several times. So, I don't > > want to modify running one without enough reason. > > I guess all SECTION_BITS can be encoded to pc->flags without diet of flags. > > That's true, there is enough room for that. > > Those reduction patches I only wrote to also pack the pc->mem_cgroup > ID into pc->flags, but these are two independent problems. > That packing is dangerous because we have lock bit on pc->flags and some access to pc->mem_cgroup is lockless. IIUC, it's difficult to avoid race with modifying pc->mem_cgroup. Hm, if we remove PCG_ACCT_LRU, it may be possible but I'm not sure how FILESTAT etc. is safe. > I would not have finished the patch only for that one tiny flag, but > it actually saved code and made it IMO a bit easier to understand. I > consider this a serious upside of code that has a history of breaking. > > But one at the time, first I will finish testing and benchmarking the > pc->page removal. > Sure. > > > E.g. I have a suspicion that we might be able to do dirty accounting > > > without all the flags (we have them in the page anyway!) but use > > > proportionals instead. It's not page-accurate, but I think the > > > fundamental problem is solved: when the dirty ratio is exceeded, > > > throttle the cgroup with the biggest dirty share. > > > > Using proportionals is a choice. But, IIUC, users of memcg wants > > something like /proc/meminfo. It doesn't match. > > If I'm an user of container, I want an information like /proc/meminfo for > > container. > > I totally agree that this is information that needs exporting. > > But you can easily calculate an absolute number of bytes by applying a > memcg's relative proportion to the absolute amount of dirty pages for > example. The only difference is that it probably won't be 100% > accurate, but a few pages difference should really not matter for > user-visible statistics. > > No? > With proportionals, we can't handle account moving between cgroups. That means rmdir, force_empty, task_move can break dirty statistics into mess. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id D00138D003A for ; Thu, 20 Jan 2011 05:19:02 -0500 (EST) Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [202.81.31.245]) by e23smtp01.au.ibm.com (8.14.4/8.13.1) with ESMTP id p0KAFGjJ029145 for ; Thu, 20 Jan 2011 21:15:16 +1100 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p0KAImKT2408626 for ; Thu, 20 Jan 2011 21:18:48 +1100 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p0KAImo2020003 for ; Thu, 20 Jan 2011 21:18:48 +1100 Date: Thu, 20 Jan 2011 15:48:44 +0530 From: Balbir Singh Subject: Re: [LSF/MM TOPIC] memory control groups Message-ID: <20110120101844.GI2897@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20110117191359.GI2212@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20110117191359.GI2212@cmpxchg.org> Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, KAMEZAWA Hiroyuki , Daisuke Nishimura , Greg Thelen , Ying Han , Michel Lespinasse List-ID: * Johannes Weiner [2011-01-17 20:14:00]: > Hello, > > on the MM summit, I would like to talk about the current state of > memory control groups, the features and extensions that are currently > being developed for it, and what their status is. > > I am especially interested in talking about the current runtime memory > overhead memcg comes with (1% of ram) and what we can do to shrink it. > > In comparison to how efficiently struct page is packed, and given that > distro kernels come with memcg enabled per default, I think we should > put a bit more thought into how struct page_cgroup (which exists for > every page in the system as well) is organized. > > I have a patch series that removes the page backpointer from struct > page_cgroup by storing a node ID (or section ID, depending on whether > sparsemem is configured) in the free bits of pc->flags. > > I also plan on replacing the pc->mem_cgroup pointer with an ID > (KAMEZAWA-san has patches for that), and move it to pc->flags too. > Every flag not used means doubling the amount of possible control > groups, so I have patches that get rid of some flags currently > allocated, including PCG_CACHE, PCG_ACCT_LRU, and PCG_MIGRATION. > > [ I meant to send those out much earlier already, but a bug in the > migration rework was not responding to my yelling 'Marco', and now my > changes collide horribly with THP, so it will take another rebase. ] > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > in pc->flags and I'd like to hash out if this leaves enough room for > the structure packing I described, or whether we can come up with a > different way of tracking state. > > Would other people be interested in discussing this? > I would definitely be if I am invited to the LSF/MM summit. Even otherwise we should discuss this over email -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id D7C278D0039 for ; Sun, 6 Feb 2011 10:45:18 -0500 (EST) Received: from wpaz13.hot.corp.google.com (wpaz13.hot.corp.google.com [172.24.198.77]) by smtp-out.google.com with ESMTP id p16Fj96p013090 for ; Sun, 6 Feb 2011 07:45:09 -0800 Received: from qwe4 (qwe4.prod.google.com [10.241.194.4]) by wpaz13.hot.corp.google.com with ESMTP id p16Fj5su024347 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Sun, 6 Feb 2011 07:45:08 -0800 Received: by qwe4 with SMTP id 4so3858171qwe.15 for ; Sun, 06 Feb 2011 07:45:05 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20110117191359.GI2212@cmpxchg.org> References: <20110117191359.GI2212@cmpxchg.org> Date: Sun, 6 Feb 2011 07:45:05 -0800 Message-ID: Subject: Re: [LSF/MM TOPIC] memory control groups From: Michel Lespinasse Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Greg Thelen , Ying Han On Mon, Jan 17, 2011 at 11:14 AM, Johannes Weiner wrote: > on the MM summit, I would like to talk about the current state of > memory control groups, the features and extensions that are currently > being developed for it, and what their status is. > > I am especially interested in talking about the current runtime memory > overhead memcg comes with (1% of ram) and what we can do to shrink it. > [...] > Would other people be interested in discussing this? Well, YES :) In addition to what you mentioned, I believe it would be possible to avoid the duplication of global vs per-cgroup LRU lists. global scanning would translate into proportional scanning of all per-cgroup lists. If we could get that done, it would IMO become reasonable to integrate back the remaining few page_cgroup fields into struct page itself... -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 80F658D0039 for ; Mon, 7 Feb 2011 00:28:24 -0500 (EST) Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by e28smtp05.in.ibm.com (8.14.4/8.13.1) with ESMTP id p175SIpr022097 for ; Mon, 7 Feb 2011 10:58:18 +0530 Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p175SIDL4395178 for ; Mon, 7 Feb 2011 10:58:18 +0530 Received: from d28av03.in.ibm.com (loopback [127.0.0.1]) by d28av03.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p175SHjG008870 for ; Mon, 7 Feb 2011 16:28:18 +1100 Date: Mon, 7 Feb 2011 10:56:08 +0530 From: Balbir Singh Subject: Re: [LSF/MM TOPIC] memory control groups Message-ID: <20110207052608.GF27729@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20110117191359.GI2212@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Michel Lespinasse Cc: Johannes Weiner , linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, KAMEZAWA Hiroyuki , Daisuke Nishimura , Greg Thelen , Ying Han * Michel Lespinasse [2011-02-06 07:45:05]: > On Mon, Jan 17, 2011 at 11:14 AM, Johannes Weiner wrote: > > on the MM summit, I would like to talk about the current state of > > memory control groups, the features and extensions that are currently > > being developed for it, and what their status is. > > > > I am especially interested in talking about the current runtime memory > > overhead memcg comes with (1% of ram) and what we can do to shrink it. > > [...] > > Would other people be interested in discussing this? > > Well, YES :) > > In addition to what you mentioned, I believe it would be possible to > avoid the duplication of global vs per-cgroup LRU lists. global > scanning would translate into proportional scanning of all per-cgroup > lists. If we could get that done, it would IMO become reasonable to > integrate back the remaining few page_cgroup fields into struct page > itself... > We thought about the duplication and proportial scanning quite a bit prior to final design and integration, but it does not scale well as cgroups increase in number. I would also like to discuss things like accounting shared pages, etc. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id DFC4E8D0039 for ; Mon, 7 Feb 2011 00:28:29 -0500 (EST) Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp04.au.ibm.com (8.14.4/8.13.1) with ESMTP id p175N0XY005778 for ; Mon, 7 Feb 2011 16:23:00 +1100 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p175SIsA1704154 for ; Mon, 7 Feb 2011 16:28:23 +1100 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p175SITT014154 for ; Mon, 7 Feb 2011 16:28:18 +1100 Date: Mon, 7 Feb 2011 10:57:44 +0530 From: Balbir Singh Subject: Re: [LSF/MM TOPIC] memory control groups Message-ID: <20110207052744.GG27729@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20110117191359.GI2212@cmpxchg.org> <20110118174523.5c79a032.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20110118174523.5c79a032.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: Michel Lespinasse , Johannes Weiner , linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, Daisuke Nishimura , Greg Thelen , Ying Han * KAMEZAWA Hiroyuki [2011-01-18 17:45:23]: > On Tue, 18 Jan 2011 00:17:53 -0800 > Michel Lespinasse wrote: > > > > > The per-memcg dirty accounting work e.g. allocates a bunch of new bits > > > in pc->flags and I'd like to hash out if this leaves enough room for > > > the structure packing I described, or whether we can come up with a > > > different way of tracking state. > > > > This is probably longer term, but I would love to get rid of the > > duplication between global LRU and per-cgroup LRU. Global LRU could be > > approximated by scanning all per-cgroup LRU lists (in mounts > > proportional to the list lengths). > > > > I can't answer why the design, which memory cgroup's meta-page has its own LRU > rather than reusing page->lru, is selected at 1st implementation because I didn't > join the birth of memcg. Does anyone remember the reason or discussion ? > The discussions can be found on LKML, some happened during OLS. Keeping local LRU and global LRU was very important because we wanted to make sure global reclaim is not broken or affected. We can discuss this further. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org