From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH v1 0/7] memcg remove pre_destroy Date: Thu, 12 Apr 2012 20:17:18 +0900 Message-ID: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "linux-mm@kvack.org" Cc: "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton , KAMEZAWA Hiroyuki In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. By pre_destroy(), rmdir of cgroup can return -EBUSY or some error. It makes cgroup complicated and unstable. I said O.K. to remove it and this patch is modification for memcg. One of problem in current implementation is that memcg moves all charges to parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may hit parent's limit and may return -EBUSY. To fix this problem, this patch changes behavior of rmdir() as - if use_hierarchy=0, all remaining charges will go to root cgroup. - if use_hierarchy=1, all remaining charges will go to the parent. By this, rmdir failure will not be caused by parent's limitation. And I think this meets meaning of use_hierarchy. This series does - add above change of behavior - use workqueue to move all pages to parent - remove unnecessary codes. I'm sorry if my reply is delayed, I'm not sure I can have enough time in this weekend. Any comments are welcomed. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx117.postini.com [74.125.245.117]) by kanga.kvack.org (Postfix) with SMTP id D9E4B6B00F9 for ; Thu, 12 Apr 2012 07:21:58 -0400 (EDT) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 7D2973EE0B5 for ; Thu, 12 Apr 2012 20:21:57 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id F387C45DE61 for ; Thu, 12 Apr 2012 20:21:53 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id DB12045DE5A for ; Thu, 12 Apr 2012 20:21:53 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id CBD411DB8051 for ; Thu, 12 Apr 2012 20:21:53 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 7962F1DB804B for ; Thu, 12 Apr 2012 20:21:53 +0900 (JST) Message-ID: <4F86BA66.2010503@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:20:06 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton This function is used for moving accounting information to its parent in the hierarchy of res_counter. Signed-off-by: KAMEZAWA Hiroyuki --- include/linux/res_counter.h | 3 +++ kernel/res_counter.c | 13 +++++++++++++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index da81af0..8919d3c 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); void res_counter_uncharge(struct res_counter *counter, unsigned long val); +/* move resource to parent counter...i.e. just forget accounting in a child */ +void res_counter_move_parent(struct res_counter *counter, unsigned long val); + /** * res_counter_margin - calculate chargeable space of a counter * @cnt: the counter diff --git a/kernel/res_counter.c b/kernel/res_counter.c index d508363..fafebf0 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) local_irq_restore(flags); } +/* + * In hierarchical accounting, child's usage is accounted into ancestors. + * To move local usage to its parent, just forget current level usage. + */ +void res_counter_move_parent(struct res_counter *counter, unsigned long val) +{ + unsigned long flags; + + BUG_ON(!counter->parent); + spin_lock_irqsave(&counter->lock, flags); + res_counter_uncharge_locked(counter, val); + spin_unlock_irqrestore(&counter->lock, flags); +} static inline unsigned long long * res_counter_member(struct res_counter *counter, int member) -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id D271D6B00FB for ; Thu, 12 Apr 2012 07:23:13 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 5C1A33EE0B5 for ; Thu, 12 Apr 2012 20:23:12 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 4002445DE52 for ; Thu, 12 Apr 2012 20:23:12 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 24F5F45DE4F for ; Thu, 12 Apr 2012 20:23:12 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 16366E08002 for ; Thu, 12 Apr 2012 20:23:12 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id BF2561DB802F for ; Thu, 12 Apr 2012 20:23:11 +0900 (JST) Message-ID: <4F86BAB0.5030809@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:21:20 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 2/7] memcg: move charge to parent only when necessary. References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton When memcg->use_hierarchy==true, the parent res_counter includes the usage in child's usage. So, it's not necessary to call try_charge() in the parent. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 39 ++++++++++++++++++++++++++++++++------- 1 files changed, 32 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fa01106..3215880 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, res_counter_uncharge(&memcg->memsw, bytes); } } +/* + * Moving usage between a child to its parent if use_hierarchy==true. + */ +static void __mem_cgroup_move_charge_parent(struct mem_cgroup *memcg, + unsigned int nr_pages) +{ + if (!mem_cgroup_is_root(memcg)) { + unsigned long bytes = nr_pages * PAGE_SIZE; + + res_counter_move_parent(&memcg->res, bytes); + if (do_swap_account) + res_counter_move_parent(&memcg->memsw, bytes); + } +} /* * A helper function to get mem_cgroup from ID. must be called under @@ -2666,18 +2680,29 @@ static int mem_cgroup_move_parent(struct page *page, goto put; nr_pages = hpage_nr_pages(page); - parent = mem_cgroup_from_cont(pcg); - ret = __mem_cgroup_try_charge(NULL, gfp_mask, nr_pages, &parent, false); - if (ret) - goto put_back; + + if (!parent->use_hierarchy) { + ret = __mem_cgroup_try_charge(NULL, gfp_mask, + nr_pages, &parent, false); + if (ret) + goto put_back; + } if (nr_pages > 1) flags = compound_lock_irqsave(page); - ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, true); - if (ret) - __mem_cgroup_cancel_charge(parent, nr_pages); + if (!parent->use_hierarchy) { + ret = mem_cgroup_move_account(page, nr_pages, pc, + child, parent, true); + if (ret) + __mem_cgroup_cancel_charge(parent, nr_pages); + } else { + ret = mem_cgroup_move_account(page, nr_pages, pc, + child, parent, false); + if (!ret) + __mem_cgroup_move_charge_parent(child, nr_pages); + } if (nr_pages > 1) compound_unlock_irqrestore(page, flags); -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx133.postini.com [74.125.245.133]) by kanga.kvack.org (Postfix) with SMTP id E4C746B00FD for ; Thu, 12 Apr 2012 07:24:34 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 0907A3EE081 for ; Thu, 12 Apr 2012 20:24:33 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id E0B7E45DE4E for ; Thu, 12 Apr 2012 20:24:32 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id C466345DD74 for ; Thu, 12 Apr 2012 20:24:32 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id B6BCB1DB803A for ; Thu, 12 Apr 2012 20:24:32 +0900 (JST) Received: from m105.s.css.fujitsu.com (m105.s.css.fujitsu.com [10.240.81.145]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 5FE661DB802C for ; Thu, 12 Apr 2012 20:24:32 +0900 (JST) Message-ID: <4F86BB02.2060607@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:22:42 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 3/7] memcg: move charges to root at rmdir() References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton As recently discussed, Tejun Heo, the cgroup maintainer, tries to remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir(). To do that, in memcg, handling case of use_hierarchy==false is a problem. We move memcg's charges to its parent at rmdir(). If use_hierarchy==true, it's already accounted in the parent, no problem. If use_hierarchy==false, we cannot guarantee we can move all charges to the parent. This patch changes the behavior to move all charges to root_mem_cgroup if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship. With this, we can remove -ENOMEM error check at pre_destroy(), called at rmdir. Signed-off-by: KAMEZAWA Hiroyuki --- Documentation/cgroups/memory.txt | 6 ++++-- mm/memcontrol.c | 38 ++++++++++++-------------------------- 2 files changed, 16 insertions(+), 28 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 84d4f00..f717f6a 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -377,8 +377,10 @@ cgroup might have some charge associated with it, even though all tasks have migrated away from it. (because we charge against pages, not against tasks.) -Such charges are freed or moved to their parent. At moving, both of RSS -and CACHES are moved to parent. +Such charges are freed or moved to their parent if use_hierarchy==true. +If use_hierarchy==false, charges are moved to root memory cgroup. + +At moving, both of RSS and CACHES are moved to parent. rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also. Charges recorded in swap information is not updated at removal of cgroup. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3215880..8246418 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2662,15 +2662,14 @@ static int mem_cgroup_move_parent(struct page *page, struct mem_cgroup *child, gfp_t gfp_mask) { - struct cgroup *cg = child->css.cgroup; - struct cgroup *pcg = cg->parent; struct mem_cgroup *parent; unsigned int nr_pages; unsigned long uninitialized_var(flags); + bool need_cancel = false; int ret; /* Is ROOT ? */ - if (!pcg) + if (mem_cgroup_is_root(child)) return -EINVAL; ret = -EBUSY; @@ -2680,33 +2679,23 @@ static int mem_cgroup_move_parent(struct page *page, goto put; nr_pages = hpage_nr_pages(page); - parent = mem_cgroup_from_cont(pcg); - - if (!parent->use_hierarchy) { - ret = __mem_cgroup_try_charge(NULL, gfp_mask, - nr_pages, &parent, false); - if (ret) - goto put_back; + parent = parent_mem_cgroup(child); + if (!parent) { + parent = root_mem_cgroup; + need_cancel = true; } if (nr_pages > 1) flags = compound_lock_irqsave(page); - if (!parent->use_hierarchy) { - ret = mem_cgroup_move_account(page, nr_pages, pc, - child, parent, true); - if (ret) - __mem_cgroup_cancel_charge(parent, nr_pages); - } else { - ret = mem_cgroup_move_account(page, nr_pages, pc, - child, parent, false); - if (!ret) - __mem_cgroup_move_charge_parent(child, nr_pages); - } + ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, + need_cancel); + if (!need_cancel && !ret) + __mem_cgroup_move_charge_parent(child, nr_pages); if (nr_pages > 1) compound_unlock_irqrestore(page, flags); -put_back: + putback_lru_page(page); put: put_page(page); @@ -3677,7 +3666,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, pc = lookup_page_cgroup(page); ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL); - if (ret == -ENOMEM || ret == -EINTR) + if (ret == -EINTR) break; if (ret == -EBUSY || ret == -EINVAL) { @@ -3738,9 +3727,6 @@ move_account: } mem_cgroup_end_move(memcg); memcg_oom_recover(memcg); - /* it seems parent cgroup doesn't have enough mem */ - if (ret == -ENOMEM) - goto try_to_free; cond_resched(); /* "ret" should also be checked to ensure all lists are empty. */ } while (memcg->res.usage > 0 || ret); -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx106.postini.com [74.125.245.106]) by kanga.kvack.org (Postfix) with SMTP id A3D326B00FF for ; Thu, 12 Apr 2012 07:26:03 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 3F5BC3EE0C0 for ; Thu, 12 Apr 2012 20:26:02 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 227F145DE51 for ; Thu, 12 Apr 2012 20:26:02 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id F209045DE4E for ; Thu, 12 Apr 2012 20:26:01 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id E44831DB8041 for ; Thu, 12 Apr 2012 20:26:01 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 9173C1DB803E for ; Thu, 12 Apr 2012 20:26:01 +0900 (JST) Message-ID: <4F86BB5E.6080509@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:24:14 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account() References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Only one call passes 'true'. remove it and handle it in caller. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 29 ++++++++++++----------------- 1 files changed, 12 insertions(+), 17 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8246418..9ac7984 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2576,23 +2576,19 @@ void mem_cgroup_split_huge_fixup(struct page *head) * @pc: page_cgroup of the page. * @from: mem_cgroup which the page is moved from. * @to: mem_cgroup which the page is moved to. @from != @to. - * @uncharge: whether we should call uncharge and css_put against @from. * * The caller must confirm following. * - page is not on LRU (isolate_page() is useful.) * - compound_lock is held when nr_pages > 1 * - * This function doesn't do "charge" nor css_get to new cgroup. It should be - * done by a caller(__mem_cgroup_try_charge would be useful). If @uncharge is - * true, this function does "uncharge" from old cgroup, but it doesn't if - * @uncharge is false, so a caller should do "uncharge". + * This function doesn't access res_counter at all. Caller should take + * care of it. */ static int mem_cgroup_move_account(struct page *page, unsigned int nr_pages, struct page_cgroup *pc, struct mem_cgroup *from, - struct mem_cgroup *to, - bool uncharge) + struct mem_cgroup *to) { unsigned long flags; int ret; @@ -2626,9 +2622,6 @@ static int mem_cgroup_move_account(struct page *page, preempt_enable(); } mem_cgroup_charge_statistics(from, anon, -nr_pages); - if (uncharge) - /* This is not "cancel", but cancel_charge does all we need. */ - __mem_cgroup_cancel_charge(from, nr_pages); /* caller should have done css_get */ pc->mem_cgroup = to; @@ -2688,10 +2681,13 @@ static int mem_cgroup_move_parent(struct page *page, if (nr_pages > 1) flags = compound_lock_irqsave(page); - ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, - need_cancel); - if (!need_cancel && !ret) - __mem_cgroup_move_charge_parent(child, nr_pages); + ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent); + if (!ret) { + if (need_cancel) + __mem_cgroup_cancel_charge(child, nr_pages); + else + __mem_cgroup_move_charge_parent(child, nr_pages); + } if (nr_pages > 1) compound_unlock_irqrestore(page, flags); @@ -5451,8 +5447,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, if (!isolate_lru_page(page)) { pc = lookup_page_cgroup(page); if (!mem_cgroup_move_account(page, HPAGE_PMD_NR, - pc, mc.from, mc.to, - false)) { + pc, mc.from, mc.to)) { mc.precharge -= HPAGE_PMD_NR; mc.moved_charge += HPAGE_PMD_NR; } @@ -5482,7 +5477,7 @@ retry: goto put; pc = lookup_page_cgroup(page); if (!mem_cgroup_move_account(page, 1, pc, - mc.from, mc.to, false)) { + mc.from, mc.to)) { mc.precharge--; /* we uncharge from mc.from later. */ mc.moved_charge++; -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx194.postini.com [74.125.245.194]) by kanga.kvack.org (Postfix) with SMTP id 22E1D6B0101 for ; Thu, 12 Apr 2012 07:30:39 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id B4D7F3EE0AE for ; Thu, 12 Apr 2012 20:30:37 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 9AB9345DEB4 for ; Thu, 12 Apr 2012 20:30:37 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 8087F45DEB2 for ; Thu, 12 Apr 2012 20:30:37 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 4DF131DB803C for ; Thu, 12 Apr 2012 20:30:37 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id E98DFE08006 for ; Thu, 12 Apr 2012 20:30:36 +0900 (JST) Message-ID: <4F86BC71.9070403@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:28:49 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Now, at rmdir, memory cgroup's charge will be moved to - parent if use_hierarchy=1 - root if use_hierarchy=0 Then, we don't have to have memory reclaim code at destroying memcg. This patch divides force_empty to 2 functions as - memory_cgroup_recharge() ... try to move all charges to ancestors. - memory_cgroup_force_empty().. try to reclaim all memory. After this patch, memory.force_empty will _not_ move charges to ancestors but just reclaim all pages. (This meets documenation.) rmdir() will not reclaim any memory but moves charge to other cgroup, parent or root. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 59 +++++++++++++++++++++++++++---------------------------- 1 files changed, 29 insertions(+), 30 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9ac7984..22c8faa 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3619,10 +3619,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order, } /* - * This routine traverse page_cgroup in given list and drop them all. - * *And* this routine doesn't reclaim page itself, just removes page_cgroup. + * This routine traverse page in given list and move them all. */ -static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, +static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, int node, int zid, enum lru_list lru) { struct mem_cgroup_per_zone *mz; @@ -3678,24 +3677,12 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, return ret; } -/* - * make mem_cgroup's charge to be 0 if there is no task. - * This enables deleting this mem_cgroup. - */ -static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) + +static int mem_cgroup_recharge(struct mem_cgroup *memcg) { - int ret; - int node, zid, shrink; - int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + int ret, node, zid; struct cgroup *cgrp = memcg->css.cgroup; - css_get(&memcg->css); - - shrink = 0; - /* should free all ? */ - if (free_all) - goto try_to_free; -move_account: do { ret = -EBUSY; if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) @@ -3712,7 +3699,7 @@ move_account: for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) { enum lru_list lru; for_each_lru(lru) { - ret = mem_cgroup_force_empty_list(memcg, + ret = mem_cgroup_recharge_lru(memcg, node, zid, lru); if (ret) break; @@ -3722,24 +3709,33 @@ move_account: break; } mem_cgroup_end_move(memcg); - memcg_oom_recover(memcg); cond_resched(); /* "ret" should also be checked to ensure all lists are empty. */ } while (memcg->res.usage > 0 || ret); out: - css_put(&memcg->css); return ret; +} + + +/* + * make mem_cgroup's charge to be 0 if there is no task. This is only called + * by memory.force_empty file, an user request. + */ +static int mem_cgroup_force_empty(struct mem_cgroup *memcg) +{ + int ret = 0; + int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + struct cgroup *cgrp = memcg->css.cgroup; + + css_get(&memcg->css); -try_to_free: /* returns EBUSY if there is a task or if we come here twice. */ - if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children) || shrink) { + if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) { ret = -EBUSY; goto out; } /* we call try-to-free pages for make this cgroup empty */ lru_add_drain_all(); - /* try to free all pages in this cgroup */ - shrink = 1; while (nr_retries && memcg->res.usage > 0) { int progress; @@ -3754,16 +3750,19 @@ try_to_free: /* maybe some writeback is necessary */ congestion_wait(BLK_RW_ASYNC, HZ/10); } - } - lru_add_drain(); + if (!nr_retries) + ret = -ENOMEM; +out: + memcg_oom_recover(memcg); + css_put(&memcg->css); /* try move_account...there may be some *locked* pages. */ - goto move_account; + return ret; } int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int event) { - return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true); + return mem_cgroup_force_empty(mem_cgroup_from_cont(cont)); } @@ -4987,7 +4986,7 @@ static int mem_cgroup_pre_destroy(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - return mem_cgroup_force_empty(memcg, false); + return mem_cgroup_recharge(memcg); } static void mem_cgroup_destroy(struct cgroup *cont) -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id 569D76B0103 for ; Thu, 12 Apr 2012 07:32:11 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 551143EE0AE for ; Thu, 12 Apr 2012 20:32:09 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 3109445DE50 for ; Thu, 12 Apr 2012 20:32:09 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 138D345DD74 for ; Thu, 12 Apr 2012 20:32:09 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 053FE1DB803C for ; Thu, 12 Apr 2012 20:32:09 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id A185A1DB802C for ; Thu, 12 Apr 2012 20:32:08 +0900 (JST) Message-ID: <4F86BCCE.5050802@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:30:22 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 6/7] memcg: remove pre_destroy() References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to prevent rmdir() from failure of EBUSY or some. This patch removes pre_destroy() in memcg. All remaining charges will be moved to other cgroup, without any failure, ->destroy() just schedule a work and it will destroy the memcg. Then, rmdir will never fail. The kernel will take care of remaining resources in the cgroup to be accounted correctly. After this patch, memcg will be destroyed by workqueue in asynchrnous way. Then, we can modify 'moving' logic to work asynchrnously, i.e, we don't force users to wait for the end of rmdir(), now. We don't need to use heavy synchronous calls. This patch modifies logics as - Use mem_cgroup_drain_stock_async rather tan drain_stock_sync. - lru_add_drain_all() will be called only when necessary, in a lazy way. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 52 ++++++++++++++++++++++------------------------------ 1 files changed, 22 insertions(+), 30 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 22c8faa..e466809 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -315,6 +315,8 @@ struct mem_cgroup { #ifdef CONFIG_INET struct tcp_memcontrol tcp_mem; #endif + + struct work_struct work_destroy; }; /* Stuffs for move charges at task migration. */ @@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg) mutex_unlock(&percpu_charge_mutex); } -/* This is a synchronous drain interface. */ static void drain_all_stock_sync(struct mem_cgroup *root_memcg) { /* called when force_empty is called */ @@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, pc = lookup_page_cgroup(page); ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL); - if (ret == -EINTR) - break; - if (ret == -EBUSY || ret == -EINVAL) { + VM_BUG_ON(ret != 0 && ret != -EBUSY); + if (ret) { /* found lock contention or "pc" is obsolete. */ busy = page; cond_resched(); @@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, return ret; } - -static int mem_cgroup_recharge(struct mem_cgroup *memcg) +/* + * This function is called after ->destroy(). So, we cannot access cgroup + * of this memcg. + */ +static void mem_cgroup_recharge(struct work_struct *work) { + struct mem_cgroup *memcg; int ret, node, zid; - struct cgroup *cgrp = memcg->css.cgroup; + memcg = container_of(work, struct mem_cgroup, work_destroy); + /* No task points this memcg. call this only once */ + drain_all_stock_async(memcg); do { - ret = -EBUSY; - if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) - goto out; - ret = -EINTR; - if (signal_pending(current)) - goto out; - /* This is for making all *used* pages to be on LRU. */ - lru_add_drain_all(); - drain_all_stock_sync(memcg); ret = 0; mem_cgroup_start_move(memcg); for_each_node_state(node, N_HIGH_MEMORY) { @@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup *memcg) } mem_cgroup_end_move(memcg); cond_resched(); - /* "ret" should also be checked to ensure all lists are empty. */ - } while (memcg->res.usage > 0 || ret); -out: - return ret; + /* drain LRU only when we canoot find pages on LRU */ + if (res_counter_read_u64(&memcg->res, RES_USAGE) && + !mem_cgroup_nr_lru_pages(memcg, LRU_ALL)) + lru_add_drain_all(); + } while (res_counter_read_u64(&memcg->res, RES_USAGE) || ret); + mem_cgroup_put(memcg); } - /* * make mem_cgroup's charge to be 0 if there is no task. This is only called * by memory.force_empty file, an user request. @@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work) memcg = container_of(work, struct mem_cgroup, work_freeing); vfree(memcg); } + static void vfree_rcu(struct rcu_head *rcu_head) { struct mem_cgroup *memcg; @@ -4982,20 +4981,14 @@ free_out: return ERR_PTR(error); } -static int mem_cgroup_pre_destroy(struct cgroup *cont) -{ - struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - - return mem_cgroup_recharge(memcg); -} - static void mem_cgroup_destroy(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); kmem_cgroup_destroy(cont); - mem_cgroup_put(memcg); + INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge); + schedule_work(&memcg->work_destroy); } static int mem_cgroup_populate(struct cgroup_subsys *ss, @@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys = { .name = "memory", .subsys_id = mem_cgroup_subsys_id, .create = mem_cgroup_create, - .pre_destroy = mem_cgroup_pre_destroy, .destroy = mem_cgroup_destroy, .populate = mem_cgroup_populate, .can_attach = mem_cgroup_can_attach, -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx147.postini.com [74.125.245.147]) by kanga.kvack.org (Postfix) with SMTP id DD0906B0105 for ; Thu, 12 Apr 2012 07:33:31 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 80D973EE0BD for ; Thu, 12 Apr 2012 20:33:30 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 66D2045DE52 for ; Thu, 12 Apr 2012 20:33:30 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 4D5EE45DE4F for ; Thu, 12 Apr 2012 20:33:30 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 4054B1DB802F for ; Thu, 12 Apr 2012 20:33:30 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id EA47C1DB803B for ; Thu, 12 Apr 2012 20:33:29 +0900 (JST) Message-ID: <4F86BD18.4010505@jp.fujitsu.com> Date: Thu, 12 Apr 2012 20:31:36 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: [PATCH 7/7] memcg: remove drain_all_stock_sync. References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Because a function moving pages to ancestor works asynchronously now, drain_all_stock_sync() is unnecessary. Signed-off-by: KAMEAZAWA Hiroyuki --- mm/memcontrol.c | 22 ++-------------------- 1 files changed, 2 insertions(+), 20 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e466809..d42811b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2053,7 +2053,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) * of the hierarchy under it. sync flag says whether we should block * until the work is done. */ -static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) +static void drain_all_stock(struct mem_cgroup *root_memcg) { int cpu, curcpu; @@ -2077,16 +2077,6 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) } } put_cpu(); - - if (!sync) - goto out; - - for_each_online_cpu(cpu) { - struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) - flush_work(&stock->work); - } -out: put_online_cpus(); } @@ -2103,15 +2093,7 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg) */ if (!mutex_trylock(&percpu_charge_mutex)) return; - drain_all_stock(root_memcg, false); - mutex_unlock(&percpu_charge_mutex); -} - -static void drain_all_stock_sync(struct mem_cgroup *root_memcg) -{ - /* called when force_empty is called */ - mutex_lock(&percpu_charge_mutex); - drain_all_stock(root_memcg, true); + drain_all_stock(root_memcg); mutex_unlock(&percpu_charge_mutex); } -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx190.postini.com [74.125.245.190]) by kanga.kvack.org (Postfix) with SMTP id 9E17A6B007E for ; Thu, 12 Apr 2012 09:22:30 -0400 (EDT) Message-ID: <4F86D6B2.2020401@parallels.com> Date: Thu, 12 Apr 2012 10:20:50 -0300 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On 04/12/2012 08:17 AM, KAMEZAWA Hiroyuki wrote: > One of problem in current implementation is that memcg moves all charges to > parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may > hit parent's limit and may return -EBUSY. To fix this problem, this patch > changes behavior of rmdir() as > > - if use_hierarchy=0, all remaining charges will go to root cgroup. > - if use_hierarchy=1, all remaining charges will go to the parent. To be quite honest, this is one of those things that we end up overlooking, and just don't think about it in the middle of the complexity. Now that you mention it... When use_hierarchy=0, there is no parent! (At least from where memcg is concerned). So it doesn't make any sense to have it ever have moved it to the "parent" (from the core cgroup perspective). I agree with this new behavior 100 %. Just a nitpick: When use_hierarchy=1, remaining charges need not to "go to the parent". They are already there. I will review your series for the specifics. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Thu, 12 Apr 2012 10:22:59 -0300 Message-ID: <4F86D733.50809@parallels.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86BA66.2010503@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton , Frederic Weisbecker On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: > > This function is used for moving accounting information to its > parent in the hierarchy of res_counter. > > Signed-off-by: KAMEZAWA Hiroyuki Frederic has a patch in his fork cgroup series, that allows you to uncharge a counter until you reach a specific ancestor. You pass the parent as a parameter, and then only you gets uncharged. I think that is a much better interface than this you are proposing. We should probably merge that patch and use it. > --- > include/linux/res_counter.h | 3 +++ > kernel/res_counter.c | 13 +++++++++++++ > 2 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index da81af0..8919d3c 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, > void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); > void res_counter_uncharge(struct res_counter *counter, unsigned long val); > > +/* move resource to parent counter...i.e. just forget accounting in a child */ > +void res_counter_move_parent(struct res_counter *counter, unsigned long val); > + > /** > * res_counter_margin - calculate chargeable space of a counter > * @cnt: the counter > diff --git a/kernel/res_counter.c b/kernel/res_counter.c > index d508363..fafebf0 100644 > --- a/kernel/res_counter.c > +++ b/kernel/res_counter.c > @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) > local_irq_restore(flags); > } > > +/* > + * In hierarchical accounting, child's usage is accounted into ancestors. > + * To move local usage to its parent, just forget current level usage. > + */ > +void res_counter_move_parent(struct res_counter *counter, unsigned long val) > +{ > + unsigned long flags; > + > + BUG_ON(!counter->parent); > + spin_lock_irqsave(&counter->lock, flags); > + res_counter_uncharge_locked(counter, val); > + spin_unlock_irqrestore(&counter->lock, flags); > +} > > static inline unsigned long long * > res_counter_member(struct res_counter *counter, int member) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account() Date: Thu, 12 Apr 2012 10:27:40 -0300 Message-ID: <4F86D84C.1050508@parallels.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BB5E.6080509@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86BB5E.6080509@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On 04/12/2012 08:24 AM, KAMEZAWA Hiroyuki wrote: > Only one call passes 'true'. remove it and handle it in caller. > > Signed-off-by: KAMEZAWA Hiroyuki I like the change. I won't ack the patch itself, though, because it has a dependency with the "need_cancel" thing you introduced in your last patch - that I need to think a bit more. > --- > mm/memcontrol.c | 29 ++++++++++++----------------- > 1 files changed, 12 insertions(+), 17 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8246418..9ac7984 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2576,23 +2576,19 @@ void mem_cgroup_split_huge_fixup(struct page *head) > * @pc: page_cgroup of the page. > * @from: mem_cgroup which the page is moved from. > * @to: mem_cgroup which the page is moved to. @from != @to. > - * @uncharge: whether we should call uncharge and css_put against @from. > * > * The caller must confirm following. > * - page is not on LRU (isolate_page() is useful.) > * - compound_lock is held when nr_pages> 1 > * > - * This function doesn't do "charge" nor css_get to new cgroup. It should be > - * done by a caller(__mem_cgroup_try_charge would be useful). If @uncharge is > - * true, this function does "uncharge" from old cgroup, but it doesn't if > - * @uncharge is false, so a caller should do "uncharge". > + * This function doesn't access res_counter at all. Caller should take > + * care of it. > */ > static int mem_cgroup_move_account(struct page *page, > unsigned int nr_pages, > struct page_cgroup *pc, > struct mem_cgroup *from, > - struct mem_cgroup *to, > - bool uncharge) > + struct mem_cgroup *to) > { > unsigned long flags; > int ret; > @@ -2626,9 +2622,6 @@ static int mem_cgroup_move_account(struct page *page, > preempt_enable(); > } > mem_cgroup_charge_statistics(from, anon, -nr_pages); > - if (uncharge) > - /* This is not "cancel", but cancel_charge does all we need. */ > - __mem_cgroup_cancel_charge(from, nr_pages); > > /* caller should have done css_get */ > pc->mem_cgroup = to; > @@ -2688,10 +2681,13 @@ static int mem_cgroup_move_parent(struct page *page, > if (nr_pages> 1) > flags = compound_lock_irqsave(page); > > - ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, > - need_cancel); > - if (!need_cancel&& !ret) > - __mem_cgroup_move_charge_parent(child, nr_pages); > + ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent); > + if (!ret) { > + if (need_cancel) > + __mem_cgroup_cancel_charge(child, nr_pages); > + else > + __mem_cgroup_move_charge_parent(child, nr_pages); > + } > > if (nr_pages> 1) > compound_unlock_irqrestore(page, flags); > @@ -5451,8 +5447,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, > if (!isolate_lru_page(page)) { > pc = lookup_page_cgroup(page); > if (!mem_cgroup_move_account(page, HPAGE_PMD_NR, > - pc, mc.from, mc.to, > - false)) { > + pc, mc.from, mc.to)) { > mc.precharge -= HPAGE_PMD_NR; > mc.moved_charge += HPAGE_PMD_NR; > } > @@ -5482,7 +5477,7 @@ retry: > goto put; > pc = lookup_page_cgroup(page); > if (!mem_cgroup_move_account(page, 1, pc, > - mc.from, mc.to, false)) { > + mc.from, mc.to)) { > mc.precharge--; > /* we uncharge from mc.from later. */ > mc.moved_charge++; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir Date: Thu, 12 Apr 2012 10:33:29 -0300 Message-ID: <4F86D9A9.9070309@parallels.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BC71.9070403@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86BC71.9070403@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On 04/12/2012 08:28 AM, KAMEZAWA Hiroyuki wrote: > Now, at rmdir, memory cgroup's charge will be moved to > - parent if use_hierarchy=1 > - root if use_hierarchy=0 > > Then, we don't have to have memory reclaim code at destroying memcg. > > This patch divides force_empty to 2 functions as > > - memory_cgroup_recharge() ... try to move all charges to ancestors. > - memory_cgroup_force_empty().. try to reclaim all memory. > > After this patch, memory.force_empty will _not_ move charges to ancestors > but just reclaim all pages. (This meets documenation.) > > rmdir() will not reclaim any memory but moves charge to other cgroup, > parent or root. > > Signed-off-by: KAMEZAWA Hiroyuki Seems fine by me... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx140.postini.com [74.125.245.140]) by kanga.kvack.org (Postfix) with SMTP id 879496B007E for ; Thu, 12 Apr 2012 09:37:26 -0400 (EDT) Message-ID: <4F86DA32.9060701@parallels.com> Date: Thu, 12 Apr 2012 10:35:46 -0300 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH 7/7] memcg: remove drain_all_stock_sync. References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BD18.4010505@jp.fujitsu.com> In-Reply-To: <4F86BD18.4010505@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On 04/12/2012 08:31 AM, KAMEZAWA Hiroyuki wrote: > Because a function moving pages to ancestor works asynchronously now, > drain_all_stock_sync() is unnecessary. > > Signed-off-by: KAMEAZAWA Hiroyuki Reviewed-by: Glauber Costa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frederic Weisbecker Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Thu, 12 Apr 2012 16:30:50 +0200 Message-ID: References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <4F86D733.50809@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=6Ts5R3OwsmBY3sdW243vL41PAab1fSArujvz2DzcD+Q=; b=NA7HyKQUSkP9F2rHDzRXrhxvD96Phl0u+tuWGp+AZhR2ZqLbS+r6761jqVbzpO5SKk EOED7mAX9Yfw20dSsVLPsFhv6rZqOXXv3yTZTkNXgDjy/ddkNLmVbESd1LuHjrbtkO3v v3wXCCXQN86/EwSTxZob3Le4Qvic3o38ybF9GFfKYZIVoKX2ViQnFASVYCCtig25Hz6e m7KZ8P10OhH/uV1pMtaebxv8OPtrLYOGQW6/8Z/1SgxTanIuvuBrWU3FCa9gLYu7Ha8W U1LE50GTgK/jdYe83mUIKxbDClIB6Fh0IvI85BAvJQR167EV/tBbI36aRs6DZ71w7Hpk w5Aw== In-Reply-To: <4F86D733.50809@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Glauber Costa Cc: KAMEZAWA Hiroyuki , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton 2012/4/12 Glauber Costa : > On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: >> >> This function is used for moving accounting information to its >> parent in the hierarchy of res_counter. >> >> Signed-off-by: KAMEZAWA Hiroyuki > > Frederic has a patch in his fork cgroup series, that allows you to > uncharge a counter until you reach a specific ancestor. > You pass the parent as a parameter, and then only you gets uncharged. I'm missing the referring patchset from Kamezawa. Ok I'm going to subscribe to the cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc LKML for cgroup patches? Some comments below: > > I think that is a much better interface than this you are proposing. > We should probably merge that patch and use it. > >> --- >> =A0 include/linux/res_counter.h | =A0 =A03 +++ >> =A0 kernel/res_counter.c =A0 =A0 =A0 =A0| =A0 13 +++++++++++++ >> =A0 2 files changed, 16 insertions(+), 0 deletions(-) >> >> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >> index da81af0..8919d3c 100644 >> --- a/include/linux/res_counter.h >> +++ b/include/linux/res_counter.h >> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct re= s_counter *counter, >> =A0 void res_counter_uncharge_locked(struct res_counter *counter, unsign= ed long val); >> =A0 void res_counter_uncharge(struct res_counter *counter, unsigned long= val); >> >> +/* move resource to parent counter...i.e. just forget accounting in a c= hild */ >> +void res_counter_move_parent(struct res_counter *counter, unsigned long= val); >> + >> =A0 /** >> =A0 =A0* res_counter_margin - calculate chargeable space of a counter >> =A0 =A0* @cnt: the counter >> diff --git a/kernel/res_counter.c b/kernel/res_counter.c >> index d508363..fafebf0 100644 >> --- a/kernel/res_counter.c >> +++ b/kernel/res_counter.c >> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *count= er, unsigned long val) >> =A0 =A0 =A0 local_irq_restore(flags); >> =A0 } >> >> +/* >> + * In hierarchical accounting, child's usage is accounted into ancestor= s. >> + * To move local usage to its parent, just forget current level usage. The way I understand this comment and the changelog matches the opposite of what the below function is doing. The function charges a child and ignore all its parents. The comments says = it charges the parents but not the child. >> + */ >> +void res_counter_move_parent(struct res_counter *counter, unsigned long= val) >> +{ >> + =A0 =A0 unsigned long flags; >> + >> + =A0 =A0 BUG_ON(!counter->parent); >> + =A0 =A0 spin_lock_irqsave(&counter->lock, flags); >> + =A0 =A0 res_counter_uncharge_locked(counter, val); >> + =A0 =A0 spin_unlock_irqrestore(&counter->lock, flags); >> +} >> >> =A0 static inline unsigned long long * >> =A0 res_counter_member(struct res_counter *counter, int member) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx143.postini.com [74.125.245.143]) by kanga.kvack.org (Postfix) with SMTP id 6EE166B0044 for ; Thu, 12 Apr 2012 12:06:48 -0400 (EDT) Received: by dakh32 with SMTP id h32so2927225dak.9 for ; Thu, 12 Apr 2012 09:06:47 -0700 (PDT) Date: Thu, 12 Apr 2012 09:06:42 -0700 From: Tejun Heo Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Message-ID: <20120412160642.GA13069@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, KAMEZAWA. Thanks a lot for doing this. On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: > In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. Just to clarify, I'm not intending to ->pre_destroy() per-se but the retry behavior of it, so ->pre_destroy() will be converted to return void and called once on rmdir and rmdir will proceed no matter what. Also, with the deprecated behavior flag set, pre_destroy() doesn't trigger the warning message. Other than that, if memcg people are fine with the change, I'll be happy to route the changes through cgroup/for-3.5 and stack rmdir simplification patches on top. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx152.postini.com [74.125.245.152]) by kanga.kvack.org (Postfix) with SMTP id 383976B0044 for ; Thu, 12 Apr 2012 14:58:13 -0400 (EDT) Received: from /spool/local by e23smtp02.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 12 Apr 2012 18:40:32 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q3CIpPKC3498218 for ; Fri, 13 Apr 2012 04:51:29 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q3CIvr1J008088 for ; Fri, 13 Apr 2012 04:57:54 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy In-Reply-To: <20120412160642.GA13069@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Fri, 13 Apr 2012 00:27:42 +0530 Message-ID: <877gxksrq1.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo , KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Tejun Heo writes: > Hello, KAMEZAWA. > > Thanks a lot for doing this. > > On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: >> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. > > Just to clarify, I'm not intending to ->pre_destroy() per-se but the > retry behavior of it, so ->pre_destroy() will be converted to return > void and called once on rmdir and rmdir will proceed no matter what. > Also, with the deprecated behavior flag set, pre_destroy() doesn't > trigger the warning message. > > Other than that, if memcg people are fine with the change, I'll be > happy to route the changes through cgroup/for-3.5 and stack rmdir > simplification patches on top. > Any suggestion on how to take HugeTLB memcg extension patches [1] upstream. Current patch series I have is on top of cgroup/for-3.5 because I need cgroup_add_files equivalent and cgroup/for-3.5 have changes around that. So if these memcg patches can also go on top of cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? Can HugeTLB memcg extension patches also go via this tree ? It should actually got via -mm. But then how do we take care of these dependencies ? [1] http://thread.gmane.org/gmane.linux.kernel.cgroups/1517 -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Fri, 13 Apr 2012 08:59:44 +0900 Message-ID: <4F876C70.7060600@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <877gxksrq1.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <877gxksrq1.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: Tejun Heo , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/13 3:57), Aneesh Kumar K.V wrote: > Tejun Heo writes: > >> Hello, KAMEZAWA. >> >> Thanks a lot for doing this. >> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. >> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the >> retry behavior of it, so ->pre_destroy() will be converted to return >> void and called once on rmdir and rmdir will proceed no matter what. >> Also, with the deprecated behavior flag set, pre_destroy() doesn't >> trigger the warning message. >> >> Other than that, if memcg people are fine with the change, I'll be >> happy to route the changes through cgroup/for-3.5 and stack rmdir >> simplification patches on top. >> > > Any suggestion on how to take HugeTLB memcg extension patches [1] > upstream. Current patch series I have is on top of cgroup/for-3.5 > because I need cgroup_add_files equivalent and cgroup/for-3.5 have > changes around that. So if these memcg patches can also go on top of > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? > > Can HugeTLB memcg extension patches also go via this tree ? It > should actually got via -mm. But then how do we take care of these > dependencies ? > I'm not in hurry. To be honest, I cannot update patches until the next Wednesday. So, If changes of cgroup tree you required are included in linux-next. Please post your updated ones. I thought your latest version was near to be merged.... How do you think, Michal ? Please post (and ask Andrew to pull it.) I'll review when I can. I know yours and mine has some conflicts. I think my this series will be onto your series. To do that, I hope your series are merged to linux-next, 1st. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id 6BCC26B00E8 for ; Thu, 12 Apr 2012 20:58:54 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id D6F6B3EE0C2 for ; Fri, 13 Apr 2012 09:58:52 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id BACF845DE53 for ; Fri, 13 Apr 2012 09:58:52 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 9884245DD74 for ; Fri, 13 Apr 2012 09:58:52 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 885AE1DB803F for ; Fri, 13 Apr 2012 09:58:52 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 3529E1DB8038 for ; Fri, 13 Apr 2012 09:58:52 +0900 (JST) Message-ID: <4F8779DF.3080307@jp.fujitsu.com> Date: Fri, 13 Apr 2012 09:57:03 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <4F86D733.50809@parallels.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Frederic Weisbecker Cc: Glauber Costa , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton (2012/04/12 23:30), Frederic Weisbecker wrote: > 2012/4/12 Glauber Costa : >> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: >>> >>> This function is used for moving accounting information to its >>> parent in the hierarchy of res_counter. >>> >>> Signed-off-by: KAMEZAWA Hiroyuki >> >> Frederic has a patch in his fork cgroup series, that allows you to >> uncharge a counter until you reach a specific ancestor. >> You pass the parent as a parameter, and then only you gets uncharged. > > I'm missing the referring patchset from Kamezawa. Ok I'm going to > subscribe to the > cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc > LKML for cgroup patches? > Ah, sorry. I will do next time. > Some comments below: > >> >> I think that is a much better interface than this you are proposing. >> We should probably merge that patch and use it. >> >>> --- >>> include/linux/res_counter.h | 3 +++ >>> kernel/res_counter.c | 13 +++++++++++++ >>> 2 files changed, 16 insertions(+), 0 deletions(-) >>> >>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >>> index da81af0..8919d3c 100644 >>> --- a/include/linux/res_counter.h >>> +++ b/include/linux/res_counter.h >>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, >>> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); >>> void res_counter_uncharge(struct res_counter *counter, unsigned long val); >>> >>> +/* move resource to parent counter...i.e. just forget accounting in a child */ >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); >>> + >>> /** >>> * res_counter_margin - calculate chargeable space of a counter >>> * @cnt: the counter >>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c >>> index d508363..fafebf0 100644 >>> --- a/kernel/res_counter.c >>> +++ b/kernel/res_counter.c >>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) >>> local_irq_restore(flags); >>> } >>> >>> +/* >>> + * In hierarchical accounting, child's usage is accounted into ancestors. >>> + * To move local usage to its parent, just forget current level usage. > > The way I understand this comment and the changelog matches the opposite > of what the below function is doing. > > The function charges a child and ignore all its parents. The comments says it > charges the parents but not the child. > Sure, I'll fix...and look into your code first. "uncharge a counter until you reach a specific ancestor"... Is it in linux-next ? Thanks, -Kame >>> + */ >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) >>> +{ >>> + unsigned long flags; >>> + >>> + BUG_ON(!counter->parent); >>> + spin_lock_irqsave(&counter->lock, flags); >>> + res_counter_uncharge_locked(counter, val); >>> + spin_unlock_irqrestore(&counter->lock, flags); >>> +} >>> >>> static inline unsigned long long * >>> res_counter_member(struct res_counter *counter, int member) >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account() Date: Fri, 13 Apr 2012 10:01:13 +0900 Message-ID: <4F877AD9.6040504@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BB5E.6080509@jp.fujitsu.com> <4F86D84C.1050508@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86D84C.1050508@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton (2012/04/12 22:27), Glauber Costa wrote: > On 04/12/2012 08:24 AM, KAMEZAWA Hiroyuki wrote: >> Only one call passes 'true'. remove it and handle it in caller. >> >> Signed-off-by: KAMEZAWA Hiroyuki > I like the change. I won't ack the patch itself, though, because it has > a dependency with the "need_cancel" thing you introduced in your last > patch - that I need to think a bit more. > I'll check the logic again. Firstly, maybe name of the variable is wrong.. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx161.postini.com [74.125.245.161]) by kanga.kvack.org (Postfix) with SMTP id 83B506B00EB for ; Thu, 12 Apr 2012 21:04:51 -0400 (EDT) Received: by qadb15 with SMTP id b15so5007972qad.9 for ; Thu, 12 Apr 2012 18:04:50 -0700 (PDT) Date: Fri, 13 Apr 2012 03:04:45 +0200 From: Frederic Weisbecker Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Message-ID: <20120413010443.GD12484@somewhere.redhat.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <4F86D733.50809@parallels.com> <4F8779DF.3080307@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F8779DF.3080307@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: Glauber Costa , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On Fri, Apr 13, 2012 at 09:57:03AM +0900, KAMEZAWA Hiroyuki wrote: > (2012/04/12 23:30), Frederic Weisbecker wrote: > > > 2012/4/12 Glauber Costa : > >> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: > >>> > >>> This function is used for moving accounting information to its > >>> parent in the hierarchy of res_counter. > >>> > >>> Signed-off-by: KAMEZAWA Hiroyuki > >> > >> Frederic has a patch in his fork cgroup series, that allows you to > >> uncharge a counter until you reach a specific ancestor. > >> You pass the parent as a parameter, and then only you gets uncharged. > > > > I'm missing the referring patchset from Kamezawa. Ok I'm going to > > subscribe to the > > cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc > > LKML for cgroup patches? > > > > Ah, sorry. I will do next time. > > > Some comments below: > > > >> > >> I think that is a much better interface than this you are proposing. > >> We should probably merge that patch and use it. > >> > >>> --- > >>> include/linux/res_counter.h | 3 +++ > >>> kernel/res_counter.c | 13 +++++++++++++ > >>> 2 files changed, 16 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > >>> index da81af0..8919d3c 100644 > >>> --- a/include/linux/res_counter.h > >>> +++ b/include/linux/res_counter.h > >>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, > >>> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); > >>> void res_counter_uncharge(struct res_counter *counter, unsigned long val); > >>> > >>> +/* move resource to parent counter...i.e. just forget accounting in a child */ > >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); > >>> + > >>> /** > >>> * res_counter_margin - calculate chargeable space of a counter > >>> * @cnt: the counter > >>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c > >>> index d508363..fafebf0 100644 > >>> --- a/kernel/res_counter.c > >>> +++ b/kernel/res_counter.c > >>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) > >>> local_irq_restore(flags); > >>> } > >>> > >>> +/* > >>> + * In hierarchical accounting, child's usage is accounted into ancestors. > >>> + * To move local usage to its parent, just forget current level usage. > > > > The way I understand this comment and the changelog matches the opposite > > of what the below function is doing. > > > > The function charges a child and ignore all its parents. The comments says it > > charges the parents but not the child. > > > > > Sure, I'll fix...and look into your code first. > "uncharge a counter until you reach a specific ancestor"... > Is it in linux-next ? No, it's part of the task counter so it's still out of tree. You were Cc'ed but if you can't find it in your inbox I can resend it: http://thread.gmane.org/gmane.linux.kernel.containers/22378 Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Fri, 13 Apr 2012 10:05:41 +0900 Message-ID: <4F877BE5.7080208@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <4F86D733.50809@parallels.com> <4F8779DF.3080307@jp.fujitsu.com> <20120413010443.GD12484@somewhere.redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120413010443.GD12484@somewhere.redhat.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Frederic Weisbecker Cc: Glauber Costa , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton (2012/04/13 10:04), Frederic Weisbecker wrote: > On Fri, Apr 13, 2012 at 09:57:03AM +0900, KAMEZAWA Hiroyuki wrote: >> (2012/04/12 23:30), Frederic Weisbecker wrote: >> >>> 2012/4/12 Glauber Costa : >>>> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: >>>>> >>>>> This function is used for moving accounting information to its >>>>> parent in the hierarchy of res_counter. >>>>> >>>>> Signed-off-by: KAMEZAWA Hiroyuki >>>> >>>> Frederic has a patch in his fork cgroup series, that allows you to >>>> uncharge a counter until you reach a specific ancestor. >>>> You pass the parent as a parameter, and then only you gets uncharged. >>> >>> I'm missing the referring patchset from Kamezawa. Ok I'm going to >>> subscribe to the >>> cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc >>> LKML for cgroup patches? >>> >> >> Ah, sorry. I will do next time. >> >>> Some comments below: >>> >>>> >>>> I think that is a much better interface than this you are proposing. >>>> We should probably merge that patch and use it. >>>> >>>>> --- >>>>> include/linux/res_counter.h | 3 +++ >>>>> kernel/res_counter.c | 13 +++++++++++++ >>>>> 2 files changed, 16 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >>>>> index da81af0..8919d3c 100644 >>>>> --- a/include/linux/res_counter.h >>>>> +++ b/include/linux/res_counter.h >>>>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, >>>>> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); >>>>> void res_counter_uncharge(struct res_counter *counter, unsigned long val); >>>>> >>>>> +/* move resource to parent counter...i.e. just forget accounting in a child */ >>>>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); >>>>> + >>>>> /** >>>>> * res_counter_margin - calculate chargeable space of a counter >>>>> * @cnt: the counter >>>>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c >>>>> index d508363..fafebf0 100644 >>>>> --- a/kernel/res_counter.c >>>>> +++ b/kernel/res_counter.c >>>>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) >>>>> local_irq_restore(flags); >>>>> } >>>>> >>>>> +/* >>>>> + * In hierarchical accounting, child's usage is accounted into ancestors. >>>>> + * To move local usage to its parent, just forget current level usage. >>> >>> The way I understand this comment and the changelog matches the opposite >>> of what the below function is doing. >>> >>> The function charges a child and ignore all its parents. The comments says it >>> charges the parents but not the child. >>> >> >> >> Sure, I'll fix...and look into your code first. >> "uncharge a counter until you reach a specific ancestor"... >> Is it in linux-next ? > > No, it's part of the task counter so it's still out of tree. > You were Cc'ed but if you can't find it in your inbox I can > resend it: > > http://thread.gmane.org/gmane.linux.kernel.containers/22378 > Ah, ok. I found it. Thank you!. -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx150.postini.com [74.125.245.150]) by kanga.kvack.org (Postfix) with SMTP id F1C8C6B00F5 for ; Fri, 13 Apr 2012 04:50:22 -0400 (EDT) Date: Fri, 13 Apr 2012 10:50:14 +0200 From: Michal Hocko Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Message-ID: <20120413085014.GA9205@tiehlicka.suse.cz> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com> <877gxksrq1.fsf@skywalker.in.ibm.com> <4F876C70.7060600@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F876C70.7060600@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "Aneesh Kumar K.V" , Tejun Heo , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote: > (2012/04/13 3:57), Aneesh Kumar K.V wrote: > > > Tejun Heo writes: > > > >> Hello, KAMEZAWA. > >> > >> Thanks a lot for doing this. > >> > >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: > >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. > >> > >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the > >> retry behavior of it, so ->pre_destroy() will be converted to return > >> void and called once on rmdir and rmdir will proceed no matter what. > >> Also, with the deprecated behavior flag set, pre_destroy() doesn't > >> trigger the warning message. > >> > >> Other than that, if memcg people are fine with the change, I'll be > >> happy to route the changes through cgroup/for-3.5 and stack rmdir > >> simplification patches on top. > >> > > > > Any suggestion on how to take HugeTLB memcg extension patches [1] > > upstream. Current patch series I have is on top of cgroup/for-3.5 > > because I need cgroup_add_files equivalent and cgroup/for-3.5 have > > changes around that. So if these memcg patches can also go on top of > > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? I would suggest working on top of memcg-devel tree or on top linux-next. Just pull the required patch-es from cgroup/for-3.5 tree before your work (I can include that into memcg-devel tree for you if you want). Do you think this is a 3.5 material? I would rather wait some more. I didn't have time to look over it yet and there are still some unresolved issues so it sounds like too early for merging. > > Can HugeTLB memcg extension patches also go via this tree ? It > > should actually got via -mm. But then how do we take care of these > > dependencies ? You are not changing anything generic from cgroup so definitely go via Andrew. > I'm not in hurry. To be honest, I cannot update patches until the next Wednesday. > So, If changes of cgroup tree you required are included in linux-next. Please post > your updated ones. I thought your latest version was near to be merged.... > > How do you think, Michal ? > Please post (and ask Andrew to pull it.) I'll review when I can. I would wait with pulling the patch after the review. > I know yours and mine has some conflicts. I think my this series will > be onto your series. To do that, I hope your series are merged to > linux-next, 1st. > > > Thanks, > -Kame > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx108.postini.com [74.125.245.108]) by kanga.kvack.org (Postfix) with SMTP id 551C46B004A for ; Fri, 13 Apr 2012 18:19:30 -0400 (EDT) Received: from /spool/local by e28smtp01.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 14 Apr 2012 03:49:24 +0530 Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q3DMJMJO1908940 for ; Sat, 14 Apr 2012 03:49:22 +0530 Received: from d28av04.in.ibm.com (loopback [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q3E3nARY015936 for ; Sat, 14 Apr 2012 13:49:10 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy In-Reply-To: <20120413085014.GA9205@tiehlicka.suse.cz> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com> <877gxksrq1.fsf@skywalker.in.ibm.com> <4F876C70.7060600@jp.fujitsu.com> <20120413085014.GA9205@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Sat, 14 Apr 2012 03:49:15 +0530 Message-ID: <87vcl3gtr0.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko , KAMEZAWA Hiroyuki Cc: Tejun Heo , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Michal Hocko writes: > On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote: >> (2012/04/13 3:57), Aneesh Kumar K.V wrote: >> >> > Tejun Heo writes: >> > >> >> Hello, KAMEZAWA. >> >> >> >> Thanks a lot for doing this. >> >> >> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: >> >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >> >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. >> >> >> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the >> >> retry behavior of it, so ->pre_destroy() will be converted to return >> >> void and called once on rmdir and rmdir will proceed no matter what. >> >> Also, with the deprecated behavior flag set, pre_destroy() doesn't >> >> trigger the warning message. >> >> >> >> Other than that, if memcg people are fine with the change, I'll be >> >> happy to route the changes through cgroup/for-3.5 and stack rmdir >> >> simplification patches on top. >> >> >> > >> > Any suggestion on how to take HugeTLB memcg extension patches [1] >> > upstream. Current patch series I have is on top of cgroup/for-3.5 >> > because I need cgroup_add_files equivalent and cgroup/for-3.5 have >> > changes around that. So if these memcg patches can also go on top of >> > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? > > I would suggest working on top of memcg-devel tree or on top linux-next. > Just pull the required patch-es from cgroup/for-3.5 tree before your > work (I can include that into memcg-devel tree for you if you want). I am expecting to have no conflicts with pending memcg changes. But I do have conflicts with cgroup/for-3.5. That is the reason I decided to rebase on top of cgroup/for-3.5. > > Do you think this is a 3.5 material? I would rather wait some more. I > didn't have time to look over it yet and there are still some unresolved > issues so it sounds like too early for merging. I would really like to get it merged for 3.5. I am ready to post V6 that address all review feedback from V5 post. > >> > Can HugeTLB memcg extension patches also go via this tree ? It >> > should actually got via -mm. But then how do we take care of these >> > dependencies ? > > You are not changing anything generic from cgroup so definitely go via > Andrew. > agreed. >> I'm not in hurry. To be honest, I cannot update patches until the next Wednesday. >> So, If changes of cgroup tree you required are included in linux-next. Please post >> your updated ones. I thought your latest version was near to be merged.... >> >> How do you think, Michal ? >> Please post (and ask Andrew to pull it.) I'll review when I can. > > I would wait with pulling the patch after the review. > agreed. So I will do a v6 post and if we all agree with the changes it can be pulled via -mm ? >> I know yours and mine has some conflicts. I think my this series will >> be onto your series. To do that, I hope your series are merged to >> linux-next, 1st. >> -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx187.postini.com [74.125.245.187]) by kanga.kvack.org (Postfix) with SMTP id 8F7F56B0044 for ; Mon, 16 Apr 2012 18:19:36 -0400 (EDT) Received: by pbcup15 with SMTP id up15so8627464pbc.14 for ; Mon, 16 Apr 2012 15:19:35 -0700 (PDT) Date: Mon, 16 Apr 2012 15:19:24 -0700 From: Tejun Heo Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Message-ID: <20120416221924.GB12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F86BA66.2010503@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: > > This function is used for moving accounting information to its > parent in the hierarchy of res_counter. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > include/linux/res_counter.h | 3 +++ > kernel/res_counter.c | 13 +++++++++++++ > 2 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index da81af0..8919d3c 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, > void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); > void res_counter_uncharge(struct res_counter *counter, unsigned long val); > > +/* move resource to parent counter...i.e. just forget accounting in a child */ Can we drop this comment and > +void res_counter_move_parent(struct res_counter *counter, unsigned long val); > > +/* > + * In hierarchical accounting, child's usage is accounted into ancestors. > + * To move local usage to its parent, just forget current level usage. > + */ make this one proper docbook function comment? > +void res_counter_move_parent(struct res_counter *counter, unsigned long val) > +{ > + unsigned long flags; > + > + BUG_ON(!counter->parent); And let's please do "if (WARN_ON(!counter->parent)) return;" instead. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id EBFE16B0044 for ; Mon, 16 Apr 2012 18:21:24 -0400 (EDT) Received: by pbcup15 with SMTP id up15so8629208pbc.14 for ; Mon, 16 Apr 2012 15:21:24 -0700 (PDT) Date: Mon, 16 Apr 2012 15:21:19 -0700 From: Tejun Heo Subject: Re: [PATCH 2/7] memcg: move charge to parent only when necessary. Message-ID: <20120416222119.GC12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BAB0.5030809@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F86BAB0.5030809@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote: > > When memcg->use_hierarchy==true, the parent res_counter includes > the usage in child's usage. So, it's not necessary to call try_charge() > in the parent. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > mm/memcontrol.c | 39 ++++++++++++++++++++++++++++++++------- > 1 files changed, 32 insertions(+), 7 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index fa01106..3215880 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, > res_counter_uncharge(&memcg->memsw, bytes); > } > } New line missing here. > +/* > + * Moving usage between a child to its parent if use_hierarchy==true. > + */ Prolly "from a child to its parent"? Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 9BA206B0044 for ; Mon, 16 Apr 2012 18:30:17 -0400 (EDT) Received: by dakh32 with SMTP id h32so8031412dak.9 for ; Mon, 16 Apr 2012 15:30:16 -0700 (PDT) Date: Mon, 16 Apr 2012 15:30:12 -0700 From: Tejun Heo Subject: Re: [PATCH 3/7] memcg: move charges to root at rmdir() Message-ID: <20120416223012.GD12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BB02.2060607@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F86BB02.2060607@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, On Thu, Apr 12, 2012 at 08:22:42PM +0900, KAMEZAWA Hiroyuki wrote: > As recently discussed, Tejun Heo, the cgroup maintainer, tries to > remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir(). I'm not trying to remove ->pre_destory() per-se. I want to remove css ref draining and ->pre_destroy() vetoing cgroup removal. Probably better wording would be "tries to simplify removal path such that removal always succeeds". > To do that, in memcg, handling case of use_hierarchy==false is a problem. > > We move memcg's charges to its parent at rmdir(). If use_hierarchy==true, > it's already accounted in the parent, no problem. If use_hierarchy==false, > we cannot guarantee we can move all charges to the parent. > > This patch changes the behavior to move all charges to root_mem_cgroup > if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship. Maybe better to break the above line? Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Mon, 16 Apr 2012 15:31:57 -0700 Message-ID: <20120416223157.GE12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=0Hxl/+zhLIMBEmAa82UdAsqafi7k/bQLU74Iay5n+kI=; b=JakGC2vxblskG/Hx7KSWvGZZYQpzIZVxTaarNGckwfJ3ai4l4zL10L7oe0PNvEzOSG gndCHCpEWOhX4wtSBvRS4czu/j4kEtkJ0LTdk4VAPp+8ZrwNZpSDHbFYS+3wqppGyXKH fxKfSWYqnIPavmMdwspJlJzXF0mMFQzdvm7KjSrljvUfDu0ELs5xz57F93taGaylWKUp fiC0VZQR+uHCStHOiecv7ik3U2AteorAFXcKqXkBzHvC/qpb7/9Dv0LbGn/bZsXTWwYs Y/g6wIO7RujyhxREyyY+YsxXus18fMBVKy/kaj/Fp9xmXUlvkxWWpsVbtsZXOh+VHaOw Zk3A== Content-Disposition: inline In-Reply-To: <4F86BA66.2010503@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: > +/* > + * In hierarchical accounting, child's usage is accounted into ancestors. > + * To move local usage to its parent, just forget current level usage. > + */ > +void res_counter_move_parent(struct res_counter *counter, unsigned long val) > +{ > + unsigned long flags; > + > + BUG_ON(!counter->parent); > + spin_lock_irqsave(&counter->lock, flags); > + res_counter_uncharge_locked(counter, val); > + spin_unlock_irqrestore(&counter->lock, flags); > +} On the second thought, do we need this at all? It's as good as doing nothing after all, no? Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH 6/7] memcg: remove pre_destroy() Date: Mon, 16 Apr 2012 15:38:00 -0700 Message-ID: <20120416223800.GF12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BCCE.5050802@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=OYcHceiCZPeIUNSvQlbOzS0lhtIyRGH5URgS0tjye2k=; b=uhHWRnp67C2gu2f3WXtInLBnQRbYUj+p2Em1NjvX4GJHrEAJvRhp4uIUm/SNtnvOUq AmMTn4SxBYC8lXXjSlI5vRElxXninwr2xzc3ttXQjIIqxAV2RHL9o8vZXBJCQcw+0q4m xHTtUKApQHeVe+rGAhbPeM50UaacmNNBz4dYuWKZ4DTnXElMolDm5Ap99q/dU2aJlcy5 EcPT0eB4Ue/xgwKcz6ipJ/oaChAE1vHqHSSJwCw72gYmIErq+mYcd9cyKsf/rksjPrRf DZq0wzsyFCIcSPzPjzA/cgJR6+90soMGMzTLBFEHiEZMuc+WgtlWFYk2fNgQeTAj8fyt FT0A== Content-Disposition: inline In-Reply-To: <4F86BCCE.5050802@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, On Thu, Apr 12, 2012 at 08:30:22PM +0900, KAMEZAWA Hiroyuki wrote: > +/* > + * This function is called after ->destroy(). So, we cannot access cgroup > + * of this memcg. > + */ > +static void mem_cgroup_recharge(struct work_struct *work) So, ->pre_destroy per-se isn't gonna go away. It's just gonna be this callback which cgroup core uses to unilaterally notify that the cgroup is going away, so no need to do this cleanup asynchronously from ->destroy(). It's okay to keep doing it synchronously from ->pre_destroy(). The only thing is that it can't fail. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx146.postini.com [74.125.245.146]) by kanga.kvack.org (Postfix) with SMTP id EE4F86B0044 for ; Mon, 16 Apr 2012 18:41:23 -0400 (EDT) Received: by dakh32 with SMTP id h32so8041262dak.9 for ; Mon, 16 Apr 2012 15:41:23 -0700 (PDT) Date: Mon, 16 Apr 2012 15:41:18 -0700 From: Tejun Heo Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Message-ID: <20120416224118.GG12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, KAMEZAWA. On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: > In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. I did a pretty shallow review of the patchset and other than the unnecessary async destruction, my complaints are mostly trivial. Ooh, and the patchset doesn't apply cleanly on top of cgroup/for-3.5. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.5 Thank you! -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ying Han Subject: Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir Date: Tue, 17 Apr 2012 10:29:55 -0700 Message-ID: References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BC71.9070403@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-system-of-record; bh=IFNcYoTqOw2Qb5QYZ/FaRwrN0E5DYDBq9cDJme2qR/4=; b=CUiFjSc0DTRzuUaH1tUBzql9f0oHmggzmcXvW8t/drv+v6cLnB1G0duvJ1SNqCcJZg u18XTy93iLuIO8u9BnhYq+JelTewILF6obkKxROQGbFenmj1KWxtDumXaSaE2VQTxv9A AvN+N3O/GAGWgXLIdMEQcy7L1yTllOPjkEwJpfIOYAb3uw7cUNwMRjKPi2MsD6XOD4zD w5AEWbPra0OpugRy376mzTIp0qw4qeK01c/50wHMk0flanQLqWXo7GxbcD+z/qeJf2++ OENToKAssKRI2bgw6oIhcNGlO+0vSQRBdIRaA25L4i9m23CydcuCGUToLQC7wi/zSyV4 M9Gw== In-Reply-To: <4F86BC71.9070403@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 4:28 AM, KAMEZAWA Hiroyuki wrote: > Now, at rmdir, memory cgroup's charge will be moved to > =A0- parent if use_hierarchy=3D1 > =A0- root =A0 if use_hierarchy=3D0 > > Then, we don't have to have memory reclaim code at destroying memcg. > > This patch divides force_empty to 2 functions as > > =A0- memory_cgroup_recharge() ... try to move all charges to ancestors. > =A0- memory_cgroup_force_empty().. try to reclaim all memory. > > After this patch, memory.force_empty will _not_ move charges to ancestors > but just reclaim all pages. (This meets documenation.) Not sure why it matches the documentation: " memory.force_empty>---->------- # trigger forced move charge to parent " and " # echo 0 > memory.force_empty Almost all pages tracked by this memory cgroup will be unmapped and freed= . Some pages cannot be freed because they are locked or in-use. Such pages = are moved to parent and this cgroup will be empty. This may return -EBUSY if VM is too busy to free/move all pages immediately. " --Ying > > rmdir() will not reclaim any memory but moves charge to other cgroup, > parent or root. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > =A0mm/memcontrol.c | =A0 59 +++++++++++++++++++++++++++------------------= ---------- > =A01 files changed, 29 insertions(+), 30 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 9ac7984..22c8faa 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3619,10 +3619,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct= zone *zone, int order, > =A0} > > =A0/* > - * This routine traverse page_cgroup in given list and drop them all. > - * *And* this routine doesn't reclaim page itself, just removes page_cgr= oup. > + * This routine traverse page in given list and move them all. > =A0*/ > -static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, > +static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int node, = int zid, enum lru_list lru) > =A0{ > =A0 =A0 =A0 =A0struct mem_cgroup_per_zone *mz; > @@ -3678,24 +3677,12 @@ static int mem_cgroup_force_empty_list(struct mem= _cgroup *memcg, > =A0 =A0 =A0 =A0return ret; > =A0} > > -/* > - * make mem_cgroup's charge to be 0 if there is no task. > - * This enables deleting this mem_cgroup. > - */ > -static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_al= l) > + > +static int mem_cgroup_recharge(struct mem_cgroup *memcg) > =A0{ > - =A0 =A0 =A0 int ret; > - =A0 =A0 =A0 int node, zid, shrink; > - =A0 =A0 =A0 int nr_retries =3D MEM_CGROUP_RECLAIM_RETRIES; > + =A0 =A0 =A0 int ret, node, zid; > =A0 =A0 =A0 =A0struct cgroup *cgrp =3D memcg->css.cgroup; > > - =A0 =A0 =A0 css_get(&memcg->css); > - > - =A0 =A0 =A0 shrink =3D 0; > - =A0 =A0 =A0 /* should free all ? */ > - =A0 =A0 =A0 if (free_all) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto try_to_free; > -move_account: > =A0 =A0 =A0 =A0do { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D -EBUSY; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (cgroup_task_count(cgrp) || !list_empty= (&cgrp->children)) > @@ -3712,7 +3699,7 @@ move_account: > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for (zid =3D 0; !ret && zi= d < MAX_NR_ZONES; zid++) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0enum lru_l= ist lru; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for_each_l= ru(lru) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ret =3D mem_cgroup_force_empty_list(memcg, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ret =3D mem_cgroup_recharge_lru(memcg, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0node, zid, lru); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0if (ret) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0break; > @@ -3722,24 +3709,33 @@ move_account: > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mem_cgroup_end_move(memcg); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 memcg_oom_recover(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cond_resched(); > =A0 =A0 =A0 =A0/* "ret" should also be checked to ensure all lists are em= pty. */ > =A0 =A0 =A0 =A0} while (memcg->res.usage > 0 || ret); > =A0out: > - =A0 =A0 =A0 css_put(&memcg->css); > =A0 =A0 =A0 =A0return ret; > +} > + > + > +/* > + * make mem_cgroup's charge to be 0 if there is no task. This is only ca= lled > + * by memory.force_empty file, an user request. > + */ > +static int mem_cgroup_force_empty(struct mem_cgroup *memcg) > +{ > + =A0 =A0 =A0 int ret =3D 0; > + =A0 =A0 =A0 int nr_retries =3D MEM_CGROUP_RECLAIM_RETRIES; > + =A0 =A0 =A0 struct cgroup *cgrp =3D memcg->css.cgroup; > + > + =A0 =A0 =A0 css_get(&memcg->css); > > -try_to_free: > =A0 =A0 =A0 =A0/* returns EBUSY if there is a task or if we come here twi= ce. */ > - =A0 =A0 =A0 if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)= || shrink) { > + =A0 =A0 =A0 if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)= ) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D -EBUSY; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out; > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0/* we call try-to-free pages for make this cgroup empty */ > =A0 =A0 =A0 =A0lru_add_drain_all(); > - =A0 =A0 =A0 /* try to free all pages in this cgroup */ > - =A0 =A0 =A0 shrink =3D 1; > =A0 =A0 =A0 =A0while (nr_retries && memcg->res.usage > 0) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int progress; > > @@ -3754,16 +3750,19 @@ try_to_free: > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* maybe some writeback is= necessary */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0congestion_wait(BLK_RW_ASY= NC, HZ/10); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > - > =A0 =A0 =A0 =A0} > - =A0 =A0 =A0 lru_add_drain(); > + =A0 =A0 =A0 if (!nr_retries) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -ENOMEM; > +out: > + =A0 =A0 =A0 memcg_oom_recover(memcg); > + =A0 =A0 =A0 css_put(&memcg->css); > =A0 =A0 =A0 =A0/* try move_account...there may be some *locked* pages. */ > - =A0 =A0 =A0 goto move_account; > + =A0 =A0 =A0 return ret; > =A0} > > =A0int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int eve= nt) > =A0{ > - =A0 =A0 =A0 return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), t= rue); > + =A0 =A0 =A0 return mem_cgroup_force_empty(mem_cgroup_from_cont(cont)); > =A0} > > > @@ -4987,7 +4986,7 @@ static int mem_cgroup_pre_destroy(struct cgroup *co= nt) > =A0{ > =A0 =A0 =A0 =A0struct mem_cgroup *memcg =3D mem_cgroup_from_cont(cont); > > - =A0 =A0 =A0 return mem_cgroup_force_empty(memcg, false); > + =A0 =A0 =A0 return mem_cgroup_recharge(memcg); > =A0} > > =A0static void mem_cgroup_destroy(struct cgroup *cont) > -- > 1.7.4.1 > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter= .ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ying Han Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Tue, 17 Apr 2012 10:35:22 -0700 Message-ID: References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-system-of-record; bh=O6o0BNffhriaRWwuz7GcK/N6VTohWlW27TsucGP8Y5I=; b=f/+vjWr/cIRWTKVZWpSxY3u90V9gA70b8t5jxo4LaOZJwQ0RWQ013yoc6i61M23dgU iy0AjAarFBIm2XGdmZMhhWvQDva1NZQxYMpA5bBdif4Io7SXaKIiGt4LPz0sbqT0bnU8 Uv+axExekg0M9i5hbNt2VWRQTz6A7WJjMwcxaz5tggckN32gXP/TyoYxh7EvJP5huggF Ns82Qem+EtIXE1uFMpgoWa+c96A12KEmE0qoxUdA6hxPTt5sFX0ZsF/mVr5dQmHdDsf7 LMUK5Kn0Z9OLgS1tJJXYDsHsknJJfXLXR0tDSjh29VKbTPbO4pGRmfSpxZPsrCXlQBL7 fO5w== In-Reply-To: <4F86B9BE.8000105@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 4:17 AM, KAMEZAWA Hiroyuki wrote: > In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WAR= NING. > > By pre_destroy(), rmdir of cgroup can return -EBUSY or some error. > It makes cgroup complicated and unstable. I said O.K. to remove it and > this patch is modification for memcg. > > One of problem in current implementation is that memcg moves all charges = to > parent in pre_destroy(). At doing so, if use_hierarchy=3D0, pre_destroy()= may > hit parent's limit and may return -EBUSY. To fix this problem, this patch > changes behavior of rmdir() as > > =A0- if use_hierarchy=3D0, all remaining charges will go to root cgroup. > =A0- if use_hierarchy=3D1, all remaining charges will go to the parent. We need to update the "4.3 Removing a cgroup" session in Documentation. --Ying > By this, rmdir failure will not be caused by parent's limitation. And > I think this meets meaning of use_hierarchy. > > This series does > =A0- add above change of behavior > =A0- use workqueue to move all pages to parent > =A0- remove unnecessary codes. > > I'm sorry if my reply is delayed, I'm not sure I can have enough time in > this weekend. Any comments are welcomed. > > Thanks, > -Kame > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter= .ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx153.postini.com [74.125.245.153]) by kanga.kvack.org (Postfix) with SMTP id 6FB386B004A for ; Tue, 17 Apr 2012 13:47:17 -0400 (EDT) Received: by lbbgp10 with SMTP id gp10so3260099lbb.14 for ; Tue, 17 Apr 2012 10:47:15 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4F86BCCE.5050802@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BCCE.5050802@jp.fujitsu.com> Date: Tue, 17 Apr 2012 10:47:13 -0700 Message-ID: Subject: Re: [PATCH 6/7] memcg: remove pre_destroy() From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 4:30 AM, KAMEZAWA Hiroyuki wrote: > Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to > prevent rmdir() from failure of EBUSY or some. > > This patch removes pre_destroy() in memcg. All remaining charges > will be moved to other cgroup, without any failure, =A0->destroy() > just schedule a work and it will destroy the memcg. > Then, rmdir will never fail. The kernel will take care of remaining > resources in the cgroup to be accounted correctly. > > After this patch, memcg will be destroyed by workqueue in asynchrnous way= . Is it necessary to change the destroy asynchronously? Frankly, i don't that that much. It will leave the system in a deterministic state on admin perspective. The current synchronous destroy works fine, and admin can rely on that w/ charging change after the destroy returns. --Ying > Then, we can modify 'moving' logic to work asynchrnously, i.e, > we don't force users to wait for the end of rmdir(), now. We don't > need to use heavy synchronous calls. This patch modifies logics as > > =A0- Use mem_cgroup_drain_stock_async rather tan drain_stock_sync. > =A0- lru_add_drain_all() will be called only when necessary, in a lazy wa= y. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > =A0mm/memcontrol.c | =A0 52 ++++++++++++++++++++++-----------------------= ------- > =A01 files changed, 22 insertions(+), 30 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 22c8faa..e466809 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -315,6 +315,8 @@ struct mem_cgroup { > =A0#ifdef CONFIG_INET > =A0 =A0 =A0 =A0struct tcp_memcontrol tcp_mem; > =A0#endif > + > + =A0 =A0 =A0 struct work_struct work_destroy; > =A0}; > > =A0/* Stuffs for move charges at task migration. */ > @@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup= *root_memcg) > =A0 =A0 =A0 =A0mutex_unlock(&percpu_charge_mutex); > =A0} > > -/* This is a synchronous drain interface. */ > =A0static void drain_all_stock_sync(struct mem_cgroup *root_memcg) > =A0{ > =A0 =A0 =A0 =A0/* called when force_empty is called */ > @@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgro= up *memcg, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pc =3D lookup_page_cgroup(page); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D mem_cgroup_move_parent(page, pc, m= emcg, GFP_KERNEL); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret =3D=3D -EINTR) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret =3D=3D -EBUSY || ret =3D=3D -EINVAL= ) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 VM_BUG_ON(ret !=3D 0 && ret !=3D -EBUSY); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* found lock contention o= r "pc" is obsolete. */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0busy =3D page; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cond_resched(); > @@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgr= oup *memcg, > =A0 =A0 =A0 =A0return ret; > =A0} > > - > -static int mem_cgroup_recharge(struct mem_cgroup *memcg) > +/* > + * This function is called after ->destroy(). So, we cannot access cgrou= p > + * of this memcg. > + */ > +static void mem_cgroup_recharge(struct work_struct *work) > =A0{ > + =A0 =A0 =A0 struct mem_cgroup *memcg; > =A0 =A0 =A0 =A0int ret, node, zid; > - =A0 =A0 =A0 struct cgroup *cgrp =3D memcg->css.cgroup; > > + =A0 =A0 =A0 memcg =3D container_of(work, struct mem_cgroup, work_destro= y); > + =A0 =A0 =A0 /* No task points this memcg. call this only once */ > + =A0 =A0 =A0 drain_all_stock_async(memcg); > =A0 =A0 =A0 =A0do { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -EBUSY; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (cgroup_task_count(cgrp) || !list_empty(= &cgrp->children)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -EINTR; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (signal_pending(current)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* This is for making all *used* pages to b= e on LRU. */ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 lru_add_drain_all(); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 drain_all_stock_sync(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D 0; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mem_cgroup_start_move(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for_each_node_state(node, N_HIGH_MEMORY) { > @@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup = *memcg) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mem_cgroup_end_move(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cond_resched(); > - =A0 =A0 =A0 /* "ret" should also be checked to ensure all lists are emp= ty. */ > - =A0 =A0 =A0 } while (memcg->res.usage > 0 || ret); > -out: > - =A0 =A0 =A0 return ret; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* drain LRU only when we canoot find pages= on LRU */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (res_counter_read_u64(&memcg->res, RES_U= SAGE) && > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 !mem_cgroup_nr_lru_pages(memcg, LRU= _ALL)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lru_add_drain_all(); > + =A0 =A0 =A0 } while (res_counter_read_u64(&memcg->res, RES_USAGE) || re= t); > + =A0 =A0 =A0 mem_cgroup_put(memcg); > =A0} > > - > =A0/* > =A0* make mem_cgroup's charge to be 0 if there is no task. This is only c= alled > =A0* by memory.force_empty file, an user request. > @@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work) > =A0 =A0 =A0 =A0memcg =3D container_of(work, struct mem_cgroup, work_freei= ng); > =A0 =A0 =A0 =A0vfree(memcg); > =A0} > + > =A0static void vfree_rcu(struct rcu_head *rcu_head) > =A0{ > =A0 =A0 =A0 =A0struct mem_cgroup *memcg; > @@ -4982,20 +4981,14 @@ free_out: > =A0 =A0 =A0 =A0return ERR_PTR(error); > =A0} > > -static int mem_cgroup_pre_destroy(struct cgroup *cont) > -{ > - =A0 =A0 =A0 struct mem_cgroup *memcg =3D mem_cgroup_from_cont(cont); > - > - =A0 =A0 =A0 return mem_cgroup_recharge(memcg); > -} > - > =A0static void mem_cgroup_destroy(struct cgroup *cont) > =A0{ > =A0 =A0 =A0 =A0struct mem_cgroup *memcg =3D mem_cgroup_from_cont(cont); > > =A0 =A0 =A0 =A0kmem_cgroup_destroy(cont); > > - =A0 =A0 =A0 mem_cgroup_put(memcg); > + =A0 =A0 =A0 INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge); > + =A0 =A0 =A0 schedule_work(&memcg->work_destroy); > =A0} > > =A0static int mem_cgroup_populate(struct cgroup_subsys *ss, > @@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys =3D { > =A0 =A0 =A0 =A0.name =3D "memory", > =A0 =A0 =A0 =A0.subsys_id =3D mem_cgroup_subsys_id, > =A0 =A0 =A0 =A0.create =3D mem_cgroup_create, > - =A0 =A0 =A0 .pre_destroy =3D mem_cgroup_pre_destroy, > =A0 =A0 =A0 =A0.destroy =3D mem_cgroup_destroy, > =A0 =A0 =A0 =A0.populate =3D mem_cgroup_populate, > =A0 =A0 =A0 =A0.can_attach =3D mem_cgroup_can_attach, > -- > 1.7.4.1 > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter= .ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx124.postini.com [74.125.245.124]) by kanga.kvack.org (Postfix) with SMTP id 04C556B0083 for ; Wed, 18 Apr 2012 03:02:05 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 0F3ED3EE0BD for ; Wed, 18 Apr 2012 16:02:04 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id E4B1F45DEB5 for ; Wed, 18 Apr 2012 16:02:03 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id B4B6145DEAD for ; Wed, 18 Apr 2012 16:02:03 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id A64D11DB8044 for ; Wed, 18 Apr 2012 16:02:03 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 5C4711DB803C for ; Wed, 18 Apr 2012 16:02:03 +0900 (JST) Message-ID: <4F8E665B.7050302@jp.fujitsu.com> Date: Wed, 18 Apr 2012 15:59:39 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <20120416221924.GB12421@google.com> In-Reply-To: <20120416221924.GB12421@google.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:19), Tejun Heo wrote: > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: >> >> This function is used for moving accounting information to its >> parent in the hierarchy of res_counter. >> >> Signed-off-by: KAMEZAWA Hiroyuki >> --- >> include/linux/res_counter.h | 3 +++ >> kernel/res_counter.c | 13 +++++++++++++ >> 2 files changed, 16 insertions(+), 0 deletions(-) >> >> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >> index da81af0..8919d3c 100644 >> --- a/include/linux/res_counter.h >> +++ b/include/linux/res_counter.h >> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, >> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); >> void res_counter_uncharge(struct res_counter *counter, unsigned long val); >> >> +/* move resource to parent counter...i.e. just forget accounting in a child */ > > Can we drop this comment and > >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); >> >> +/* >> + * In hierarchical accounting, child's usage is accounted into ancestors. >> + * To move local usage to its parent, just forget current level usage. >> + */ > > make this one proper docbook function comment? > Sure. (I'll use Frederic's one.) Thanks, -Kame >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) >> +{ >> + unsigned long flags; >> + >> + BUG_ON(!counter->parent); > > And let's please do "if (WARN_ON(!counter->parent)) return;" instead. > > Thanks. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 2B9D66B0083 for ; Wed, 18 Apr 2012 03:03:16 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 4249F3EE0C2 for ; Wed, 18 Apr 2012 16:03:14 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 1EF2D45DE56 for ; Wed, 18 Apr 2012 16:03:14 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id EAF7545DE4D for ; Wed, 18 Apr 2012 16:03:13 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id D0F89E08008 for ; Wed, 18 Apr 2012 16:03:13 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.240.81.146]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 8C335E08005 for ; Wed, 18 Apr 2012 16:03:13 +0900 (JST) Message-ID: <4F8E66B7.3050602@jp.fujitsu.com> Date: Wed, 18 Apr 2012 16:01:11 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 2/7] memcg: move charge to parent only when necessary. References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BAB0.5030809@jp.fujitsu.com> <20120416222119.GC12421@google.com> In-Reply-To: <20120416222119.GC12421@google.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:21), Tejun Heo wrote: > On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote: >> >> When memcg->use_hierarchy==true, the parent res_counter includes >> the usage in child's usage. So, it's not necessary to call try_charge() >> in the parent. >> >> Signed-off-by: KAMEZAWA Hiroyuki >> --- >> mm/memcontrol.c | 39 ++++++++++++++++++++++++++++++++------- >> 1 files changed, 32 insertions(+), 7 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index fa01106..3215880 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, >> res_counter_uncharge(&memcg->memsw, bytes); >> } >> } > > New line missing here. > >> +/* >> + * Moving usage between a child to its parent if use_hierarchy==true. >> + */ > > Prolly "from a child to its parent"? > Sure. will fix. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 3/7] memcg: move charges to root at rmdir() Date: Wed, 18 Apr 2012 16:02:27 +0900 Message-ID: <4F8E6703.70101@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BB02.2060607@jp.fujitsu.com> <20120416223012.GD12421@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120416223012.GD12421@google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:30), Tejun Heo wrote: > Hello, > > On Thu, Apr 12, 2012 at 08:22:42PM +0900, KAMEZAWA Hiroyuki wrote: >> As recently discussed, Tejun Heo, the cgroup maintainer, tries to >> remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir(). > > I'm not trying to remove ->pre_destory() per-se. I want to remove css > ref draining and ->pre_destroy() vetoing cgroup removal. Probably > better wording would be "tries to simplify removal path such that > removal always succeeds". > Ok. >> To do that, in memcg, handling case of use_hierarchy==false is a problem. >> >> We move memcg's charges to its parent at rmdir(). If use_hierarchy==true, >> it's already accounted in the parent, no problem. If use_hierarchy==false, >> we cannot guarantee we can move all charges to the parent. >> >> This patch changes the behavior to move all charges to root_mem_cgroup >> if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship. > > Maybe better to break the above line? > yes, I'll fix it. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id 56BDE6B0083 for ; Wed, 18 Apr 2012 03:06:33 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id EC9C43EE0C0 for ; Wed, 18 Apr 2012 16:06:31 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id D344345DE50 for ; Wed, 18 Apr 2012 16:06:31 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id BBA8645DE4D for ; Wed, 18 Apr 2012 16:06:31 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id AA3F7E08007 for ; Wed, 18 Apr 2012 16:06:31 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.240.81.146]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 5AB69E08003 for ; Wed, 18 Apr 2012 16:06:31 +0900 (JST) Message-ID: <4F8E678A.8000805@jp.fujitsu.com> Date: Wed, 18 Apr 2012 16:04:42 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <20120416223157.GE12421@google.com> In-Reply-To: <20120416223157.GE12421@google.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:31), Tejun Heo wrote: > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: >> +/* >> + * In hierarchical accounting, child's usage is accounted into ancestors. >> + * To move local usage to its parent, just forget current level usage. >> + */ >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) >> +{ >> + unsigned long flags; >> + >> + BUG_ON(!counter->parent); >> + spin_lock_irqsave(&counter->lock, flags); >> + res_counter_uncharge_locked(counter, val); >> + spin_unlock_irqrestore(&counter->lock, flags); >> +} > > On the second thought, do we need this at all? It's as good as doing > nothing after all, no? > I considered that, but I think it may make it hard to debug memcg leakage. I'd like to confirm res->usage == 0 at removal of memcg. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx194.postini.com [74.125.245.194]) by kanga.kvack.org (Postfix) with SMTP id 42A666B00E8 for ; Wed, 18 Apr 2012 03:14:11 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id CF4AC3EE0BC for ; Wed, 18 Apr 2012 16:14:09 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id B41AA45DEB7 for ; Wed, 18 Apr 2012 16:14:09 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id E987545DEB5 for ; Wed, 18 Apr 2012 16:14:07 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id D96191DB8040 for ; Wed, 18 Apr 2012 16:14:07 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 8F24B1DB803B for ; Wed, 18 Apr 2012 16:14:07 +0900 (JST) Message-ID: <4F8E6952.9030909@jp.fujitsu.com> Date: Wed, 18 Apr 2012 16:12:18 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 6/7] memcg: remove pre_destroy() References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BCCE.5050802@jp.fujitsu.com> <20120416223800.GF12421@google.com> In-Reply-To: <20120416223800.GF12421@google.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:38), Tejun Heo wrote: > Hello, > > On Thu, Apr 12, 2012 at 08:30:22PM +0900, KAMEZAWA Hiroyuki wrote: >> +/* >> + * This function is called after ->destroy(). So, we cannot access cgroup >> + * of this memcg. >> + */ >> +static void mem_cgroup_recharge(struct work_struct *work) > > So, ->pre_destroy per-se isn't gonna go away. It's just gonna be this > callback which cgroup core uses to unilaterally notify that the cgroup > is going away, so no need to do this cleanup asynchronously from > ->destroy(). It's okay to keep doing it synchronously from > ->pre_destroy(). The only thing is that it can't fail. > I see. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx158.postini.com [74.125.245.158]) by kanga.kvack.org (Postfix) with SMTP id 5CCC36B00E8 for ; Wed, 18 Apr 2012 03:16:23 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 01BDF3EE0BD for ; Wed, 18 Apr 2012 16:16:22 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id D6FB245DEAD for ; Wed, 18 Apr 2012 16:16:21 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id A94E645DEB3 for ; Wed, 18 Apr 2012 16:16:21 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 3094E1DB8040 for ; Wed, 18 Apr 2012 16:16:21 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.240.81.146]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id DE98A1DB803B for ; Wed, 18 Apr 2012 16:16:20 +0900 (JST) Message-ID: <4F8E69D8.4070102@jp.fujitsu.com> Date: Wed, 18 Apr 2012 16:14:32 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BC71.9070403@jp.fujitsu.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Ying Han Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/18 2:29), Ying Han wrote: > On Thu, Apr 12, 2012 at 4:28 AM, KAMEZAWA Hiroyuki > wrote: >> Now, at rmdir, memory cgroup's charge will be moved to >> - parent if use_hierarchy=1 >> - root if use_hierarchy=0 >> >> Then, we don't have to have memory reclaim code at destroying memcg. >> >> This patch divides force_empty to 2 functions as >> >> - memory_cgroup_recharge() ... try to move all charges to ancestors. >> - memory_cgroup_force_empty().. try to reclaim all memory. >> >> After this patch, memory.force_empty will _not_ move charges to ancestors >> but just reclaim all pages. (This meets documenation.) > > Not sure why it matches the documentation: > " > memory.force_empty>---->------- # trigger forced move charge to parent > " I missed this... > > and > " > # echo 0 > memory.force_empty > > Almost all pages tracked by this memory cgroup will be unmapped and freed. > Some pages cannot be freed because they are locked or in-use. Such pages are > moved to parent and this cgroup will be empty. This may return -EBUSY if > VM is too busy to free/move all pages immediately. > " > The 1st feature is "will be unmapped and freed". I'll update Documentation. Thank you. -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx109.postini.com [74.125.245.109]) by kanga.kvack.org (Postfix) with SMTP id 8000A6B00E8 for ; Wed, 18 Apr 2012 03:17:02 -0400 (EDT) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 207703EE0BC for ; Wed, 18 Apr 2012 16:17:01 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id ED32145DE5B for ; Wed, 18 Apr 2012 16:17:00 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id D3D4145DE54 for ; Wed, 18 Apr 2012 16:17:00 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id C1D671DB8040 for ; Wed, 18 Apr 2012 16:17:00 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 7C8DE1DB804C for ; Wed, 18 Apr 2012 16:17:00 +0900 (JST) Message-ID: <4F8E6A00.50707@jp.fujitsu.com> Date: Wed, 18 Apr 2012 16:15:12 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy References: <4F86B9BE.8000105@jp.fujitsu.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Ying Han Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/18 2:35), Ying Han wrote: > On Thu, Apr 12, 2012 at 4:17 AM, KAMEZAWA Hiroyuki > wrote: >> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. >> >> By pre_destroy(), rmdir of cgroup can return -EBUSY or some error. >> It makes cgroup complicated and unstable. I said O.K. to remove it and >> this patch is modification for memcg. >> >> One of problem in current implementation is that memcg moves all charges to >> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may >> hit parent's limit and may return -EBUSY. To fix this problem, this patch >> changes behavior of rmdir() as >> >> - if use_hierarchy=0, all remaining charges will go to root cgroup. >> - if use_hierarchy=1, all remaining charges will go to the parent. > > > We need to update the "4.3 Removing a cgroup" session in Documentation. > Sure, will do. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx138.postini.com [74.125.245.138]) by kanga.kvack.org (Postfix) with SMTP id 73D5A6B0092 for ; Wed, 18 Apr 2012 13:03:43 -0400 (EDT) Received: by dakh32 with SMTP id h32so10839624dak.9 for ; Wed, 18 Apr 2012 10:03:42 -0700 (PDT) Date: Wed, 18 Apr 2012 10:03:37 -0700 From: Tejun Heo Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Message-ID: <20120418170337.GH19975@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <20120416223157.GE12421@google.com> <4F8E678A.8000805@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F8E678A.8000805@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Wed, Apr 18, 2012 at 04:04:42PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/04/17 7:31), Tejun Heo wrote: > > > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: > >> +/* > >> + * In hierarchical accounting, child's usage is accounted into ancestors. > >> + * To move local usage to its parent, just forget current level usage. > >> + */ > >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) > >> +{ > >> + unsigned long flags; > >> + > >> + BUG_ON(!counter->parent); > >> + spin_lock_irqsave(&counter->lock, flags); > >> + res_counter_uncharge_locked(counter, val); > >> + spin_unlock_irqrestore(&counter->lock, flags); > >> +} > > > > On the second thought, do we need this at all? It's as good as doing > > nothing after all, no? > > > > > I considered that, but I think it may make it hard to debug memcg leakage. > I'd like to confirm res->usage == 0 at removal of memcg. Hmmm... then let's name it res_counter_reset() or something. I feel very confused about the function name. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Thu, 12 Apr 2012 20:20:06 +0900 Message-ID: <4F86BA66.2010503@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton This function is used for moving accounting information to its parent in the hierarchy of res_counter. Signed-off-by: KAMEZAWA Hiroyuki --- include/linux/res_counter.h | 3 +++ kernel/res_counter.c | 13 +++++++++++++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index da81af0..8919d3c 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); void res_counter_uncharge(struct res_counter *counter, unsigned long val); +/* move resource to parent counter...i.e. just forget accounting in a child */ +void res_counter_move_parent(struct res_counter *counter, unsigned long val); + /** * res_counter_margin - calculate chargeable space of a counter * @cnt: the counter diff --git a/kernel/res_counter.c b/kernel/res_counter.c index d508363..fafebf0 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) local_irq_restore(flags); } +/* + * In hierarchical accounting, child's usage is accounted into ancestors. + * To move local usage to its parent, just forget current level usage. + */ +void res_counter_move_parent(struct res_counter *counter, unsigned long val) +{ + unsigned long flags; + + BUG_ON(!counter->parent); + spin_lock_irqsave(&counter->lock, flags); + res_counter_uncharge_locked(counter, val); + spin_unlock_irqrestore(&counter->lock, flags); +} static inline unsigned long long * res_counter_member(struct res_counter *counter, int member) -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 2/7] memcg: move charge to parent only when necessary. Date: Thu, 12 Apr 2012 20:21:20 +0900 Message-ID: <4F86BAB0.5030809@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton When memcg->use_hierarchy==true, the parent res_counter includes the usage in child's usage. So, it's not necessary to call try_charge() in the parent. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 39 ++++++++++++++++++++++++++++++++------- 1 files changed, 32 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fa01106..3215880 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, res_counter_uncharge(&memcg->memsw, bytes); } } +/* + * Moving usage between a child to its parent if use_hierarchy==true. + */ +static void __mem_cgroup_move_charge_parent(struct mem_cgroup *memcg, + unsigned int nr_pages) +{ + if (!mem_cgroup_is_root(memcg)) { + unsigned long bytes = nr_pages * PAGE_SIZE; + + res_counter_move_parent(&memcg->res, bytes); + if (do_swap_account) + res_counter_move_parent(&memcg->memsw, bytes); + } +} /* * A helper function to get mem_cgroup from ID. must be called under @@ -2666,18 +2680,29 @@ static int mem_cgroup_move_parent(struct page *page, goto put; nr_pages = hpage_nr_pages(page); - parent = mem_cgroup_from_cont(pcg); - ret = __mem_cgroup_try_charge(NULL, gfp_mask, nr_pages, &parent, false); - if (ret) - goto put_back; + + if (!parent->use_hierarchy) { + ret = __mem_cgroup_try_charge(NULL, gfp_mask, + nr_pages, &parent, false); + if (ret) + goto put_back; + } if (nr_pages > 1) flags = compound_lock_irqsave(page); - ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, true); - if (ret) - __mem_cgroup_cancel_charge(parent, nr_pages); + if (!parent->use_hierarchy) { + ret = mem_cgroup_move_account(page, nr_pages, pc, + child, parent, true); + if (ret) + __mem_cgroup_cancel_charge(parent, nr_pages); + } else { + ret = mem_cgroup_move_account(page, nr_pages, pc, + child, parent, false); + if (!ret) + __mem_cgroup_move_charge_parent(child, nr_pages); + } if (nr_pages > 1) compound_unlock_irqrestore(page, flags); -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 3/7] memcg: move charges to root at rmdir() Date: Thu, 12 Apr 2012 20:22:42 +0900 Message-ID: <4F86BB02.2060607@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton As recently discussed, Tejun Heo, the cgroup maintainer, tries to remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir(). To do that, in memcg, handling case of use_hierarchy==false is a problem. We move memcg's charges to its parent at rmdir(). If use_hierarchy==true, it's already accounted in the parent, no problem. If use_hierarchy==false, we cannot guarantee we can move all charges to the parent. This patch changes the behavior to move all charges to root_mem_cgroup if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship. With this, we can remove -ENOMEM error check at pre_destroy(), called at rmdir. Signed-off-by: KAMEZAWA Hiroyuki --- Documentation/cgroups/memory.txt | 6 ++++-- mm/memcontrol.c | 38 ++++++++++++-------------------------- 2 files changed, 16 insertions(+), 28 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 84d4f00..f717f6a 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -377,8 +377,10 @@ cgroup might have some charge associated with it, even though all tasks have migrated away from it. (because we charge against pages, not against tasks.) -Such charges are freed or moved to their parent. At moving, both of RSS -and CACHES are moved to parent. +Such charges are freed or moved to their parent if use_hierarchy==true. +If use_hierarchy==false, charges are moved to root memory cgroup. + +At moving, both of RSS and CACHES are moved to parent. rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also. Charges recorded in swap information is not updated at removal of cgroup. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3215880..8246418 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2662,15 +2662,14 @@ static int mem_cgroup_move_parent(struct page *page, struct mem_cgroup *child, gfp_t gfp_mask) { - struct cgroup *cg = child->css.cgroup; - struct cgroup *pcg = cg->parent; struct mem_cgroup *parent; unsigned int nr_pages; unsigned long uninitialized_var(flags); + bool need_cancel = false; int ret; /* Is ROOT ? */ - if (!pcg) + if (mem_cgroup_is_root(child)) return -EINVAL; ret = -EBUSY; @@ -2680,33 +2679,23 @@ static int mem_cgroup_move_parent(struct page *page, goto put; nr_pages = hpage_nr_pages(page); - parent = mem_cgroup_from_cont(pcg); - - if (!parent->use_hierarchy) { - ret = __mem_cgroup_try_charge(NULL, gfp_mask, - nr_pages, &parent, false); - if (ret) - goto put_back; + parent = parent_mem_cgroup(child); + if (!parent) { + parent = root_mem_cgroup; + need_cancel = true; } if (nr_pages > 1) flags = compound_lock_irqsave(page); - if (!parent->use_hierarchy) { - ret = mem_cgroup_move_account(page, nr_pages, pc, - child, parent, true); - if (ret) - __mem_cgroup_cancel_charge(parent, nr_pages); - } else { - ret = mem_cgroup_move_account(page, nr_pages, pc, - child, parent, false); - if (!ret) - __mem_cgroup_move_charge_parent(child, nr_pages); - } + ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, + need_cancel); + if (!need_cancel && !ret) + __mem_cgroup_move_charge_parent(child, nr_pages); if (nr_pages > 1) compound_unlock_irqrestore(page, flags); -put_back: + putback_lru_page(page); put: put_page(page); @@ -3677,7 +3666,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, pc = lookup_page_cgroup(page); ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL); - if (ret == -ENOMEM || ret == -EINTR) + if (ret == -EINTR) break; if (ret == -EBUSY || ret == -EINVAL) { @@ -3738,9 +3727,6 @@ move_account: } mem_cgroup_end_move(memcg); memcg_oom_recover(memcg); - /* it seems parent cgroup doesn't have enough mem */ - if (ret == -ENOMEM) - goto try_to_free; cond_resched(); /* "ret" should also be checked to ensure all lists are empty. */ } while (memcg->res.usage > 0 || ret); -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account() Date: Thu, 12 Apr 2012 20:24:14 +0900 Message-ID: <4F86BB5E.6080509@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Only one call passes 'true'. remove it and handle it in caller. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 29 ++++++++++++----------------- 1 files changed, 12 insertions(+), 17 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8246418..9ac7984 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2576,23 +2576,19 @@ void mem_cgroup_split_huge_fixup(struct page *head) * @pc: page_cgroup of the page. * @from: mem_cgroup which the page is moved from. * @to: mem_cgroup which the page is moved to. @from != @to. - * @uncharge: whether we should call uncharge and css_put against @from. * * The caller must confirm following. * - page is not on LRU (isolate_page() is useful.) * - compound_lock is held when nr_pages > 1 * - * This function doesn't do "charge" nor css_get to new cgroup. It should be - * done by a caller(__mem_cgroup_try_charge would be useful). If @uncharge is - * true, this function does "uncharge" from old cgroup, but it doesn't if - * @uncharge is false, so a caller should do "uncharge". + * This function doesn't access res_counter at all. Caller should take + * care of it. */ static int mem_cgroup_move_account(struct page *page, unsigned int nr_pages, struct page_cgroup *pc, struct mem_cgroup *from, - struct mem_cgroup *to, - bool uncharge) + struct mem_cgroup *to) { unsigned long flags; int ret; @@ -2626,9 +2622,6 @@ static int mem_cgroup_move_account(struct page *page, preempt_enable(); } mem_cgroup_charge_statistics(from, anon, -nr_pages); - if (uncharge) - /* This is not "cancel", but cancel_charge does all we need. */ - __mem_cgroup_cancel_charge(from, nr_pages); /* caller should have done css_get */ pc->mem_cgroup = to; @@ -2688,10 +2681,13 @@ static int mem_cgroup_move_parent(struct page *page, if (nr_pages > 1) flags = compound_lock_irqsave(page); - ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, - need_cancel); - if (!need_cancel && !ret) - __mem_cgroup_move_charge_parent(child, nr_pages); + ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent); + if (!ret) { + if (need_cancel) + __mem_cgroup_cancel_charge(child, nr_pages); + else + __mem_cgroup_move_charge_parent(child, nr_pages); + } if (nr_pages > 1) compound_unlock_irqrestore(page, flags); @@ -5451,8 +5447,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, if (!isolate_lru_page(page)) { pc = lookup_page_cgroup(page); if (!mem_cgroup_move_account(page, HPAGE_PMD_NR, - pc, mc.from, mc.to, - false)) { + pc, mc.from, mc.to)) { mc.precharge -= HPAGE_PMD_NR; mc.moved_charge += HPAGE_PMD_NR; } @@ -5482,7 +5477,7 @@ retry: goto put; pc = lookup_page_cgroup(page); if (!mem_cgroup_move_account(page, 1, pc, - mc.from, mc.to, false)) { + mc.from, mc.to)) { mc.precharge--; /* we uncharge from mc.from later. */ mc.moved_charge++; -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir Date: Thu, 12 Apr 2012 20:28:49 +0900 Message-ID: <4F86BC71.9070403@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Now, at rmdir, memory cgroup's charge will be moved to - parent if use_hierarchy=1 - root if use_hierarchy=0 Then, we don't have to have memory reclaim code at destroying memcg. This patch divides force_empty to 2 functions as - memory_cgroup_recharge() ... try to move all charges to ancestors. - memory_cgroup_force_empty().. try to reclaim all memory. After this patch, memory.force_empty will _not_ move charges to ancestors but just reclaim all pages. (This meets documenation.) rmdir() will not reclaim any memory but moves charge to other cgroup, parent or root. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 59 +++++++++++++++++++++++++++---------------------------- 1 files changed, 29 insertions(+), 30 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9ac7984..22c8faa 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3619,10 +3619,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order, } /* - * This routine traverse page_cgroup in given list and drop them all. - * *And* this routine doesn't reclaim page itself, just removes page_cgroup. + * This routine traverse page in given list and move them all. */ -static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, +static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, int node, int zid, enum lru_list lru) { struct mem_cgroup_per_zone *mz; @@ -3678,24 +3677,12 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg, return ret; } -/* - * make mem_cgroup's charge to be 0 if there is no task. - * This enables deleting this mem_cgroup. - */ -static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) + +static int mem_cgroup_recharge(struct mem_cgroup *memcg) { - int ret; - int node, zid, shrink; - int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + int ret, node, zid; struct cgroup *cgrp = memcg->css.cgroup; - css_get(&memcg->css); - - shrink = 0; - /* should free all ? */ - if (free_all) - goto try_to_free; -move_account: do { ret = -EBUSY; if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) @@ -3712,7 +3699,7 @@ move_account: for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) { enum lru_list lru; for_each_lru(lru) { - ret = mem_cgroup_force_empty_list(memcg, + ret = mem_cgroup_recharge_lru(memcg, node, zid, lru); if (ret) break; @@ -3722,24 +3709,33 @@ move_account: break; } mem_cgroup_end_move(memcg); - memcg_oom_recover(memcg); cond_resched(); /* "ret" should also be checked to ensure all lists are empty. */ } while (memcg->res.usage > 0 || ret); out: - css_put(&memcg->css); return ret; +} + + +/* + * make mem_cgroup's charge to be 0 if there is no task. This is only called + * by memory.force_empty file, an user request. + */ +static int mem_cgroup_force_empty(struct mem_cgroup *memcg) +{ + int ret = 0; + int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + struct cgroup *cgrp = memcg->css.cgroup; + + css_get(&memcg->css); -try_to_free: /* returns EBUSY if there is a task or if we come here twice. */ - if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children) || shrink) { + if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) { ret = -EBUSY; goto out; } /* we call try-to-free pages for make this cgroup empty */ lru_add_drain_all(); - /* try to free all pages in this cgroup */ - shrink = 1; while (nr_retries && memcg->res.usage > 0) { int progress; @@ -3754,16 +3750,19 @@ try_to_free: /* maybe some writeback is necessary */ congestion_wait(BLK_RW_ASYNC, HZ/10); } - } - lru_add_drain(); + if (!nr_retries) + ret = -ENOMEM; +out: + memcg_oom_recover(memcg); + css_put(&memcg->css); /* try move_account...there may be some *locked* pages. */ - goto move_account; + return ret; } int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int event) { - return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true); + return mem_cgroup_force_empty(mem_cgroup_from_cont(cont)); } @@ -4987,7 +4986,7 @@ static int mem_cgroup_pre_destroy(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - return mem_cgroup_force_empty(memcg, false); + return mem_cgroup_recharge(memcg); } static void mem_cgroup_destroy(struct cgroup *cont) -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 6/7] memcg: remove pre_destroy() Date: Thu, 12 Apr 2012 20:30:22 +0900 Message-ID: <4F86BCCE.5050802@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to prevent rmdir() from failure of EBUSY or some. This patch removes pre_destroy() in memcg. All remaining charges will be moved to other cgroup, without any failure, ->destroy() just schedule a work and it will destroy the memcg. Then, rmdir will never fail. The kernel will take care of remaining resources in the cgroup to be accounted correctly. After this patch, memcg will be destroyed by workqueue in asynchrnous way. Then, we can modify 'moving' logic to work asynchrnously, i.e, we don't force users to wait for the end of rmdir(), now. We don't need to use heavy synchronous calls. This patch modifies logics as - Use mem_cgroup_drain_stock_async rather tan drain_stock_sync. - lru_add_drain_all() will be called only when necessary, in a lazy way. Signed-off-by: KAMEZAWA Hiroyuki --- mm/memcontrol.c | 52 ++++++++++++++++++++++------------------------------ 1 files changed, 22 insertions(+), 30 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 22c8faa..e466809 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -315,6 +315,8 @@ struct mem_cgroup { #ifdef CONFIG_INET struct tcp_memcontrol tcp_mem; #endif + + struct work_struct work_destroy; }; /* Stuffs for move charges at task migration. */ @@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg) mutex_unlock(&percpu_charge_mutex); } -/* This is a synchronous drain interface. */ static void drain_all_stock_sync(struct mem_cgroup *root_memcg) { /* called when force_empty is called */ @@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, pc = lookup_page_cgroup(page); ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL); - if (ret == -EINTR) - break; - if (ret == -EBUSY || ret == -EINVAL) { + VM_BUG_ON(ret != 0 && ret != -EBUSY); + if (ret) { /* found lock contention or "pc" is obsolete. */ busy = page; cond_resched(); @@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg, return ret; } - -static int mem_cgroup_recharge(struct mem_cgroup *memcg) +/* + * This function is called after ->destroy(). So, we cannot access cgroup + * of this memcg. + */ +static void mem_cgroup_recharge(struct work_struct *work) { + struct mem_cgroup *memcg; int ret, node, zid; - struct cgroup *cgrp = memcg->css.cgroup; + memcg = container_of(work, struct mem_cgroup, work_destroy); + /* No task points this memcg. call this only once */ + drain_all_stock_async(memcg); do { - ret = -EBUSY; - if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) - goto out; - ret = -EINTR; - if (signal_pending(current)) - goto out; - /* This is for making all *used* pages to be on LRU. */ - lru_add_drain_all(); - drain_all_stock_sync(memcg); ret = 0; mem_cgroup_start_move(memcg); for_each_node_state(node, N_HIGH_MEMORY) { @@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup *memcg) } mem_cgroup_end_move(memcg); cond_resched(); - /* "ret" should also be checked to ensure all lists are empty. */ - } while (memcg->res.usage > 0 || ret); -out: - return ret; + /* drain LRU only when we canoot find pages on LRU */ + if (res_counter_read_u64(&memcg->res, RES_USAGE) && + !mem_cgroup_nr_lru_pages(memcg, LRU_ALL)) + lru_add_drain_all(); + } while (res_counter_read_u64(&memcg->res, RES_USAGE) || ret); + mem_cgroup_put(memcg); } - /* * make mem_cgroup's charge to be 0 if there is no task. This is only called * by memory.force_empty file, an user request. @@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work) memcg = container_of(work, struct mem_cgroup, work_freeing); vfree(memcg); } + static void vfree_rcu(struct rcu_head *rcu_head) { struct mem_cgroup *memcg; @@ -4982,20 +4981,14 @@ free_out: return ERR_PTR(error); } -static int mem_cgroup_pre_destroy(struct cgroup *cont) -{ - struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - - return mem_cgroup_recharge(memcg); -} - static void mem_cgroup_destroy(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); kmem_cgroup_destroy(cont); - mem_cgroup_put(memcg); + INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge); + schedule_work(&memcg->work_destroy); } static int mem_cgroup_populate(struct cgroup_subsys *ss, @@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys = { .name = "memory", .subsys_id = mem_cgroup_subsys_id, .create = mem_cgroup_create, - .pre_destroy = mem_cgroup_pre_destroy, .destroy = mem_cgroup_destroy, .populate = mem_cgroup_populate, .can_attach = mem_cgroup_can_attach, -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH 7/7] memcg: remove drain_all_stock_sync. Date: Thu, 12 Apr 2012 20:31:36 +0900 Message-ID: <4F86BD18.4010505@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton Because a function moving pages to ancestor works asynchronously now, drain_all_stock_sync() is unnecessary. Signed-off-by: KAMEAZAWA Hiroyuki --- mm/memcontrol.c | 22 ++-------------------- 1 files changed, 2 insertions(+), 20 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e466809..d42811b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2053,7 +2053,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) * of the hierarchy under it. sync flag says whether we should block * until the work is done. */ -static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) +static void drain_all_stock(struct mem_cgroup *root_memcg) { int cpu, curcpu; @@ -2077,16 +2077,6 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) } } put_cpu(); - - if (!sync) - goto out; - - for_each_online_cpu(cpu) { - struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) - flush_work(&stock->work); - } -out: put_online_cpus(); } @@ -2103,15 +2093,7 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg) */ if (!mutex_trylock(&percpu_charge_mutex)) return; - drain_all_stock(root_memcg, false); - mutex_unlock(&percpu_charge_mutex); -} - -static void drain_all_stock_sync(struct mem_cgroup *root_memcg) -{ - /* called when force_empty is called */ - mutex_lock(&percpu_charge_mutex); - drain_all_stock(root_memcg, true); + drain_all_stock(root_memcg); mutex_unlock(&percpu_charge_mutex); } -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Thu, 12 Apr 2012 10:20:50 -0300 Message-ID: <4F86D6B2.2020401@parallels.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On 04/12/2012 08:17 AM, KAMEZAWA Hiroyuki wrote: > One of problem in current implementation is that memcg moves all charges to > parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may > hit parent's limit and may return -EBUSY. To fix this problem, this patch > changes behavior of rmdir() as > > - if use_hierarchy=0, all remaining charges will go to root cgroup. > - if use_hierarchy=1, all remaining charges will go to the parent. To be quite honest, this is one of those things that we end up overlooking, and just don't think about it in the middle of the complexity. Now that you mention it... When use_hierarchy=0, there is no parent! (At least from where memcg is concerned). So it doesn't make any sense to have it ever have moved it to the "parent" (from the core cgroup perspective). I agree with this new behavior 100 %. Just a nitpick: When use_hierarchy=1, remaining charges need not to "go to the parent". They are already there. I will review your series for the specifics. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH 7/7] memcg: remove drain_all_stock_sync. Date: Thu, 12 Apr 2012 10:35:46 -0300 Message-ID: <4F86DA32.9060701@parallels.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BD18.4010505@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F86BD18.4010505-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On 04/12/2012 08:31 AM, KAMEZAWA Hiroyuki wrote: > Because a function moving pages to ancestor works asynchronously now, > drain_all_stock_sync() is unnecessary. > > Signed-off-by: KAMEAZAWA Hiroyuki Reviewed-by: Glauber Costa From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Thu, 12 Apr 2012 09:06:42 -0700 Message-ID: <20120412160642.GA13069@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=M/ADfApD7rYBWDP+1lU72+qm2+stJ+wDyA8FBMZ/Oxk=; b=Fp4KRXGE2ww2KUp15w0x/9FTL4LuNQOkOxWIV7tV4tL4wLQgLlwxAJiaLm250Ucvtd 1s94Ldij8C+UT219uSIQZGpHZfPZujnyR7wQflGgDZd0ZTBsMO5XGSj3q+LXoeiv22Ol lTxMrJFcPHqwgu2Zr5AS6MtBNLxcmgXUmCpTW6Y1quYniqYtYJv0ZPtT0eM3JOV94WmX okd7Ov6otQ1km/CCwzdSdkVlO3A+AlqLN0lw6X2VICp6BQP89s7TV2aZaugS+oebBDwy IozmZx085bm5BCfHXW+ViBaqZ/tGXXVekOf9WApPsH2ceo6329tAyydFWs0D364xaE9c iBFQ== Content-Disposition: inline In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, KAMEZAWA. Thanks a lot for doing this. On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: > In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. Just to clarify, I'm not intending to ->pre_destroy() per-se but the retry behavior of it, so ->pre_destroy() will be converted to return void and called once on rmdir and rmdir will proceed no matter what. Also, with the deprecated behavior flag set, pre_destroy() doesn't trigger the warning message. Other than that, if memcg people are fine with the change, I'll be happy to route the changes through cgroup/for-3.5 and stack rmdir simplification patches on top. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Fri, 13 Apr 2012 00:27:42 +0530 Message-ID: <877gxksrq1.fsf@skywalker.in.ibm.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120412160642.GA13069-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Tejun Heo , KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Tejun Heo writes: > Hello, KAMEZAWA. > > Thanks a lot for doing this. > > On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: >> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. > > Just to clarify, I'm not intending to ->pre_destroy() per-se but the > retry behavior of it, so ->pre_destroy() will be converted to return > void and called once on rmdir and rmdir will proceed no matter what. > Also, with the deprecated behavior flag set, pre_destroy() doesn't > trigger the warning message. > > Other than that, if memcg people are fine with the change, I'll be > happy to route the changes through cgroup/for-3.5 and stack rmdir > simplification patches on top. > Any suggestion on how to take HugeTLB memcg extension patches [1] upstream. Current patch series I have is on top of cgroup/for-3.5 because I need cgroup_add_files equivalent and cgroup/for-3.5 have changes around that. So if these memcg patches can also go on top of cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? Can HugeTLB memcg extension patches also go via this tree ? It should actually got via -mm. But then how do we take care of these dependencies ? [1] http://thread.gmane.org/gmane.linux.kernel.cgroups/1517 -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Fri, 13 Apr 2012 09:57:03 +0900 Message-ID: <4F8779DF.3080307@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <4F86D733.50809@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Frederic Weisbecker Cc: Glauber Costa , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton (2012/04/12 23:30), Frederic Weisbecker wrote: > 2012/4/12 Glauber Costa : >> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: >>> >>> This function is used for moving accounting information to its >>> parent in the hierarchy of res_counter. >>> >>> Signed-off-by: KAMEZAWA Hiroyuki >> >> Frederic has a patch in his fork cgroup series, that allows you to >> uncharge a counter until you reach a specific ancestor. >> You pass the parent as a parameter, and then only you gets uncharged. > > I'm missing the referring patchset from Kamezawa. Ok I'm going to > subscribe to the > cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc > LKML for cgroup patches? > Ah, sorry. I will do next time. > Some comments below: > >> >> I think that is a much better interface than this you are proposing. >> We should probably merge that patch and use it. >> >>> --- >>> include/linux/res_counter.h | 3 +++ >>> kernel/res_counter.c | 13 +++++++++++++ >>> 2 files changed, 16 insertions(+), 0 deletions(-) >>> >>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >>> index da81af0..8919d3c 100644 >>> --- a/include/linux/res_counter.h >>> +++ b/include/linux/res_counter.h >>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, >>> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); >>> void res_counter_uncharge(struct res_counter *counter, unsigned long val); >>> >>> +/* move resource to parent counter...i.e. just forget accounting in a child */ >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); >>> + >>> /** >>> * res_counter_margin - calculate chargeable space of a counter >>> * @cnt: the counter >>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c >>> index d508363..fafebf0 100644 >>> --- a/kernel/res_counter.c >>> +++ b/kernel/res_counter.c >>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) >>> local_irq_restore(flags); >>> } >>> >>> +/* >>> + * In hierarchical accounting, child's usage is accounted into ancestors. >>> + * To move local usage to its parent, just forget current level usage. > > The way I understand this comment and the changelog matches the opposite > of what the below function is doing. > > The function charges a child and ignore all its parents. The comments says it > charges the parents but not the child. > Sure, I'll fix...and look into your code first. "uncharge a counter until you reach a specific ancestor"... Is it in linux-next ? Thanks, -Kame >>> + */ >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) >>> +{ >>> + unsigned long flags; >>> + >>> + BUG_ON(!counter->parent); >>> + spin_lock_irqsave(&counter->lock, flags); >>> + res_counter_uncharge_locked(counter, val); >>> + spin_unlock_irqrestore(&counter->lock, flags); >>> +} >>> >>> static inline unsigned long long * >>> res_counter_member(struct res_counter *counter, int member) >> > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frederic Weisbecker Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Fri, 13 Apr 2012 03:04:45 +0200 Message-ID: <20120413010443.GD12484@somewhere.redhat.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <4F86D733.50809@parallels.com> <4F8779DF.3080307@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=WMvtyUjBXXGpIOLukJ+8QQaUgdjP4SAjMYeVbqkujzI=; b=EjSjyJQ0h8l+T/hn2L1jFKbto+qHsKM/y/ulDb5TS76+Up+DeT0+Y9nJ4+yv9OY3B5 3a1gGZs1PdmpRFV3UprknodSv9jyIrJbnS6Js8lzHDcowCxwmlPE9XmR4UASuR6aBw19 zbgvTJ/212AJOoFnHSG5uQJftERFUesJgxUx/T1KUZXGaeo7i1qnFu+j78GBdcI4gvvm VY7sAVGKznluCvTw7E6q9uUWVZ9QuyTHnKJfzJ3iRUOLdkTMhbZPdL1V5CItgia+jVuC jc3z2xxQeLxZOffevJ2glH94LiIpmXNYty3ZW30z0Q6F1ApvVbWexIT6jcidLgvYpj4e hRyg== Content-Disposition: inline In-Reply-To: <4F8779DF.3080307-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: Glauber Costa , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Hugh Dickins , Andrew Morton On Fri, Apr 13, 2012 at 09:57:03AM +0900, KAMEZAWA Hiroyuki wrote: > (2012/04/12 23:30), Frederic Weisbecker wrote: > > > 2012/4/12 Glauber Costa : > >> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote: > >>> > >>> This function is used for moving accounting information to its > >>> parent in the hierarchy of res_counter. > >>> > >>> Signed-off-by: KAMEZAWA Hiroyuki > >> > >> Frederic has a patch in his fork cgroup series, that allows you to > >> uncharge a counter until you reach a specific ancestor. > >> You pass the parent as a parameter, and then only you gets uncharged. > > > > I'm missing the referring patchset from Kamezawa. Ok I'm going to > > subscribe to the > > cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc > > LKML for cgroup patches? > > > > Ah, sorry. I will do next time. > > > Some comments below: > > > >> > >> I think that is a much better interface than this you are proposing. > >> We should probably merge that patch and use it. > >> > >>> --- > >>> include/linux/res_counter.h | 3 +++ > >>> kernel/res_counter.c | 13 +++++++++++++ > >>> 2 files changed, 16 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > >>> index da81af0..8919d3c 100644 > >>> --- a/include/linux/res_counter.h > >>> +++ b/include/linux/res_counter.h > >>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, > >>> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); > >>> void res_counter_uncharge(struct res_counter *counter, unsigned long val); > >>> > >>> +/* move resource to parent counter...i.e. just forget accounting in a child */ > >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); > >>> + > >>> /** > >>> * res_counter_margin - calculate chargeable space of a counter > >>> * @cnt: the counter > >>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c > >>> index d508363..fafebf0 100644 > >>> --- a/kernel/res_counter.c > >>> +++ b/kernel/res_counter.c > >>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val) > >>> local_irq_restore(flags); > >>> } > >>> > >>> +/* > >>> + * In hierarchical accounting, child's usage is accounted into ancestors. > >>> + * To move local usage to its parent, just forget current level usage. > > > > The way I understand this comment and the changelog matches the opposite > > of what the below function is doing. > > > > The function charges a child and ignore all its parents. The comments says it > > charges the parents but not the child. > > > > > Sure, I'll fix...and look into your code first. > "uncharge a counter until you reach a specific ancestor"... > Is it in linux-next ? No, it's part of the task counter so it's still out of tree. You were Cc'ed but if you can't find it in your inbox I can resend it: http://thread.gmane.org/gmane.linux.kernel.containers/22378 Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Fri, 13 Apr 2012 10:50:14 +0200 Message-ID: <20120413085014.GA9205@tiehlicka.suse.cz> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com> <877gxksrq1.fsf@skywalker.in.ibm.com> <4F876C70.7060600@jp.fujitsu.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <4F876C70.7060600-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "Aneesh Kumar K.V" , Tejun Heo , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote: > (2012/04/13 3:57), Aneesh Kumar K.V wrote: > > > Tejun Heo writes: > > > >> Hello, KAMEZAWA. > >> > >> Thanks a lot for doing this. > >> > >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: > >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. > >> > >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the > >> retry behavior of it, so ->pre_destroy() will be converted to return > >> void and called once on rmdir and rmdir will proceed no matter what. > >> Also, with the deprecated behavior flag set, pre_destroy() doesn't > >> trigger the warning message. > >> > >> Other than that, if memcg people are fine with the change, I'll be > >> happy to route the changes through cgroup/for-3.5 and stack rmdir > >> simplification patches on top. > >> > > > > Any suggestion on how to take HugeTLB memcg extension patches [1] > > upstream. Current patch series I have is on top of cgroup/for-3.5 > > because I need cgroup_add_files equivalent and cgroup/for-3.5 have > > changes around that. So if these memcg patches can also go on top of > > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? I would suggest working on top of memcg-devel tree or on top linux-next. Just pull the required patch-es from cgroup/for-3.5 tree before your work (I can include that into memcg-devel tree for you if you want). Do you think this is a 3.5 material? I would rather wait some more. I didn't have time to look over it yet and there are still some unresolved issues so it sounds like too early for merging. > > Can HugeTLB memcg extension patches also go via this tree ? It > > should actually got via -mm. But then how do we take care of these > > dependencies ? You are not changing anything generic from cgroup so definitely go via Andrew. > I'm not in hurry. To be honest, I cannot update patches until the next Wednesday. > So, If changes of cgroup tree you required are included in linux-next. Please post > your updated ones. I thought your latest version was near to be merged.... > > How do you think, Michal ? > Please post (and ask Andrew to pull it.) I'll review when I can. I would wait with pulling the patch after the review. > I know yours and mine has some conflicts. I think my this series will > be onto your series. To do that, I hope your series are merged to > linux-next, 1st. > > > Thanks, > -Kame > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Sat, 14 Apr 2012 03:49:15 +0530 Message-ID: <87vcl3gtr0.fsf@skywalker.in.ibm.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <20120412160642.GA13069@google.com> <877gxksrq1.fsf@skywalker.in.ibm.com> <4F876C70.7060600@jp.fujitsu.com> <20120413085014.GA9205@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120413085014.GA9205-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko , KAMEZAWA Hiroyuki Cc: Tejun Heo , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Michal Hocko writes: > On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote: >> (2012/04/13 3:57), Aneesh Kumar K.V wrote: >> >> > Tejun Heo writes: >> > >> >> Hello, KAMEZAWA. >> >> >> >> Thanks a lot for doing this. >> >> >> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: >> >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >> >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. >> >> >> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the >> >> retry behavior of it, so ->pre_destroy() will be converted to return >> >> void and called once on rmdir and rmdir will proceed no matter what. >> >> Also, with the deprecated behavior flag set, pre_destroy() doesn't >> >> trigger the warning message. >> >> >> >> Other than that, if memcg people are fine with the change, I'll be >> >> happy to route the changes through cgroup/for-3.5 and stack rmdir >> >> simplification patches on top. >> >> >> > >> > Any suggestion on how to take HugeTLB memcg extension patches [1] >> > upstream. Current patch series I have is on top of cgroup/for-3.5 >> > because I need cgroup_add_files equivalent and cgroup/for-3.5 have >> > changes around that. So if these memcg patches can also go on top of >> > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ? > > I would suggest working on top of memcg-devel tree or on top linux-next. > Just pull the required patch-es from cgroup/for-3.5 tree before your > work (I can include that into memcg-devel tree for you if you want). I am expecting to have no conflicts with pending memcg changes. But I do have conflicts with cgroup/for-3.5. That is the reason I decided to rebase on top of cgroup/for-3.5. > > Do you think this is a 3.5 material? I would rather wait some more. I > didn't have time to look over it yet and there are still some unresolved > issues so it sounds like too early for merging. I would really like to get it merged for 3.5. I am ready to post V6 that address all review feedback from V5 post. > >> > Can HugeTLB memcg extension patches also go via this tree ? It >> > should actually got via -mm. But then how do we take care of these >> > dependencies ? > > You are not changing anything generic from cgroup so definitely go via > Andrew. > agreed. >> I'm not in hurry. To be honest, I cannot update patches until the next Wednesday. >> So, If changes of cgroup tree you required are included in linux-next. Please post >> your updated ones. I thought your latest version was near to be merged.... >> >> How do you think, Michal ? >> Please post (and ask Andrew to pull it.) I'll review when I can. > > I would wait with pulling the patch after the review. > agreed. So I will do a v6 post and if we all agree with the changes it can be pulled via -mm ? >> I know yours and mine has some conflicts. I think my this series will >> be onto your series. To do that, I hope your series are merged to >> linux-next, 1st. >> -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Mon, 16 Apr 2012 15:19:24 -0700 Message-ID: <20120416221924.GB12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=efISsRD16SpTgjTiyZU11yVAh4uUeSVZP80rwbFbSUU=; b=t7zpzyDvN9kaaumU647kHtsz3Q/Do76nSntpw/scIw2sQ28U8Sw21eQ62bCjdoRu1c SR/N6beANrgwQ8LKsmwHh4g60U3QswD3bB7qq8Y9hBY2d0hpVD0gGd3jdDxhE6Q0gs4o hWg2zSFbBigjX7LB7w6tK/4l9GrJ//zTs9cBj1aGgE/b7CV9GxgrTOLgLltUlZJF326u +tz+wGaqkb7FQLT5VZj/VtpAYQGY8adAYOAwTvsjt8zXrOXLNd4RduEZpwBFanc1NbVx e4s47OgIa31vRJngCmI77Zo16r0rLo1JxJO9RQ1wFAwTDb86EZhzbzBFosiGet3rsNsi 32YQ== Content-Disposition: inline In-Reply-To: <4F86BA66.2010503-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: > > This function is used for moving accounting information to its > parent in the hierarchy of res_counter. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > include/linux/res_counter.h | 3 +++ > kernel/res_counter.c | 13 +++++++++++++ > 2 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index da81af0..8919d3c 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, > void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); > void res_counter_uncharge(struct res_counter *counter, unsigned long val); > > +/* move resource to parent counter...i.e. just forget accounting in a child */ Can we drop this comment and > +void res_counter_move_parent(struct res_counter *counter, unsigned long val); > > +/* > + * In hierarchical accounting, child's usage is accounted into ancestors. > + * To move local usage to its parent, just forget current level usage. > + */ make this one proper docbook function comment? > +void res_counter_move_parent(struct res_counter *counter, unsigned long val) > +{ > + unsigned long flags; > + > + BUG_ON(!counter->parent); And let's please do "if (WARN_ON(!counter->parent)) return;" instead. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH 2/7] memcg: move charge to parent only when necessary. Date: Mon, 16 Apr 2012 15:21:19 -0700 Message-ID: <20120416222119.GC12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BAB0.5030809@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9PG+MkbYjIESrpGxWXyiEzN50CEjQw+AKP867ulpnIc=; b=wqalZpxj85v8Pw1ef80P8xoWaAeqAZNaalgqDjQovdr+HmrGtAQS5qmP80XRwAcy9w 6FEiOnEeoOrfMxCkYWrkgv3QLQED4lSaFjhHaUFxhbUOl0ip1Mju7wO1lsrcMN5B0yCJ 0fBygD1o23f/mminFHBEWiBGraPiYd0vJL/TxrtsekgrbWYfq+sNlCpVCdIQ11OyYbNl dwsxoZG1DDdVUjGwfOZxSYiKlyFMux/fiF/0L+/qxQWOZWtKgneo0hmS1j2uWDGecq0Q YuwSMo2HUb/mg4asET0T7jwUB+tYl1z+HMZwuTy9bcE2UtgpkfpO6slqz5+RN+Qaz9LJ NBhA== Content-Disposition: inline In-Reply-To: <4F86BAB0.5030809-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote: > > When memcg->use_hierarchy==true, the parent res_counter includes > the usage in child's usage. So, it's not necessary to call try_charge() > in the parent. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > mm/memcontrol.c | 39 ++++++++++++++++++++++++++++++++------- > 1 files changed, 32 insertions(+), 7 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index fa01106..3215880 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, > res_counter_uncharge(&memcg->memsw, bytes); > } > } New line missing here. > +/* > + * Moving usage between a child to its parent if use_hierarchy==true. > + */ Prolly "from a child to its parent"? Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH 3/7] memcg: move charges to root at rmdir() Date: Mon, 16 Apr 2012 15:30:12 -0700 Message-ID: <20120416223012.GD12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BB02.2060607@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=0bGLclZh2u05b/O6CbrFpFhZovK/IQL3aH70iaG0Jgw=; b=nzJ66LWWlUefaEIvWLMkt4nkpspPUMPy9F2dR5lBbAjkHgmZOgmFMptO+6f6aEwwDp YK15BFhN9p6hC6kKVBnoVjwHcjrYakbmW4G0nmrga++rflhtC2oer5O823HrN+ofNZia FhWgfqQf7oBxsiUpkDtAFyYSHmabEYY+bhyxlQCamN5+hWh6Lym2Ck5twP3bhCvMdTgM ZuuLYNcYwQNhaBEJvMaSnhb4QNGzULiYhWNKhaD2rm01BFJ/w924VQ/AFLXz+w2vckDN 29wqLKnGu7Ao7HSpkdVdqEwb5YnR88JhHkyjDDAa5J43DD6+twcr7buvv8XQtAywmW0t 97kw== Content-Disposition: inline In-Reply-To: <4F86BB02.2060607-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, On Thu, Apr 12, 2012 at 08:22:42PM +0900, KAMEZAWA Hiroyuki wrote: > As recently discussed, Tejun Heo, the cgroup maintainer, tries to > remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir(). I'm not trying to remove ->pre_destory() per-se. I want to remove css ref draining and ->pre_destroy() vetoing cgroup removal. Probably better wording would be "tries to simplify removal path such that removal always succeeds". > To do that, in memcg, handling case of use_hierarchy==false is a problem. > > We move memcg's charges to its parent at rmdir(). If use_hierarchy==true, > it's already accounted in the parent, no problem. If use_hierarchy==false, > we cannot guarantee we can move all charges to the parent. > > This patch changes the behavior to move all charges to root_mem_cgroup > if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship. Maybe better to break the above line? Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Mon, 16 Apr 2012 15:41:18 -0700 Message-ID: <20120416224118.GG12421@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=ZF3Vkf1/v6cKQDLI6WSUcVnE3fXxY8bgF+wHse7WiFI=; b=hdXsZuJUSK57C6zmHUIAjTqBD3tt7q0+BEmz+pC3SeP4PhfsqtR3yN56qpdrBZzHm3 hFKUrok1K6Y+KyHOxPhQ+EpxViGi7Cm/ibCzjeeynAi/Noxki2ZkqUCwY/dHBeFAzrnY WIXtGuBdWE5lhFWNa1IhBei3USbUMch+F7vpnNhU6Yy+SQsIfAIaxersLpyjHN20CzQk wjQ2M+hcP7hEsHz/dVhNie6KCwDTeIVShBbs+K/8xacaicd+h7osg0HnCi4sLKWuU80Y NR1PaO4Gd8bH0gB4LhqEQyPRLMkMX+IUovIWz7YVjTn6ZVFLrfLW/w1w3VzZYDpWEG0D IeKA== Content-Disposition: inline In-Reply-To: <4F86B9BE.8000105-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton Hello, KAMEZAWA. On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote: > In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove > ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. I did a pretty shallow review of the patchset and other than the unnecessary async destruction, my complaints are mostly trivial. Ooh, and the patchset doesn't apply cleanly on top of cgroup/for-3.5. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.5 Thank you! -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ying Han Subject: Re: [PATCH 6/7] memcg: remove pre_destroy() Date: Tue, 17 Apr 2012 10:47:13 -0700 Message-ID: References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BCCE.5050802@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-system-of-record; bh=R0f20NRGDLfHiVL+HBFnHJz35to2veDSumafn7UjGPU=; b=kkC0d+j4nKE+cGWSTgB8YJQJFT+aoTxVti6rzW0q7mmjaV3feR9zHtCI8DI94Irypu zMmcoh9g6bAdArSlUrS62T4lGEzHiu3cxHnD3gupUYxvdOF3SHm4CRuWhxaVhOj166em r8vvcgDpCByMaLRMj8Q7C9YnsXMbkOxB1aZ0wca5d7LbVWKFfyFNHXYLMmwQoeKtqpwS R0OwFbz/bicn7lRN5awcZ+cuPJEMiAgEwMgOIzoEH5t/5XeESrdmQHXHnMY1XBMLERXX XNTgru0MRsFpA/yJQtEqY2EUaBGr6ITEHRQiybBsGpjG2FUCd6jLEoNWfIEoN57baJ2u A5fA== In-Reply-To: <4F86BCCE.5050802-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton On Thu, Apr 12, 2012 at 4:30 AM, KAMEZAWA Hiroyuki wrote: > Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to > prevent rmdir() from failure of EBUSY or some. > > This patch removes pre_destroy() in memcg. All remaining charges > will be moved to other cgroup, without any failure, =A0->destroy() > just schedule a work and it will destroy the memcg. > Then, rmdir will never fail. The kernel will take care of remaining > resources in the cgroup to be accounted correctly. > > After this patch, memcg will be destroyed by workqueue in asynchrnous= way. Is it necessary to change the destroy asynchronously? =46rankly, i don't that that much. It will leave the system in a deterministic state on admin perspective. The current synchronous destroy works fine, and admin can rely on that w/ charging change after the destroy returns. --Ying > Then, we can modify 'moving' logic to work asynchrnously, i.e, > we don't force users to wait for the end of rmdir(), now. We don't > need to use heavy synchronous calls. This patch modifies logics as > > =A0- Use mem_cgroup_drain_stock_async rather tan drain_stock_sync. > =A0- lru_add_drain_all() will be called only when necessary, in a laz= y way. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > =A0mm/memcontrol.c | =A0 52 ++++++++++++++++++++++-------------------= ----------- > =A01 files changed, 22 insertions(+), 30 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 22c8faa..e466809 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -315,6 +315,8 @@ struct mem_cgroup { > =A0#ifdef CONFIG_INET > =A0 =A0 =A0 =A0struct tcp_memcontrol tcp_mem; > =A0#endif > + > + =A0 =A0 =A0 struct work_struct work_destroy; > =A0}; > > =A0/* Stuffs for move charges at task migration. */ > @@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cg= roup *root_memcg) > =A0 =A0 =A0 =A0mutex_unlock(&percpu_charge_mutex); > =A0} > > -/* This is a synchronous drain interface. */ > =A0static void drain_all_stock_sync(struct mem_cgroup *root_memcg) > =A0{ > =A0 =A0 =A0 =A0/* called when force_empty is called */ > @@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_= cgroup *memcg, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pc =3D lookup_page_cgroup(page); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D mem_cgroup_move_parent(page, p= c, memcg, GFP_KERNEL); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret =3D=3D -EINTR) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret =3D=3D -EBUSY || ret =3D=3D -EI= NVAL) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 VM_BUG_ON(ret !=3D 0 && ret !=3D -EBUSY= ); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* found lock contenti= on or "pc" is obsolete. */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0busy =3D page; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cond_resched(); > @@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem= _cgroup *memcg, > =A0 =A0 =A0 =A0return ret; > =A0} > > - > -static int mem_cgroup_recharge(struct mem_cgroup *memcg) > +/* > + * This function is called after ->destroy(). So, we cannot access c= group > + * of this memcg. > + */ > +static void mem_cgroup_recharge(struct work_struct *work) > =A0{ > + =A0 =A0 =A0 struct mem_cgroup *memcg; > =A0 =A0 =A0 =A0int ret, node, zid; > - =A0 =A0 =A0 struct cgroup *cgrp =3D memcg->css.cgroup; > > + =A0 =A0 =A0 memcg =3D container_of(work, struct mem_cgroup, work_de= stroy); > + =A0 =A0 =A0 /* No task points this memcg. call this only once */ > + =A0 =A0 =A0 drain_all_stock_async(memcg); > =A0 =A0 =A0 =A0do { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -EBUSY; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (cgroup_task_count(cgrp) || !list_em= pty(&cgrp->children)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -EINTR; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (signal_pending(current)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* This is for making all *used* pages = to be on LRU. */ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 lru_add_drain_all(); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 drain_all_stock_sync(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D 0; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mem_cgroup_start_move(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for_each_node_state(node, N_HIGH_MEMOR= Y) { > @@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgr= oup *memcg) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mem_cgroup_end_move(memcg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cond_resched(); > - =A0 =A0 =A0 /* "ret" should also be checked to ensure all lists are= empty. */ > - =A0 =A0 =A0 } while (memcg->res.usage > 0 || ret); > -out: > - =A0 =A0 =A0 return ret; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* drain LRU only when we canoot find p= ages on LRU */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (res_counter_read_u64(&memcg->res, R= ES_USAGE) && > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 !mem_cgroup_nr_lru_pages(memcg,= LRU_ALL)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lru_add_drain_all(); > + =A0 =A0 =A0 } while (res_counter_read_u64(&memcg->res, RES_USAGE) |= | ret); > + =A0 =A0 =A0 mem_cgroup_put(memcg); > =A0} > > - > =A0/* > =A0* make mem_cgroup's charge to be 0 if there is no task. This is on= ly called > =A0* by memory.force_empty file, an user request. > @@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work= ) > =A0 =A0 =A0 =A0memcg =3D container_of(work, struct mem_cgroup, work_f= reeing); > =A0 =A0 =A0 =A0vfree(memcg); > =A0} > + > =A0static void vfree_rcu(struct rcu_head *rcu_head) > =A0{ > =A0 =A0 =A0 =A0struct mem_cgroup *memcg; > @@ -4982,20 +4981,14 @@ free_out: > =A0 =A0 =A0 =A0return ERR_PTR(error); > =A0} > > -static int mem_cgroup_pre_destroy(struct cgroup *cont) > -{ > - =A0 =A0 =A0 struct mem_cgroup *memcg =3D mem_cgroup_from_cont(cont)= ; > - > - =A0 =A0 =A0 return mem_cgroup_recharge(memcg); > -} > - > =A0static void mem_cgroup_destroy(struct cgroup *cont) > =A0{ > =A0 =A0 =A0 =A0struct mem_cgroup *memcg =3D mem_cgroup_from_cont(cont= ); > > =A0 =A0 =A0 =A0kmem_cgroup_destroy(cont); > > - =A0 =A0 =A0 mem_cgroup_put(memcg); > + =A0 =A0 =A0 INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge); > + =A0 =A0 =A0 schedule_work(&memcg->work_destroy); > =A0} > > =A0static int mem_cgroup_populate(struct cgroup_subsys *ss, > @@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys =3D { > =A0 =A0 =A0 =A0.name =3D "memory", > =A0 =A0 =A0 =A0.subsys_id =3D mem_cgroup_subsys_id, > =A0 =A0 =A0 =A0.create =3D mem_cgroup_create, > - =A0 =A0 =A0 .pre_destroy =3D mem_cgroup_pre_destroy, > =A0 =A0 =A0 =A0.destroy =3D mem_cgroup_destroy, > =A0 =A0 =A0 =A0.populate =3D mem_cgroup_populate, > =A0 =A0 =A0 =A0.can_attach =3D mem_cgroup_can_attach, > -- > 1.7.4.1 > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthem= eter.ca/ > Don't email: email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Wed, 18 Apr 2012 15:59:39 +0900 Message-ID: <4F8E665B.7050302@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <20120416221924.GB12421@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120416221924.GB12421-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:19), Tejun Heo wrote: > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: >> >> This function is used for moving accounting information to its >> parent in the hierarchy of res_counter. >> >> Signed-off-by: KAMEZAWA Hiroyuki >> --- >> include/linux/res_counter.h | 3 +++ >> kernel/res_counter.c | 13 +++++++++++++ >> 2 files changed, 16 insertions(+), 0 deletions(-) >> >> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >> index da81af0..8919d3c 100644 >> --- a/include/linux/res_counter.h >> +++ b/include/linux/res_counter.h >> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter, >> void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); >> void res_counter_uncharge(struct res_counter *counter, unsigned long val); >> >> +/* move resource to parent counter...i.e. just forget accounting in a child */ > > Can we drop this comment and > >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val); >> >> +/* >> + * In hierarchical accounting, child's usage is accounted into ancestors. >> + * To move local usage to its parent, just forget current level usage. >> + */ > > make this one proper docbook function comment? > Sure. (I'll use Frederic's one.) Thanks, -Kame >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) >> +{ >> + unsigned long flags; >> + >> + BUG_ON(!counter->parent); > > And let's please do "if (WARN_ON(!counter->parent)) return;" instead. > > Thanks. > From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 2/7] memcg: move charge to parent only when necessary. Date: Wed, 18 Apr 2012 16:01:11 +0900 Message-ID: <4F8E66B7.3050602@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BAB0.5030809@jp.fujitsu.com> <20120416222119.GC12421@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120416222119.GC12421-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:21), Tejun Heo wrote: > On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote: >> >> When memcg->use_hierarchy==true, the parent res_counter includes >> the usage in child's usage. So, it's not necessary to call try_charge() >> in the parent. >> >> Signed-off-by: KAMEZAWA Hiroyuki >> --- >> mm/memcontrol.c | 39 ++++++++++++++++++++++++++++++++------- >> 1 files changed, 32 insertions(+), 7 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index fa01106..3215880 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, >> res_counter_uncharge(&memcg->memsw, bytes); >> } >> } > > New line missing here. > >> +/* >> + * Moving usage between a child to its parent if use_hierarchy==true. >> + */ > > Prolly "from a child to its parent"? > Sure. will fix. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Wed, 18 Apr 2012 16:04:42 +0900 Message-ID: <4F8E678A.8000805@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <20120416223157.GE12421@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120416223157.GE12421-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:31), Tejun Heo wrote: > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: >> +/* >> + * In hierarchical accounting, child's usage is accounted into ancestors. >> + * To move local usage to its parent, just forget current level usage. >> + */ >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) >> +{ >> + unsigned long flags; >> + >> + BUG_ON(!counter->parent); >> + spin_lock_irqsave(&counter->lock, flags); >> + res_counter_uncharge_locked(counter, val); >> + spin_unlock_irqrestore(&counter->lock, flags); >> +} > > On the second thought, do we need this at all? It's as good as doing > nothing after all, no? > I considered that, but I think it may make it hard to debug memcg leakage. I'd like to confirm res->usage == 0 at removal of memcg. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 6/7] memcg: remove pre_destroy() Date: Wed, 18 Apr 2012 16:12:18 +0900 Message-ID: <4F8E6952.9030909@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BCCE.5050802@jp.fujitsu.com> <20120416223800.GF12421@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120416223800.GF12421-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/17 7:38), Tejun Heo wrote: > Hello, > > On Thu, Apr 12, 2012 at 08:30:22PM +0900, KAMEZAWA Hiroyuki wrote: >> +/* >> + * This function is called after ->destroy(). So, we cannot access cgroup >> + * of this memcg. >> + */ >> +static void mem_cgroup_recharge(struct work_struct *work) > > So, ->pre_destroy per-se isn't gonna go away. It's just gonna be this > callback which cgroup core uses to unilaterally notify that the cgroup > is going away, so no need to do this cleanup asynchronously from > ->destroy(). It's okay to keep doing it synchronously from > ->pre_destroy(). The only thing is that it can't fail. > I see. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir Date: Wed, 18 Apr 2012 16:14:32 +0900 Message-ID: <4F8E69D8.4070102@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BC71.9070403@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Ying Han Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/18 2:29), Ying Han wrote: > On Thu, Apr 12, 2012 at 4:28 AM, KAMEZAWA Hiroyuki > wrote: >> Now, at rmdir, memory cgroup's charge will be moved to >> - parent if use_hierarchy=1 >> - root if use_hierarchy=0 >> >> Then, we don't have to have memory reclaim code at destroying memcg. >> >> This patch divides force_empty to 2 functions as >> >> - memory_cgroup_recharge() ... try to move all charges to ancestors. >> - memory_cgroup_force_empty().. try to reclaim all memory. >> >> After this patch, memory.force_empty will _not_ move charges to ancestors >> but just reclaim all pages. (This meets documenation.) > > Not sure why it matches the documentation: > " > memory.force_empty>---->------- # trigger forced move charge to parent > " I missed this... > > and > " > # echo 0 > memory.force_empty > > Almost all pages tracked by this memory cgroup will be unmapped and freed. > Some pages cannot be freed because they are locked or in-use. Such pages are > moved to parent and this cgroup will be empty. This may return -EBUSY if > VM is too busy to free/move all pages immediately. > " > The 1st feature is "will be unmapped and freed". I'll update Documentation. Thank you. -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH v1 0/7] memcg remove pre_destroy Date: Wed, 18 Apr 2012 16:15:12 +0900 Message-ID: <4F8E6A00.50707@jp.fujitsu.com> References: <4F86B9BE.8000105@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Ying Han Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Tejun Heo , Glauber Costa , Hugh Dickins , Andrew Morton (2012/04/18 2:35), Ying Han wrote: > On Thu, Apr 12, 2012 at 4:17 AM, KAMEZAWA Hiroyuki > wrote: >> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove >> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING. >> >> By pre_destroy(), rmdir of cgroup can return -EBUSY or some error. >> It makes cgroup complicated and unstable. I said O.K. to remove it and >> this patch is modification for memcg. >> >> One of problem in current implementation is that memcg moves all charges to >> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may >> hit parent's limit and may return -EBUSY. To fix this problem, this patch >> changes behavior of rmdir() as >> >> - if use_hierarchy=0, all remaining charges will go to root cgroup. >> - if use_hierarchy=1, all remaining charges will go to the parent. > > > We need to update the "4.3 Removing a cgroup" session in Documentation. > Sure, will do. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent(). Date: Wed, 18 Apr 2012 10:03:37 -0700 Message-ID: <20120418170337.GH19975@google.com> References: <4F86B9BE.8000105@jp.fujitsu.com> <4F86BA66.2010503@jp.fujitsu.com> <20120416223157.GE12421@google.com> <4F8E678A.8000805@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=tmltBXl/3W2+L+8Wgv/KiumtOauX+PiYZZs4fb51aaU=; b=vMSVZ4amNqpeihxF0Q6BCf+KFhE/2W2N/7C1ySMZdbgiafMkRyv6GDD4sxqkpT4CEB 8hUU9ZgokctuVFTRLkZ/De4Bc6P+HbIPSQ/f3FJ2CNknKHgmibjYZWNZBvJbsRn/jxz2 8Q2aC1UMXXPhvFb4YTASBnyqbJJH0ekV8jQ2sArjGKO25FIZMUtragwmMLqOtjHKyBlO HTnNeC+hYWzsAhtGST/B+G6kyLWS+PKua4owZMyL94KXfpyn1oxtDlIUQnb3xJXvJOQj EPfxCyxPCzE2b78r4DmdMQd3WWJZnV5ubtf6abe1jYekua9H4SHovXIRDWV23NLIW9mf TEOA== Content-Disposition: inline In-Reply-To: <4F8E678A.8000805-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Michal Hocko , Johannes Weiner , Glauber Costa , Hugh Dickins , Andrew Morton On Wed, Apr 18, 2012 at 04:04:42PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/04/17 7:31), Tejun Heo wrote: > > > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote: > >> +/* > >> + * In hierarchical accounting, child's usage is accounted into ancestors. > >> + * To move local usage to its parent, just forget current level usage. > >> + */ > >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val) > >> +{ > >> + unsigned long flags; > >> + > >> + BUG_ON(!counter->parent); > >> + spin_lock_irqsave(&counter->lock, flags); > >> + res_counter_uncharge_locked(counter, val); > >> + spin_unlock_irqrestore(&counter->lock, flags); > >> +} > > > > On the second thought, do we need this at all? It's as good as doing > > nothing after all, no? > > > > > I considered that, but I think it may make it hard to debug memcg leakage. > I'd like to confirm res->usage == 0 at removal of memcg. Hmmm... then let's name it res_counter_reset() or something. I feel very confused about the function name. Thanks. -- tejun