All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 11:17 KAMEZAWA Hiroyuki
  2012-04-12 11:20   ` KAMEZAWA Hiroyuki
                   ` (10 more replies)
  0 siblings, 11 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:17 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton, KAMEZAWA Hiroyuki

In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.

By pre_destroy(), rmdir of cgroup can return -EBUSY or some error.
It makes cgroup complicated and unstable. I said O.K. to remove it and
this patch is modification for memcg.

One of problem in current implementation is that memcg moves all charges to
parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may
hit parent's limit and may return -EBUSY. To fix this problem, this patch
changes behavior of rmdir() as

 - if use_hierarchy=0, all remaining charges will go to root cgroup.
 - if use_hierarchy=1, all remaining charges will go to the parent.

By this, rmdir failure will not be caused by parent's limitation. And
I think this meets meaning of use_hierarchy.

This series does
  - add above change of behavior
  - use workqueue to move all pages to parent
  - remove unnecessary codes.

I'm sorry if my reply is delayed, I'm not sure I can have enough time in
this weekend. Any comments are welcomed.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-12 11:20   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton


This function is used for moving accounting information to its
parent in the hierarchy of res_counter.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    3 +++
 kernel/res_counter.c        |   13 +++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..8919d3c 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
 
+/* move resource to parent counter...i.e. just forget accounting in a child */
+void res_counter_move_parent(struct res_counter *counter, unsigned long val);
+
 /**
  * res_counter_margin - calculate chargeable space of a counter
  * @cnt: the counter
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..fafebf0 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+/*
+ * In hierarchical accounting, child's usage is accounted into ancestors.
+ * To move local usage to its parent, just forget current level usage.
+ */
+void res_counter_move_parent(struct res_counter *counter, unsigned long val)
+{
+	unsigned long flags;
+
+	BUG_ON(!counter->parent);
+	spin_lock_irqsave(&counter->lock, flags);
+	res_counter_uncharge_locked(counter, val);
+	spin_unlock_irqrestore(&counter->lock, flags);
+}
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-12 11:20   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton


This function is used for moving accounting information to its
parent in the hierarchy of res_counter.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 include/linux/res_counter.h |    3 +++
 kernel/res_counter.c        |   13 +++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..8919d3c 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
 
+/* move resource to parent counter...i.e. just forget accounting in a child */
+void res_counter_move_parent(struct res_counter *counter, unsigned long val);
+
 /**
  * res_counter_margin - calculate chargeable space of a counter
  * @cnt: the counter
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..fafebf0 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+/*
+ * In hierarchical accounting, child's usage is accounted into ancestors.
+ * To move local usage to its parent, just forget current level usage.
+ */
+void res_counter_move_parent(struct res_counter *counter, unsigned long val)
+{
+	unsigned long flags;
+
+	BUG_ON(!counter->parent);
+	spin_lock_irqsave(&counter->lock, flags);
+	res_counter_uncharge_locked(counter, val);
+	spin_unlock_irqrestore(&counter->lock, flags);
+}
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 2/7] memcg: move charge to parent only when necessary.
@ 2012-04-12 11:21   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton


When memcg->use_hierarchy==true, the parent res_counter includes
the usage in child's usage. So, it's not necessary to call try_charge()
in the parent.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   39 ++++++++++++++++++++++++++++++++-------
 1 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fa01106..3215880 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
 			res_counter_uncharge(&memcg->memsw, bytes);
 	}
 }
+/*
+ * Moving usage between a child to its parent if use_hierarchy==true.
+ */
+static void __mem_cgroup_move_charge_parent(struct mem_cgroup *memcg,
+					unsigned int nr_pages)
+{
+	if (!mem_cgroup_is_root(memcg)) {
+		unsigned long bytes = nr_pages * PAGE_SIZE;
+
+		res_counter_move_parent(&memcg->res, bytes);
+		if (do_swap_account)
+			res_counter_move_parent(&memcg->memsw, bytes);
+	}
+}
 
 /*
  * A helper function to get mem_cgroup from ID. must be called under
@@ -2666,18 +2680,29 @@ static int mem_cgroup_move_parent(struct page *page,
 		goto put;
 
 	nr_pages = hpage_nr_pages(page);
-
 	parent = mem_cgroup_from_cont(pcg);
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, nr_pages, &parent, false);
-	if (ret)
-		goto put_back;
+
+	if (!parent->use_hierarchy) {
+		ret = __mem_cgroup_try_charge(NULL, gfp_mask,
+					nr_pages, &parent, false);
+		if (ret)
+			goto put_back;
+	}
 
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, true);
-	if (ret)
-		__mem_cgroup_cancel_charge(parent, nr_pages);
+	if (!parent->use_hierarchy) {
+		ret = mem_cgroup_move_account(page, nr_pages, pc,
+					child, parent, true);
+		if (ret)
+			__mem_cgroup_cancel_charge(parent, nr_pages);
+	} else {
+		ret = mem_cgroup_move_account(page, nr_pages, pc,
+					child, parent, false);
+		if (!ret)
+			__mem_cgroup_move_charge_parent(child, nr_pages);
+	}
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 2/7] memcg: move charge to parent only when necessary.
@ 2012-04-12 11:21   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton


When memcg->use_hierarchy==true, the parent res_counter includes
the usage in child's usage. So, it's not necessary to call try_charge()
in the parent.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/memcontrol.c |   39 ++++++++++++++++++++++++++++++++-------
 1 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fa01106..3215880 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
 			res_counter_uncharge(&memcg->memsw, bytes);
 	}
 }
+/*
+ * Moving usage between a child to its parent if use_hierarchy==true.
+ */
+static void __mem_cgroup_move_charge_parent(struct mem_cgroup *memcg,
+					unsigned int nr_pages)
+{
+	if (!mem_cgroup_is_root(memcg)) {
+		unsigned long bytes = nr_pages * PAGE_SIZE;
+
+		res_counter_move_parent(&memcg->res, bytes);
+		if (do_swap_account)
+			res_counter_move_parent(&memcg->memsw, bytes);
+	}
+}
 
 /*
  * A helper function to get mem_cgroup from ID. must be called under
@@ -2666,18 +2680,29 @@ static int mem_cgroup_move_parent(struct page *page,
 		goto put;
 
 	nr_pages = hpage_nr_pages(page);
-
 	parent = mem_cgroup_from_cont(pcg);
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, nr_pages, &parent, false);
-	if (ret)
-		goto put_back;
+
+	if (!parent->use_hierarchy) {
+		ret = __mem_cgroup_try_charge(NULL, gfp_mask,
+					nr_pages, &parent, false);
+		if (ret)
+			goto put_back;
+	}
 
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, true);
-	if (ret)
-		__mem_cgroup_cancel_charge(parent, nr_pages);
+	if (!parent->use_hierarchy) {
+		ret = mem_cgroup_move_account(page, nr_pages, pc,
+					child, parent, true);
+		if (ret)
+			__mem_cgroup_cancel_charge(parent, nr_pages);
+	} else {
+		ret = mem_cgroup_move_account(page, nr_pages, pc,
+					child, parent, false);
+		if (!ret)
+			__mem_cgroup_move_charge_parent(child, nr_pages);
+	}
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 3/7] memcg: move charges to root at rmdir()
@ 2012-04-12 11:22   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

As recently discussed, Tejun Heo, the cgroup maintainer, tries to
remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir().

To do that, in memcg, handling case of use_hierarchy==false is a problem.

We move memcg's charges to its parent at rmdir(). If use_hierarchy==true,
it's already accounted in the parent, no problem. If use_hierarchy==false,
we cannot guarantee we can move all charges to the parent.

This patch changes the behavior to move all charges to root_mem_cgroup
if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship.

With this, we can remove -ENOMEM error check at pre_destroy(), called at rmdir.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/memory.txt |    6 ++++--
 mm/memcontrol.c                  |   38 ++++++++++++--------------------------
 2 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 84d4f00..f717f6a 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -377,8 +377,10 @@ cgroup might have some charge associated with it, even though all
 tasks have migrated away from it. (because we charge against pages, not
 against tasks.)
 
-Such charges are freed or moved to their parent. At moving, both of RSS
-and CACHES are moved to parent.
+Such charges are freed or moved to their parent if use_hierarchy==true.
+If use_hierarchy==false, charges are moved to root memory cgroup.
+
+At moving, both of RSS and CACHES are moved to parent.
 rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also.
 
 Charges recorded in swap information is not updated at removal of cgroup.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3215880..8246418 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2662,15 +2662,14 @@ static int mem_cgroup_move_parent(struct page *page,
 				  struct mem_cgroup *child,
 				  gfp_t gfp_mask)
 {
-	struct cgroup *cg = child->css.cgroup;
-	struct cgroup *pcg = cg->parent;
 	struct mem_cgroup *parent;
 	unsigned int nr_pages;
 	unsigned long uninitialized_var(flags);
+	bool need_cancel = false;
 	int ret;
 
 	/* Is ROOT ? */
-	if (!pcg)
+	if (mem_cgroup_is_root(child))
 		return -EINVAL;
 
 	ret = -EBUSY;
@@ -2680,33 +2679,23 @@ static int mem_cgroup_move_parent(struct page *page,
 		goto put;
 
 	nr_pages = hpage_nr_pages(page);
-	parent = mem_cgroup_from_cont(pcg);
-
-	if (!parent->use_hierarchy) {
-		ret = __mem_cgroup_try_charge(NULL, gfp_mask,
-					nr_pages, &parent, false);
-		if (ret)
-			goto put_back;
+	parent = parent_mem_cgroup(child);
+	if (!parent) {
+		parent = root_mem_cgroup;
+		need_cancel = true;
 	}
 
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	if (!parent->use_hierarchy) {
-		ret = mem_cgroup_move_account(page, nr_pages, pc,
-					child, parent, true);
-		if (ret)
-			__mem_cgroup_cancel_charge(parent, nr_pages);
-	} else {
-		ret = mem_cgroup_move_account(page, nr_pages, pc,
-					child, parent, false);
-		if (!ret)
-			__mem_cgroup_move_charge_parent(child, nr_pages);
-	}
+	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent,
+					need_cancel);
+	if (!need_cancel && !ret)
+		__mem_cgroup_move_charge_parent(child, nr_pages);
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
-put_back:
+
 	putback_lru_page(page);
 put:
 	put_page(page);
@@ -3677,7 +3666,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 		pc = lookup_page_cgroup(page);
 
 		ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
-		if (ret == -ENOMEM || ret == -EINTR)
+		if (ret == -EINTR)
 			break;
 
 		if (ret == -EBUSY || ret == -EINVAL) {
@@ -3738,9 +3727,6 @@ move_account:
 		}
 		mem_cgroup_end_move(memcg);
 		memcg_oom_recover(memcg);
-		/* it seems parent cgroup doesn't have enough mem */
-		if (ret == -ENOMEM)
-			goto try_to_free;
 		cond_resched();
 	/* "ret" should also be checked to ensure all lists are empty. */
 	} while (memcg->res.usage > 0 || ret);
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 3/7] memcg: move charges to root at rmdir()
@ 2012-04-12 11:22   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

As recently discussed, Tejun Heo, the cgroup maintainer, tries to
remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir().

To do that, in memcg, handling case of use_hierarchy==false is a problem.

We move memcg's charges to its parent at rmdir(). If use_hierarchy==true,
it's already accounted in the parent, no problem. If use_hierarchy==false,
we cannot guarantee we can move all charges to the parent.

This patch changes the behavior to move all charges to root_mem_cgroup
if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship.

With this, we can remove -ENOMEM error check at pre_destroy(), called at rmdir.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 Documentation/cgroups/memory.txt |    6 ++++--
 mm/memcontrol.c                  |   38 ++++++++++++--------------------------
 2 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 84d4f00..f717f6a 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -377,8 +377,10 @@ cgroup might have some charge associated with it, even though all
 tasks have migrated away from it. (because we charge against pages, not
 against tasks.)
 
-Such charges are freed or moved to their parent. At moving, both of RSS
-and CACHES are moved to parent.
+Such charges are freed or moved to their parent if use_hierarchy==true.
+If use_hierarchy==false, charges are moved to root memory cgroup.
+
+At moving, both of RSS and CACHES are moved to parent.
 rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also.
 
 Charges recorded in swap information is not updated at removal of cgroup.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3215880..8246418 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2662,15 +2662,14 @@ static int mem_cgroup_move_parent(struct page *page,
 				  struct mem_cgroup *child,
 				  gfp_t gfp_mask)
 {
-	struct cgroup *cg = child->css.cgroup;
-	struct cgroup *pcg = cg->parent;
 	struct mem_cgroup *parent;
 	unsigned int nr_pages;
 	unsigned long uninitialized_var(flags);
+	bool need_cancel = false;
 	int ret;
 
 	/* Is ROOT ? */
-	if (!pcg)
+	if (mem_cgroup_is_root(child))
 		return -EINVAL;
 
 	ret = -EBUSY;
@@ -2680,33 +2679,23 @@ static int mem_cgroup_move_parent(struct page *page,
 		goto put;
 
 	nr_pages = hpage_nr_pages(page);
-	parent = mem_cgroup_from_cont(pcg);
-
-	if (!parent->use_hierarchy) {
-		ret = __mem_cgroup_try_charge(NULL, gfp_mask,
-					nr_pages, &parent, false);
-		if (ret)
-			goto put_back;
+	parent = parent_mem_cgroup(child);
+	if (!parent) {
+		parent = root_mem_cgroup;
+		need_cancel = true;
 	}
 
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	if (!parent->use_hierarchy) {
-		ret = mem_cgroup_move_account(page, nr_pages, pc,
-					child, parent, true);
-		if (ret)
-			__mem_cgroup_cancel_charge(parent, nr_pages);
-	} else {
-		ret = mem_cgroup_move_account(page, nr_pages, pc,
-					child, parent, false);
-		if (!ret)
-			__mem_cgroup_move_charge_parent(child, nr_pages);
-	}
+	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent,
+					need_cancel);
+	if (!need_cancel && !ret)
+		__mem_cgroup_move_charge_parent(child, nr_pages);
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
-put_back:
+
 	putback_lru_page(page);
 put:
 	put_page(page);
@@ -3677,7 +3666,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 		pc = lookup_page_cgroup(page);
 
 		ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
-		if (ret == -ENOMEM || ret == -EINTR)
+		if (ret == -EINTR)
 			break;
 
 		if (ret == -EBUSY || ret == -EINVAL) {
@@ -3738,9 +3727,6 @@ move_account:
 		}
 		mem_cgroup_end_move(memcg);
 		memcg_oom_recover(memcg);
-		/* it seems parent cgroup doesn't have enough mem */
-		if (ret == -ENOMEM)
-			goto try_to_free;
 		cond_resched();
 	/* "ret" should also be checked to ensure all lists are empty. */
 	} while (memcg->res.usage > 0 || ret);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account()
@ 2012-04-12 11:24   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

Only one call passes 'true'. remove it and handle it in caller.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   29 ++++++++++++-----------------
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8246418..9ac7984 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2576,23 +2576,19 @@ void mem_cgroup_split_huge_fixup(struct page *head)
  * @pc:	page_cgroup of the page.
  * @from: mem_cgroup which the page is moved from.
  * @to:	mem_cgroup which the page is moved to. @from != @to.
- * @uncharge: whether we should call uncharge and css_put against @from.
  *
  * The caller must confirm following.
  * - page is not on LRU (isolate_page() is useful.)
  * - compound_lock is held when nr_pages > 1
  *
- * This function doesn't do "charge" nor css_get to new cgroup. It should be
- * done by a caller(__mem_cgroup_try_charge would be useful). If @uncharge is
- * true, this function does "uncharge" from old cgroup, but it doesn't if
- * @uncharge is false, so a caller should do "uncharge".
+ * This function doesn't access res_counter at all. Caller should take
+ * care of it.
  */
 static int mem_cgroup_move_account(struct page *page,
 				   unsigned int nr_pages,
 				   struct page_cgroup *pc,
 				   struct mem_cgroup *from,
-				   struct mem_cgroup *to,
-				   bool uncharge)
+				   struct mem_cgroup *to)
 {
 	unsigned long flags;
 	int ret;
@@ -2626,9 +2622,6 @@ static int mem_cgroup_move_account(struct page *page,
 		preempt_enable();
 	}
 	mem_cgroup_charge_statistics(from, anon, -nr_pages);
-	if (uncharge)
-		/* This is not "cancel", but cancel_charge does all we need. */
-		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
 	pc->mem_cgroup = to;
@@ -2688,10 +2681,13 @@ static int mem_cgroup_move_parent(struct page *page,
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent,
-					need_cancel);
-	if (!need_cancel && !ret)
-		__mem_cgroup_move_charge_parent(child, nr_pages);
+	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent);
+	if (!ret) {
+		if (need_cancel)
+			__mem_cgroup_cancel_charge(child, nr_pages);
+		else
+			__mem_cgroup_move_charge_parent(child, nr_pages);
+	}
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
@@ -5451,8 +5447,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 			if (!isolate_lru_page(page)) {
 				pc = lookup_page_cgroup(page);
 				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
-							     pc, mc.from, mc.to,
-							     false)) {
+							pc, mc.from, mc.to)) {
 					mc.precharge -= HPAGE_PMD_NR;
 					mc.moved_charge += HPAGE_PMD_NR;
 				}
@@ -5482,7 +5477,7 @@ retry:
 				goto put;
 			pc = lookup_page_cgroup(page);
 			if (!mem_cgroup_move_account(page, 1, pc,
-						     mc.from, mc.to, false)) {
+						     mc.from, mc.to)) {
 				mc.precharge--;
 				/* we uncharge from mc.from later. */
 				mc.moved_charge++;
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account()
@ 2012-04-12 11:24   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

Only one call passes 'true'. remove it and handle it in caller.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/memcontrol.c |   29 ++++++++++++-----------------
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8246418..9ac7984 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2576,23 +2576,19 @@ void mem_cgroup_split_huge_fixup(struct page *head)
  * @pc:	page_cgroup of the page.
  * @from: mem_cgroup which the page is moved from.
  * @to:	mem_cgroup which the page is moved to. @from != @to.
- * @uncharge: whether we should call uncharge and css_put against @from.
  *
  * The caller must confirm following.
  * - page is not on LRU (isolate_page() is useful.)
  * - compound_lock is held when nr_pages > 1
  *
- * This function doesn't do "charge" nor css_get to new cgroup. It should be
- * done by a caller(__mem_cgroup_try_charge would be useful). If @uncharge is
- * true, this function does "uncharge" from old cgroup, but it doesn't if
- * @uncharge is false, so a caller should do "uncharge".
+ * This function doesn't access res_counter at all. Caller should take
+ * care of it.
  */
 static int mem_cgroup_move_account(struct page *page,
 				   unsigned int nr_pages,
 				   struct page_cgroup *pc,
 				   struct mem_cgroup *from,
-				   struct mem_cgroup *to,
-				   bool uncharge)
+				   struct mem_cgroup *to)
 {
 	unsigned long flags;
 	int ret;
@@ -2626,9 +2622,6 @@ static int mem_cgroup_move_account(struct page *page,
 		preempt_enable();
 	}
 	mem_cgroup_charge_statistics(from, anon, -nr_pages);
-	if (uncharge)
-		/* This is not "cancel", but cancel_charge does all we need. */
-		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
 	pc->mem_cgroup = to;
@@ -2688,10 +2681,13 @@ static int mem_cgroup_move_parent(struct page *page,
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent,
-					need_cancel);
-	if (!need_cancel && !ret)
-		__mem_cgroup_move_charge_parent(child, nr_pages);
+	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent);
+	if (!ret) {
+		if (need_cancel)
+			__mem_cgroup_cancel_charge(child, nr_pages);
+		else
+			__mem_cgroup_move_charge_parent(child, nr_pages);
+	}
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
@@ -5451,8 +5447,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 			if (!isolate_lru_page(page)) {
 				pc = lookup_page_cgroup(page);
 				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
-							     pc, mc.from, mc.to,
-							     false)) {
+							pc, mc.from, mc.to)) {
 					mc.precharge -= HPAGE_PMD_NR;
 					mc.moved_charge += HPAGE_PMD_NR;
 				}
@@ -5482,7 +5477,7 @@ retry:
 				goto put;
 			pc = lookup_page_cgroup(page);
 			if (!mem_cgroup_move_account(page, 1, pc,
-						     mc.from, mc.to, false)) {
+						     mc.from, mc.to)) {
 				mc.precharge--;
 				/* we uncharge from mc.from later. */
 				mc.moved_charge++;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir
@ 2012-04-12 11:28   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

Now, at rmdir, memory cgroup's charge will be moved to
  - parent if use_hierarchy=1
  - root   if use_hierarchy=0

Then, we don't have to have memory reclaim code at destroying memcg.

This patch divides force_empty to 2 functions as

 - memory_cgroup_recharge() ... try to move all charges to ancestors.
 - memory_cgroup_force_empty().. try to reclaim all memory.

After this patch, memory.force_empty will _not_ move charges to ancestors
but just reclaim all pages. (This meets documenation.)

rmdir() will not reclaim any memory but moves charge to other cgroup,
parent or root.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   59 +++++++++++++++++++++++++++----------------------------
 1 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9ac7984..22c8faa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3619,10 +3619,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 }
 
 /*
- * This routine traverse page_cgroup in given list and drop them all.
- * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
+ * This routine traverse page in given list and move them all.
  */
-static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
+static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
 				int node, int zid, enum lru_list lru)
 {
 	struct mem_cgroup_per_zone *mz;
@@ -3678,24 +3677,12 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 	return ret;
 }
 
-/*
- * make mem_cgroup's charge to be 0 if there is no task.
- * This enables deleting this mem_cgroup.
- */
-static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
+
+static int mem_cgroup_recharge(struct mem_cgroup *memcg)
 {
-	int ret;
-	int node, zid, shrink;
-	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
+	int ret, node, zid;
 	struct cgroup *cgrp = memcg->css.cgroup;
 
-	css_get(&memcg->css);
-
-	shrink = 0;
-	/* should free all ? */
-	if (free_all)
-		goto try_to_free;
-move_account:
 	do {
 		ret = -EBUSY;
 		if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
@@ -3712,7 +3699,7 @@ move_account:
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
 				enum lru_list lru;
 				for_each_lru(lru) {
-					ret = mem_cgroup_force_empty_list(memcg,
+					ret = mem_cgroup_recharge_lru(memcg,
 							node, zid, lru);
 					if (ret)
 						break;
@@ -3722,24 +3709,33 @@ move_account:
 				break;
 		}
 		mem_cgroup_end_move(memcg);
-		memcg_oom_recover(memcg);
 		cond_resched();
 	/* "ret" should also be checked to ensure all lists are empty. */
 	} while (memcg->res.usage > 0 || ret);
 out:
-	css_put(&memcg->css);
 	return ret;
+}
+
+
+/*
+ * make mem_cgroup's charge to be 0 if there is no task. This is only called
+ * by memory.force_empty file, an user request.
+ */
+static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
+{
+	int ret = 0;
+	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
+	struct cgroup *cgrp = memcg->css.cgroup;
+
+	css_get(&memcg->css);
 
-try_to_free:
 	/* returns EBUSY if there is a task or if we come here twice. */
-	if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children) || shrink) {
+	if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) {
 		ret = -EBUSY;
 		goto out;
 	}
 	/* we call try-to-free pages for make this cgroup empty */
 	lru_add_drain_all();
-	/* try to free all pages in this cgroup */
-	shrink = 1;
 	while (nr_retries && memcg->res.usage > 0) {
 		int progress;
 
@@ -3754,16 +3750,19 @@ try_to_free:
 			/* maybe some writeback is necessary */
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 		}
-
 	}
-	lru_add_drain();
+	if (!nr_retries)
+		ret = -ENOMEM;
+out:
+	memcg_oom_recover(memcg);
+	css_put(&memcg->css);
 	/* try move_account...there may be some *locked* pages. */
-	goto move_account;
+	return ret;
 }
 
 int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int event)
 {
-	return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true);
+	return mem_cgroup_force_empty(mem_cgroup_from_cont(cont));
 }
 
 
@@ -4987,7 +4986,7 @@ static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	return mem_cgroup_recharge(memcg);
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir
@ 2012-04-12 11:28   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

Now, at rmdir, memory cgroup's charge will be moved to
  - parent if use_hierarchy=1
  - root   if use_hierarchy=0

Then, we don't have to have memory reclaim code at destroying memcg.

This patch divides force_empty to 2 functions as

 - memory_cgroup_recharge() ... try to move all charges to ancestors.
 - memory_cgroup_force_empty().. try to reclaim all memory.

After this patch, memory.force_empty will _not_ move charges to ancestors
but just reclaim all pages. (This meets documenation.)

rmdir() will not reclaim any memory but moves charge to other cgroup,
parent or root.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/memcontrol.c |   59 +++++++++++++++++++++++++++----------------------------
 1 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9ac7984..22c8faa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3619,10 +3619,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 }
 
 /*
- * This routine traverse page_cgroup in given list and drop them all.
- * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
+ * This routine traverse page in given list and move them all.
  */
-static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
+static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
 				int node, int zid, enum lru_list lru)
 {
 	struct mem_cgroup_per_zone *mz;
@@ -3678,24 +3677,12 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 	return ret;
 }
 
-/*
- * make mem_cgroup's charge to be 0 if there is no task.
- * This enables deleting this mem_cgroup.
- */
-static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
+
+static int mem_cgroup_recharge(struct mem_cgroup *memcg)
 {
-	int ret;
-	int node, zid, shrink;
-	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
+	int ret, node, zid;
 	struct cgroup *cgrp = memcg->css.cgroup;
 
-	css_get(&memcg->css);
-
-	shrink = 0;
-	/* should free all ? */
-	if (free_all)
-		goto try_to_free;
-move_account:
 	do {
 		ret = -EBUSY;
 		if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
@@ -3712,7 +3699,7 @@ move_account:
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
 				enum lru_list lru;
 				for_each_lru(lru) {
-					ret = mem_cgroup_force_empty_list(memcg,
+					ret = mem_cgroup_recharge_lru(memcg,
 							node, zid, lru);
 					if (ret)
 						break;
@@ -3722,24 +3709,33 @@ move_account:
 				break;
 		}
 		mem_cgroup_end_move(memcg);
-		memcg_oom_recover(memcg);
 		cond_resched();
 	/* "ret" should also be checked to ensure all lists are empty. */
 	} while (memcg->res.usage > 0 || ret);
 out:
-	css_put(&memcg->css);
 	return ret;
+}
+
+
+/*
+ * make mem_cgroup's charge to be 0 if there is no task. This is only called
+ * by memory.force_empty file, an user request.
+ */
+static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
+{
+	int ret = 0;
+	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
+	struct cgroup *cgrp = memcg->css.cgroup;
+
+	css_get(&memcg->css);
 
-try_to_free:
 	/* returns EBUSY if there is a task or if we come here twice. */
-	if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children) || shrink) {
+	if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) {
 		ret = -EBUSY;
 		goto out;
 	}
 	/* we call try-to-free pages for make this cgroup empty */
 	lru_add_drain_all();
-	/* try to free all pages in this cgroup */
-	shrink = 1;
 	while (nr_retries && memcg->res.usage > 0) {
 		int progress;
 
@@ -3754,16 +3750,19 @@ try_to_free:
 			/* maybe some writeback is necessary */
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 		}
-
 	}
-	lru_add_drain();
+	if (!nr_retries)
+		ret = -ENOMEM;
+out:
+	memcg_oom_recover(memcg);
+	css_put(&memcg->css);
 	/* try move_account...there may be some *locked* pages. */
-	goto move_account;
+	return ret;
 }
 
 int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int event)
 {
-	return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true);
+	return mem_cgroup_force_empty(mem_cgroup_from_cont(cont));
 }
 
 
@@ -4987,7 +4986,7 @@ static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	return mem_cgroup_recharge(memcg);
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 6/7] memcg: remove pre_destroy()
@ 2012-04-12 11:30   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to
prevent rmdir() from failure of EBUSY or some.

This patch removes pre_destroy() in memcg. All remaining charges
will be moved to other cgroup, without any failure,  ->destroy()
just schedule a work and it will destroy the memcg.
Then, rmdir will never fail. The kernel will take care of remaining
resources in the cgroup to be accounted correctly.

After this patch, memcg will be destroyed by workqueue in asynchrnous way.
Then, we can modify 'moving' logic to work asynchrnously, i.e,
we don't force users to wait for the end of rmdir(), now. We don't
need to use heavy synchronous calls. This patch modifies logics as

 - Use mem_cgroup_drain_stock_async rather tan drain_stock_sync.
 - lru_add_drain_all() will be called only when necessary, in a lazy way.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   52 ++++++++++++++++++++++------------------------------
 1 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 22c8faa..e466809 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -315,6 +315,8 @@ struct mem_cgroup {
 #ifdef CONFIG_INET
 	struct tcp_memcontrol tcp_mem;
 #endif
+
+	struct work_struct work_destroy;
 };
 
 /* Stuffs for move charges at task migration. */
@@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg)
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-/* This is a synchronous drain interface. */
 static void drain_all_stock_sync(struct mem_cgroup *root_memcg)
 {
 	/* called when force_empty is called */
@@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
 		pc = lookup_page_cgroup(page);
 
 		ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
-		if (ret == -EINTR)
-			break;
 
-		if (ret == -EBUSY || ret == -EINVAL) {
+		VM_BUG_ON(ret != 0 && ret != -EBUSY);
+		if (ret) {
 			/* found lock contention or "pc" is obsolete. */
 			busy = page;
 			cond_resched();
@@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
 	return ret;
 }
 
-
-static int mem_cgroup_recharge(struct mem_cgroup *memcg)
+/*
+ * This function is called after ->destroy(). So, we cannot access cgroup
+ * of this memcg.
+ */
+static void mem_cgroup_recharge(struct work_struct *work)
 {
+	struct mem_cgroup *memcg;
 	int ret, node, zid;
-	struct cgroup *cgrp = memcg->css.cgroup;
 
+	memcg = container_of(work, struct mem_cgroup, work_destroy);
+	/* No task points this memcg. call this only once */
+	drain_all_stock_async(memcg);
 	do {
-		ret = -EBUSY;
-		if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
-			goto out;
-		ret = -EINTR;
-		if (signal_pending(current))
-			goto out;
-		/* This is for making all *used* pages to be on LRU. */
-		lru_add_drain_all();
-		drain_all_stock_sync(memcg);
 		ret = 0;
 		mem_cgroup_start_move(memcg);
 		for_each_node_state(node, N_HIGH_MEMORY) {
@@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup *memcg)
 		}
 		mem_cgroup_end_move(memcg);
 		cond_resched();
-	/* "ret" should also be checked to ensure all lists are empty. */
-	} while (memcg->res.usage > 0 || ret);
-out:
-	return ret;
+		/* drain LRU only when we canoot find pages on LRU */
+		if (res_counter_read_u64(&memcg->res, RES_USAGE) &&
+		    !mem_cgroup_nr_lru_pages(memcg, LRU_ALL))
+			lru_add_drain_all();
+	} while (res_counter_read_u64(&memcg->res, RES_USAGE) || ret);
+	mem_cgroup_put(memcg);
 }
 
-
 /*
  * make mem_cgroup's charge to be 0 if there is no task. This is only called
  * by memory.force_empty file, an user request.
@@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work)
 	memcg = container_of(work, struct mem_cgroup, work_freeing);
 	vfree(memcg);
 }
+
 static void vfree_rcu(struct rcu_head *rcu_head)
 {
 	struct mem_cgroup *memcg;
@@ -4982,20 +4981,14 @@ free_out:
 	return ERR_PTR(error);
 }
 
-static int mem_cgroup_pre_destroy(struct cgroup *cont)
-{
-	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-
-	return mem_cgroup_recharge(memcg);
-}
-
 static void mem_cgroup_destroy(struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
 	kmem_cgroup_destroy(cont);
 
-	mem_cgroup_put(memcg);
+	INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge);
+	schedule_work(&memcg->work_destroy);
 }
 
 static int mem_cgroup_populate(struct cgroup_subsys *ss,
@@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.name = "memory",
 	.subsys_id = mem_cgroup_subsys_id,
 	.create = mem_cgroup_create,
-	.pre_destroy = mem_cgroup_pre_destroy,
 	.destroy = mem_cgroup_destroy,
 	.populate = mem_cgroup_populate,
 	.can_attach = mem_cgroup_can_attach,
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 6/7] memcg: remove pre_destroy()
@ 2012-04-12 11:30   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to
prevent rmdir() from failure of EBUSY or some.

This patch removes pre_destroy() in memcg. All remaining charges
will be moved to other cgroup, without any failure,  ->destroy()
just schedule a work and it will destroy the memcg.
Then, rmdir will never fail. The kernel will take care of remaining
resources in the cgroup to be accounted correctly.

After this patch, memcg will be destroyed by workqueue in asynchrnous way.
Then, we can modify 'moving' logic to work asynchrnously, i.e,
we don't force users to wait for the end of rmdir(), now. We don't
need to use heavy synchronous calls. This patch modifies logics as

 - Use mem_cgroup_drain_stock_async rather tan drain_stock_sync.
 - lru_add_drain_all() will be called only when necessary, in a lazy way.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/memcontrol.c |   52 ++++++++++++++++++++++------------------------------
 1 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 22c8faa..e466809 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -315,6 +315,8 @@ struct mem_cgroup {
 #ifdef CONFIG_INET
 	struct tcp_memcontrol tcp_mem;
 #endif
+
+	struct work_struct work_destroy;
 };
 
 /* Stuffs for move charges at task migration. */
@@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg)
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-/* This is a synchronous drain interface. */
 static void drain_all_stock_sync(struct mem_cgroup *root_memcg)
 {
 	/* called when force_empty is called */
@@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
 		pc = lookup_page_cgroup(page);
 
 		ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
-		if (ret == -EINTR)
-			break;
 
-		if (ret == -EBUSY || ret == -EINVAL) {
+		VM_BUG_ON(ret != 0 && ret != -EBUSY);
+		if (ret) {
 			/* found lock contention or "pc" is obsolete. */
 			busy = page;
 			cond_resched();
@@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
 	return ret;
 }
 
-
-static int mem_cgroup_recharge(struct mem_cgroup *memcg)
+/*
+ * This function is called after ->destroy(). So, we cannot access cgroup
+ * of this memcg.
+ */
+static void mem_cgroup_recharge(struct work_struct *work)
 {
+	struct mem_cgroup *memcg;
 	int ret, node, zid;
-	struct cgroup *cgrp = memcg->css.cgroup;
 
+	memcg = container_of(work, struct mem_cgroup, work_destroy);
+	/* No task points this memcg. call this only once */
+	drain_all_stock_async(memcg);
 	do {
-		ret = -EBUSY;
-		if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
-			goto out;
-		ret = -EINTR;
-		if (signal_pending(current))
-			goto out;
-		/* This is for making all *used* pages to be on LRU. */
-		lru_add_drain_all();
-		drain_all_stock_sync(memcg);
 		ret = 0;
 		mem_cgroup_start_move(memcg);
 		for_each_node_state(node, N_HIGH_MEMORY) {
@@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup *memcg)
 		}
 		mem_cgroup_end_move(memcg);
 		cond_resched();
-	/* "ret" should also be checked to ensure all lists are empty. */
-	} while (memcg->res.usage > 0 || ret);
-out:
-	return ret;
+		/* drain LRU only when we canoot find pages on LRU */
+		if (res_counter_read_u64(&memcg->res, RES_USAGE) &&
+		    !mem_cgroup_nr_lru_pages(memcg, LRU_ALL))
+			lru_add_drain_all();
+	} while (res_counter_read_u64(&memcg->res, RES_USAGE) || ret);
+	mem_cgroup_put(memcg);
 }
 
-
 /*
  * make mem_cgroup's charge to be 0 if there is no task. This is only called
  * by memory.force_empty file, an user request.
@@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work)
 	memcg = container_of(work, struct mem_cgroup, work_freeing);
 	vfree(memcg);
 }
+
 static void vfree_rcu(struct rcu_head *rcu_head)
 {
 	struct mem_cgroup *memcg;
@@ -4982,20 +4981,14 @@ free_out:
 	return ERR_PTR(error);
 }
 
-static int mem_cgroup_pre_destroy(struct cgroup *cont)
-{
-	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-
-	return mem_cgroup_recharge(memcg);
-}
-
 static void mem_cgroup_destroy(struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
 	kmem_cgroup_destroy(cont);
 
-	mem_cgroup_put(memcg);
+	INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge);
+	schedule_work(&memcg->work_destroy);
 }
 
 static int mem_cgroup_populate(struct cgroup_subsys *ss,
@@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.name = "memory",
 	.subsys_id = mem_cgroup_subsys_id,
 	.create = mem_cgroup_create,
-	.pre_destroy = mem_cgroup_pre_destroy,
 	.destroy = mem_cgroup_destroy,
 	.populate = mem_cgroup_populate,
 	.can_attach = mem_cgroup_can_attach,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 7/7] memcg: remove drain_all_stock_sync.
@ 2012-04-12 11:31   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

Because a function moving pages to ancestor works asynchronously now,
drain_all_stock_sync() is unnecessary.

Signed-off-by: KAMEAZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   22 ++--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e466809..d42811b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2053,7 +2053,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
  * of the hierarchy under it. sync flag says whether we should block
  * until the work is done.
  */
-static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
+static void drain_all_stock(struct mem_cgroup *root_memcg)
 {
 	int cpu, curcpu;
 
@@ -2077,16 +2077,6 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
 		}
 	}
 	put_cpu();
-
-	if (!sync)
-		goto out;
-
-	for_each_online_cpu(cpu) {
-		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
-			flush_work(&stock->work);
-	}
-out:
  	put_online_cpus();
 }
 
@@ -2103,15 +2093,7 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg)
 	 */
 	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
-	drain_all_stock(root_memcg, false);
-	mutex_unlock(&percpu_charge_mutex);
-}
-
-static void drain_all_stock_sync(struct mem_cgroup *root_memcg)
-{
-	/* called when force_empty is called */
-	mutex_lock(&percpu_charge_mutex);
-	drain_all_stock(root_memcg, true);
+	drain_all_stock(root_memcg);
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 7/7] memcg: remove drain_all_stock_sync.
@ 2012-04-12 11:31   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 11:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

Because a function moving pages to ancestor works asynchronously now,
drain_all_stock_sync() is unnecessary.

Signed-off-by: KAMEAZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/memcontrol.c |   22 ++--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e466809..d42811b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2053,7 +2053,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
  * of the hierarchy under it. sync flag says whether we should block
  * until the work is done.
  */
-static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
+static void drain_all_stock(struct mem_cgroup *root_memcg)
 {
 	int cpu, curcpu;
 
@@ -2077,16 +2077,6 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
 		}
 	}
 	put_cpu();
-
-	if (!sync)
-		goto out;
-
-	for_each_online_cpu(cpu) {
-		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
-			flush_work(&stock->work);
-	}
-out:
  	put_online_cpus();
 }
 
@@ -2103,15 +2093,7 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg)
 	 */
 	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
-	drain_all_stock(root_memcg, false);
-	mutex_unlock(&percpu_charge_mutex);
-}
-
-static void drain_all_stock_sync(struct mem_cgroup *root_memcg)
-{
-	/* called when force_empty is called */
-	mutex_lock(&percpu_charge_mutex);
-	drain_all_stock(root_memcg, true);
+	drain_all_stock(root_memcg);
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 13:20   ` Glauber Costa
  0 siblings, 0 replies; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Hugh Dickins, Andrew Morton

On 04/12/2012 08:17 AM, KAMEZAWA Hiroyuki wrote:
> One of problem in current implementation is that memcg moves all charges to
> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may
> hit parent's limit and may return -EBUSY. To fix this problem, this patch
> changes behavior of rmdir() as
> 
>   - if use_hierarchy=0, all remaining charges will go to root cgroup.
>   - if use_hierarchy=1, all remaining charges will go to the parent.
To be quite honest, this is one of those things that we end up
overlooking, and just don't think about it in the middle of the complexity.

Now that you mention it... When use_hierarchy=0,  there is no parent!
(At least from where memcg is concerned). So it doesn't make any sense
to have it ever have moved it to the "parent" (from the core cgroup
perspective).

I agree with this new behavior 100 %.

Just a nitpick: When use_hierarchy=1, remaining charges need not to "go
to the parent". They are already there.

I will review your series for the specifics.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 13:20   ` Glauber Costa
  0 siblings, 0 replies; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Hugh Dickins,
	Andrew Morton

On 04/12/2012 08:17 AM, KAMEZAWA Hiroyuki wrote:
> One of problem in current implementation is that memcg moves all charges to
> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may
> hit parent's limit and may return -EBUSY. To fix this problem, this patch
> changes behavior of rmdir() as
> 
>   - if use_hierarchy=0, all remaining charges will go to root cgroup.
>   - if use_hierarchy=1, all remaining charges will go to the parent.
To be quite honest, this is one of those things that we end up
overlooking, and just don't think about it in the middle of the complexity.

Now that you mention it... When use_hierarchy=0,  there is no parent!
(At least from where memcg is concerned). So it doesn't make any sense
to have it ever have moved it to the "parent" (from the core cgroup
perspective).

I agree with this new behavior 100 %.

Just a nitpick: When use_hierarchy=1, remaining charges need not to "go
to the parent". They are already there.

I will review your series for the specifics.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
  2012-04-12 11:20   ` KAMEZAWA Hiroyuki
  (?)
@ 2012-04-12 13:22   ` Glauber Costa
  2012-04-12 14:30     ` Frederic Weisbecker
  -1 siblings, 1 reply; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Hugh Dickins, Andrew Morton, Frederic Weisbecker

On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
> 
> This function is used for moving accounting information to its
> parent in the hierarchy of res_counter.
>
> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>

Frederic has a patch in his fork cgroup series, that allows you to
uncharge a counter until you reach a specific ancestor.
You pass the parent as a parameter, and then only you gets uncharged.

I think that is a much better interface than this you are proposing.
We should probably merge that patch and use it.

> ---
>   include/linux/res_counter.h |    3 +++
>   kernel/res_counter.c        |   13 +++++++++++++
>   2 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index da81af0..8919d3c 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
> 
> +/* move resource to parent counter...i.e. just forget accounting in a child */
> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
> +
>   /**
>    * res_counter_margin - calculate chargeable space of a counter
>    * @cnt: the counter
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index d508363..fafebf0 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
>   	local_irq_restore(flags);
>   }
> 
> +/*
> + * In hierarchical accounting, child's usage is accounted into ancestors.
> + * To move local usage to its parent, just forget current level usage.
> + */
> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
> +{
> +	unsigned long flags;
> +
> +	BUG_ON(!counter->parent);
> +	spin_lock_irqsave(&counter->lock, flags);
> +	res_counter_uncharge_locked(counter, val);
> +	spin_unlock_irqrestore(&counter->lock, flags);
> +}
> 
>   static inline unsigned long long *
>   res_counter_member(struct res_counter *counter, int member)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account()
  2012-04-12 11:24   ` KAMEZAWA Hiroyuki
  (?)
@ 2012-04-12 13:27   ` Glauber Costa
  2012-04-13  1:01     ` KAMEZAWA Hiroyuki
  -1 siblings, 1 reply; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Hugh Dickins, Andrew Morton

On 04/12/2012 08:24 AM, KAMEZAWA Hiroyuki wrote:
> Only one call passes 'true'. remove it and handle it in caller.
> 
> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
I like the change. I won't ack the patch itself, though, because it has
a dependency with the "need_cancel" thing you introduced in your last
patch - that I need to think a bit more.

> ---
>   mm/memcontrol.c |   29 ++++++++++++-----------------
>   1 files changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8246418..9ac7984 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2576,23 +2576,19 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>    * @pc:	page_cgroup of the page.
>    * @from: mem_cgroup which the page is moved from.
>    * @to:	mem_cgroup which the page is moved to. @from != @to.
> - * @uncharge: whether we should call uncharge and css_put against @from.
>    *
>    * The caller must confirm following.
>    * - page is not on LRU (isolate_page() is useful.)
>    * - compound_lock is held when nr_pages>  1
>    *
> - * This function doesn't do "charge" nor css_get to new cgroup. It should be
> - * done by a caller(__mem_cgroup_try_charge would be useful). If @uncharge is
> - * true, this function does "uncharge" from old cgroup, but it doesn't if
> - * @uncharge is false, so a caller should do "uncharge".
> + * This function doesn't access res_counter at all. Caller should take
> + * care of it.
>    */
>   static int mem_cgroup_move_account(struct page *page,
>   				   unsigned int nr_pages,
>   				   struct page_cgroup *pc,
>   				   struct mem_cgroup *from,
> -				   struct mem_cgroup *to,
> -				   bool uncharge)
> +				   struct mem_cgroup *to)
>   {
>   	unsigned long flags;
>   	int ret;
> @@ -2626,9 +2622,6 @@ static int mem_cgroup_move_account(struct page *page,
>   		preempt_enable();
>   	}
>   	mem_cgroup_charge_statistics(from, anon, -nr_pages);
> -	if (uncharge)
> -		/* This is not "cancel", but cancel_charge does all we need. */
> -		__mem_cgroup_cancel_charge(from, nr_pages);
> 
>   	/* caller should have done css_get */
>   	pc->mem_cgroup = to;
> @@ -2688,10 +2681,13 @@ static int mem_cgroup_move_parent(struct page *page,
>   	if (nr_pages>  1)
>   		flags = compound_lock_irqsave(page);
> 
> -	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent,
> -					need_cancel);
> -	if (!need_cancel&&  !ret)
> -		__mem_cgroup_move_charge_parent(child, nr_pages);
> +	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent);
> +	if (!ret) {
> +		if (need_cancel)
> +			__mem_cgroup_cancel_charge(child, nr_pages);
> +		else
> +			__mem_cgroup_move_charge_parent(child, nr_pages);
> +	}
> 
>   	if (nr_pages>  1)
>   		compound_unlock_irqrestore(page, flags);
> @@ -5451,8 +5447,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>   			if (!isolate_lru_page(page)) {
>   				pc = lookup_page_cgroup(page);
>   				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> -							     pc, mc.from, mc.to,
> -							     false)) {
> +							pc, mc.from, mc.to)) {
>   					mc.precharge -= HPAGE_PMD_NR;
>   					mc.moved_charge += HPAGE_PMD_NR;
>   				}
> @@ -5482,7 +5477,7 @@ retry:
>   				goto put;
>   			pc = lookup_page_cgroup(page);
>   			if (!mem_cgroup_move_account(page, 1, pc,
> -						     mc.from, mc.to, false)) {
> +						     mc.from, mc.to)) {
>   				mc.precharge--;
>   				/* we uncharge from mc.from later. */
>   				mc.moved_charge++;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir
  2012-04-12 11:28   ` KAMEZAWA Hiroyuki
  (?)
@ 2012-04-12 13:33   ` Glauber Costa
  -1 siblings, 0 replies; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Hugh Dickins, Andrew Morton

On 04/12/2012 08:28 AM, KAMEZAWA Hiroyuki wrote:
> Now, at rmdir, memory cgroup's charge will be moved to
>    - parent if use_hierarchy=1
>    - root   if use_hierarchy=0
> 
> Then, we don't have to have memory reclaim code at destroying memcg.
> 
> This patch divides force_empty to 2 functions as
> 
>   - memory_cgroup_recharge() ... try to move all charges to ancestors.
>   - memory_cgroup_force_empty().. try to reclaim all memory.
> 
> After this patch, memory.force_empty will _not_ move charges to ancestors
> but just reclaim all pages. (This meets documenation.)
> 
> rmdir() will not reclaim any memory but moves charge to other cgroup,
> parent or root.
> 
> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>

Seems fine by me...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 7/7] memcg: remove drain_all_stock_sync.
@ 2012-04-12 13:35     ` Glauber Costa
  0 siblings, 0 replies; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Hugh Dickins, Andrew Morton

On 04/12/2012 08:31 AM, KAMEZAWA Hiroyuki wrote:
> Because a function moving pages to ancestor works asynchronously now,
> drain_all_stock_sync() is unnecessary.
> 
> Signed-off-by: KAMEAZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Glauber Costa <glommer@parallels.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 7/7] memcg: remove drain_all_stock_sync.
@ 2012-04-12 13:35     ` Glauber Costa
  0 siblings, 0 replies; 67+ messages in thread
From: Glauber Costa @ 2012-04-12 13:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Hugh Dickins,
	Andrew Morton

On 04/12/2012 08:31 AM, KAMEZAWA Hiroyuki wrote:
> Because a function moving pages to ancestor works asynchronously now,
> drain_all_stock_sync() is unnecessary.
> 
> Signed-off-by: KAMEAZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Reviewed-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
  2012-04-12 13:22   ` Glauber Costa
@ 2012-04-12 14:30     ` Frederic Weisbecker
  2012-04-13  0:57         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 67+ messages in thread
From: Frederic Weisbecker @ 2012-04-12 14:30 UTC (permalink / raw)
  To: Glauber Costa
  Cc: KAMEZAWA Hiroyuki, linux-mm, cgroups, Michal Hocko,
	Johannes Weiner, Tejun Heo, Hugh Dickins, Andrew Morton

2012/4/12 Glauber Costa <glommer@parallels.com>:
> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
>>
>> This function is used for moving accounting information to its
>> parent in the hierarchy of res_counter.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Frederic has a patch in his fork cgroup series, that allows you to
> uncharge a counter until you reach a specific ancestor.
> You pass the parent as a parameter, and then only you gets uncharged.

I'm missing the referring patchset from Kamezawa. Ok I'm going to
subscribe to the
cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc
LKML for cgroup patches?

Some comments below:

>
> I think that is a much better interface than this you are proposing.
> We should probably merge that patch and use it.
>
>> ---
>>   include/linux/res_counter.h |    3 +++
>>   kernel/res_counter.c        |   13 +++++++++++++
>>   2 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>> index da81af0..8919d3c 100644
>> --- a/include/linux/res_counter.h
>> +++ b/include/linux/res_counter.h
>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>>
>> +/* move resource to parent counter...i.e. just forget accounting in a child */
>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>> +
>>   /**
>>    * res_counter_margin - calculate chargeable space of a counter
>>    * @cnt: the counter
>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>> index d508363..fafebf0 100644
>> --- a/kernel/res_counter.c
>> +++ b/kernel/res_counter.c
>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
>>       local_irq_restore(flags);
>>   }
>>
>> +/*
>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>> + * To move local usage to its parent, just forget current level usage.

The way I understand this comment and the changelog matches the opposite
of what the below function is doing.

The function charges a child and ignore all its parents. The comments says it
charges the parents but not the child.

>> + */
>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>> +{
>> +     unsigned long flags;
>> +
>> +     BUG_ON(!counter->parent);
>> +     spin_lock_irqsave(&counter->lock, flags);
>> +     res_counter_uncharge_locked(counter, val);
>> +     spin_unlock_irqrestore(&counter->lock, flags);
>> +}
>>
>>   static inline unsigned long long *
>>   res_counter_member(struct res_counter *counter, int member)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 16:06   ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-12 16:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

Hello, KAMEZAWA.

Thanks a lot for doing this.

On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.

Just to clarify, I'm not intending to ->pre_destroy() per-se but the
retry behavior of it, so ->pre_destroy() will be converted to return
void and called once on rmdir and rmdir will proceed no matter what.
Also, with the deprecated behavior flag set, pre_destroy() doesn't
trigger the warning message.

Other than that, if memcg people are fine with the change, I'll be
happy to route the changes through cgroup/for-3.5 and stack rmdir
simplification patches on top.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 16:06   ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-12 16:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

Hello, KAMEZAWA.

Thanks a lot for doing this.

On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.

Just to clarify, I'm not intending to ->pre_destroy() per-se but the
retry behavior of it, so ->pre_destroy() will be converted to return
void and called once on rmdir and rmdir will proceed no matter what.
Also, with the deprecated behavior flag set, pre_destroy() doesn't
trigger the warning message.

Other than that, if memcg people are fine with the change, I'll be
happy to route the changes through cgroup/for-3.5 and stack rmdir
simplification patches on top.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 18:57     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 67+ messages in thread
From: Aneesh Kumar K.V @ 2012-04-12 18:57 UTC (permalink / raw)
  To: Tejun Heo, KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

Tejun Heo <tj@kernel.org> writes:

> Hello, KAMEZAWA.
>
> Thanks a lot for doing this.
>
> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>
> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
> retry behavior of it, so ->pre_destroy() will be converted to return
> void and called once on rmdir and rmdir will proceed no matter what.
> Also, with the deprecated behavior flag set, pre_destroy() doesn't
> trigger the warning message.
>
> Other than that, if memcg people are fine with the change, I'll be
> happy to route the changes through cgroup/for-3.5 and stack rmdir
> simplification patches on top.
>

Any suggestion on how to take HugeTLB memcg extension patches [1]
upstream. Current patch series I have is on top of cgroup/for-3.5
because I need cgroup_add_files equivalent and cgroup/for-3.5 have
changes around that. So if these memcg patches can also go on top of
cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?

Can HugeTLB memcg extension patches also go via this tree ? It
should actually got via -mm. But then how do we take care of these
dependencies ?

[1]  http://thread.gmane.org/gmane.linux.kernel.cgroups/1517

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-12 18:57     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 67+ messages in thread
From: Aneesh Kumar K.V @ 2012-04-12 18:57 UTC (permalink / raw)
  To: Tejun Heo, KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, Michal Hocko,
	Johannes Weiner, Glauber Costa, Hugh Dickins, Andrew Morton

Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> writes:

> Hello, KAMEZAWA.
>
> Thanks a lot for doing this.
>
> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>
> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
> retry behavior of it, so ->pre_destroy() will be converted to return
> void and called once on rmdir and rmdir will proceed no matter what.
> Also, with the deprecated behavior flag set, pre_destroy() doesn't
> trigger the warning message.
>
> Other than that, if memcg people are fine with the change, I'll be
> happy to route the changes through cgroup/for-3.5 and stack rmdir
> simplification patches on top.
>

Any suggestion on how to take HugeTLB memcg extension patches [1]
upstream. Current patch series I have is on top of cgroup/for-3.5
because I need cgroup_add_files equivalent and cgroup/for-3.5 have
changes around that. So if these memcg patches can also go on top of
cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?

Can HugeTLB memcg extension patches also go via this tree ? It
should actually got via -mm. But then how do we take care of these
dependencies ?

[1]  http://thread.gmane.org/gmane.linux.kernel.cgroups/1517

-aneesh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
  2012-04-12 18:57     ` Aneesh Kumar K.V
  (?)
@ 2012-04-12 23:59     ` KAMEZAWA Hiroyuki
  2012-04-13  8:50         ` Michal Hocko
  -1 siblings, 1 reply; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-12 23:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Tejun Heo, linux-mm, cgroups, Michal Hocko, Johannes Weiner,
	Glauber Costa, Hugh Dickins, Andrew Morton

(2012/04/13 3:57), Aneesh Kumar K.V wrote:

> Tejun Heo <tj@kernel.org> writes:
> 
>> Hello, KAMEZAWA.
>>
>> Thanks a lot for doing this.
>>
>> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
>>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>>
>> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
>> retry behavior of it, so ->pre_destroy() will be converted to return
>> void and called once on rmdir and rmdir will proceed no matter what.
>> Also, with the deprecated behavior flag set, pre_destroy() doesn't
>> trigger the warning message.
>>
>> Other than that, if memcg people are fine with the change, I'll be
>> happy to route the changes through cgroup/for-3.5 and stack rmdir
>> simplification patches on top.
>>
> 
> Any suggestion on how to take HugeTLB memcg extension patches [1]
> upstream. Current patch series I have is on top of cgroup/for-3.5
> because I need cgroup_add_files equivalent and cgroup/for-3.5 have
> changes around that. So if these memcg patches can also go on top of
> cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?
> 
> Can HugeTLB memcg extension patches also go via this tree ? It
> should actually got via -mm. But then how do we take care of these
> dependencies ?
> 


I'm not in hurry. To be honest, I cannot update patches until the next Wednesday.
So, If changes of cgroup tree you required are included in linux-next. Please post
your updated ones. I thought your latest version was near to be merged....

How do you think, Michal ?
Please post (and ask Andrew to pull it.) I'll review when I can.

I know yours and mine has some conflicts. I think my this series will
be onto your series. To do that, I hope your series are merged to linux-next, 1st.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-13  0:57         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  0:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Glauber Costa, linux-mm, cgroups, Michal Hocko, Johannes Weiner,
	Tejun Heo, Hugh Dickins, Andrew Morton

(2012/04/12 23:30), Frederic Weisbecker wrote:

> 2012/4/12 Glauber Costa <glommer@parallels.com>:
>> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
>>>
>>> This function is used for moving accounting information to its
>>> parent in the hierarchy of res_counter.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>
>> Frederic has a patch in his fork cgroup series, that allows you to
>> uncharge a counter until you reach a specific ancestor.
>> You pass the parent as a parameter, and then only you gets uncharged.
> 
> I'm missing the referring patchset from Kamezawa. Ok I'm going to
> subscribe to the
> cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc
> LKML for cgroup patches?
> 

Ah, sorry. I will do next time.

> Some comments below:
> 
>>
>> I think that is a much better interface than this you are proposing.
>> We should probably merge that patch and use it.
>>
>>> ---
>>>   include/linux/res_counter.h |    3 +++
>>>   kernel/res_counter.c        |   13 +++++++++++++
>>>   2 files changed, 16 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>>> index da81af0..8919d3c 100644
>>> --- a/include/linux/res_counter.h
>>> +++ b/include/linux/res_counter.h
>>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>>>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>>>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>>>
>>> +/* move resource to parent counter...i.e. just forget accounting in a child */
>>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>>> +
>>>   /**
>>>    * res_counter_margin - calculate chargeable space of a counter
>>>    * @cnt: the counter
>>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>>> index d508363..fafebf0 100644
>>> --- a/kernel/res_counter.c
>>> +++ b/kernel/res_counter.c
>>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
>>>       local_irq_restore(flags);
>>>   }
>>>
>>> +/*
>>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>>> + * To move local usage to its parent, just forget current level usage.
> 
> The way I understand this comment and the changelog matches the opposite
> of what the below function is doing.
> 
> The function charges a child and ignore all its parents. The comments says it
> charges the parents but not the child.
> 


Sure, I'll fix...and look into your code first.
"uncharge a counter until you reach a specific ancestor"...
Is it in linux-next ?

Thanks,
-Kame


>>> + */
>>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>>> +{
>>> +     unsigned long flags;
>>> +
>>> +     BUG_ON(!counter->parent);
>>> +     spin_lock_irqsave(&counter->lock, flags);
>>> +     res_counter_uncharge_locked(counter, val);
>>> +     spin_unlock_irqrestore(&counter->lock, flags);
>>> +}
>>>
>>>   static inline unsigned long long *
>>>   res_counter_member(struct res_counter *counter, int member)
>>
> 
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-13  0:57         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  0:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Johannes Weiner,
	Tejun Heo, Hugh Dickins, Andrew Morton

(2012/04/12 23:30), Frederic Weisbecker wrote:

> 2012/4/12 Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>:
>> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
>>>
>>> This function is used for moving accounting information to its
>>> parent in the hierarchy of res_counter.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>>
>> Frederic has a patch in his fork cgroup series, that allows you to
>> uncharge a counter until you reach a specific ancestor.
>> You pass the parent as a parameter, and then only you gets uncharged.
> 
> I'm missing the referring patchset from Kamezawa. Ok I'm going to
> subscribe to the
> cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc
> LKML for cgroup patches?
> 

Ah, sorry. I will do next time.

> Some comments below:
> 
>>
>> I think that is a much better interface than this you are proposing.
>> We should probably merge that patch and use it.
>>
>>> ---
>>>   include/linux/res_counter.h |    3 +++
>>>   kernel/res_counter.c        |   13 +++++++++++++
>>>   2 files changed, 16 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>>> index da81af0..8919d3c 100644
>>> --- a/include/linux/res_counter.h
>>> +++ b/include/linux/res_counter.h
>>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>>>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>>>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>>>
>>> +/* move resource to parent counter...i.e. just forget accounting in a child */
>>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>>> +
>>>   /**
>>>    * res_counter_margin - calculate chargeable space of a counter
>>>    * @cnt: the counter
>>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>>> index d508363..fafebf0 100644
>>> --- a/kernel/res_counter.c
>>> +++ b/kernel/res_counter.c
>>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
>>>       local_irq_restore(flags);
>>>   }
>>>
>>> +/*
>>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>>> + * To move local usage to its parent, just forget current level usage.
> 
> The way I understand this comment and the changelog matches the opposite
> of what the below function is doing.
> 
> The function charges a child and ignore all its parents. The comments says it
> charges the parents but not the child.
> 


Sure, I'll fix...and look into your code first.
"uncharge a counter until you reach a specific ancestor"...
Is it in linux-next ?

Thanks,
-Kame


>>> + */
>>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>>> +{
>>> +     unsigned long flags;
>>> +
>>> +     BUG_ON(!counter->parent);
>>> +     spin_lock_irqsave(&counter->lock, flags);
>>> +     res_counter_uncharge_locked(counter, val);
>>> +     spin_unlock_irqrestore(&counter->lock, flags);
>>> +}
>>>
>>>   static inline unsigned long long *
>>>   res_counter_member(struct res_counter *counter, int member)
>>
> 
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account()
  2012-04-12 13:27   ` Glauber Costa
@ 2012-04-13  1:01     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  1:01 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Hugh Dickins, Andrew Morton

(2012/04/12 22:27), Glauber Costa wrote:

> On 04/12/2012 08:24 AM, KAMEZAWA Hiroyuki wrote:
>> Only one call passes 'true'. remove it and handle it in caller.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> I like the change. I won't ack the patch itself, though, because it has
> a dependency with the "need_cancel" thing you introduced in your last
> patch - that I need to think a bit more.
> 


I'll check the logic again. Firstly, maybe name of the variable is wrong..

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-13  1:04           ` Frederic Weisbecker
  0 siblings, 0 replies; 67+ messages in thread
From: Frederic Weisbecker @ 2012-04-13  1:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Glauber Costa, linux-mm, cgroups, Michal Hocko, Johannes Weiner,
	Tejun Heo, Hugh Dickins, Andrew Morton

On Fri, Apr 13, 2012 at 09:57:03AM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/12 23:30), Frederic Weisbecker wrote:
> 
> > 2012/4/12 Glauber Costa <glommer@parallels.com>:
> >> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
> >>>
> >>> This function is used for moving accounting information to its
> >>> parent in the hierarchy of res_counter.
> >>>
> >>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> >>
> >> Frederic has a patch in his fork cgroup series, that allows you to
> >> uncharge a counter until you reach a specific ancestor.
> >> You pass the parent as a parameter, and then only you gets uncharged.
> > 
> > I'm missing the referring patchset from Kamezawa. Ok I'm going to
> > subscribe to the
> > cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc
> > LKML for cgroup patches?
> > 
> 
> Ah, sorry. I will do next time.
> 
> > Some comments below:
> > 
> >>
> >> I think that is a much better interface than this you are proposing.
> >> We should probably merge that patch and use it.
> >>
> >>> ---
> >>>   include/linux/res_counter.h |    3 +++
> >>>   kernel/res_counter.c        |   13 +++++++++++++
> >>>   2 files changed, 16 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> >>> index da81af0..8919d3c 100644
> >>> --- a/include/linux/res_counter.h
> >>> +++ b/include/linux/res_counter.h
> >>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
> >>>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
> >>>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
> >>>
> >>> +/* move resource to parent counter...i.e. just forget accounting in a child */
> >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
> >>> +
> >>>   /**
> >>>    * res_counter_margin - calculate chargeable space of a counter
> >>>    * @cnt: the counter
> >>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> >>> index d508363..fafebf0 100644
> >>> --- a/kernel/res_counter.c
> >>> +++ b/kernel/res_counter.c
> >>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> >>>       local_irq_restore(flags);
> >>>   }
> >>>
> >>> +/*
> >>> + * In hierarchical accounting, child's usage is accounted into ancestors.
> >>> + * To move local usage to its parent, just forget current level usage.
> > 
> > The way I understand this comment and the changelog matches the opposite
> > of what the below function is doing.
> > 
> > The function charges a child and ignore all its parents. The comments says it
> > charges the parents but not the child.
> > 
> 
> 
> Sure, I'll fix...and look into your code first.
> "uncharge a counter until you reach a specific ancestor"...
> Is it in linux-next ?

No, it's part of the task counter so it's still out of tree.
You were Cc'ed but if you can't find it in your inbox I can
resend it:

http://thread.gmane.org/gmane.linux.kernel.containers/22378

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-13  1:04           ` Frederic Weisbecker
  0 siblings, 0 replies; 67+ messages in thread
From: Frederic Weisbecker @ 2012-04-13  1:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Johannes Weiner,
	Tejun Heo, Hugh Dickins, Andrew Morton

On Fri, Apr 13, 2012 at 09:57:03AM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/12 23:30), Frederic Weisbecker wrote:
> 
> > 2012/4/12 Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>:
> >> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
> >>>
> >>> This function is used for moving accounting information to its
> >>> parent in the hierarchy of res_counter.
> >>>
> >>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> >>
> >> Frederic has a patch in his fork cgroup series, that allows you to
> >> uncharge a counter until you reach a specific ancestor.
> >> You pass the parent as a parameter, and then only you gets uncharged.
> > 
> > I'm missing the referring patchset from Kamezawa. Ok I'm going to
> > subscribe to the
> > cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc
> > LKML for cgroup patches?
> > 
> 
> Ah, sorry. I will do next time.
> 
> > Some comments below:
> > 
> >>
> >> I think that is a much better interface than this you are proposing.
> >> We should probably merge that patch and use it.
> >>
> >>> ---
> >>>   include/linux/res_counter.h |    3 +++
> >>>   kernel/res_counter.c        |   13 +++++++++++++
> >>>   2 files changed, 16 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> >>> index da81af0..8919d3c 100644
> >>> --- a/include/linux/res_counter.h
> >>> +++ b/include/linux/res_counter.h
> >>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
> >>>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
> >>>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
> >>>
> >>> +/* move resource to parent counter...i.e. just forget accounting in a child */
> >>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
> >>> +
> >>>   /**
> >>>    * res_counter_margin - calculate chargeable space of a counter
> >>>    * @cnt: the counter
> >>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> >>> index d508363..fafebf0 100644
> >>> --- a/kernel/res_counter.c
> >>> +++ b/kernel/res_counter.c
> >>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> >>>       local_irq_restore(flags);
> >>>   }
> >>>
> >>> +/*
> >>> + * In hierarchical accounting, child's usage is accounted into ancestors.
> >>> + * To move local usage to its parent, just forget current level usage.
> > 
> > The way I understand this comment and the changelog matches the opposite
> > of what the below function is doing.
> > 
> > The function charges a child and ignore all its parents. The comments says it
> > charges the parents but not the child.
> > 
> 
> 
> Sure, I'll fix...and look into your code first.
> "uncharge a counter until you reach a specific ancestor"...
> Is it in linux-next ?

No, it's part of the task counter so it's still out of tree.
You were Cc'ed but if you can't find it in your inbox I can
resend it:

http://thread.gmane.org/gmane.linux.kernel.containers/22378

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
  2012-04-13  1:04           ` Frederic Weisbecker
  (?)
@ 2012-04-13  1:05           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-13  1:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Glauber Costa, linux-mm, cgroups, Michal Hocko, Johannes Weiner,
	Tejun Heo, Hugh Dickins, Andrew Morton

(2012/04/13 10:04), Frederic Weisbecker wrote:

> On Fri, Apr 13, 2012 at 09:57:03AM +0900, KAMEZAWA Hiroyuki wrote:
>> (2012/04/12 23:30), Frederic Weisbecker wrote:
>>
>>> 2012/4/12 Glauber Costa <glommer@parallels.com>:
>>>> On 04/12/2012 08:20 AM, KAMEZAWA Hiroyuki wrote:
>>>>>
>>>>> This function is used for moving accounting information to its
>>>>> parent in the hierarchy of res_counter.
>>>>>
>>>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>>>
>>>> Frederic has a patch in his fork cgroup series, that allows you to
>>>> uncharge a counter until you reach a specific ancestor.
>>>> You pass the parent as a parameter, and then only you gets uncharged.
>>>
>>> I'm missing the referring patchset from Kamezawa. Ok I'm going to
>>> subscribe to the
>>> cgroup mailing list. Meanwhile perhaps would it be nice to keep Cc
>>> LKML for cgroup patches?
>>>
>>
>> Ah, sorry. I will do next time.
>>
>>> Some comments below:
>>>
>>>>
>>>> I think that is a much better interface than this you are proposing.
>>>> We should probably merge that patch and use it.
>>>>
>>>>> ---
>>>>>   include/linux/res_counter.h |    3 +++
>>>>>   kernel/res_counter.c        |   13 +++++++++++++
>>>>>   2 files changed, 16 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>>>>> index da81af0..8919d3c 100644
>>>>> --- a/include/linux/res_counter.h
>>>>> +++ b/include/linux/res_counter.h
>>>>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>>>>>   void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>>>>>   void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>>>>>
>>>>> +/* move resource to parent counter...i.e. just forget accounting in a child */
>>>>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>>>>> +
>>>>>   /**
>>>>>    * res_counter_margin - calculate chargeable space of a counter
>>>>>    * @cnt: the counter
>>>>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>>>>> index d508363..fafebf0 100644
>>>>> --- a/kernel/res_counter.c
>>>>> +++ b/kernel/res_counter.c
>>>>> @@ -113,6 +113,19 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
>>>>>       local_irq_restore(flags);
>>>>>   }
>>>>>
>>>>> +/*
>>>>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>>>>> + * To move local usage to its parent, just forget current level usage.
>>>
>>> The way I understand this comment and the changelog matches the opposite
>>> of what the below function is doing.
>>>
>>> The function charges a child and ignore all its parents. The comments says it
>>> charges the parents but not the child.
>>>
>>
>>
>> Sure, I'll fix...and look into your code first.
>> "uncharge a counter until you reach a specific ancestor"...
>> Is it in linux-next ?
> 
> No, it's part of the task counter so it's still out of tree.
> You were Cc'ed but if you can't find it in your inbox I can
> resend it:
> 
> http://thread.gmane.org/gmane.linux.kernel.containers/22378
> 


Ah, ok. I found it. Thank you!.
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-13  8:50         ` Michal Hocko
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Hocko @ 2012-04-13  8:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Aneesh Kumar K.V, Tejun Heo, linux-mm, cgroups, Johannes Weiner,
	Glauber Costa, Hugh Dickins, Andrew Morton

On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote:
> (2012/04/13 3:57), Aneesh Kumar K.V wrote:
> 
> > Tejun Heo <tj@kernel.org> writes:
> > 
> >> Hello, KAMEZAWA.
> >>
> >> Thanks a lot for doing this.
> >>
> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
> >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
> >>
> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
> >> retry behavior of it, so ->pre_destroy() will be converted to return
> >> void and called once on rmdir and rmdir will proceed no matter what.
> >> Also, with the deprecated behavior flag set, pre_destroy() doesn't
> >> trigger the warning message.
> >>
> >> Other than that, if memcg people are fine with the change, I'll be
> >> happy to route the changes through cgroup/for-3.5 and stack rmdir
> >> simplification patches on top.
> >>
> > 
> > Any suggestion on how to take HugeTLB memcg extension patches [1]
> > upstream. Current patch series I have is on top of cgroup/for-3.5
> > because I need cgroup_add_files equivalent and cgroup/for-3.5 have
> > changes around that. So if these memcg patches can also go on top of
> > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?

I would suggest working on top of memcg-devel tree or on top linux-next.
Just pull the required patch-es from cgroup/for-3.5 tree before your
work (I can include that into memcg-devel tree for you if you want).

Do you think this is a 3.5 material? I would rather wait some more. I
didn't have time to look over it yet and there are still some unresolved
issues so it sounds like too early for merging.

> > Can HugeTLB memcg extension patches also go via this tree ? It
> > should actually got via -mm. But then how do we take care of these
> > dependencies ?

You are not changing anything generic from cgroup so definitely go via
Andrew.

> I'm not in hurry. To be honest, I cannot update patches until the next Wednesday.
> So, If changes of cgroup tree you required are included in linux-next. Please post
> your updated ones. I thought your latest version was near to be merged....
> 
> How do you think, Michal ?
> Please post (and ask Andrew to pull it.) I'll review when I can.

I would wait with pulling the patch after the review.

> I know yours and mine has some conflicts. I think my this series will
> be onto your series. To do that, I hope your series are merged to
> linux-next, 1st.
> 
> 
> Thanks,
> -Kame
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-13  8:50         ` Michal Hocko
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Hocko @ 2012-04-13  8:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Aneesh Kumar K.V, Tejun Heo, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote:
> (2012/04/13 3:57), Aneesh Kumar K.V wrote:
> 
> > Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> writes:
> > 
> >> Hello, KAMEZAWA.
> >>
> >> Thanks a lot for doing this.
> >>
> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
> >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
> >>
> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
> >> retry behavior of it, so ->pre_destroy() will be converted to return
> >> void and called once on rmdir and rmdir will proceed no matter what.
> >> Also, with the deprecated behavior flag set, pre_destroy() doesn't
> >> trigger the warning message.
> >>
> >> Other than that, if memcg people are fine with the change, I'll be
> >> happy to route the changes through cgroup/for-3.5 and stack rmdir
> >> simplification patches on top.
> >>
> > 
> > Any suggestion on how to take HugeTLB memcg extension patches [1]
> > upstream. Current patch series I have is on top of cgroup/for-3.5
> > because I need cgroup_add_files equivalent and cgroup/for-3.5 have
> > changes around that. So if these memcg patches can also go on top of
> > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?

I would suggest working on top of memcg-devel tree or on top linux-next.
Just pull the required patch-es from cgroup/for-3.5 tree before your
work (I can include that into memcg-devel tree for you if you want).

Do you think this is a 3.5 material? I would rather wait some more. I
didn't have time to look over it yet and there are still some unresolved
issues so it sounds like too early for merging.

> > Can HugeTLB memcg extension patches also go via this tree ? It
> > should actually got via -mm. But then how do we take care of these
> > dependencies ?

You are not changing anything generic from cgroup so definitely go via
Andrew.

> I'm not in hurry. To be honest, I cannot update patches until the next Wednesday.
> So, If changes of cgroup tree you required are included in linux-next. Please post
> your updated ones. I thought your latest version was near to be merged....
> 
> How do you think, Michal ?
> Please post (and ask Andrew to pull it.) I'll review when I can.

I would wait with pulling the patch after the review.

> I know yours and mine has some conflicts. I think my this series will
> be onto your series. To do that, I hope your series are merged to
> linux-next, 1st.
> 
> 
> Thanks,
> -Kame
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-13 22:19           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 67+ messages in thread
From: Aneesh Kumar K.V @ 2012-04-13 22:19 UTC (permalink / raw)
  To: Michal Hocko, KAMEZAWA Hiroyuki
  Cc: Tejun Heo, linux-mm, cgroups, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote:
>> (2012/04/13 3:57), Aneesh Kumar K.V wrote:
>> 
>> > Tejun Heo <tj@kernel.org> writes:
>> > 
>> >> Hello, KAMEZAWA.
>> >>
>> >> Thanks a lot for doing this.
>> >>
>> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
>> >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>> >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>> >>
>> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
>> >> retry behavior of it, so ->pre_destroy() will be converted to return
>> >> void and called once on rmdir and rmdir will proceed no matter what.
>> >> Also, with the deprecated behavior flag set, pre_destroy() doesn't
>> >> trigger the warning message.
>> >>
>> >> Other than that, if memcg people are fine with the change, I'll be
>> >> happy to route the changes through cgroup/for-3.5 and stack rmdir
>> >> simplification patches on top.
>> >>
>> > 
>> > Any suggestion on how to take HugeTLB memcg extension patches [1]
>> > upstream. Current patch series I have is on top of cgroup/for-3.5
>> > because I need cgroup_add_files equivalent and cgroup/for-3.5 have
>> > changes around that. So if these memcg patches can also go on top of
>> > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?
>
> I would suggest working on top of memcg-devel tree or on top linux-next.
> Just pull the required patch-es from cgroup/for-3.5 tree before your
> work (I can include that into memcg-devel tree for you if you want).

I am expecting to have no conflicts with pending memcg changes. But I do
have conflicts with cgroup/for-3.5. That is the reason I decided to
rebase on top of cgroup/for-3.5. 


>
> Do you think this is a 3.5 material? I would rather wait some more. I
> didn't have time to look over it yet and there are still some unresolved
> issues so it sounds like too early for merging.


I would really like to get it merged for 3.5. I am ready to post V6 that
address all review feedback from V5 post. 


>
>> > Can HugeTLB memcg extension patches also go via this tree ? It
>> > should actually got via -mm. But then how do we take care of these
>> > dependencies ?
>
> You are not changing anything generic from cgroup so definitely go via
> Andrew.
>

agreed.


>> I'm not in hurry. To be honest, I cannot update patches until the next Wednesday.
>> So, If changes of cgroup tree you required are included in linux-next. Please post
>> your updated ones. I thought your latest version was near to be merged....
>> 
>> How do you think, Michal ?
>> Please post (and ask Andrew to pull it.) I'll review when I can.
>
> I would wait with pulling the patch after the review.
>

agreed. So I will do a v6 post and if we all agree with the changes it
can be pulled via -mm ?


>> I know yours and mine has some conflicts. I think my this series will
>> be onto your series. To do that, I hope your series are merged to
>> linux-next, 1st.
>> 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-13 22:19           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 67+ messages in thread
From: Aneesh Kumar K.V @ 2012-04-13 22:19 UTC (permalink / raw)
  To: Michal Hocko, KAMEZAWA Hiroyuki
  Cc: Tejun Heo, linux-mm@kvack.org, cgroups@vger.kernel.org,
	Johannes Weiner, Glauber Costa, Hugh Dickins, Andrew Morton

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Fri 13-04-12 08:59:44, KAMEZAWA Hiroyuki wrote:
>> (2012/04/13 3:57), Aneesh Kumar K.V wrote:
>> 
>> > Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> writes:
>> > 
>> >> Hello, KAMEZAWA.
>> >>
>> >> Thanks a lot for doing this.
>> >>
>> >> On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
>> >>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>> >>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>> >>
>> >> Just to clarify, I'm not intending to ->pre_destroy() per-se but the
>> >> retry behavior of it, so ->pre_destroy() will be converted to return
>> >> void and called once on rmdir and rmdir will proceed no matter what.
>> >> Also, with the deprecated behavior flag set, pre_destroy() doesn't
>> >> trigger the warning message.
>> >>
>> >> Other than that, if memcg people are fine with the change, I'll be
>> >> happy to route the changes through cgroup/for-3.5 and stack rmdir
>> >> simplification patches on top.
>> >>
>> > 
>> > Any suggestion on how to take HugeTLB memcg extension patches [1]
>> > upstream. Current patch series I have is on top of cgroup/for-3.5
>> > because I need cgroup_add_files equivalent and cgroup/for-3.5 have
>> > changes around that. So if these memcg patches can also go on top of
>> > cgroup/for-3.5 then I can continue to work on top of cgroup/for-3.5 ?
>
> I would suggest working on top of memcg-devel tree or on top linux-next.
> Just pull the required patch-es from cgroup/for-3.5 tree before your
> work (I can include that into memcg-devel tree for you if you want).

I am expecting to have no conflicts with pending memcg changes. But I do
have conflicts with cgroup/for-3.5. That is the reason I decided to
rebase on top of cgroup/for-3.5. 


>
> Do you think this is a 3.5 material? I would rather wait some more. I
> didn't have time to look over it yet and there are still some unresolved
> issues so it sounds like too early for merging.


I would really like to get it merged for 3.5. I am ready to post V6 that
address all review feedback from V5 post. 


>
>> > Can HugeTLB memcg extension patches also go via this tree ? It
>> > should actually got via -mm. But then how do we take care of these
>> > dependencies ?
>
> You are not changing anything generic from cgroup so definitely go via
> Andrew.
>

agreed.


>> I'm not in hurry. To be honest, I cannot update patches until the next Wednesday.
>> So, If changes of cgroup tree you required are included in linux-next. Please post
>> your updated ones. I thought your latest version was near to be merged....
>> 
>> How do you think, Michal ?
>> Please post (and ask Andrew to pull it.) I'll review when I can.
>
> I would wait with pulling the patch after the review.
>

agreed. So I will do a v6 post and if we all agree with the changes it
can be pulled via -mm ?


>> I know yours and mine has some conflicts. I think my this series will
>> be onto your series. To do that, I hope your series are merged to
>> linux-next, 1st.
>> 

-aneesh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-16 22:19     ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> This function is used for moving accounting information to its
> parent in the hierarchy of res_counter.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/res_counter.h |    3 +++
>  kernel/res_counter.c        |   13 +++++++++++++
>  2 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index da81af0..8919d3c 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>  void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>  void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>  
> +/* move resource to parent counter...i.e. just forget accounting in a child */

Can we drop this comment and

> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>  
> +/*
> + * In hierarchical accounting, child's usage is accounted into ancestors.
> + * To move local usage to its parent, just forget current level usage.
> + */

make this one proper docbook function comment?

> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
> +{
> +	unsigned long flags;
> +
> +	BUG_ON(!counter->parent);

And let's please do "if (WARN_ON(!counter->parent)) return;" instead.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-16 22:19     ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> This function is used for moving accounting information to its
> parent in the hierarchy of res_counter.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  include/linux/res_counter.h |    3 +++
>  kernel/res_counter.c        |   13 +++++++++++++
>  2 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index da81af0..8919d3c 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>  void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>  void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>  
> +/* move resource to parent counter...i.e. just forget accounting in a child */

Can we drop this comment and

> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>  
> +/*
> + * In hierarchical accounting, child's usage is accounted into ancestors.
> + * To move local usage to its parent, just forget current level usage.
> + */

make this one proper docbook function comment?

> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
> +{
> +	unsigned long flags;
> +
> +	BUG_ON(!counter->parent);

And let's please do "if (WARN_ON(!counter->parent)) return;" instead.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/7] memcg: move charge to parent only when necessary.
@ 2012-04-16 22:21     ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> When memcg->use_hierarchy==true, the parent res_counter includes
> the usage in child's usage. So, it's not necessary to call try_charge()
> in the parent.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   39 ++++++++++++++++++++++++++++++++-------
>  1 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fa01106..3215880 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
>  			res_counter_uncharge(&memcg->memsw, bytes);
>  	}
>  }

New line missing here.

> +/*
> + * Moving usage between a child to its parent if use_hierarchy==true.
> + */

Prolly "from a child to its parent"?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/7] memcg: move charge to parent only when necessary.
@ 2012-04-16 22:21     ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> When memcg->use_hierarchy==true, the parent res_counter includes
> the usage in child's usage. So, it's not necessary to call try_charge()
> in the parent.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/memcontrol.c |   39 ++++++++++++++++++++++++++++++++-------
>  1 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fa01106..3215880 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
>  			res_counter_uncharge(&memcg->memsw, bytes);
>  	}
>  }

New line missing here.

> +/*
> + * Moving usage between a child to its parent if use_hierarchy==true.
> + */

Prolly "from a child to its parent"?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/7] memcg: move charges to root at rmdir()
@ 2012-04-16 22:30     ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

Hello,

On Thu, Apr 12, 2012 at 08:22:42PM +0900, KAMEZAWA Hiroyuki wrote:
> As recently discussed, Tejun Heo, the cgroup maintainer, tries to
> remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir().

I'm not trying to remove ->pre_destory() per-se.  I want to remove css
ref draining and ->pre_destroy() vetoing cgroup removal.  Probably
better wording would be "tries to simplify removal path such that
removal always succeeds".

> To do that, in memcg, handling case of use_hierarchy==false is a problem.
> 
> We move memcg's charges to its parent at rmdir(). If use_hierarchy==true,
> it's already accounted in the parent, no problem. If use_hierarchy==false,
> we cannot guarantee we can move all charges to the parent.
> 
> This patch changes the behavior to move all charges to root_mem_cgroup
> if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship.

Maybe better to break the above line?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/7] memcg: move charges to root at rmdir()
@ 2012-04-16 22:30     ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

Hello,

On Thu, Apr 12, 2012 at 08:22:42PM +0900, KAMEZAWA Hiroyuki wrote:
> As recently discussed, Tejun Heo, the cgroup maintainer, tries to
> remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir().

I'm not trying to remove ->pre_destory() per-se.  I want to remove css
ref draining and ->pre_destroy() vetoing cgroup removal.  Probably
better wording would be "tries to simplify removal path such that
removal always succeeds".

> To do that, in memcg, handling case of use_hierarchy==false is a problem.
> 
> We move memcg's charges to its parent at rmdir(). If use_hierarchy==true,
> it's already accounted in the parent, no problem. If use_hierarchy==false,
> we cannot guarantee we can move all charges to the parent.
> 
> This patch changes the behavior to move all charges to root_mem_cgroup
> if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship.

Maybe better to break the above line?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
  2012-04-12 11:20   ` KAMEZAWA Hiroyuki
                     ` (2 preceding siblings ...)
  (?)
@ 2012-04-16 22:31   ` Tejun Heo
  2012-04-18  7:04       ` KAMEZAWA Hiroyuki
  -1 siblings, 1 reply; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
> +/*
> + * In hierarchical accounting, child's usage is accounted into ancestors.
> + * To move local usage to its parent, just forget current level usage.
> + */
> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
> +{
> +	unsigned long flags;
> +
> +	BUG_ON(!counter->parent);
> +	spin_lock_irqsave(&counter->lock, flags);
> +	res_counter_uncharge_locked(counter, val);
> +	spin_unlock_irqrestore(&counter->lock, flags);
> +}

On the second thought, do we need this at all?  It's as good as doing
nothing after all, no?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/7] memcg: remove pre_destroy()
  2012-04-12 11:30   ` KAMEZAWA Hiroyuki
  (?)
@ 2012-04-16 22:38   ` Tejun Heo
  2012-04-18  7:12       ` KAMEZAWA Hiroyuki
  -1 siblings, 1 reply; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

Hello,

On Thu, Apr 12, 2012 at 08:30:22PM +0900, KAMEZAWA Hiroyuki wrote:
> +/*
> + * This function is called after ->destroy(). So, we cannot access cgroup
> + * of this memcg.
> + */
> +static void mem_cgroup_recharge(struct work_struct *work)

So, ->pre_destroy per-se isn't gonna go away.  It's just gonna be this
callback which cgroup core uses to unilaterally notify that the cgroup
is going away, so no need to do this cleanup asynchronously from
->destroy().  It's okay to keep doing it synchronously from
->pre_destroy().  The only thing is that it can't fail.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-16 22:41   ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

Hello, KAMEZAWA.

On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.

I did a pretty shallow review of the patchset and other than the
unnecessary async destruction, my complaints are mostly trivial.  Ooh,
and the patchset doesn't apply cleanly on top of cgroup/for-3.5.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.5

Thank you!

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-16 22:41   ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-16 22:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

Hello, KAMEZAWA.

On Thu, Apr 12, 2012 at 08:17:18PM +0900, KAMEZAWA Hiroyuki wrote:
> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.

I did a pretty shallow review of the patchset and other than the
unnecessary async destruction, my complaints are mostly trivial.  Ooh,
and the patchset doesn't apply cleanly on top of cgroup/for-3.5.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.5

Thank you!

-- 
tejun

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir
  2012-04-12 11:28   ` KAMEZAWA Hiroyuki
  (?)
  (?)
@ 2012-04-17 17:29   ` Ying Han
  2012-04-18  7:14       ` KAMEZAWA Hiroyuki
  -1 siblings, 1 reply; 67+ messages in thread
From: Ying Han @ 2012-04-17 17:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 4:28 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Now, at rmdir, memory cgroup's charge will be moved to
>  - parent if use_hierarchy=1
>  - root   if use_hierarchy=0
>
> Then, we don't have to have memory reclaim code at destroying memcg.
>
> This patch divides force_empty to 2 functions as
>
>  - memory_cgroup_recharge() ... try to move all charges to ancestors.
>  - memory_cgroup_force_empty().. try to reclaim all memory.
>
> After this patch, memory.force_empty will _not_ move charges to ancestors
> but just reclaim all pages. (This meets documenation.)

Not sure why it matches the documentation:
"
memory.force_empty>---->------- # trigger forced move charge to parent
"

and
"
  # echo 0 > memory.force_empty

  Almost all pages tracked by this memory cgroup will be unmapped and freed.
  Some pages cannot be freed because they are locked or in-use. Such pages are
  moved to parent and this cgroup will be empty. This may return -EBUSY if
  VM is too busy to free/move all pages immediately.
"

--Ying

>
> rmdir() will not reclaim any memory but moves charge to other cgroup,
> parent or root.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   59 +++++++++++++++++++++++++++----------------------------
>  1 files changed, 29 insertions(+), 30 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ac7984..22c8faa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3619,10 +3619,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
>  }
>
>  /*
> - * This routine traverse page_cgroup in given list and drop them all.
> - * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
> + * This routine traverse page in given list and move them all.
>  */
> -static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> +static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
>                                int node, int zid, enum lru_list lru)
>  {
>        struct mem_cgroup_per_zone *mz;
> @@ -3678,24 +3677,12 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>        return ret;
>  }
>
> -/*
> - * make mem_cgroup's charge to be 0 if there is no task.
> - * This enables deleting this mem_cgroup.
> - */
> -static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
> +
> +static int mem_cgroup_recharge(struct mem_cgroup *memcg)
>  {
> -       int ret;
> -       int node, zid, shrink;
> -       int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
> +       int ret, node, zid;
>        struct cgroup *cgrp = memcg->css.cgroup;
>
> -       css_get(&memcg->css);
> -
> -       shrink = 0;
> -       /* should free all ? */
> -       if (free_all)
> -               goto try_to_free;
> -move_account:
>        do {
>                ret = -EBUSY;
>                if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
> @@ -3712,7 +3699,7 @@ move_account:
>                        for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
>                                enum lru_list lru;
>                                for_each_lru(lru) {
> -                                       ret = mem_cgroup_force_empty_list(memcg,
> +                                       ret = mem_cgroup_recharge_lru(memcg,
>                                                        node, zid, lru);
>                                        if (ret)
>                                                break;
> @@ -3722,24 +3709,33 @@ move_account:
>                                break;
>                }
>                mem_cgroup_end_move(memcg);
> -               memcg_oom_recover(memcg);
>                cond_resched();
>        /* "ret" should also be checked to ensure all lists are empty. */
>        } while (memcg->res.usage > 0 || ret);
>  out:
> -       css_put(&memcg->css);
>        return ret;
> +}
> +
> +
> +/*
> + * make mem_cgroup's charge to be 0 if there is no task. This is only called
> + * by memory.force_empty file, an user request.
> + */
> +static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
> +{
> +       int ret = 0;
> +       int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
> +       struct cgroup *cgrp = memcg->css.cgroup;
> +
> +       css_get(&memcg->css);
>
> -try_to_free:
>        /* returns EBUSY if there is a task or if we come here twice. */
> -       if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children) || shrink) {
> +       if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children)) {
>                ret = -EBUSY;
>                goto out;
>        }
>        /* we call try-to-free pages for make this cgroup empty */
>        lru_add_drain_all();
> -       /* try to free all pages in this cgroup */
> -       shrink = 1;
>        while (nr_retries && memcg->res.usage > 0) {
>                int progress;
>
> @@ -3754,16 +3750,19 @@ try_to_free:
>                        /* maybe some writeback is necessary */
>                        congestion_wait(BLK_RW_ASYNC, HZ/10);
>                }
> -
>        }
> -       lru_add_drain();
> +       if (!nr_retries)
> +               ret = -ENOMEM;
> +out:
> +       memcg_oom_recover(memcg);
> +       css_put(&memcg->css);
>        /* try move_account...there may be some *locked* pages. */
> -       goto move_account;
> +       return ret;
>  }
>
>  int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int event)
>  {
> -       return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true);
> +       return mem_cgroup_force_empty(mem_cgroup_from_cont(cont));
>  }
>
>
> @@ -4987,7 +4986,7 @@ static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
>        struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> -       return mem_cgroup_force_empty(memcg, false);
> +       return mem_cgroup_recharge(memcg);
>  }
>
>  static void mem_cgroup_destroy(struct cgroup *cont)
> --
> 1.7.4.1
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
  2012-04-12 11:17 [PATCH v1 0/7] memcg remove pre_destroy KAMEZAWA Hiroyuki
                   ` (9 preceding siblings ...)
  2012-04-16 22:41   ` Tejun Heo
@ 2012-04-17 17:35 ` Ying Han
  2012-04-18  7:15     ` KAMEZAWA Hiroyuki
  10 siblings, 1 reply; 67+ messages in thread
From: Ying Han @ 2012-04-17 17:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 4:17 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>
> By pre_destroy(), rmdir of cgroup can return -EBUSY or some error.
> It makes cgroup complicated and unstable. I said O.K. to remove it and
> this patch is modification for memcg.
>
> One of problem in current implementation is that memcg moves all charges to
> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may
> hit parent's limit and may return -EBUSY. To fix this problem, this patch
> changes behavior of rmdir() as
>
>  - if use_hierarchy=0, all remaining charges will go to root cgroup.
>  - if use_hierarchy=1, all remaining charges will go to the parent.


We need to update the "4.3 Removing a cgroup" session in Documentation.

--Ying

> By this, rmdir failure will not be caused by parent's limitation. And
> I think this meets meaning of use_hierarchy.
>
> This series does
>  - add above change of behavior
>  - use workqueue to move all pages to parent
>  - remove unnecessary codes.
>
> I'm sorry if my reply is delayed, I'm not sure I can have enough time in
> this weekend. Any comments are welcomed.
>
> Thanks,
> -Kame
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/7] memcg: remove pre_destroy()
@ 2012-04-17 17:47     ` Ying Han
  0 siblings, 0 replies; 67+ messages in thread
From: Ying Han @ 2012-04-17 17:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 4:30 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to
> prevent rmdir() from failure of EBUSY or some.
>
> This patch removes pre_destroy() in memcg. All remaining charges
> will be moved to other cgroup, without any failure,  ->destroy()
> just schedule a work and it will destroy the memcg.
> Then, rmdir will never fail. The kernel will take care of remaining
> resources in the cgroup to be accounted correctly.
>
> After this patch, memcg will be destroyed by workqueue in asynchrnous way.

Is it necessary to change the destroy asynchronously?

Frankly, i don't that that much. It will leave the system in a
deterministic state on admin perspective. The current synchronous
destroy works fine, and admin can rely on that w/ charging change
after the destroy returns.

--Ying

> Then, we can modify 'moving' logic to work asynchrnously, i.e,
> we don't force users to wait for the end of rmdir(), now. We don't
> need to use heavy synchronous calls. This patch modifies logics as
>
>  - Use mem_cgroup_drain_stock_async rather tan drain_stock_sync.
>  - lru_add_drain_all() will be called only when necessary, in a lazy way.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   52 ++++++++++++++++++++++------------------------------
>  1 files changed, 22 insertions(+), 30 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 22c8faa..e466809 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -315,6 +315,8 @@ struct mem_cgroup {
>  #ifdef CONFIG_INET
>        struct tcp_memcontrol tcp_mem;
>  #endif
> +
> +       struct work_struct work_destroy;
>  };
>
>  /* Stuffs for move charges at task migration. */
> @@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg)
>        mutex_unlock(&percpu_charge_mutex);
>  }
>
> -/* This is a synchronous drain interface. */
>  static void drain_all_stock_sync(struct mem_cgroup *root_memcg)
>  {
>        /* called when force_empty is called */
> @@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
>                pc = lookup_page_cgroup(page);
>
>                ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
> -               if (ret == -EINTR)
> -                       break;
>
> -               if (ret == -EBUSY || ret == -EINVAL) {
> +               VM_BUG_ON(ret != 0 && ret != -EBUSY);
> +               if (ret) {
>                        /* found lock contention or "pc" is obsolete. */
>                        busy = page;
>                        cond_resched();
> @@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
>        return ret;
>  }
>
> -
> -static int mem_cgroup_recharge(struct mem_cgroup *memcg)
> +/*
> + * This function is called after ->destroy(). So, we cannot access cgroup
> + * of this memcg.
> + */
> +static void mem_cgroup_recharge(struct work_struct *work)
>  {
> +       struct mem_cgroup *memcg;
>        int ret, node, zid;
> -       struct cgroup *cgrp = memcg->css.cgroup;
>
> +       memcg = container_of(work, struct mem_cgroup, work_destroy);
> +       /* No task points this memcg. call this only once */
> +       drain_all_stock_async(memcg);
>        do {
> -               ret = -EBUSY;
> -               if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
> -                       goto out;
> -               ret = -EINTR;
> -               if (signal_pending(current))
> -                       goto out;
> -               /* This is for making all *used* pages to be on LRU. */
> -               lru_add_drain_all();
> -               drain_all_stock_sync(memcg);
>                ret = 0;
>                mem_cgroup_start_move(memcg);
>                for_each_node_state(node, N_HIGH_MEMORY) {
> @@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup *memcg)
>                }
>                mem_cgroup_end_move(memcg);
>                cond_resched();
> -       /* "ret" should also be checked to ensure all lists are empty. */
> -       } while (memcg->res.usage > 0 || ret);
> -out:
> -       return ret;
> +               /* drain LRU only when we canoot find pages on LRU */
> +               if (res_counter_read_u64(&memcg->res, RES_USAGE) &&
> +                   !mem_cgroup_nr_lru_pages(memcg, LRU_ALL))
> +                       lru_add_drain_all();
> +       } while (res_counter_read_u64(&memcg->res, RES_USAGE) || ret);
> +       mem_cgroup_put(memcg);
>  }
>
> -
>  /*
>  * make mem_cgroup's charge to be 0 if there is no task. This is only called
>  * by memory.force_empty file, an user request.
> @@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work)
>        memcg = container_of(work, struct mem_cgroup, work_freeing);
>        vfree(memcg);
>  }
> +
>  static void vfree_rcu(struct rcu_head *rcu_head)
>  {
>        struct mem_cgroup *memcg;
> @@ -4982,20 +4981,14 @@ free_out:
>        return ERR_PTR(error);
>  }
>
> -static int mem_cgroup_pre_destroy(struct cgroup *cont)
> -{
> -       struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> -
> -       return mem_cgroup_recharge(memcg);
> -}
> -
>  static void mem_cgroup_destroy(struct cgroup *cont)
>  {
>        struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
>        kmem_cgroup_destroy(cont);
>
> -       mem_cgroup_put(memcg);
> +       INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge);
> +       schedule_work(&memcg->work_destroy);
>  }
>
>  static int mem_cgroup_populate(struct cgroup_subsys *ss,
> @@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys = {
>        .name = "memory",
>        .subsys_id = mem_cgroup_subsys_id,
>        .create = mem_cgroup_create,
> -       .pre_destroy = mem_cgroup_pre_destroy,
>        .destroy = mem_cgroup_destroy,
>        .populate = mem_cgroup_populate,
>        .can_attach = mem_cgroup_can_attach,
> --
> 1.7.4.1
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/7] memcg: remove pre_destroy()
@ 2012-04-17 17:47     ` Ying Han
  0 siblings, 0 replies; 67+ messages in thread
From: Ying Han @ 2012-04-17 17:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

On Thu, Apr 12, 2012 at 4:30 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> Tejun Heo, cgroup maintainer, tries to remove ->pre_destroy() to
> prevent rmdir() from failure of EBUSY or some.
>
> This patch removes pre_destroy() in memcg. All remaining charges
> will be moved to other cgroup, without any failure,  ->destroy()
> just schedule a work and it will destroy the memcg.
> Then, rmdir will never fail. The kernel will take care of remaining
> resources in the cgroup to be accounted correctly.
>
> After this patch, memcg will be destroyed by workqueue in asynchrnous way.

Is it necessary to change the destroy asynchronously?

Frankly, i don't that that much. It will leave the system in a
deterministic state on admin perspective. The current synchronous
destroy works fine, and admin can rely on that w/ charging change
after the destroy returns.

--Ying

> Then, we can modify 'moving' logic to work asynchrnously, i.e,
> we don't force users to wait for the end of rmdir(), now. We don't
> need to use heavy synchronous calls. This patch modifies logics as
>
>  - Use mem_cgroup_drain_stock_async rather tan drain_stock_sync.
>  - lru_add_drain_all() will be called only when necessary, in a lazy way.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/memcontrol.c |   52 ++++++++++++++++++++++------------------------------
>  1 files changed, 22 insertions(+), 30 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 22c8faa..e466809 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -315,6 +315,8 @@ struct mem_cgroup {
>  #ifdef CONFIG_INET
>        struct tcp_memcontrol tcp_mem;
>  #endif
> +
> +       struct work_struct work_destroy;
>  };
>
>  /* Stuffs for move charges at task migration. */
> @@ -2105,7 +2107,6 @@ static void drain_all_stock_async(struct mem_cgroup *root_memcg)
>        mutex_unlock(&percpu_charge_mutex);
>  }
>
> -/* This is a synchronous drain interface. */
>  static void drain_all_stock_sync(struct mem_cgroup *root_memcg)
>  {
>        /* called when force_empty is called */
> @@ -3661,10 +3662,9 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
>                pc = lookup_page_cgroup(page);
>
>                ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
> -               if (ret == -EINTR)
> -                       break;
>
> -               if (ret == -EBUSY || ret == -EINVAL) {
> +               VM_BUG_ON(ret != 0 && ret != -EBUSY);
> +               if (ret) {
>                        /* found lock contention or "pc" is obsolete. */
>                        busy = page;
>                        cond_resched();
> @@ -3677,22 +3677,19 @@ static int mem_cgroup_recharge_lru(struct mem_cgroup *memcg,
>        return ret;
>  }
>
> -
> -static int mem_cgroup_recharge(struct mem_cgroup *memcg)
> +/*
> + * This function is called after ->destroy(). So, we cannot access cgroup
> + * of this memcg.
> + */
> +static void mem_cgroup_recharge(struct work_struct *work)
>  {
> +       struct mem_cgroup *memcg;
>        int ret, node, zid;
> -       struct cgroup *cgrp = memcg->css.cgroup;
>
> +       memcg = container_of(work, struct mem_cgroup, work_destroy);
> +       /* No task points this memcg. call this only once */
> +       drain_all_stock_async(memcg);
>        do {
> -               ret = -EBUSY;
> -               if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
> -                       goto out;
> -               ret = -EINTR;
> -               if (signal_pending(current))
> -                       goto out;
> -               /* This is for making all *used* pages to be on LRU. */
> -               lru_add_drain_all();
> -               drain_all_stock_sync(memcg);
>                ret = 0;
>                mem_cgroup_start_move(memcg);
>                for_each_node_state(node, N_HIGH_MEMORY) {
> @@ -3710,13 +3707,14 @@ static int mem_cgroup_recharge(struct mem_cgroup *memcg)
>                }
>                mem_cgroup_end_move(memcg);
>                cond_resched();
> -       /* "ret" should also be checked to ensure all lists are empty. */
> -       } while (memcg->res.usage > 0 || ret);
> -out:
> -       return ret;
> +               /* drain LRU only when we canoot find pages on LRU */
> +               if (res_counter_read_u64(&memcg->res, RES_USAGE) &&
> +                   !mem_cgroup_nr_lru_pages(memcg, LRU_ALL))
> +                       lru_add_drain_all();
> +       } while (res_counter_read_u64(&memcg->res, RES_USAGE) || ret);
> +       mem_cgroup_put(memcg);
>  }
>
> -
>  /*
>  * make mem_cgroup's charge to be 0 if there is no task. This is only called
>  * by memory.force_empty file, an user request.
> @@ -4803,6 +4801,7 @@ static void vfree_work(struct work_struct *work)
>        memcg = container_of(work, struct mem_cgroup, work_freeing);
>        vfree(memcg);
>  }
> +
>  static void vfree_rcu(struct rcu_head *rcu_head)
>  {
>        struct mem_cgroup *memcg;
> @@ -4982,20 +4981,14 @@ free_out:
>        return ERR_PTR(error);
>  }
>
> -static int mem_cgroup_pre_destroy(struct cgroup *cont)
> -{
> -       struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> -
> -       return mem_cgroup_recharge(memcg);
> -}
> -
>  static void mem_cgroup_destroy(struct cgroup *cont)
>  {
>        struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
>        kmem_cgroup_destroy(cont);
>
> -       mem_cgroup_put(memcg);
> +       INIT_WORK(&memcg->work_destroy, mem_cgroup_recharge);
> +       schedule_work(&memcg->work_destroy);
>  }
>
>  static int mem_cgroup_populate(struct cgroup_subsys *ss,
> @@ -5589,7 +5582,6 @@ struct cgroup_subsys mem_cgroup_subsys = {
>        .name = "memory",
>        .subsys_id = mem_cgroup_subsys_id,
>        .create = mem_cgroup_create,
> -       .pre_destroy = mem_cgroup_pre_destroy,
>        .destroy = mem_cgroup_destroy,
>        .populate = mem_cgroup_populate,
>        .can_attach = mem_cgroup_can_attach,
> --
> 1.7.4.1
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-18  6:59       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  6:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/17 7:19), Tejun Heo wrote:

> On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
>>
>> This function is used for moving accounting information to its
>> parent in the hierarchy of res_counter.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/res_counter.h |    3 +++
>>  kernel/res_counter.c        |   13 +++++++++++++
>>  2 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>> index da81af0..8919d3c 100644
>> --- a/include/linux/res_counter.h
>> +++ b/include/linux/res_counter.h
>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>>  void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>>  void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>>  
>> +/* move resource to parent counter...i.e. just forget accounting in a child */
> 
> Can we drop this comment and
> 
>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>>  
>> +/*
>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>> + * To move local usage to its parent, just forget current level usage.
>> + */
> 
> make this one proper docbook function comment?
> 

Sure. (I'll use Frederic's one.)

Thanks,
-Kame

>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>> +{
>> +	unsigned long flags;
>> +
>> +	BUG_ON(!counter->parent);
> 
> And let's please do "if (WARN_ON(!counter->parent)) return;" instead.
> 
> Thanks.
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-18  6:59       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  6:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

(2012/04/17 7:19), Tejun Heo wrote:

> On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
>>
>> This function is used for moving accounting information to its
>> parent in the hierarchy of res_counter.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>> ---
>>  include/linux/res_counter.h |    3 +++
>>  kernel/res_counter.c        |   13 +++++++++++++
>>  2 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>> index da81af0..8919d3c 100644
>> --- a/include/linux/res_counter.h
>> +++ b/include/linux/res_counter.h
>> @@ -135,6 +135,9 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
>>  void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
>>  void res_counter_uncharge(struct res_counter *counter, unsigned long val);
>>  
>> +/* move resource to parent counter...i.e. just forget accounting in a child */
> 
> Can we drop this comment and
> 
>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val);
>>  
>> +/*
>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>> + * To move local usage to its parent, just forget current level usage.
>> + */
> 
> make this one proper docbook function comment?
> 

Sure. (I'll use Frederic's one.)

Thanks,
-Kame

>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>> +{
>> +	unsigned long flags;
>> +
>> +	BUG_ON(!counter->parent);
> 
> And let's please do "if (WARN_ON(!counter->parent)) return;" instead.
> 
> Thanks.
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/7] memcg: move charge to parent only when necessary.
@ 2012-04-18  7:01       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/17 7:21), Tejun Heo wrote:

> On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote:
>>
>> When memcg->use_hierarchy==true, the parent res_counter includes
>> the usage in child's usage. So, it's not necessary to call try_charge()
>> in the parent.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  mm/memcontrol.c |   39 ++++++++++++++++++++++++++++++++-------
>>  1 files changed, 32 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index fa01106..3215880 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
>>  			res_counter_uncharge(&memcg->memsw, bytes);
>>  	}
>>  }
> 
> New line missing here.
> 
>> +/*
>> + * Moving usage between a child to its parent if use_hierarchy==true.
>> + */
> 
> Prolly "from a child to its parent"?
> 

Sure. will fix.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/7] memcg: move charge to parent only when necessary.
@ 2012-04-18  7:01       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

(2012/04/17 7:21), Tejun Heo wrote:

> On Thu, Apr 12, 2012 at 08:21:20PM +0900, KAMEZAWA Hiroyuki wrote:
>>
>> When memcg->use_hierarchy==true, the parent res_counter includes
>> the usage in child's usage. So, it's not necessary to call try_charge()
>> in the parent.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>> ---
>>  mm/memcontrol.c |   39 ++++++++++++++++++++++++++++++++-------
>>  1 files changed, 32 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index fa01106..3215880 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2409,6 +2409,20 @@ static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
>>  			res_counter_uncharge(&memcg->memsw, bytes);
>>  	}
>>  }
> 
> New line missing here.
> 
>> +/*
>> + * Moving usage between a child to its parent if use_hierarchy==true.
>> + */
> 
> Prolly "from a child to its parent"?
> 

Sure. will fix.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/7] memcg: move charges to root at rmdir()
  2012-04-16 22:30     ` Tejun Heo
  (?)
@ 2012-04-18  7:02     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/17 7:30), Tejun Heo wrote:

> Hello,
> 
> On Thu, Apr 12, 2012 at 08:22:42PM +0900, KAMEZAWA Hiroyuki wrote:
>> As recently discussed, Tejun Heo, the cgroup maintainer, tries to
>> remove ->pre_destroy() and cgroup will never return -EBUSY at rmdir().
> 
> I'm not trying to remove ->pre_destory() per-se.  I want to remove css
> ref draining and ->pre_destroy() vetoing cgroup removal.  Probably
> better wording would be "tries to simplify removal path such that
> removal always succeeds".
> 


Ok.

>> To do that, in memcg, handling case of use_hierarchy==false is a problem.
>>
>> We move memcg's charges to its parent at rmdir(). If use_hierarchy==true,
>> it's already accounted in the parent, no problem. If use_hierarchy==false,
>> we cannot guarantee we can move all charges to the parent.
>>
>> This patch changes the behavior to move all charges to root_mem_cgroup
>> if use_hierarchy=false. It seems this matches semantics of use_hierarchy==false,which means parent and child has no hierarchical relationship.
> 
> Maybe better to break the above line?
> 

yes, I'll fix it.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-18  7:04       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/17 7:31), Tejun Heo wrote:

> On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
>> +/*
>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>> + * To move local usage to its parent, just forget current level usage.
>> + */
>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>> +{
>> +	unsigned long flags;
>> +
>> +	BUG_ON(!counter->parent);
>> +	spin_lock_irqsave(&counter->lock, flags);
>> +	res_counter_uncharge_locked(counter, val);
>> +	spin_unlock_irqrestore(&counter->lock, flags);
>> +}
> 
> On the second thought, do we need this at all?  It's as good as doing
> nothing after all, no?
> 


I considered that, but I think it may make it hard to debug memcg leakage.
I'd like to confirm res->usage == 0 at removal of memcg.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-18  7:04       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

(2012/04/17 7:31), Tejun Heo wrote:

> On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
>> +/*
>> + * In hierarchical accounting, child's usage is accounted into ancestors.
>> + * To move local usage to its parent, just forget current level usage.
>> + */
>> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
>> +{
>> +	unsigned long flags;
>> +
>> +	BUG_ON(!counter->parent);
>> +	spin_lock_irqsave(&counter->lock, flags);
>> +	res_counter_uncharge_locked(counter, val);
>> +	spin_unlock_irqrestore(&counter->lock, flags);
>> +}
> 
> On the second thought, do we need this at all?  It's as good as doing
> nothing after all, no?
> 


I considered that, but I think it may make it hard to debug memcg leakage.
I'd like to confirm res->usage == 0 at removal of memcg.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/7] memcg: remove pre_destroy()
@ 2012-04-18  7:12       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/17 7:38), Tejun Heo wrote:

> Hello,
> 
> On Thu, Apr 12, 2012 at 08:30:22PM +0900, KAMEZAWA Hiroyuki wrote:
>> +/*
>> + * This function is called after ->destroy(). So, we cannot access cgroup
>> + * of this memcg.
>> + */
>> +static void mem_cgroup_recharge(struct work_struct *work)
> 
> So, ->pre_destroy per-se isn't gonna go away.  It's just gonna be this
> callback which cgroup core uses to unilaterally notify that the cgroup
> is going away, so no need to do this cleanup asynchronously from
> ->destroy().  It's okay to keep doing it synchronously from
> ->pre_destroy().  The only thing is that it can't fail.
> 


I see. 

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/7] memcg: remove pre_destroy()
@ 2012-04-18  7:12       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

(2012/04/17 7:38), Tejun Heo wrote:

> Hello,
> 
> On Thu, Apr 12, 2012 at 08:30:22PM +0900, KAMEZAWA Hiroyuki wrote:
>> +/*
>> + * This function is called after ->destroy(). So, we cannot access cgroup
>> + * of this memcg.
>> + */
>> +static void mem_cgroup_recharge(struct work_struct *work)
> 
> So, ->pre_destroy per-se isn't gonna go away.  It's just gonna be this
> callback which cgroup core uses to unilaterally notify that the cgroup
> is going away, so no need to do this cleanup asynchronously from
> ->destroy().  It's okay to keep doing it synchronously from
> ->pre_destroy().  The only thing is that it can't fail.
> 


I see. 

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir
@ 2012-04-18  7:14       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:14 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

(2012/04/18 2:29), Ying Han wrote:

> On Thu, Apr 12, 2012 at 4:28 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> Now, at rmdir, memory cgroup's charge will be moved to
>>  - parent if use_hierarchy=1
>>  - root   if use_hierarchy=0
>>
>> Then, we don't have to have memory reclaim code at destroying memcg.
>>
>> This patch divides force_empty to 2 functions as
>>
>>  - memory_cgroup_recharge() ... try to move all charges to ancestors.
>>  - memory_cgroup_force_empty().. try to reclaim all memory.
>>
>> After this patch, memory.force_empty will _not_ move charges to ancestors
>> but just reclaim all pages. (This meets documenation.)
> 
> Not sure why it matches the documentation:
> "
> memory.force_empty>---->------- # trigger forced move charge to parent
> "

I missed this...

> 
> and
> "
>   # echo 0 > memory.force_empty
> 
>   Almost all pages tracked by this memory cgroup will be unmapped and freed.
>   Some pages cannot be freed because they are locked or in-use. Such pages are
>   moved to parent and this cgroup will be empty. This may return -EBUSY if
>   VM is too busy to free/move all pages immediately.
> "
> 


The 1st feature is "will be unmapped and freed".

I'll update Documentation. Thank you.

-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir
@ 2012-04-18  7:14       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:14 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/18 2:29), Ying Han wrote:

> On Thu, Apr 12, 2012 at 4:28 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>> Now, at rmdir, memory cgroup's charge will be moved to
>>  - parent if use_hierarchy=1
>>  - root   if use_hierarchy=0
>>
>> Then, we don't have to have memory reclaim code at destroying memcg.
>>
>> This patch divides force_empty to 2 functions as
>>
>>  - memory_cgroup_recharge() ... try to move all charges to ancestors.
>>  - memory_cgroup_force_empty().. try to reclaim all memory.
>>
>> After this patch, memory.force_empty will _not_ move charges to ancestors
>> but just reclaim all pages. (This meets documenation.)
> 
> Not sure why it matches the documentation:
> "
> memory.force_empty>---->------- # trigger forced move charge to parent
> "

I missed this...

> 
> and
> "
>   # echo 0 > memory.force_empty
> 
>   Almost all pages tracked by this memory cgroup will be unmapped and freed.
>   Some pages cannot be freed because they are locked or in-use. Such pages are
>   moved to parent and this cgroup will be empty. This may return -EBUSY if
>   VM is too busy to free/move all pages immediately.
> "
> 


The 1st feature is "will be unmapped and freed".

I'll update Documentation. Thank you.

-Kame




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-18  7:15     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:15 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Tejun Heo,
	Glauber Costa, Hugh Dickins, Andrew Morton

(2012/04/18 2:35), Ying Han wrote:

> On Thu, Apr 12, 2012 at 4:17 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>>
>> By pre_destroy(), rmdir of cgroup can return -EBUSY or some error.
>> It makes cgroup complicated and unstable. I said O.K. to remove it and
>> this patch is modification for memcg.
>>
>> One of problem in current implementation is that memcg moves all charges to
>> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may
>> hit parent's limit and may return -EBUSY. To fix this problem, this patch
>> changes behavior of rmdir() as
>>
>>  - if use_hierarchy=0, all remaining charges will go to root cgroup.
>>  - if use_hierarchy=1, all remaining charges will go to the parent.
> 
> 
> We need to update the "4.3 Removing a cgroup" session in Documentation.
> 


Sure, will do.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v1 0/7] memcg remove pre_destroy
@ 2012-04-18  7:15     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 67+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:15 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Tejun Heo, Glauber Costa,
	Hugh Dickins, Andrew Morton

(2012/04/18 2:35), Ying Han wrote:

> On Thu, Apr 12, 2012 at 4:17 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>> In recent discussion, Tejun Heo, cgroup maintainer, has a plan to remove
>> ->pre_destroy(). And now, in cgroup tree, pre_destroy() failure cause WARNING.
>>
>> By pre_destroy(), rmdir of cgroup can return -EBUSY or some error.
>> It makes cgroup complicated and unstable. I said O.K. to remove it and
>> this patch is modification for memcg.
>>
>> One of problem in current implementation is that memcg moves all charges to
>> parent in pre_destroy(). At doing so, if use_hierarchy=0, pre_destroy() may
>> hit parent's limit and may return -EBUSY. To fix this problem, this patch
>> changes behavior of rmdir() as
>>
>>  - if use_hierarchy=0, all remaining charges will go to root cgroup.
>>  - if use_hierarchy=1, all remaining charges will go to the parent.
> 
> 
> We need to update the "4.3 Removing a cgroup" session in Documentation.
> 


Sure, will do.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-18 17:03         ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-18 17:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Michal Hocko, Johannes Weiner, Glauber Costa,
	Hugh Dickins, Andrew Morton

On Wed, Apr 18, 2012 at 04:04:42PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/17 7:31), Tejun Heo wrote:
> 
> > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
> >> +/*
> >> + * In hierarchical accounting, child's usage is accounted into ancestors.
> >> + * To move local usage to its parent, just forget current level usage.
> >> + */
> >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
> >> +{
> >> +	unsigned long flags;
> >> +
> >> +	BUG_ON(!counter->parent);
> >> +	spin_lock_irqsave(&counter->lock, flags);
> >> +	res_counter_uncharge_locked(counter, val);
> >> +	spin_unlock_irqrestore(&counter->lock, flags);
> >> +}
> > 
> > On the second thought, do we need this at all?  It's as good as doing
> > nothing after all, no?
> > 
> 
> 
> I considered that, but I think it may make it hard to debug memcg leakage.
> I'd like to confirm res->usage == 0 at removal of memcg.

Hmmm... then let's name it res_counter_reset() or something.  I feel
very confused about the function name.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/7] res_counter: add a function res_counter_move_parent().
@ 2012-04-18 17:03         ` Tejun Heo
  0 siblings, 0 replies; 67+ messages in thread
From: Tejun Heo @ 2012-04-18 17:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Johannes Weiner, Glauber Costa, Hugh Dickins,
	Andrew Morton

On Wed, Apr 18, 2012 at 04:04:42PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/04/17 7:31), Tejun Heo wrote:
> 
> > On Thu, Apr 12, 2012 at 08:20:06PM +0900, KAMEZAWA Hiroyuki wrote:
> >> +/*
> >> + * In hierarchical accounting, child's usage is accounted into ancestors.
> >> + * To move local usage to its parent, just forget current level usage.
> >> + */
> >> +void res_counter_move_parent(struct res_counter *counter, unsigned long val)
> >> +{
> >> +	unsigned long flags;
> >> +
> >> +	BUG_ON(!counter->parent);
> >> +	spin_lock_irqsave(&counter->lock, flags);
> >> +	res_counter_uncharge_locked(counter, val);
> >> +	spin_unlock_irqrestore(&counter->lock, flags);
> >> +}
> > 
> > On the second thought, do we need this at all?  It's as good as doing
> > nothing after all, no?
> > 
> 
> 
> I considered that, but I think it may make it hard to debug memcg leakage.
> I'd like to confirm res->usage == 0 at removal of memcg.

Hmmm... then let's name it res_counter_reset() or something.  I feel
very confused about the function name.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2012-04-18 17:03 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12 11:17 [PATCH v1 0/7] memcg remove pre_destroy KAMEZAWA Hiroyuki
2012-04-12 11:20 ` [PATCH 1/7] res_counter: add a function res_counter_move_parent() KAMEZAWA Hiroyuki
2012-04-12 11:20   ` KAMEZAWA Hiroyuki
2012-04-12 13:22   ` Glauber Costa
2012-04-12 14:30     ` Frederic Weisbecker
2012-04-13  0:57       ` KAMEZAWA Hiroyuki
2012-04-13  0:57         ` KAMEZAWA Hiroyuki
2012-04-13  1:04         ` Frederic Weisbecker
2012-04-13  1:04           ` Frederic Weisbecker
2012-04-13  1:05           ` KAMEZAWA Hiroyuki
2012-04-16 22:19   ` Tejun Heo
2012-04-16 22:19     ` Tejun Heo
2012-04-18  6:59     ` KAMEZAWA Hiroyuki
2012-04-18  6:59       ` KAMEZAWA Hiroyuki
2012-04-16 22:31   ` Tejun Heo
2012-04-18  7:04     ` KAMEZAWA Hiroyuki
2012-04-18  7:04       ` KAMEZAWA Hiroyuki
2012-04-18 17:03       ` Tejun Heo
2012-04-18 17:03         ` Tejun Heo
2012-04-12 11:21 ` [PATCH 2/7] memcg: move charge to parent only when necessary KAMEZAWA Hiroyuki
2012-04-12 11:21   ` KAMEZAWA Hiroyuki
2012-04-16 22:21   ` Tejun Heo
2012-04-16 22:21     ` Tejun Heo
2012-04-18  7:01     ` KAMEZAWA Hiroyuki
2012-04-18  7:01       ` KAMEZAWA Hiroyuki
2012-04-12 11:22 ` [PATCH 3/7] memcg: move charges to root at rmdir() KAMEZAWA Hiroyuki
2012-04-12 11:22   ` KAMEZAWA Hiroyuki
2012-04-16 22:30   ` Tejun Heo
2012-04-16 22:30     ` Tejun Heo
2012-04-18  7:02     ` KAMEZAWA Hiroyuki
2012-04-12 11:24 ` [PATCH 4/7] memcg: remove 'uncharge' argument from mem_cgroup_move_account() KAMEZAWA Hiroyuki
2012-04-12 11:24   ` KAMEZAWA Hiroyuki
2012-04-12 13:27   ` Glauber Costa
2012-04-13  1:01     ` KAMEZAWA Hiroyuki
2012-04-12 11:28 ` [PATCH 5/7] memcg: divide force_empty into 2 functions, avoid memory reclaim at rmdir KAMEZAWA Hiroyuki
2012-04-12 11:28   ` KAMEZAWA Hiroyuki
2012-04-12 13:33   ` Glauber Costa
2012-04-17 17:29   ` Ying Han
2012-04-18  7:14     ` KAMEZAWA Hiroyuki
2012-04-18  7:14       ` KAMEZAWA Hiroyuki
2012-04-12 11:30 ` [PATCH 6/7] memcg: remove pre_destroy() KAMEZAWA Hiroyuki
2012-04-12 11:30   ` KAMEZAWA Hiroyuki
2012-04-16 22:38   ` Tejun Heo
2012-04-18  7:12     ` KAMEZAWA Hiroyuki
2012-04-18  7:12       ` KAMEZAWA Hiroyuki
2012-04-17 17:47   ` Ying Han
2012-04-17 17:47     ` Ying Han
2012-04-12 11:31 ` [PATCH 7/7] memcg: remove drain_all_stock_sync KAMEZAWA Hiroyuki
2012-04-12 11:31   ` KAMEZAWA Hiroyuki
2012-04-12 13:35   ` Glauber Costa
2012-04-12 13:35     ` Glauber Costa
2012-04-12 13:20 ` [PATCH v1 0/7] memcg remove pre_destroy Glauber Costa
2012-04-12 13:20   ` Glauber Costa
2012-04-12 16:06 ` Tejun Heo
2012-04-12 16:06   ` Tejun Heo
2012-04-12 18:57   ` Aneesh Kumar K.V
2012-04-12 18:57     ` Aneesh Kumar K.V
2012-04-12 23:59     ` KAMEZAWA Hiroyuki
2012-04-13  8:50       ` Michal Hocko
2012-04-13  8:50         ` Michal Hocko
2012-04-13 22:19         ` Aneesh Kumar K.V
2012-04-13 22:19           ` Aneesh Kumar K.V
2012-04-16 22:41 ` Tejun Heo
2012-04-16 22:41   ` Tejun Heo
2012-04-17 17:35 ` Ying Han
2012-04-18  7:15   ` KAMEZAWA Hiroyuki
2012-04-18  7:15     ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.