linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware ***
@ 2010-11-09  9:24 Greg Thelen
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

This series aims to:
- Make throttle_vm_writeout() cgroup aware.  Prior to this patch, cgroup reclaim
  would consider global dirty limits when deciding to throttle.  Now cgroup
  limits are used if the cgroup being reclaimed has dirty limits.

- Part of making throttle_vm_writeout() cgroup aware involves fixing a negative
  value signaling error in mem_cgroup_page_stat().  Previously,
  mem_cgroup_page_stat() could falsely return a negative value if a per-cpu
  counter sum was negative.  Calling routines considered negative a special
  "cgroup does not have limits" value.

Greg Thelen (6):
  memcg: add mem_cgroup parameter to mem_cgroup_page_stat()
  memcg: pass mem_cgroup to mem_cgroup_dirty_info()
  memcg: make throttle_vm_writeout() memcg aware
  memcg: simplify mem_cgroup_page_stat()
  memcg: simplify mem_cgroup_dirty_info()
  memcg: make mem_cgroup_page_stat() return value unsigned

 include/linux/memcontrol.h |    8 +++-
 include/linux/writeback.h  |    2 +-
 mm/memcontrol.c            |   92 ++++++++++++++++++++++---------------------
 mm/page-writeback.c        |   40 ++++++++++---------
 mm/vmscan.c                |    2 +-
 5 files changed, 77 insertions(+), 67 deletions(-)

-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat()
  2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
@ 2010-11-09  9:24 ` Greg Thelen
  2010-11-09 22:53   ` Minchan Kim
                     ` (3 more replies)
  2010-11-09  9:24 ` [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info() Greg Thelen
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

This new parameter can be used to query dirty memory usage
from a given memcg rather than the current task's memcg.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 include/linux/memcontrol.h |    6 ++++--
 mm/memcontrol.c            |   37 +++++++++++++++++++++----------------
 mm/page-writeback.c        |    2 +-
 3 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7a3d915..89a9278 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -157,7 +157,8 @@ static inline void mem_cgroup_dec_page_stat(struct page *page,
 bool mem_cgroup_has_dirty_limit(void);
 bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 			   struct dirty_info *info);
-long mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item);
+long mem_cgroup_page_stat(struct mem_cgroup *mem,
+			  enum mem_cgroup_nr_pages_item item);
 
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						gfp_t gfp_mask);
@@ -351,7 +352,8 @@ static inline bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 	return false;
 }
 
-static inline long mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item)
+static inline long mem_cgroup_page_stat(struct mem_cgroup *mem,
+					enum mem_cgroup_nr_pages_item item)
 {
 	return -ENOSYS;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d8a06d6..1bff7cf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1245,22 +1245,20 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 	unsigned long available_mem;
 	struct mem_cgroup *memcg;
 	long value;
+	bool valid = false;
 
 	if (mem_cgroup_disabled())
 		return false;
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
-	if (!__mem_cgroup_has_dirty_limit(memcg)) {
-		rcu_read_unlock();
-		return false;
-	}
+	if (!__mem_cgroup_has_dirty_limit(memcg))
+		goto done;
 	__mem_cgroup_dirty_param(&dirty_param, memcg);
-	rcu_read_unlock();
 
-	value = mem_cgroup_page_stat(MEMCG_NR_DIRTYABLE_PAGES);
+	value = mem_cgroup_page_stat(memcg, MEMCG_NR_DIRTYABLE_PAGES);
 	if (value < 0)
-		return false;
+		goto done;
 
 	available_mem = min((unsigned long)value, sys_available_mem);
 
@@ -1280,17 +1278,21 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 			(dirty_param.dirty_background_ratio *
 			       available_mem) / 100;
 
-	value = mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
+	value = mem_cgroup_page_stat(memcg, MEMCG_NR_RECLAIM_PAGES);
 	if (value < 0)
-		return false;
+		goto done;
 	info->nr_reclaimable = value;
 
-	value = mem_cgroup_page_stat(MEMCG_NR_WRITEBACK);
+	value = mem_cgroup_page_stat(memcg, MEMCG_NR_WRITEBACK);
 	if (value < 0)
-		return false;
+		goto done;
 	info->nr_writeback = value;
 
-	return true;
+	valid = true;
+
+done:
+	rcu_read_unlock();
+	return valid;
 }
 
 static inline bool mem_cgroup_can_swap(struct mem_cgroup *memcg)
@@ -1361,20 +1363,23 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
 
 /*
  * mem_cgroup_page_stat() - get memory cgroup file cache statistics
- * @item:      memory statistic item exported to the kernel
+ * @mem:	optional memory cgroup to query.  If NULL, use current task's
+ *		cgroup.
+ * @item:	memory statistic item exported to the kernel
  *
  * Return the accounted statistic value or negative value if current task is
  * root cgroup.
  */
-long mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item)
+long mem_cgroup_page_stat(struct mem_cgroup *mem,
+			  enum mem_cgroup_nr_pages_item item)
 {
 	struct mem_cgroup *iter;
-	struct mem_cgroup *mem;
 	long value;
 
 	get_online_cpus();
 	rcu_read_lock();
-	mem = mem_cgroup_from_task(current);
+	if (!mem)
+		mem = mem_cgroup_from_task(current);
 	if (__mem_cgroup_has_dirty_limit(mem)) {
 		/*
 		 * If we're looking for dirtyable pages we need to evaluate
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a477f59..dc3dbe3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -135,7 +135,7 @@ static unsigned long dirty_writeback_pages(void)
 {
 	unsigned long ret;
 
-	ret = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
+	ret = mem_cgroup_page_stat(NULL, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
 	if ((long)ret < 0)
 		ret = global_page_state(NR_UNSTABLE_NFS) +
 			global_page_state(NR_WRITEBACK);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info()
  2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
@ 2010-11-09  9:24 ` Greg Thelen
  2010-11-09 23:09   ` Minchan Kim
                     ` (2 more replies)
  2010-11-09  9:24 ` [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware Greg Thelen
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

Pass mem_cgroup parameter through memcg_dirty_info() into
mem_cgroup_dirty_info().  This allows for querying dirty memory
information from a particular cgroup, rather than just the
current task's cgroup.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 include/linux/memcontrol.h |    2 ++
 mm/memcontrol.c            |    5 +++--
 mm/page-writeback.c        |    9 ++++++---
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 89a9278..a81dfda 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -156,6 +156,7 @@ static inline void mem_cgroup_dec_page_stat(struct page *page,
 
 bool mem_cgroup_has_dirty_limit(void);
 bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
+			   struct mem_cgroup *memcg,
 			   struct dirty_info *info);
 long mem_cgroup_page_stat(struct mem_cgroup *mem,
 			  enum mem_cgroup_nr_pages_item item);
@@ -347,6 +348,7 @@ static inline bool mem_cgroup_has_dirty_limit(void)
 }
 
 static inline bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
+					 struct mem_cgroup *memcg,
 					 struct dirty_info *info)
 {
 	return false;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1bff7cf..eb621ee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1239,11 +1239,11 @@ static void __mem_cgroup_dirty_param(struct vm_dirty_param *param,
  * "memcg" will not be freed while holding rcu_read_lock().
  */
 bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
+			   struct mem_cgroup *memcg,
 			   struct dirty_info *info)
 {
 	struct vm_dirty_param dirty_param;
 	unsigned long available_mem;
-	struct mem_cgroup *memcg;
 	long value;
 	bool valid = false;
 
@@ -1251,7 +1251,8 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 		return false;
 
 	rcu_read_lock();
-	memcg = mem_cgroup_from_task(current);
+	if (!memcg)
+		memcg = mem_cgroup_from_task(current);
 	if (!__mem_cgroup_has_dirty_limit(memcg))
 		goto done;
 	__mem_cgroup_dirty_param(&dirty_param, memcg);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index dc3dbe3..d717fa9 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -461,12 +461,15 @@ void global_dirty_info(struct dirty_info *info)
  * Calculate the background-writeback and dirty-throttling thresholds and dirty
  * usage metrics from the current task's memcg dirty limit parameters.  Returns
  * false if no memcg limits exist.
+ *
+ * @memcg may be NULL if the current task's memcg should be used.
+ * @info is the location where the dirty information is written.
  */
-static bool memcg_dirty_info(struct dirty_info *info)
+static bool memcg_dirty_info(struct mem_cgroup *memcg, struct dirty_info *info)
 {
 	unsigned long available_memory = determine_dirtyable_memory();
 
-	if (!mem_cgroup_dirty_info(available_memory, info))
+	if (!mem_cgroup_dirty_info(available_memory, memcg, info))
 		return false;
 
 	adjust_dirty_info(info);
@@ -534,7 +537,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 
 		global_dirty_info(&sys_info);
 
-		if (!memcg_dirty_info(&memcg_info))
+		if (!memcg_dirty_info(NULL, &memcg_info))
 			memcg_info = sys_info;
 
 		/*
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware
  2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
  2010-11-09  9:24 ` [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info() Greg Thelen
@ 2010-11-09  9:24 ` Greg Thelen
  2010-11-09 23:22   ` Minchan Kim
  2010-11-12  8:17   ` Johannes Weiner
  2010-11-09  9:24 ` [PATCH 4/6] memcg: simplify mem_cgroup_page_stat() Greg Thelen
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

If called with a mem_cgroup, then throttle_vm_writeout() should
query the given cgroup for its dirty memory usage limits.

dirty_writeback_pages() is no longer used, so delete it.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 include/linux/writeback.h |    2 +-
 mm/page-writeback.c       |   31 ++++++++++++++++---------------
 mm/vmscan.c               |    2 +-
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 335dba1..1bacdda 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -97,7 +97,7 @@ void laptop_mode_timer_fn(unsigned long data);
 #else
 static inline void laptop_sync_completion(void) { }
 #endif
-void throttle_vm_writeout(gfp_t gfp_mask);
+void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup);
 
 /* These are exported to sysctl. */
 extern int dirty_background_ratio;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d717fa9..bf85062 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -131,18 +131,6 @@ EXPORT_SYMBOL(laptop_mode);
 static struct prop_descriptor vm_completions;
 static struct prop_descriptor vm_dirties;
 
-static unsigned long dirty_writeback_pages(void)
-{
-	unsigned long ret;
-
-	ret = mem_cgroup_page_stat(NULL, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
-	if ((long)ret < 0)
-		ret = global_page_state(NR_UNSTABLE_NFS) +
-			global_page_state(NR_WRITEBACK);
-
-	return ret;
-}
-
 /*
  * couple the period to the dirty_ratio:
  *
@@ -703,12 +691,25 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
 }
 EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
 
-void throttle_vm_writeout(gfp_t gfp_mask)
+/*
+ * Throttle the current task if it is near dirty memory usage limits.
+ * If @mem_cgroup is NULL or the root_cgroup, then use global dirty memory
+ * information; otherwise use the per-memcg dirty limits.
+ */
+void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup)
 {
 	struct dirty_info dirty_info;
+	unsigned long nr_writeback;
 
         for ( ; ; ) {
-		global_dirty_info(&dirty_info);
+		if (!mem_cgroup || !memcg_dirty_info(mem_cgroup, &dirty_info)) {
+			global_dirty_info(&dirty_info);
+			nr_writeback = global_page_state(NR_UNSTABLE_NFS) +
+				global_page_state(NR_WRITEBACK);
+		} else {
+			nr_writeback = mem_cgroup_page_stat(
+				mem_cgroup, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
+		}
 
                 /*
                  * Boost the allowable dirty threshold a bit for page
@@ -717,7 +718,7 @@ void throttle_vm_writeout(gfp_t gfp_mask)
 		dirty_info.dirty_thresh +=
 			dirty_info.dirty_thresh / 10;      /* wheeee... */
 
-		if (dirty_writeback_pages() <= dirty_info.dirty_thresh)
+		if (nr_writeback <= dirty_info.dirty_thresh)
 			break;
                 congestion_wait(BLK_RW_ASYNC, HZ/10);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6d84858..8cc90d5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1844,7 +1844,7 @@ static void shrink_zone(int priority, struct zone *zone,
 	if (inactive_anon_is_low(zone, sc))
 		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
 
-	throttle_vm_writeout(sc->gfp_mask);
+	throttle_vm_writeout(sc->gfp_mask, sc->mem_cgroup);
 }
 
 /*
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/6] memcg: simplify mem_cgroup_page_stat()
  2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
                   ` (2 preceding siblings ...)
  2010-11-09  9:24 ` [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware Greg Thelen
@ 2010-11-09  9:24 ` Greg Thelen
  2010-11-09 23:36   ` Minchan Kim
                     ` (2 more replies)
  2010-11-09  9:24 ` [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info() Greg Thelen
  2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
  5 siblings, 3 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

The cgroup given to mem_cgroup_page_stat() is no allowed to be
NULL or the root cgroup.  So there is no need to complicate the code
handling those cases.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 mm/memcontrol.c |   48 ++++++++++++++++++++++--------------------------
 1 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index eb621ee..f8df350 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1364,12 +1364,10 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
 
 /*
  * mem_cgroup_page_stat() - get memory cgroup file cache statistics
- * @mem:	optional memory cgroup to query.  If NULL, use current task's
- *		cgroup.
+ * @mem:	memory cgroup to query
  * @item:	memory statistic item exported to the kernel
  *
- * Return the accounted statistic value or negative value if current task is
- * root cgroup.
+ * Return the accounted statistic value.
  */
 long mem_cgroup_page_stat(struct mem_cgroup *mem,
 			  enum mem_cgroup_nr_pages_item item)
@@ -1377,29 +1375,27 @@ long mem_cgroup_page_stat(struct mem_cgroup *mem,
 	struct mem_cgroup *iter;
 	long value;
 
+	VM_BUG_ON(!mem);
+	VM_BUG_ON(mem_cgroup_is_root(mem));
+
 	get_online_cpus();
-	rcu_read_lock();
-	if (!mem)
-		mem = mem_cgroup_from_task(current);
-	if (__mem_cgroup_has_dirty_limit(mem)) {
-		/*
-		 * If we're looking for dirtyable pages we need to evaluate
-		 * free pages depending on the limit and usage of the parents
-		 * first of all.
-		 */
-		if (item == MEMCG_NR_DIRTYABLE_PAGES)
-			value = memcg_hierarchical_free_pages(mem);
-		else
-			value = 0;
-		/*
-		 * Recursively evaluate page statistics against all cgroup
-		 * under hierarchy tree
-		 */
-		for_each_mem_cgroup_tree(iter, mem)
-			value += mem_cgroup_local_page_stat(iter, item);
-	} else
-		value = -EINVAL;
-	rcu_read_unlock();
+
+	/*
+	 * If we're looking for dirtyable pages we need to evaluate
+	 * free pages depending on the limit and usage of the parents
+	 * first of all.
+	 */
+	if (item == MEMCG_NR_DIRTYABLE_PAGES)
+		value = memcg_hierarchical_free_pages(mem);
+	else
+		value = 0;
+	/*
+	 * Recursively evaluate page statistics against all cgroup
+	 * under hierarchy tree
+	 */
+	for_each_mem_cgroup_tree(iter, mem)
+		value += mem_cgroup_local_page_stat(iter, item);
+
 	put_online_cpus();
 
 	return value;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info()
  2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
                   ` (3 preceding siblings ...)
  2010-11-09  9:24 ` [PATCH 4/6] memcg: simplify mem_cgroup_page_stat() Greg Thelen
@ 2010-11-09  9:24 ` Greg Thelen
  2010-11-10  1:01   ` Minchan Kim
  2010-11-12  8:21   ` Johannes Weiner
  2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
  5 siblings, 2 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

Because mem_cgroup_page_stat() no longer returns negative numbers
to indicate failure, mem_cgroup_dirty_info() does not need to check
for such failures.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 mm/memcontrol.c |   14 +++-----------
 1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f8df350..ccdbb7e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1258,8 +1258,6 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 	__mem_cgroup_dirty_param(&dirty_param, memcg);
 
 	value = mem_cgroup_page_stat(memcg, MEMCG_NR_DIRTYABLE_PAGES);
-	if (value < 0)
-		goto done;
 
 	available_mem = min((unsigned long)value, sys_available_mem);
 
@@ -1279,15 +1277,9 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 			(dirty_param.dirty_background_ratio *
 			       available_mem) / 100;
 
-	value = mem_cgroup_page_stat(memcg, MEMCG_NR_RECLAIM_PAGES);
-	if (value < 0)
-		goto done;
-	info->nr_reclaimable = value;
-
-	value = mem_cgroup_page_stat(memcg, MEMCG_NR_WRITEBACK);
-	if (value < 0)
-		goto done;
-	info->nr_writeback = value;
+	info->nr_reclaimable =
+		mem_cgroup_page_stat(memcg, MEMCG_NR_RECLAIM_PAGES);
+	info->nr_writeback = mem_cgroup_page_stat(memcg, MEMCG_NR_WRITEBACK);
 
 	valid = true;
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
                   ` (4 preceding siblings ...)
  2010-11-09  9:24 ` [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info() Greg Thelen
@ 2010-11-09  9:24 ` Greg Thelen
  2010-11-09 12:15   ` Wu Fengguang
                     ` (3 more replies)
  5 siblings, 4 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-09  9:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel, Greg Thelen

mem_cgroup_page_stat() used to return a negative page count
value to indicate value.

mem_cgroup_page_stat() has changed so it never returns
error so convert the return value to the traditional page
count type (unsigned long).

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 include/linux/memcontrol.h |    6 +++---
 mm/memcontrol.c            |   12 ++++++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a81dfda..3433784 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -158,8 +158,8 @@ bool mem_cgroup_has_dirty_limit(void);
 bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 			   struct mem_cgroup *memcg,
 			   struct dirty_info *info);
-long mem_cgroup_page_stat(struct mem_cgroup *mem,
-			  enum mem_cgroup_nr_pages_item item);
+unsigned long mem_cgroup_page_stat(struct mem_cgroup *mem,
+				   enum mem_cgroup_nr_pages_item item);
 
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						gfp_t gfp_mask);
@@ -354,7 +354,7 @@ static inline bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
 	return false;
 }
 
-static inline long mem_cgroup_page_stat(struct mem_cgroup *mem,
+static inline unsigned long mem_cgroup_page_stat(struct mem_cgroup *mem,
 					enum mem_cgroup_nr_pages_item item)
 {
 	return -ENOSYS;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ccdbb7e..ed070d0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1361,8 +1361,8 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
  *
  * Return the accounted statistic value.
  */
-long mem_cgroup_page_stat(struct mem_cgroup *mem,
-			  enum mem_cgroup_nr_pages_item item)
+unsigned long mem_cgroup_page_stat(struct mem_cgroup *mem,
+				   enum mem_cgroup_nr_pages_item item)
 {
 	struct mem_cgroup *iter;
 	long value;
@@ -1388,6 +1388,14 @@ long mem_cgroup_page_stat(struct mem_cgroup *mem,
 	for_each_mem_cgroup_tree(iter, mem)
 		value += mem_cgroup_local_page_stat(iter, item);
 
+	/*
+	 * The sum of unlocked per-cpu counters may yield a slightly negative
+	 * value.  This function returns an unsigned value, so round it up to
+	 * zero to avoid returning a very large value.
+	 */
+	if (value < 0)
+		value = 0;
+
 	put_online_cpus();
 
 	return value;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
@ 2010-11-09 12:15   ` Wu Fengguang
  2010-11-10  1:04   ` Minchan Kim
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Wu Fengguang @ 2010-11-09 12:15 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Nov 09, 2010 at 05:24:31PM +0800, Greg Thelen wrote:
> mem_cgroup_page_stat() used to return a negative page count
> value to indicate value.
> 
> mem_cgroup_page_stat() has changed so it never returns
> error so convert the return value to the traditional page
> count type (unsigned long).
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---

> +	/*
> +	 * The sum of unlocked per-cpu counters may yield a slightly negative
> +	 * value.  This function returns an unsigned value, so round it up to
> +	 * zero to avoid returning a very large value.
> +	 */
> +	if (value < 0)
> +		value = 0;

nitpick: it's good candidate for unlikely().

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

Sorry, I lose track to the source code after so many patches.
It would help if you can put the patches to a git tree.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
@ 2010-11-09 22:53   ` Minchan Kim
  2010-11-10  0:51   ` Minchan Kim
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-09 22:53 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> This new parameter can be used to query dirty memory usage
> from a given memcg rather than the current task's memcg.
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
>  include/linux/memcontrol.h |    6 ++++--
>  mm/memcontrol.c            |   37 +++++++++++++++++++++----------------
>  mm/page-writeback.c        |    2 +-
>  3 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 7a3d915..89a9278 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -157,7 +157,8 @@ static inline void mem_cgroup_dec_page_stat(struct page *page,
>  bool mem_cgroup_has_dirty_limit(void);
>  bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
>                           struct dirty_info *info);
> -long mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item);
> +long mem_cgroup_page_stat(struct mem_cgroup *mem,
> +                         enum mem_cgroup_nr_pages_item item);
>
>  unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
>                                                gfp_t gfp_mask);
> @@ -351,7 +352,8 @@ static inline bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
>        return false;
>  }
>
> -static inline long mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item)
> +static inline long mem_cgroup_page_stat(struct mem_cgroup *mem,
> +                                       enum mem_cgroup_nr_pages_item item)
>  {
>        return -ENOSYS;
>  }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d8a06d6..1bff7cf 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1245,22 +1245,20 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
>        unsigned long available_mem;
>        struct mem_cgroup *memcg;
>        long value;
> +       bool valid = false;
>
>        if (mem_cgroup_disabled())
>                return false;
>
>        rcu_read_lock();
>        memcg = mem_cgroup_from_task(current);
> -       if (!__mem_cgroup_has_dirty_limit(memcg)) {
> -               rcu_read_unlock();
> -               return false;
> -       }
> +       if (!__mem_cgroup_has_dirty_limit(memcg))
> +               goto done;
>        __mem_cgroup_dirty_param(&dirty_param, memcg);
> -       rcu_read_unlock();
>
> -       value = mem_cgroup_page_stat(MEMCG_NR_DIRTYABLE_PAGES);
> +       value = mem_cgroup_page_stat(memcg, MEMCG_NR_DIRTYABLE_PAGES);
>        if (value < 0)
> -               return false;
> +               goto done;
>
>        available_mem = min((unsigned long)value, sys_available_mem);
>
> @@ -1280,17 +1278,21 @@ bool mem_cgroup_dirty_info(unsigned long sys_available_mem,
>                        (dirty_param.dirty_background_ratio *
>                               available_mem) / 100;
>
> -       value = mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
> +       value = mem_cgroup_page_stat(memcg, MEMCG_NR_RECLAIM_PAGES);
>        if (value < 0)
> -               return false;
> +               goto done;
>        info->nr_reclaimable = value;
>
> -       value = mem_cgroup_page_stat(MEMCG_NR_WRITEBACK);
> +       value = mem_cgroup_page_stat(memcg, MEMCG_NR_WRITEBACK);
>        if (value < 0)
> -               return false;
> +               goto done;
>        info->nr_writeback = value;
>
> -       return true;
> +       valid = true;
> +
> +done:
> +       rcu_read_unlock();
> +       return valid;
>  }
>
>  static inline bool mem_cgroup_can_swap(struct mem_cgroup *memcg)
> @@ -1361,20 +1363,23 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
>
>  /*
>  * mem_cgroup_page_stat() - get memory cgroup file cache statistics
> - * @item:      memory statistic item exported to the kernel
> + * @mem:       optional memory cgroup to query.  If NULL, use current task's
> + *             cgroup.
> + * @item:      memory statistic item exported to the kernel
>  *
>  * Return the accounted statistic value or negative value if current task is
>  * root cgroup.
>  */
> -long mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item)
> +long mem_cgroup_page_stat(struct mem_cgroup *mem,
> +                         enum mem_cgroup_nr_pages_item item)
>  {
>        struct mem_cgroup *iter;
> -       struct mem_cgroup *mem;
>        long value;
>
>        get_online_cpus();
>        rcu_read_lock();
> -       mem = mem_cgroup_from_task(current);
> +       if (!mem)
> +               mem = mem_cgroup_from_task(current);
>        if (__mem_cgroup_has_dirty_limit(mem)) {
>                /*
>                 * If we're looking for dirtyable pages we need to evaluate
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index a477f59..dc3dbe3 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -135,7 +135,7 @@ static unsigned long dirty_writeback_pages(void)
>  {
>        unsigned long ret;
>
> -       ret = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> +       ret = mem_cgroup_page_stat(NULL, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
>        if ((long)ret < 0)
>                ret = global_page_state(NR_UNSTABLE_NFS) +
>                        global_page_state(NR_WRITEBACK);
> --
> 1.7.3.1
>

I didn't look at further patches so It might be changed.

Now all of caller of mem_cgroup_page_stat except only
dirty_writeback_pages hold rcu_read_lock.
And mem_cgroup_page_stat itself hold rcu_read_lock again.
Couldn't we remove duplicated rcu lock by adding rcu_read_lock in
dirty_writeback_pages for the consistency?


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info()
  2010-11-09  9:24 ` [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info() Greg Thelen
@ 2010-11-09 23:09   ` Minchan Kim
  2010-11-16  3:51   ` KAMEZAWA Hiroyuki
  2010-11-22  6:41   ` Balbir Singh
  2 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-09 23:09 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> Pass mem_cgroup parameter through memcg_dirty_info() into
> mem_cgroup_dirty_info().  This allows for querying dirty memory
> information from a particular cgroup, rather than just the
> current task's cgroup.
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware
  2010-11-09  9:24 ` [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware Greg Thelen
@ 2010-11-09 23:22   ` Minchan Kim
  2010-11-12  8:17   ` Johannes Weiner
  1 sibling, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-09 23:22 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> If called with a mem_cgroup, then throttle_vm_writeout() should
> query the given cgroup for its dirty memory usage limits.
>
> dirty_writeback_pages() is no longer used, so delete it.
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

> ---
>  include/linux/writeback.h |    2 +-
>  mm/page-writeback.c       |   31 ++++++++++++++++---------------
>  mm/vmscan.c               |    2 +-
>  3 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 335dba1..1bacdda 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -97,7 +97,7 @@ void laptop_mode_timer_fn(unsigned long data);
>  #else
>  static inline void laptop_sync_completion(void) { }
>  #endif
> -void throttle_vm_writeout(gfp_t gfp_mask);
> +void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup);
>
>  /* These are exported to sysctl. */
>  extern int dirty_background_ratio;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index d717fa9..bf85062 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -131,18 +131,6 @@ EXPORT_SYMBOL(laptop_mode);
>  static struct prop_descriptor vm_completions;
>  static struct prop_descriptor vm_dirties;
>
> -static unsigned long dirty_writeback_pages(void)
> -{
> -       unsigned long ret;
> -
> -       ret = mem_cgroup_page_stat(NULL, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> -       if ((long)ret < 0)
> -               ret = global_page_state(NR_UNSTABLE_NFS) +
> -                       global_page_state(NR_WRITEBACK);
> -
> -       return ret;
> -}
> -

Nice cleanup.

>  /*
>  * couple the period to the dirty_ratio:
>  *
> @@ -703,12 +691,25 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
>
> -void throttle_vm_writeout(gfp_t gfp_mask)
> +/*
> + * Throttle the current task if it is near dirty memory usage limits.
> + * If @mem_cgroup is NULL or the root_cgroup, then use global dirty memory
> + * information; otherwise use the per-memcg dirty limits.
> + */
> +void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup)
>  {
>        struct dirty_info dirty_info;
> +       unsigned long nr_writeback;
>
>         for ( ; ; ) {
> -               global_dirty_info(&dirty_info);
> +               if (!mem_cgroup || !memcg_dirty_info(mem_cgroup, &dirty_info)) {
> +                       global_dirty_info(&dirty_info);
> +                       nr_writeback = global_page_state(NR_UNSTABLE_NFS) +
> +                               global_page_state(NR_WRITEBACK);
> +               } else {
> +                       nr_writeback = mem_cgroup_page_stat(
> +                               mem_cgroup, MEMCG_NR_DIRTY_WRITEBACK_PAGES);

In point of view rcu_read_lock removal, memcg can't destroy due to
mem_cgroup_select_victim's css_tryget?
Then, we can remove unnecessary rcu_read_lock in mem_cgroup_page_stat.



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/6] memcg: simplify mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 4/6] memcg: simplify mem_cgroup_page_stat() Greg Thelen
@ 2010-11-09 23:36   ` Minchan Kim
  2010-11-12  8:19   ` Johannes Weiner
  2010-11-16  3:58   ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-09 23:36 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> The cgroup given to mem_cgroup_page_stat() is no allowed to be
> NULL or the root cgroup.  So there is no need to complicate the code
> handling those cases.
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

You already did what i want. :)
I should have commented after seeing all patches.

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
  2010-11-09 22:53   ` Minchan Kim
@ 2010-11-10  0:51   ` Minchan Kim
  2010-11-16  3:50   ` KAMEZAWA Hiroyuki
  2010-11-22  6:40   ` Balbir Singh
  3 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-10  0:51 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> This new parameter can be used to query dirty memory usage
> from a given memcg rather than the current task's memcg.
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info()
  2010-11-09  9:24 ` [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info() Greg Thelen
@ 2010-11-10  1:01   ` Minchan Kim
  2010-11-12  8:21   ` Johannes Weiner
  1 sibling, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-10  1:01 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> Because mem_cgroup_page_stat() no longer returns negative numbers

I tried to understand why mem_cgroup_page_stat doesn't return negative
number any more for a while.
I couldn't find answer by current patches 5/6.
The answer is where 6/6.
It would be better to change 6/6 with 5/6.


> to indicate failure, mem_cgroup_dirty_info() does not need to check
> for such failures.
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
  2010-11-09 12:15   ` Wu Fengguang
@ 2010-11-10  1:04   ` Minchan Kim
  2010-11-12  8:29   ` Johannes Weiner
  2010-11-16  4:00   ` KAMEZAWA Hiroyuki
  3 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2010-11-10  1:04 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Johannes Weiner, Wu Fengguang, linux-mm,
	linux-kernel

On Tue, Nov 9, 2010 at 6:24 PM, Greg Thelen <gthelen@google.com> wrote:
> mem_cgroup_page_stat() used to return a negative page count
> value to indicate value.
>
> mem_cgroup_page_stat() has changed so it never returns
> error so convert the return value to the traditional page
> count type (unsigned long).
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

It seems to be late review. Just 1 day.
I don't know why Andrew merge it in a hurry without any review.
Anyway, I add my Reviewed-by in this series since my eyes can't find
any big bug.

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware
  2010-11-09  9:24 ` [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware Greg Thelen
  2010-11-09 23:22   ` Minchan Kim
@ 2010-11-12  8:17   ` Johannes Weiner
  2010-11-12 20:39     ` Greg Thelen
  1 sibling, 1 reply; 33+ messages in thread
From: Johannes Weiner @ 2010-11-12  8:17 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Nov 09, 2010 at 01:24:28AM -0800, Greg Thelen wrote:
> If called with a mem_cgroup, then throttle_vm_writeout() should
> query the given cgroup for its dirty memory usage limits.
> 
> dirty_writeback_pages() is no longer used, so delete it.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
>  include/linux/writeback.h |    2 +-
>  mm/page-writeback.c       |   31 ++++++++++++++++---------------
>  mm/vmscan.c               |    2 +-
>  3 files changed, 18 insertions(+), 17 deletions(-)

> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index d717fa9..bf85062 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -131,18 +131,6 @@ EXPORT_SYMBOL(laptop_mode);
>  static struct prop_descriptor vm_completions;
>  static struct prop_descriptor vm_dirties;
>  
> -static unsigned long dirty_writeback_pages(void)
> -{
> -	unsigned long ret;
> -
> -	ret = mem_cgroup_page_stat(NULL, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> -	if ((long)ret < 0)
> -		ret = global_page_state(NR_UNSTABLE_NFS) +
> -			global_page_state(NR_WRITEBACK);

There are two bugfixes in this patch.  One is getting rid of this
fallback to global numbers that are compared to memcg limits.  The
other one is that reclaim will now throttle writeout based on the
cgroup it runs on behalf of, instead of that of the current task.

Both are undocumented and should arguably not even be in the same
patch...?

> @@ -703,12 +691,25 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
>  
> -void throttle_vm_writeout(gfp_t gfp_mask)
> +/*
> + * Throttle the current task if it is near dirty memory usage limits.
> + * If @mem_cgroup is NULL or the root_cgroup, then use global dirty memory
> + * information; otherwise use the per-memcg dirty limits.
> + */
> +void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup)
>  {
>  	struct dirty_info dirty_info;
> +	unsigned long nr_writeback;
>  
>          for ( ; ; ) {
> -		global_dirty_info(&dirty_info);
> +		if (!mem_cgroup || !memcg_dirty_info(mem_cgroup, &dirty_info)) {
> +			global_dirty_info(&dirty_info);
> +			nr_writeback = global_page_state(NR_UNSTABLE_NFS) +
> +				global_page_state(NR_WRITEBACK);
> +		} else {
> +			nr_writeback = mem_cgroup_page_stat(
> +				mem_cgroup, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> +		}

Odd branch ordering, but I may be OCDing again.

	if (mem_cgroup && memcg_dirty_info())
		do_mem_cgroup_stuff()
	else
		do_global_stuff()

would be more natural, IMO.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/6] memcg: simplify mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 4/6] memcg: simplify mem_cgroup_page_stat() Greg Thelen
  2010-11-09 23:36   ` Minchan Kim
@ 2010-11-12  8:19   ` Johannes Weiner
  2010-11-12 20:40     ` Greg Thelen
  2010-11-16  3:58   ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 33+ messages in thread
From: Johannes Weiner @ 2010-11-12  8:19 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Nov 09, 2010 at 01:24:29AM -0800, Greg Thelen wrote:
> The cgroup given to mem_cgroup_page_stat() is no allowed to be
> NULL or the root cgroup.  So there is no need to complicate the code
> handling those cases.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
>  mm/memcontrol.c |   48 ++++++++++++++++++++++--------------------------
>  1 files changed, 22 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index eb621ee..f8df350 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1364,12 +1364,10 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
>  
>  /*
>   * mem_cgroup_page_stat() - get memory cgroup file cache statistics
> - * @mem:	optional memory cgroup to query.  If NULL, use current task's
> - *		cgroup.
> + * @mem:	memory cgroup to query
>   * @item:	memory statistic item exported to the kernel
>   *
> - * Return the accounted statistic value or negative value if current task is
> - * root cgroup.
> + * Return the accounted statistic value.
>   */
>  long mem_cgroup_page_stat(struct mem_cgroup *mem,
>  			  enum mem_cgroup_nr_pages_item item)
> @@ -1377,29 +1375,27 @@ long mem_cgroup_page_stat(struct mem_cgroup *mem,
>  	struct mem_cgroup *iter;
>  	long value;
>  
> +	VM_BUG_ON(!mem);
> +	VM_BUG_ON(mem_cgroup_is_root(mem));
> +
>  	get_online_cpus();
> -	rcu_read_lock();
> -	if (!mem)
> -		mem = mem_cgroup_from_task(current);
> -	if (__mem_cgroup_has_dirty_limit(mem)) {

What about mem->use_hierarchy that is checked in
__mem_cgroup_has_dirty_limit()?  Is it no longer needed?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info()
  2010-11-09  9:24 ` [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info() Greg Thelen
  2010-11-10  1:01   ` Minchan Kim
@ 2010-11-12  8:21   ` Johannes Weiner
  2010-11-12 20:40     ` Greg Thelen
  1 sibling, 1 reply; 33+ messages in thread
From: Johannes Weiner @ 2010-11-12  8:21 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Nov 09, 2010 at 01:24:30AM -0800, Greg Thelen wrote:
> Because mem_cgroup_page_stat() no longer returns negative numbers
> to indicate failure, mem_cgroup_dirty_info() does not need to check
> for such failures.

This is simply not true at this point in time.  Patch ordering is not
optional.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
  2010-11-09 12:15   ` Wu Fengguang
  2010-11-10  1:04   ` Minchan Kim
@ 2010-11-12  8:29   ` Johannes Weiner
  2010-11-12 20:41     ` Greg Thelen
  2010-11-16  4:00   ` KAMEZAWA Hiroyuki
  3 siblings, 1 reply; 33+ messages in thread
From: Johannes Weiner @ 2010-11-12  8:29 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Nov 09, 2010 at 01:24:31AM -0800, Greg Thelen wrote:
> mem_cgroup_page_stat() used to return a negative page count
> value to indicate value.

Whoops :)

> mem_cgroup_page_stat() has changed so it never returns
> error so convert the return value to the traditional page
> count type (unsigned long).

This changelog feels a bit beside the point.

What's really interesting is that we now don't consider negative sums
to be invalid anymore, but just assume zero!  There is a real
semantical change here.

That the return type can then be changed to unsigned long is a nice
follow-up cleanup that happens to be folded into this patch.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware
  2010-11-12  8:17   ` Johannes Weiner
@ 2010-11-12 20:39     ` Greg Thelen
  2010-11-16  3:57       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 33+ messages in thread
From: Greg Thelen @ 2010-11-12 20:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Tue, Nov 09, 2010 at 01:24:28AM -0800, Greg Thelen wrote:
>> If called with a mem_cgroup, then throttle_vm_writeout() should
>> query the given cgroup for its dirty memory usage limits.
>> 
>> dirty_writeback_pages() is no longer used, so delete it.
>> 
>> Signed-off-by: Greg Thelen <gthelen@google.com>
>> ---
>>  include/linux/writeback.h |    2 +-
>>  mm/page-writeback.c       |   31 ++++++++++++++++---------------
>>  mm/vmscan.c               |    2 +-
>>  3 files changed, 18 insertions(+), 17 deletions(-)
>
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index d717fa9..bf85062 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -131,18 +131,6 @@ EXPORT_SYMBOL(laptop_mode);
>>  static struct prop_descriptor vm_completions;
>>  static struct prop_descriptor vm_dirties;
>>  
>> -static unsigned long dirty_writeback_pages(void)
>> -{
>> -	unsigned long ret;
>> -
>> -	ret = mem_cgroup_page_stat(NULL, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
>> -	if ((long)ret < 0)
>> -		ret = global_page_state(NR_UNSTABLE_NFS) +
>> -			global_page_state(NR_WRITEBACK);
>
> There are two bugfixes in this patch.  One is getting rid of this
> fallback to global numbers that are compared to memcg limits.  The
> other one is that reclaim will now throttle writeout based on the
> cgroup it runs on behalf of, instead of that of the current task.
>
> Both are undocumented and should arguably not even be in the same
> patch...?

I will better document these changes in the commit message and I will
split the change into two patches for clarity.

- sub-patch 1 will change throttle_vm_writeout() to only consider global
  usage and limits.  This would remove memcg consideration from
  throttle_vm_writeout() and thus ensure that only global limits are
  compared to global usage.

- sub-patch 2 will introduce memcg consideration consistently into
  throttle_vm_writeout().  This will allow throttle_vm_writeout() to
  consider memcg usage and limits, but they will uniformly applied.
  memcg usage will not be compared to global limits.

>> @@ -703,12 +691,25 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
>>  }
>>  EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
>>  
>> -void throttle_vm_writeout(gfp_t gfp_mask)
>> +/*
>> + * Throttle the current task if it is near dirty memory usage limits.
>> + * If @mem_cgroup is NULL or the root_cgroup, then use global dirty memory
>> + * information; otherwise use the per-memcg dirty limits.
>> + */
>> +void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup)
>>  {
>>  	struct dirty_info dirty_info;
>> +	unsigned long nr_writeback;
>>  
>>          for ( ; ; ) {
>> -		global_dirty_info(&dirty_info);
>> +		if (!mem_cgroup || !memcg_dirty_info(mem_cgroup, &dirty_info)) {
>> +			global_dirty_info(&dirty_info);
>> +			nr_writeback = global_page_state(NR_UNSTABLE_NFS) +
>> +				global_page_state(NR_WRITEBACK);
>> +		} else {
>> +			nr_writeback = mem_cgroup_page_stat(
>> +				mem_cgroup, MEMCG_NR_DIRTY_WRITEBACK_PAGES);
>> +		}
>
> Odd branch ordering, but I may be OCDing again.
>
> 	if (mem_cgroup && memcg_dirty_info())
> 		do_mem_cgroup_stuff()
> 	else
> 		do_global_stuff()
>
> would be more natural, IMO.

I agree.  I will resubmit this series with your improved branch ordering.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/6] memcg: simplify mem_cgroup_page_stat()
  2010-11-12  8:19   ` Johannes Weiner
@ 2010-11-12 20:40     ` Greg Thelen
  2010-11-19 11:22       ` Johannes Weiner
  0 siblings, 1 reply; 33+ messages in thread
From: Greg Thelen @ 2010-11-12 20:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Tue, Nov 09, 2010 at 01:24:29AM -0800, Greg Thelen wrote:
>> The cgroup given to mem_cgroup_page_stat() is no allowed to be
>> NULL or the root cgroup.  So there is no need to complicate the code
>> handling those cases.
>> 
>> Signed-off-by: Greg Thelen <gthelen@google.com>
>> ---
>>  mm/memcontrol.c |   48 ++++++++++++++++++++++--------------------------
>>  1 files changed, 22 insertions(+), 26 deletions(-)
>> 
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index eb621ee..f8df350 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1364,12 +1364,10 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
>>  
>>  /*
>>   * mem_cgroup_page_stat() - get memory cgroup file cache statistics
>> - * @mem:	optional memory cgroup to query.  If NULL, use current task's
>> - *		cgroup.
>> + * @mem:	memory cgroup to query
>>   * @item:	memory statistic item exported to the kernel
>>   *
>> - * Return the accounted statistic value or negative value if current task is
>> - * root cgroup.
>> + * Return the accounted statistic value.
>>   */
>>  long mem_cgroup_page_stat(struct mem_cgroup *mem,
>>  			  enum mem_cgroup_nr_pages_item item)
>> @@ -1377,29 +1375,27 @@ long mem_cgroup_page_stat(struct mem_cgroup *mem,
>>  	struct mem_cgroup *iter;
>>  	long value;
>>  
>> +	VM_BUG_ON(!mem);
>> +	VM_BUG_ON(mem_cgroup_is_root(mem));
>> +
>>  	get_online_cpus();
>> -	rcu_read_lock();
>> -	if (!mem)
>> -		mem = mem_cgroup_from_task(current);
>> -	if (__mem_cgroup_has_dirty_limit(mem)) {
>
> What about mem->use_hierarchy that is checked in
> __mem_cgroup_has_dirty_limit()?  Is it no longer needed?

It is no longer needed because the callers of mem_cgroup_page_stat()
call __mem_cgroup_has_dirty_limit().  In the current implementation, if
use_hierarchy=1 then the cgroup does not have dirty limits, so calls
into mem_cgroup_page_stat() are avoided.  Specifically the callers of
mem_cgroup_page_stat() are:

1. mem_cgroup_dirty_info() which calls __mem_cgroup_has_dirty_limit()
   and returns false if use_hierarchy=1.

2. throttle_vm_writeout() which calls mem_dirty_info() ->
   mem_cgroup_dirty_info() -> __mem_cgroup_has_dirty_limit() will fall
   back to global limits if use_hierarchy=1.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info()
  2010-11-12  8:21   ` Johannes Weiner
@ 2010-11-12 20:40     ` Greg Thelen
  0 siblings, 0 replies; 33+ messages in thread
From: Greg Thelen @ 2010-11-12 20:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Tue, Nov 09, 2010 at 01:24:30AM -0800, Greg Thelen wrote:
>> Because mem_cgroup_page_stat() no longer returns negative numbers
>> to indicate failure, mem_cgroup_dirty_info() does not need to check
>> for such failures.
>
> This is simply not true at this point in time.  Patch ordering is not
> optional.

Thanks.  Patch order will be corrected.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-12  8:29   ` Johannes Weiner
@ 2010-11-12 20:41     ` Greg Thelen
  2010-11-19 11:39       ` Johannes Weiner
  0 siblings, 1 reply; 33+ messages in thread
From: Greg Thelen @ 2010-11-12 20:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

DeJohannes Weiner <hannes@cmpxchg.org> writes:

> On Tue, Nov 09, 2010 at 01:24:31AM -0800, Greg Thelen wrote:
>> mem_cgroup_page_stat() used to return a negative page count
>> value to indicate value.
>
> Whoops :)
>
>> mem_cgroup_page_stat() has changed so it never returns
>> error so convert the return value to the traditional page
>> count type (unsigned long).
>
> This changelog feels a bit beside the point.
>
> What's really interesting is that we now don't consider negative sums
> to be invalid anymore, but just assume zero!  There is a real
> semantical change here.

Prior to this patch series mem_cgroup_page_stat() returned a negative
value (specifically -EINVAL) to indicate that the current task was in
the root_cgroup and thus the per-cgroup usage and limit counter were
invalid.  Callers treated all negative values as an indication of
root-cgroup message.

Unfortunately there was another way that mem_cgroup_page_stat() could
return a negative value even when current was not in the root cgroup.
Negative sums were a possibility due to summing of unsynchronized
per-cpu counters.  These occasional negative sums would fool callers
into thinking that the current task was in the root cgroup.

Would adding this description to the commit message address your
concerns?

> That the return type can then be changed to unsigned long is a nice
> follow-up cleanup that happens to be folded into this patch.

Good point.  I can separate the change into two sub-patches:
1. use zero for a min-value (as described above)
2. change return value to unsigned

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
  2010-11-09 22:53   ` Minchan Kim
  2010-11-10  0:51   ` Minchan Kim
@ 2010-11-16  3:50   ` KAMEZAWA Hiroyuki
  2010-11-22  6:40   ` Balbir Singh
  3 siblings, 0 replies; 33+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-16  3:50 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, Daisuke Nishimura, Johannes Weiner,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue,  9 Nov 2010 01:24:26 -0800
Greg Thelen <gthelen@google.com> wrote:

> This new parameter can be used to query dirty memory usage
> from a given memcg rather than the current task's memcg.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info()
  2010-11-09  9:24 ` [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info() Greg Thelen
  2010-11-09 23:09   ` Minchan Kim
@ 2010-11-16  3:51   ` KAMEZAWA Hiroyuki
  2010-11-22  6:41   ` Balbir Singh
  2 siblings, 0 replies; 33+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-16  3:51 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, Daisuke Nishimura, Johannes Weiner,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue,  9 Nov 2010 01:24:27 -0800
Greg Thelen <gthelen@google.com> wrote:

> Pass mem_cgroup parameter through memcg_dirty_info() into
> mem_cgroup_dirty_info().  This allows for querying dirty memory
> information from a particular cgroup, rather than just the
> current task's cgroup.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware
  2010-11-12 20:39     ` Greg Thelen
@ 2010-11-16  3:57       ` KAMEZAWA Hiroyuki
  2010-11-19 11:16         ` Johannes Weiner
  0 siblings, 1 reply; 33+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-16  3:57 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Johannes Weiner, Andrew Morton, Balbir Singh, Daisuke Nishimura,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Fri, 12 Nov 2010 12:39:35 -0800
Greg Thelen <gthelen@google.com> wrote:

> > Odd branch ordering, but I may be OCDing again.
> >
> > 	if (mem_cgroup && memcg_dirty_info())
> > 		do_mem_cgroup_stuff()
> > 	else
> > 		do_global_stuff()
> >
> > would be more natural, IMO.
> 
> I agree.  I will resubmit this series with your improved branch ordering.
> 

Hmm. I think this patch is troublesome.

This patch will make memcg's pageout routine _not_ throttoled even when the whole
system vmscan's pageout is throttoled.

So, one idea is....

Make this change 
==
+++ b/mm/vmscan.c
@@ -1844,7 +1844,7 @@ static void shrink_zone(int priority, struct zone *zone,
 	if (inactive_anon_is_low(zone, sc))
 		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
 
-	throttle_vm_writeout(sc->gfp_mask);
+	throttle_vm_writeout(sc->gfp_mask, sc->mem_cgroup);
 }
==
as

==
	
if (!sc->mem_cgroup || throttle_vm_writeout(sc->gfp_mask, sc->mem_cgroup) == not throttled)
	throttole_vm_writeout(sc->gfp_mask, NULL);

Then, both of memcg and global dirty thresh will be checked.


-Kame


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/6] memcg: simplify mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 4/6] memcg: simplify mem_cgroup_page_stat() Greg Thelen
  2010-11-09 23:36   ` Minchan Kim
  2010-11-12  8:19   ` Johannes Weiner
@ 2010-11-16  3:58   ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 33+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-16  3:58 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, Daisuke Nishimura, Johannes Weiner,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue,  9 Nov 2010 01:24:29 -0800
Greg Thelen <gthelen@google.com> wrote:

> The cgroup given to mem_cgroup_page_stat() is no allowed to be
> NULL or the root cgroup.  So there is no need to complicate the code
> handling those cases.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
                     ` (2 preceding siblings ...)
  2010-11-12  8:29   ` Johannes Weiner
@ 2010-11-16  4:00   ` KAMEZAWA Hiroyuki
  3 siblings, 0 replies; 33+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-16  4:00 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, Daisuke Nishimura, Johannes Weiner,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue,  9 Nov 2010 01:24:31 -0800
Greg Thelen <gthelen@google.com> wrote:

> mem_cgroup_page_stat() used to return a negative page count
> value to indicate value.
> 
> mem_cgroup_page_stat() has changed so it never returns
> error so convert the return value to the traditional page
> count type (unsigned long).
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

Okay.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware
  2010-11-16  3:57       ` KAMEZAWA Hiroyuki
@ 2010-11-19 11:16         ` Johannes Weiner
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Weiner @ 2010-11-19 11:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Greg Thelen, Andrew Morton, Balbir Singh, Daisuke Nishimura,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

Hello,

On Tue, Nov 16, 2010 at 12:57:26PM +0900, KAMEZAWA Hiroyuki wrote:
> Hmm. I think this patch is troublesome.
> 
> This patch will make memcg's pageout routine _not_ throttoled even when the whole
> system vmscan's pageout is throttoled.
> 
> So, one idea is....
> 
> Make this change 
> ==
> +++ b/mm/vmscan.c
> @@ -1844,7 +1844,7 @@ static void shrink_zone(int priority, struct zone *zone,
>  	if (inactive_anon_is_low(zone, sc))
>  		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>  
> -	throttle_vm_writeout(sc->gfp_mask);
> +	throttle_vm_writeout(sc->gfp_mask, sc->mem_cgroup);
>  }
> ==
> as
> 
> ==
> 	
> if (!sc->mem_cgroup || throttle_vm_writeout(sc->gfp_mask, sc->mem_cgroup) == not throttled)
> 	throttole_vm_writeout(sc->gfp_mask, NULL);
> 
> Then, both of memcg and global dirty thresh will be checked.

Good point, both limits should apply.

I'd prefer to stuff it all into throttle_vm_writeout() and not encode
memcg-specific behaviour into the caller, though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/6] memcg: simplify mem_cgroup_page_stat()
  2010-11-12 20:40     ` Greg Thelen
@ 2010-11-19 11:22       ` Johannes Weiner
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Weiner @ 2010-11-19 11:22 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

On Fri, Nov 12, 2010 at 12:40:22PM -0800, Greg Thelen wrote:
> Johannes Weiner <hannes@cmpxchg.org> writes:
> 
> > On Tue, Nov 09, 2010 at 01:24:29AM -0800, Greg Thelen wrote:
> >> The cgroup given to mem_cgroup_page_stat() is no allowed to be
> >> NULL or the root cgroup.  So there is no need to complicate the code
> >> handling those cases.
> >> 
> >> Signed-off-by: Greg Thelen <gthelen@google.com>
> >> ---
> >>  mm/memcontrol.c |   48 ++++++++++++++++++++++--------------------------
> >>  1 files changed, 22 insertions(+), 26 deletions(-)
> >> 
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index eb621ee..f8df350 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -1364,12 +1364,10 @@ memcg_hierarchical_free_pages(struct mem_cgroup *mem)
> >>  
> >>  /*
> >>   * mem_cgroup_page_stat() - get memory cgroup file cache statistics
> >> - * @mem:	optional memory cgroup to query.  If NULL, use current task's
> >> - *		cgroup.
> >> + * @mem:	memory cgroup to query
> >>   * @item:	memory statistic item exported to the kernel
> >>   *
> >> - * Return the accounted statistic value or negative value if current task is
> >> - * root cgroup.
> >> + * Return the accounted statistic value.
> >>   */
> >>  long mem_cgroup_page_stat(struct mem_cgroup *mem,
> >>  			  enum mem_cgroup_nr_pages_item item)
> >> @@ -1377,29 +1375,27 @@ long mem_cgroup_page_stat(struct mem_cgroup *mem,
> >>  	struct mem_cgroup *iter;
> >>  	long value;
> >>  
> >> +	VM_BUG_ON(!mem);
> >> +	VM_BUG_ON(mem_cgroup_is_root(mem));
> >> +
> >>  	get_online_cpus();
> >> -	rcu_read_lock();
> >> -	if (!mem)
> >> -		mem = mem_cgroup_from_task(current);
> >> -	if (__mem_cgroup_has_dirty_limit(mem)) {
> >
> > What about mem->use_hierarchy that is checked in
> > __mem_cgroup_has_dirty_limit()?  Is it no longer needed?
> 
> It is no longer needed because the callers of mem_cgroup_page_stat()
> call __mem_cgroup_has_dirty_limit().  In the current implementation, if
> use_hierarchy=1 then the cgroup does not have dirty limits, so calls
> into mem_cgroup_page_stat() are avoided.  Specifically the callers of
> mem_cgroup_page_stat() are:
> 
> 1. mem_cgroup_dirty_info() which calls __mem_cgroup_has_dirty_limit()
>    and returns false if use_hierarchy=1.
> 
> 2. throttle_vm_writeout() which calls mem_dirty_info() ->
>    mem_cgroup_dirty_info() -> __mem_cgroup_has_dirty_limit() will fall
>    back to global limits if use_hierarchy=1.

Thanks for the clarification.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned
  2010-11-12 20:41     ` Greg Thelen
@ 2010-11-19 11:39       ` Johannes Weiner
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Weiner @ 2010-11-19 11:39 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

On Fri, Nov 12, 2010 at 12:41:15PM -0800, Greg Thelen wrote:
> >> mem_cgroup_page_stat() has changed so it never returns
> >> error so convert the return value to the traditional page
> >> count type (unsigned long).
> >
> > This changelog feels a bit beside the point.
> >
> > What's really interesting is that we now don't consider negative sums
> > to be invalid anymore, but just assume zero!  There is a real
> > semantical change here.
> 
> Prior to this patch series mem_cgroup_page_stat() returned a negative
> value (specifically -EINVAL) to indicate that the current task was in
> the root_cgroup and thus the per-cgroup usage and limit counter were
> invalid.  Callers treated all negative values as an indication of
> root-cgroup message.
> 
> Unfortunately there was another way that mem_cgroup_page_stat() could
> return a negative value even when current was not in the root cgroup.
> Negative sums were a possibility due to summing of unsynchronized
> per-cpu counters.  These occasional negative sums would fool callers
> into thinking that the current task was in the root cgroup.
> 
> Would adding this description to the commit message address your
> concerns?

I'd just describe that summing per-cpu counters is racy, that we can
end up with negative results, and the only sensible handling of that
is to assume zero.

> > That the return type can then be changed to unsigned long is a nice
> > follow-up cleanup that happens to be folded into this patch.
> 
> Good point.  I can separate the change into two sub-patches:
> 1. use zero for a min-value (as described above)
> 2. change return value to unsigned

Sounds good.  You can just fold the previous patch (adjusting the
callsites) into 2, which should take care of the ordering problem.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat()
  2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
                     ` (2 preceding siblings ...)
  2010-11-16  3:50   ` KAMEZAWA Hiroyuki
@ 2010-11-22  6:40   ` Balbir Singh
  3 siblings, 0 replies; 33+ messages in thread
From: Balbir Singh @ 2010-11-22  6:40 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

* Greg Thelen <gthelen@google.com> [2010-11-09 01:24:26]:

> This new parameter can be used to query dirty memory usage
> from a given memcg rather than the current task's memcg.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
>

How is this useful, documenting that in the changelog would be nice. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info()
  2010-11-09  9:24 ` [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info() Greg Thelen
  2010-11-09 23:09   ` Minchan Kim
  2010-11-16  3:51   ` KAMEZAWA Hiroyuki
@ 2010-11-22  6:41   ` Balbir Singh
  2 siblings, 0 replies; 33+ messages in thread
From: Balbir Singh @ 2010-11-22  6:41 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Johannes Weiner, Wu Fengguang, Minchan Kim, linux-mm,
	linux-kernel

* Greg Thelen <gthelen@google.com> [2010-11-09 01:24:27]:

> Pass mem_cgroup parameter through memcg_dirty_info() into
> mem_cgroup_dirty_info().  This allows for querying dirty memory
> information from a particular cgroup, rather than just the
> current task's cgroup.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
 
-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2010-11-22  6:41 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-09  9:24 [PATCH 0/6] *** memcg: make throttle_vm_writeout() cgroup aware *** Greg Thelen
2010-11-09  9:24 ` [PATCH 1/6] memcg: add mem_cgroup parameter to mem_cgroup_page_stat() Greg Thelen
2010-11-09 22:53   ` Minchan Kim
2010-11-10  0:51   ` Minchan Kim
2010-11-16  3:50   ` KAMEZAWA Hiroyuki
2010-11-22  6:40   ` Balbir Singh
2010-11-09  9:24 ` [PATCH 2/6] memcg: pass mem_cgroup to mem_cgroup_dirty_info() Greg Thelen
2010-11-09 23:09   ` Minchan Kim
2010-11-16  3:51   ` KAMEZAWA Hiroyuki
2010-11-22  6:41   ` Balbir Singh
2010-11-09  9:24 ` [PATCH 3/6] memcg: make throttle_vm_writeout() memcg aware Greg Thelen
2010-11-09 23:22   ` Minchan Kim
2010-11-12  8:17   ` Johannes Weiner
2010-11-12 20:39     ` Greg Thelen
2010-11-16  3:57       ` KAMEZAWA Hiroyuki
2010-11-19 11:16         ` Johannes Weiner
2010-11-09  9:24 ` [PATCH 4/6] memcg: simplify mem_cgroup_page_stat() Greg Thelen
2010-11-09 23:36   ` Minchan Kim
2010-11-12  8:19   ` Johannes Weiner
2010-11-12 20:40     ` Greg Thelen
2010-11-19 11:22       ` Johannes Weiner
2010-11-16  3:58   ` KAMEZAWA Hiroyuki
2010-11-09  9:24 ` [PATCH 5/6] memcg: simplify mem_cgroup_dirty_info() Greg Thelen
2010-11-10  1:01   ` Minchan Kim
2010-11-12  8:21   ` Johannes Weiner
2010-11-12 20:40     ` Greg Thelen
2010-11-09  9:24 ` [PATCH 6/6] memcg: make mem_cgroup_page_stat() return value unsigned Greg Thelen
2010-11-09 12:15   ` Wu Fengguang
2010-11-10  1:04   ` Minchan Kim
2010-11-12  8:29   ` Johannes Weiner
2010-11-12 20:41     ` Greg Thelen
2010-11-19 11:39       ` Johannes Weiner
2010-11-16  4:00   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).