All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-08  5:05 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-08  5:05 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, Ying Han, hannes, Balbir Singh, nishimura

This patch is made against mainline git tree.
==
>From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Wed, 8 Jun 2011 13:51:11 +0900
Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.

For performance, memory cgroup caches some "charge" from res_counter
into per cpu cache. This works well but because it's cache,
it needs to be flushed in some cases. Typical cases are
	1. when someone hit limit.
	2. when rmdir() is called and need to charges to be 0.

But "1" has problem.

Recently, with large SMP machines, we see many kworker/%d:%d when
memcg hit limit. It is because of flushing memcg's percpu cache. 
Bad things in implementation are

a) it's called before calling try_to_free_mem_cgroup_pages()
   so, it's called immidiately when a task hit limit.
   (I thought it was better to avoid to run into memory reclaim.
    But it was wrong decision.)

b) Even if a cpu contains a cache for memcg not related to
   a memcg which hits limit, drain code is called.

This patch fixes a) and b) by

A) delay calling of flushing until one run of try_to_free...
   Then, the number of calling is much decreased.
B) check percpu cache contains a useful data or not.
plus
C) check asynchronous percpu draining doesn't run on the cpu.

Reported-by: Ying Han <yinghan@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
 1 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bd9052a..c22c0eb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -359,7 +359,7 @@ enum charge_type {
 static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
-static void drain_all_stock_async(void);
+static void drain_all_stock_async(struct mem_cgroup *mem);
 
 static struct mem_cgroup_per_zone *
 mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
 			loop++;
-			if (loop >= 1)
-				drain_all_stock_async();
 			if (loop >= 2) {
 				/*
 				 * If we have not been able to reclaim
@@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 				return total;
 		} else if (mem_cgroup_margin(root_mem))
 			return total;
+		drain_all_stock_async(root_mem);
 	}
 	return total;
 }
@@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
 	struct mem_cgroup *cached; /* this never be root cgroup */
 	unsigned int nr_pages;
 	struct work_struct work;
+	unsigned long flags;
+#define ASYNC_FLUSHING	(0)
 };
 static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
-static atomic_t memcg_drain_count;
+static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
 
 /*
  * Try to consume stocked charge on this cpu. If success, one page is consumed
@@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
 {
 	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
 	drain_stock(stock);
+	clear_bit(ASYNC_FLUSHING, &stock->flags);
 }
 
 /*
@@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
  * Tries to drain stocked charges in other cpus. This function is asynchronous
  * and just put a work per cpu for draining localy on each cpu. Caller can
  * expects some charges will be back to res_counter later but cannot wait for
- * it.
+ * it. This runs only when per-cpu stock contains information of memcg which
+ * is under specified root_mem and no other flush runs.
  */
-static void drain_all_stock_async(void)
+static void drain_all_stock_async(struct mem_cgroup *root_mem)
 {
 	int cpu;
-	/* This function is for scheduling "drain" in asynchronous way.
-	 * The result of "drain" is not directly handled by callers. Then,
-	 * if someone is calling drain, we don't have to call drain more.
-	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
-	 * there is a race. We just do loose check here.
+
+	/*
+	 * If synchronous flushing (which flushes all cpus's cache) runs,
+	 * do nothing.
 	 */
-	if (atomic_read(&memcg_drain_count))
+	if (unlikely(atomic_read(&memcg_drain_count)))
 		return;
-	/* Notify other cpus that system-wide "drain" is running */
-	atomic_inc(&memcg_drain_count);
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		schedule_work_on(cpu, &stock->work);
+		struct mem_cgroup *mem;
+		bool do_flush;
+
+		rcu_read_lock();
+		mem = stock->cached;
+		if (!mem) {
+			rcu_read_unlock();
+			continue;
+		}
+		do_flush = ((mem == root_mem) ||
+		     	css_is_ancestor(&mem->css, &root_mem->css));
+		rcu_read_unlock();
+		if (do_flush && !test_and_set_bit(ASYNC_FLUSHING, &stock->flags))
+			schedule_work_on(cpu, &stock->work);
 	}
  	put_online_cpus();
-	atomic_dec(&memcg_drain_count);
 	/* We don't wait for flush_work */
 }
 
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-08  5:05 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-08  5:05 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, Ying Han, hannes, Balbir Singh, nishimura

This patch is made against mainline git tree.
==

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
  2011-06-08  5:05 ` KAMEZAWA Hiroyuki
@ 2011-06-08  5:49   ` Daisuke Nishimura
  -1 siblings, 0 replies; 12+ messages in thread
From: Daisuke Nishimura @ 2011-06-08  5:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, Ying Han, hannes, Balbir Singh,
	Daisuke Nishimura

I have a few minor comments.

On Wed, 8 Jun 2011 14:05:18 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> This patch is made against mainline git tree.
> ==
> From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Wed, 8 Jun 2011 13:51:11 +0900
> Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> 
> For performance, memory cgroup caches some "charge" from res_counter
> into per cpu cache. This works well but because it's cache,
> it needs to be flushed in some cases. Typical cases are
> 	1. when someone hit limit.
> 	2. when rmdir() is called and need to charges to be 0.
> 
> But "1" has problem.
> 
> Recently, with large SMP machines, we see many kworker/%d:%d when
> memcg hit limit. It is because of flushing memcg's percpu cache. 
> Bad things in implementation are
> 
> a) it's called before calling try_to_free_mem_cgroup_pages()
>    so, it's called immidiately when a task hit limit.
>    (I thought it was better to avoid to run into memory reclaim.
>     But it was wrong decision.)
> 
> b) Even if a cpu contains a cache for memcg not related to
>    a memcg which hits limit, drain code is called.
> 
> This patch fixes a) and b) by
> 
> A) delay calling of flushing until one run of try_to_free...
>    Then, the number of calling is much decreased.
> B) check percpu cache contains a useful data or not.
> plus
> C) check asynchronous percpu draining doesn't run on the cpu.
> 
> Reported-by: Ying Han <yinghan@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
>  1 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bd9052a..c22c0eb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -359,7 +359,7 @@ enum charge_type {
>  static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> -static void drain_all_stock_async(void);
> +static void drain_all_stock_async(struct mem_cgroup *mem);
>  
>  static struct mem_cgroup_per_zone *
>  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  		victim = mem_cgroup_select_victim(root_mem);
>  		if (victim == root_mem) {
>  			loop++;
> -			if (loop >= 1)
> -				drain_all_stock_async();
>  			if (loop >= 2) {
>  				/*
>  				 * If we have not been able to reclaim
> @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  				return total;
>  		} else if (mem_cgroup_margin(root_mem))
>  			return total;
> +		drain_all_stock_async(root_mem);
>  	}
>  	return total;
>  }
> @@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
>  	struct mem_cgroup *cached; /* this never be root cgroup */
>  	unsigned int nr_pages;
>  	struct work_struct work;
> +	unsigned long flags;
> +#define ASYNC_FLUSHING	(0)
>  };
>  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static atomic_t memcg_drain_count;
> +static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
>  
>  /*
>   * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
>  {
>  	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
>  	drain_stock(stock);
> +	clear_bit(ASYNC_FLUSHING, &stock->flags);
>  }
>  
>  /*
> @@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
>   * Tries to drain stocked charges in other cpus. This function is asynchronous
>   * and just put a work per cpu for draining localy on each cpu. Caller can
>   * expects some charges will be back to res_counter later but cannot wait for
> - * it.
> + * it. This runs only when per-cpu stock contains information of memcg which
> + * is under specified root_mem and no other flush runs.
>   */
> -static void drain_all_stock_async(void)
> +static void drain_all_stock_async(struct mem_cgroup *root_mem)
>  {
>  	int cpu;
> -	/* This function is for scheduling "drain" in asynchronous way.
> -	 * The result of "drain" is not directly handled by callers. Then,
> -	 * if someone is calling drain, we don't have to call drain more.
> -	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> -	 * there is a race. We just do loose check here.
> +
> +	/*
> +	 * If synchronous flushing (which flushes all cpus's cache) runs,
> +	 * do nothing.
>  	 */
> -	if (atomic_read(&memcg_drain_count))
> +	if (unlikely(atomic_read(&memcg_drain_count)))
>  		return;
> -	/* Notify other cpus that system-wide "drain" is running */
> -	atomic_inc(&memcg_drain_count);
>  	get_online_cpus();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> -		schedule_work_on(cpu, &stock->work);
> +		struct mem_cgroup *mem;
> +		bool do_flush;
> +
> +		rcu_read_lock();

Should this rcu_read_lock() be placed here ? IIUC, it's necessary only for css_is_ancestor().

> +		mem = stock->cached;
> +		if (!mem) {
> +			rcu_read_unlock();
> +			continue;
> +		}
> +		do_flush = ((mem == root_mem) ||
> +		     	css_is_ancestor(&mem->css, &root_mem->css));

Adding "root_mem->use_hierarchy" is better to avoid flusing the cache as long as possible.


Thanks,
Daisuke Nishimura.

> +		rcu_read_unlock();
> +		if (do_flush && !test_and_set_bit(ASYNC_FLUSHING, &stock->flags))
> +			schedule_work_on(cpu, &stock->work);
>  	}
>   	put_online_cpus();
> -	atomic_dec(&memcg_drain_count);
>  	/* We don't wait for flush_work */
>  }
>  
> -- 
> 1.7.4.1
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-08  5:49   ` Daisuke Nishimura
  0 siblings, 0 replies; 12+ messages in thread
From: Daisuke Nishimura @ 2011-06-08  5:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, Ying Han, hannes, Balbir Singh,
	Daisuke Nishimura

I have a few minor comments.

On Wed, 8 Jun 2011 14:05:18 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> This patch is made against mainline git tree.
> ==
> From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Wed, 8 Jun 2011 13:51:11 +0900
> Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> 
> For performance, memory cgroup caches some "charge" from res_counter
> into per cpu cache. This works well but because it's cache,
> it needs to be flushed in some cases. Typical cases are
> 	1. when someone hit limit.
> 	2. when rmdir() is called and need to charges to be 0.
> 
> But "1" has problem.
> 
> Recently, with large SMP machines, we see many kworker/%d:%d when
> memcg hit limit. It is because of flushing memcg's percpu cache. 
> Bad things in implementation are
> 
> a) it's called before calling try_to_free_mem_cgroup_pages()
>    so, it's called immidiately when a task hit limit.
>    (I thought it was better to avoid to run into memory reclaim.
>     But it was wrong decision.)
> 
> b) Even if a cpu contains a cache for memcg not related to
>    a memcg which hits limit, drain code is called.
> 
> This patch fixes a) and b) by
> 
> A) delay calling of flushing until one run of try_to_free...
>    Then, the number of calling is much decreased.
> B) check percpu cache contains a useful data or not.
> plus
> C) check asynchronous percpu draining doesn't run on the cpu.
> 
> Reported-by: Ying Han <yinghan@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
>  1 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bd9052a..c22c0eb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -359,7 +359,7 @@ enum charge_type {
>  static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> -static void drain_all_stock_async(void);
> +static void drain_all_stock_async(struct mem_cgroup *mem);
>  
>  static struct mem_cgroup_per_zone *
>  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  		victim = mem_cgroup_select_victim(root_mem);
>  		if (victim == root_mem) {
>  			loop++;
> -			if (loop >= 1)
> -				drain_all_stock_async();
>  			if (loop >= 2) {
>  				/*
>  				 * If we have not been able to reclaim
> @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  				return total;
>  		} else if (mem_cgroup_margin(root_mem))
>  			return total;
> +		drain_all_stock_async(root_mem);
>  	}
>  	return total;
>  }
> @@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
>  	struct mem_cgroup *cached; /* this never be root cgroup */
>  	unsigned int nr_pages;
>  	struct work_struct work;
> +	unsigned long flags;
> +#define ASYNC_FLUSHING	(0)
>  };
>  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static atomic_t memcg_drain_count;
> +static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
>  
>  /*
>   * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
>  {
>  	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
>  	drain_stock(stock);
> +	clear_bit(ASYNC_FLUSHING, &stock->flags);
>  }
>  
>  /*
> @@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
>   * Tries to drain stocked charges in other cpus. This function is asynchronous
>   * and just put a work per cpu for draining localy on each cpu. Caller can
>   * expects some charges will be back to res_counter later but cannot wait for
> - * it.
> + * it. This runs only when per-cpu stock contains information of memcg which
> + * is under specified root_mem and no other flush runs.
>   */
> -static void drain_all_stock_async(void)
> +static void drain_all_stock_async(struct mem_cgroup *root_mem)
>  {
>  	int cpu;
> -	/* This function is for scheduling "drain" in asynchronous way.
> -	 * The result of "drain" is not directly handled by callers. Then,
> -	 * if someone is calling drain, we don't have to call drain more.
> -	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> -	 * there is a race. We just do loose check here.
> +
> +	/*
> +	 * If synchronous flushing (which flushes all cpus's cache) runs,
> +	 * do nothing.
>  	 */
> -	if (atomic_read(&memcg_drain_count))
> +	if (unlikely(atomic_read(&memcg_drain_count)))
>  		return;
> -	/* Notify other cpus that system-wide "drain" is running */
> -	atomic_inc(&memcg_drain_count);
>  	get_online_cpus();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> -		schedule_work_on(cpu, &stock->work);
> +		struct mem_cgroup *mem;
> +		bool do_flush;
> +
> +		rcu_read_lock();

Should this rcu_read_lock() be placed here ? IIUC, it's necessary only for css_is_ancestor().

> +		mem = stock->cached;
> +		if (!mem) {
> +			rcu_read_unlock();
> +			continue;
> +		}
> +		do_flush = ((mem == root_mem) ||
> +		     	css_is_ancestor(&mem->css, &root_mem->css));

Adding "root_mem->use_hierarchy" is better to avoid flusing the cache as long as possible.


Thanks,
Daisuke Nishimura.

> +		rcu_read_unlock();
> +		if (do_flush && !test_and_set_bit(ASYNC_FLUSHING, &stock->flags))
> +			schedule_work_on(cpu, &stock->work);
>  	}
>   	put_online_cpus();
> -	atomic_dec(&memcg_drain_count);
>  	/* We don't wait for flush_work */
>  }
>  
> -- 
> 1.7.4.1
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
  2011-06-08  5:49   ` Daisuke Nishimura
@ 2011-06-08  6:29     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-08  6:29 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, linux-kernel, akpm, Ying Han, hannes, Balbir Singh

On Wed, 8 Jun 2011 14:49:34 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> I have a few minor comments.
> 
> On Wed, 8 Jun 2011 14:05:18 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > This patch is made against mainline git tree.
> > ==
> > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > 
> > For performance, memory cgroup caches some "charge" from res_counter
> > into per cpu cache. This works well but because it's cache,
> > it needs to be flushed in some cases. Typical cases are
> > 	1. when someone hit limit.
> > 	2. when rmdir() is called and need to charges to be 0.
> > 
> > But "1" has problem.
> > 
> > Recently, with large SMP machines, we see many kworker/%d:%d when
> > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > Bad things in implementation are
> > 
> > a) it's called before calling try_to_free_mem_cgroup_pages()
> >    so, it's called immidiately when a task hit limit.
> >    (I thought it was better to avoid to run into memory reclaim.
> >     But it was wrong decision.)
> > 
> > b) Even if a cpu contains a cache for memcg not related to
> >    a memcg which hits limit, drain code is called.
> > 
> > This patch fixes a) and b) by
> > 
> > A) delay calling of flushing until one run of try_to_free...
> >    Then, the number of calling is much decreased.
> > B) check percpu cache contains a useful data or not.
> > plus
> > C) check asynchronous percpu draining doesn't run on the cpu.
> > 
> > Reported-by: Ying Han <yinghan@google.com>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
> >  1 files changed, 28 insertions(+), 16 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bd9052a..c22c0eb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -359,7 +359,7 @@ enum charge_type {
> >  static void mem_cgroup_get(struct mem_cgroup *mem);
> >  static void mem_cgroup_put(struct mem_cgroup *mem);
> >  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> > -static void drain_all_stock_async(void);
> > +static void drain_all_stock_async(struct mem_cgroup *mem);
> >  
> >  static struct mem_cgroup_per_zone *
> >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >  		victim = mem_cgroup_select_victim(root_mem);
> >  		if (victim == root_mem) {
> >  			loop++;
> > -			if (loop >= 1)
> > -				drain_all_stock_async();
> >  			if (loop >= 2) {
> >  				/*
> >  				 * If we have not been able to reclaim
> > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >  				return total;
> >  		} else if (mem_cgroup_margin(root_mem))
> >  			return total;
> > +		drain_all_stock_async(root_mem);
> >  	}
> >  	return total;
> >  }
> > @@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
> >  	struct mem_cgroup *cached; /* this never be root cgroup */
> >  	unsigned int nr_pages;
> >  	struct work_struct work;
> > +	unsigned long flags;
> > +#define ASYNC_FLUSHING	(0)
> >  };
> >  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > -static atomic_t memcg_drain_count;
> > +static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
> >  
> >  /*
> >   * Try to consume stocked charge on this cpu. If success, one page is consumed
> > @@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
> >  {
> >  	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
> >  	drain_stock(stock);
> > +	clear_bit(ASYNC_FLUSHING, &stock->flags);
> >  }
> >  
> >  /*
> > @@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
> >   * Tries to drain stocked charges in other cpus. This function is asynchronous
> >   * and just put a work per cpu for draining localy on each cpu. Caller can
> >   * expects some charges will be back to res_counter later but cannot wait for
> > - * it.
> > + * it. This runs only when per-cpu stock contains information of memcg which
> > + * is under specified root_mem and no other flush runs.
> >   */
> > -static void drain_all_stock_async(void)
> > +static void drain_all_stock_async(struct mem_cgroup *root_mem)
> >  {
> >  	int cpu;
> > -	/* This function is for scheduling "drain" in asynchronous way.
> > -	 * The result of "drain" is not directly handled by callers. Then,
> > -	 * if someone is calling drain, we don't have to call drain more.
> > -	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> > -	 * there is a race. We just do loose check here.
> > +
> > +	/*
> > +	 * If synchronous flushing (which flushes all cpus's cache) runs,
> > +	 * do nothing.
> >  	 */
> > -	if (atomic_read(&memcg_drain_count))
> > +	if (unlikely(atomic_read(&memcg_drain_count)))
> >  		return;
> > -	/* Notify other cpus that system-wide "drain" is running */
> > -	atomic_inc(&memcg_drain_count);
> >  	get_online_cpus();
> >  	for_each_online_cpu(cpu) {
> >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > -		schedule_work_on(cpu, &stock->work);
> > +		struct mem_cgroup *mem;
> > +		bool do_flush;
> > +
> > +		rcu_read_lock();
> 
> Should this rcu_read_lock() be placed here ? IIUC, it's necessary only for css_is_ancestor().
> 

I thought rcu_read_lock() is required before getting a pointer to be acceseed.

But hmm...at second thought..
 1. stock->cached is flushed before memcg is destroyed..
 2. force_empty()(before destroy memcg) and this function can
    do mutual execution by some lock.

Then, it's safe to access stock->cached. Ok. I'll move this (with some comment)

> > +		mem = stock->cached;
> > +		if (!mem) {
> > +			rcu_read_unlock();
> > +			continue;
> > +		}
> > +		do_flush = ((mem == root_mem) ||
> > +		     	css_is_ancestor(&mem->css, &root_mem->css));
> 
> Adding "root_mem->use_hierarchy" is better to avoid flusing the cache as long as possible.
> 

ok. here is v2.

==
>From e5e0b342be74144f13b6a6aac6f54bce8cf64857 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Wed, 8 Jun 2011 13:51:11 +0900
Subject: [PATCH] Fix behavior of per-cpu charge cache draining in memcg.

For performance, memory cgroup caches some "charge" from res_counter
into per cpu cache. This works well but because it's cache,
it needs to be flushed in some cases. Typical cases are
	1. when someone hit limit.
	2. when rmdir() is called and need to charges to be 0.

But "1" has problem.

Recently, with large SMP machines, we many kworker runs because
of flushing memcg's cache. Bad things in implementation are

a) it's called before calling try_to_free_mem_cgroup_pages()
   so, it's called immidiately when a task hit limit.
   (I though it was better to avoid to run into memory reclaim.
    But it was wrong decision.)

b) Even if a cpu contains a cache for memcg not related to
   a memcg which hits limit, drain code is called.

This patch fixes a) and b) by

A) delay calling of flushing until one run of try_to_free...
   Then, the number of calling is decreased.
B) check percpu cache contains a useful data or not.
plus
C) check asynchronous percpu draining doesn't run.

Reported-by: Ying Han <yinghan@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Changelog:
 - fixed rcu_read_lock() and add strict mutal execution between
   asynchronous and synchronous flushing. It's requred for validness
   of cached pointer.
 - add root_mem->use_hierarchy check.
---
 mm/memcontrol.c |   54 +++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bd9052a..e5609b9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -359,7 +359,7 @@ enum charge_type {
 static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
-static void drain_all_stock_async(void);
+static void drain_all_stock_async(struct mem_cgroup *mem);
 
 static struct mem_cgroup_per_zone *
 mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
 			loop++;
-			if (loop >= 1)
-				drain_all_stock_async();
 			if (loop >= 2) {
 				/*
 				 * If we have not been able to reclaim
@@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 				return total;
 		} else if (mem_cgroup_margin(root_mem))
 			return total;
+		drain_all_stock_async(root_mem);
 	}
 	return total;
 }
@@ -1934,9 +1933,12 @@ struct memcg_stock_pcp {
 	struct mem_cgroup *cached; /* this never be root cgroup */
 	unsigned int nr_pages;
 	struct work_struct work;
+	unsigned long flags;
+#define ASYNC_FLUSHING	(0)
 };
 static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
-static atomic_t memcg_drain_count;
+/* mutex for draining percpu charge for avoid too much kworker works. */
+DEFINE_MUTEX(percpu_charge_mutex);
 
 /*
  * Try to consume stocked charge on this cpu. If success, one page is consumed
@@ -1984,6 +1986,7 @@ static void drain_local_stock(struct work_struct *dummy)
 {
 	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
 	drain_stock(stock);
+	clear_bit(ASYNC_FLUSHING, &stock->flags);
 }
 
 /*
@@ -2006,38 +2009,51 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
  * Tries to drain stocked charges in other cpus. This function is asynchronous
  * and just put a work per cpu for draining localy on each cpu. Caller can
  * expects some charges will be back to res_counter later but cannot wait for
- * it.
+ * it. This runs only when per-cpu stock contains information of memcg which
+ * is under specified root_mem and no other flush runs.
  */
-static void drain_all_stock_async(void)
+static void drain_all_stock_async(struct mem_cgroup *root_mem)
 {
 	int cpu;
-	/* This function is for scheduling "drain" in asynchronous way.
-	 * The result of "drain" is not directly handled by callers. Then,
-	 * if someone is calling drain, we don't have to call drain more.
-	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
-	 * there is a race. We just do loose check here.
+
+	/*
+	 * If synchronous flushing (which flushes all cpus's cache) runs,
+	 * or some other thread runs already. avoid more calls.
 	 */
-	if (atomic_read(&memcg_drain_count))
+	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
-	/* Notify other cpus that system-wide "drain" is running */
-	atomic_inc(&memcg_drain_count);
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		schedule_work_on(cpu, &stock->work);
+		struct mem_cgroup *mem;
+		bool do_flush;
+		/*
+		 * This stock->cached is cleared before mem cgroup is destroyed.
+		 * And we have mutex. So it's safe to access stock->cached.
+		 */
+
+		mem = stock->cached;
+		if (!mem)
+			continue;
+		rcu_read_lock();
+		do_flush = ((mem == root_mem) ||
+			(root_mem->use_hierarchy &&
+		     	 css_is_ancestor(&mem->css, &root_mem->css)));
+		rcu_read_unlock();
+		if (do_flush && !test_and_set_bit(ASYNC_FLUSHING, &stock->flags))
+			schedule_work_on(cpu, &stock->work);
 	}
  	put_online_cpus();
-	atomic_dec(&memcg_drain_count);
-	/* We don't wait for flush_work */
+	mutex_unlock(&percpu_charge_mutex)
 }
 
 /* This is a synchronous drain interface. */
 static void drain_all_stock_sync(void)
 {
 	/* called when force_empty is called */
-	atomic_inc(&memcg_drain_count);
+	mutex_lock(&percpu_charge_mutex);
 	schedule_on_each_cpu(drain_local_stock);
-	atomic_dec(&memcg_drain_count);
+	mutex_unlock(&percpu_charge_mutex);
 }
 
 /*
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-08  6:29     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-08  6:29 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, linux-kernel, akpm, Ying Han, hannes, Balbir Singh

On Wed, 8 Jun 2011 14:49:34 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> I have a few minor comments.
> 
> On Wed, 8 Jun 2011 14:05:18 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > This patch is made against mainline git tree.
> > ==
> > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > 
> > For performance, memory cgroup caches some "charge" from res_counter
> > into per cpu cache. This works well but because it's cache,
> > it needs to be flushed in some cases. Typical cases are
> > 	1. when someone hit limit.
> > 	2. when rmdir() is called and need to charges to be 0.
> > 
> > But "1" has problem.
> > 
> > Recently, with large SMP machines, we see many kworker/%d:%d when
> > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > Bad things in implementation are
> > 
> > a) it's called before calling try_to_free_mem_cgroup_pages()
> >    so, it's called immidiately when a task hit limit.
> >    (I thought it was better to avoid to run into memory reclaim.
> >     But it was wrong decision.)
> > 
> > b) Even if a cpu contains a cache for memcg not related to
> >    a memcg which hits limit, drain code is called.
> > 
> > This patch fixes a) and b) by
> > 
> > A) delay calling of flushing until one run of try_to_free...
> >    Then, the number of calling is much decreased.
> > B) check percpu cache contains a useful data or not.
> > plus
> > C) check asynchronous percpu draining doesn't run on the cpu.
> > 
> > Reported-by: Ying Han <yinghan@google.com>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
> >  1 files changed, 28 insertions(+), 16 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bd9052a..c22c0eb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -359,7 +359,7 @@ enum charge_type {
> >  static void mem_cgroup_get(struct mem_cgroup *mem);
> >  static void mem_cgroup_put(struct mem_cgroup *mem);
> >  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> > -static void drain_all_stock_async(void);
> > +static void drain_all_stock_async(struct mem_cgroup *mem);
> >  
> >  static struct mem_cgroup_per_zone *
> >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >  		victim = mem_cgroup_select_victim(root_mem);
> >  		if (victim == root_mem) {
> >  			loop++;
> > -			if (loop >= 1)
> > -				drain_all_stock_async();
> >  			if (loop >= 2) {
> >  				/*
> >  				 * If we have not been able to reclaim
> > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >  				return total;
> >  		} else if (mem_cgroup_margin(root_mem))
> >  			return total;
> > +		drain_all_stock_async(root_mem);
> >  	}
> >  	return total;
> >  }
> > @@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
> >  	struct mem_cgroup *cached; /* this never be root cgroup */
> >  	unsigned int nr_pages;
> >  	struct work_struct work;
> > +	unsigned long flags;
> > +#define ASYNC_FLUSHING	(0)
> >  };
> >  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > -static atomic_t memcg_drain_count;
> > +static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
> >  
> >  /*
> >   * Try to consume stocked charge on this cpu. If success, one page is consumed
> > @@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
> >  {
> >  	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
> >  	drain_stock(stock);
> > +	clear_bit(ASYNC_FLUSHING, &stock->flags);
> >  }
> >  
> >  /*
> > @@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
> >   * Tries to drain stocked charges in other cpus. This function is asynchronous
> >   * and just put a work per cpu for draining localy on each cpu. Caller can
> >   * expects some charges will be back to res_counter later but cannot wait for
> > - * it.
> > + * it. This runs only when per-cpu stock contains information of memcg which
> > + * is under specified root_mem and no other flush runs.
> >   */
> > -static void drain_all_stock_async(void)
> > +static void drain_all_stock_async(struct mem_cgroup *root_mem)
> >  {
> >  	int cpu;
> > -	/* This function is for scheduling "drain" in asynchronous way.
> > -	 * The result of "drain" is not directly handled by callers. Then,
> > -	 * if someone is calling drain, we don't have to call drain more.
> > -	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> > -	 * there is a race. We just do loose check here.
> > +
> > +	/*
> > +	 * If synchronous flushing (which flushes all cpus's cache) runs,
> > +	 * do nothing.
> >  	 */
> > -	if (atomic_read(&memcg_drain_count))
> > +	if (unlikely(atomic_read(&memcg_drain_count)))
> >  		return;
> > -	/* Notify other cpus that system-wide "drain" is running */
> > -	atomic_inc(&memcg_drain_count);
> >  	get_online_cpus();
> >  	for_each_online_cpu(cpu) {
> >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > -		schedule_work_on(cpu, &stock->work);
> > +		struct mem_cgroup *mem;
> > +		bool do_flush;
> > +
> > +		rcu_read_lock();
> 
> Should this rcu_read_lock() be placed here ? IIUC, it's necessary only for css_is_ancestor().
> 

I thought rcu_read_lock() is required before getting a pointer to be acceseed.

But hmm...at second thought..
 1. stock->cached is flushed before memcg is destroyed..
 2. force_empty()(before destroy memcg) and this function can
    do mutual execution by some lock.

Then, it's safe to access stock->cached. Ok. I'll move this (with some comment)

> > +		mem = stock->cached;
> > +		if (!mem) {
> > +			rcu_read_unlock();
> > +			continue;
> > +		}
> > +		do_flush = ((mem == root_mem) ||
> > +		     	css_is_ancestor(&mem->css, &root_mem->css));
> 
> Adding "root_mem->use_hierarchy" is better to avoid flusing the cache as long as possible.
> 

ok. here is v2.

==

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
  2011-06-08  6:29     ` KAMEZAWA Hiroyuki
@ 2011-06-08  9:15       ` Michal Hocko
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2011-06-08  9:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, akpm, Ying Han,
	hannes, Balbir Singh

On Wed 08-06-11 15:29:01, KAMEZAWA Hiroyuki wrote:
> On Wed, 8 Jun 2011 14:49:34 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > I have a few minor comments.
> > 
> > On Wed, 8 Jun 2011 14:05:18 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > This patch is made against mainline git tree.
> > > ==
> > > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > > 
> > > For performance, memory cgroup caches some "charge" from res_counter
> > > into per cpu cache. This works well but because it's cache,
> > > it needs to be flushed in some cases. Typical cases are
> > > 	1. when someone hit limit.
> > > 	2. when rmdir() is called and need to charges to be 0.
> > > 
> > > But "1" has problem.
> > > 
> > > Recently, with large SMP machines, we see many kworker/%d:%d when
> > > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > > Bad things in implementation are
> > > 
> > > a) it's called before calling try_to_free_mem_cgroup_pages()
> > >    so, it's called immidiately when a task hit limit.
> > >    (I thought it was better to avoid to run into memory reclaim.
> > >     But it was wrong decision.)
> > > 
> > > b) Even if a cpu contains a cache for memcg not related to
> > >    a memcg which hits limit, drain code is called.
> > > 
> > > This patch fixes a) and b) by
> > > 
> > > A) delay calling of flushing until one run of try_to_free...
> > >    Then, the number of calling is much decreased.
> > > B) check percpu cache contains a useful data or not.
> > > plus
> > > C) check asynchronous percpu draining doesn't run on the cpu.
> > > 
> > > Reported-by: Ying Han <yinghan@google.com>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Looks good to me.
Reviewed-by: Michal Hocko <mhocko@suse.cz>

One minor note though. 
AFAICS we can end up having CHARGE_BATCH * (NR_ONLINE_CPU) pages pre-charged
for a group which would be freed by drain_all_stock_async so we could get
under the limit and so we could omit direct reclaim, or?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-08  9:15       ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2011-06-08  9:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, akpm, Ying Han,
	hannes, Balbir Singh

On Wed 08-06-11 15:29:01, KAMEZAWA Hiroyuki wrote:
> On Wed, 8 Jun 2011 14:49:34 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > I have a few minor comments.
> > 
> > On Wed, 8 Jun 2011 14:05:18 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > This patch is made against mainline git tree.
> > > ==
> > > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > > 
> > > For performance, memory cgroup caches some "charge" from res_counter
> > > into per cpu cache. This works well but because it's cache,
> > > it needs to be flushed in some cases. Typical cases are
> > > 	1. when someone hit limit.
> > > 	2. when rmdir() is called and need to charges to be 0.
> > > 
> > > But "1" has problem.
> > > 
> > > Recently, with large SMP machines, we see many kworker/%d:%d when
> > > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > > Bad things in implementation are
> > > 
> > > a) it's called before calling try_to_free_mem_cgroup_pages()
> > >    so, it's called immidiately when a task hit limit.
> > >    (I thought it was better to avoid to run into memory reclaim.
> > >     But it was wrong decision.)
> > > 
> > > b) Even if a cpu contains a cache for memcg not related to
> > >    a memcg which hits limit, drain code is called.
> > > 
> > > This patch fixes a) and b) by
> > > 
> > > A) delay calling of flushing until one run of try_to_free...
> > >    Then, the number of calling is much decreased.
> > > B) check percpu cache contains a useful data or not.
> > > plus
> > > C) check asynchronous percpu draining doesn't run on the cpu.
> > > 
> > > Reported-by: Ying Han <yinghan@google.com>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Looks good to me.
Reviewed-by: Michal Hocko <mhocko@suse.cz>

One minor note though. 
AFAICS we can end up having CHARGE_BATCH * (NR_ONLINE_CPU) pages pre-charged
for a group which would be freed by drain_all_stock_async so we could get
under the limit and so we could omit direct reclaim, or?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
  2011-06-08  9:15       ` Michal Hocko
@ 2011-06-08  9:20         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-08  9:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, akpm, Ying Han,
	hannes, Balbir Singh

On Wed, 8 Jun 2011 11:15:11 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Wed 08-06-11 15:29:01, KAMEZAWA Hiroyuki wrote:
> > On Wed, 8 Jun 2011 14:49:34 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > I have a few minor comments.
> > > 
> > > On Wed, 8 Jun 2011 14:05:18 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > 
> > > > This patch is made against mainline git tree.
> > > > ==
> > > > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > > > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > > > 
> > > > For performance, memory cgroup caches some "charge" from res_counter
> > > > into per cpu cache. This works well but because it's cache,
> > > > it needs to be flushed in some cases. Typical cases are
> > > > 	1. when someone hit limit.
> > > > 	2. when rmdir() is called and need to charges to be 0.
> > > > 
> > > > But "1" has problem.
> > > > 
> > > > Recently, with large SMP machines, we see many kworker/%d:%d when
> > > > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > > > Bad things in implementation are
> > > > 
> > > > a) it's called before calling try_to_free_mem_cgroup_pages()
> > > >    so, it's called immidiately when a task hit limit.
> > > >    (I thought it was better to avoid to run into memory reclaim.
> > > >     But it was wrong decision.)
> > > > 
> > > > b) Even if a cpu contains a cache for memcg not related to
> > > >    a memcg which hits limit, drain code is called.
> > > > 
> > > > This patch fixes a) and b) by
> > > > 
> > > > A) delay calling of flushing until one run of try_to_free...
> > > >    Then, the number of calling is much decreased.
> > > > B) check percpu cache contains a useful data or not.
> > > > plus
> > > > C) check asynchronous percpu draining doesn't run on the cpu.
> > > > 
> > > > Reported-by: Ying Han <yinghan@google.com>
> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Looks good to me.
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> 
> One minor note though. 
> AFAICS we can end up having CHARGE_BATCH * (NR_ONLINE_CPU) pages pre-charged
> for a group which would be freed by drain_all_stock_async so we could get
> under the limit and so we could omit direct reclaim, or?

If drain_all_stock_async flushes charges, we go under limit and skip
direct reclaim. yes. It was my initial thought. But in recent test while
we do for keep-margin or some other, we saw too much kworkers/%d:%d.

Then, What I think now is....

 1. if memory can be reclaimed easily, the cost of calling kworker is very bad.
 2. if memory reclaim cost is too high, the benefit of flushing per-cpu
    cache is very low.

In future, situation will be much better.

 a. Our test shows async shrinker for keep-margin will reduce memory
    effectively and process will not dive into direct reclaim because of limit
    in not-very-havy workload.
 b. dirty-ratio will stop very-heavy-workload before reclaim is troublesome.

Hmm,
-Kame





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-08  9:20         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-08  9:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, akpm, Ying Han,
	hannes, Balbir Singh

On Wed, 8 Jun 2011 11:15:11 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Wed 08-06-11 15:29:01, KAMEZAWA Hiroyuki wrote:
> > On Wed, 8 Jun 2011 14:49:34 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > I have a few minor comments.
> > > 
> > > On Wed, 8 Jun 2011 14:05:18 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > 
> > > > This patch is made against mainline git tree.
> > > > ==
> > > > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > > > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > > > 
> > > > For performance, memory cgroup caches some "charge" from res_counter
> > > > into per cpu cache. This works well but because it's cache,
> > > > it needs to be flushed in some cases. Typical cases are
> > > > 	1. when someone hit limit.
> > > > 	2. when rmdir() is called and need to charges to be 0.
> > > > 
> > > > But "1" has problem.
> > > > 
> > > > Recently, with large SMP machines, we see many kworker/%d:%d when
> > > > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > > > Bad things in implementation are
> > > > 
> > > > a) it's called before calling try_to_free_mem_cgroup_pages()
> > > >    so, it's called immidiately when a task hit limit.
> > > >    (I thought it was better to avoid to run into memory reclaim.
> > > >     But it was wrong decision.)
> > > > 
> > > > b) Even if a cpu contains a cache for memcg not related to
> > > >    a memcg which hits limit, drain code is called.
> > > > 
> > > > This patch fixes a) and b) by
> > > > 
> > > > A) delay calling of flushing until one run of try_to_free...
> > > >    Then, the number of calling is much decreased.
> > > > B) check percpu cache contains a useful data or not.
> > > > plus
> > > > C) check asynchronous percpu draining doesn't run on the cpu.
> > > > 
> > > > Reported-by: Ying Han <yinghan@google.com>
> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Looks good to me.
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> 
> One minor note though. 
> AFAICS we can end up having CHARGE_BATCH * (NR_ONLINE_CPU) pages pre-charged
> for a group which would be freed by drain_all_stock_async so we could get
> under the limit and so we could omit direct reclaim, or?

If drain_all_stock_async flushes charges, we go under limit and skip
direct reclaim. yes. It was my initial thought. But in recent test while
we do for keep-margin or some other, we saw too much kworkers/%d:%d.

Then, What I think now is....

 1. if memory can be reclaimed easily, the cost of calling kworker is very bad.
 2. if memory reclaim cost is too high, the benefit of flushing per-cpu
    cache is very low.

In future, situation will be much better.

 a. Our test shows async shrinker for keep-margin will reduce memory
    effectively and process will not dive into direct reclaim because of limit
    in not-very-havy workload.
 b. dirty-ratio will stop very-heavy-workload before reclaim is troublesome.

Hmm,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
  2011-06-08  6:29     ` KAMEZAWA Hiroyuki
@ 2011-06-09  0:13       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-09  0:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, akpm, Ying Han,
	hannes, Balbir Singh

On Wed, 8 Jun 2011 15:29:01 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 8 Jun 2011 14:49:34 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > I have a few minor comments.
> > 
> > On Wed, 8 Jun 2011 14:05:18 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > This patch is made against mainline git tree.
> > > ==
> > > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > > 
> > > For performance, memory cgroup caches some "charge" from res_counter
> > > into per cpu cache. This works well but because it's cache,
> > > it needs to be flushed in some cases. Typical cases are
> > > 	1. when someone hit limit.
> > > 	2. when rmdir() is called and need to charges to be 0.
> > > 
> > > But "1" has problem.
> > > 
> > > Recently, with large SMP machines, we see many kworker/%d:%d when
> > > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > > Bad things in implementation are
> > > 
> > > a) it's called before calling try_to_free_mem_cgroup_pages()
> > >    so, it's called immidiately when a task hit limit.
> > >    (I thought it was better to avoid to run into memory reclaim.
> > >     But it was wrong decision.)
> > > 
> > > b) Even if a cpu contains a cache for memcg not related to
> > >    a memcg which hits limit, drain code is called.
> > > 
> > > This patch fixes a) and b) by
> > > 
> > > A) delay calling of flushing until one run of try_to_free...
> > >    Then, the number of calling is much decreased.
> > > B) check percpu cache contains a useful data or not.
> > > plus
> > > C) check asynchronous percpu draining doesn't run on the cpu.
> > > 
> > > Reported-by: Ying Han <yinghan@google.com>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
> > >  1 files changed, 28 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index bd9052a..c22c0eb 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -359,7 +359,7 @@ enum charge_type {
> > >  static void mem_cgroup_get(struct mem_cgroup *mem);
> > >  static void mem_cgroup_put(struct mem_cgroup *mem);
> > >  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> > > -static void drain_all_stock_async(void);
> > > +static void drain_all_stock_async(struct mem_cgroup *mem);
> > >  
> > >  static struct mem_cgroup_per_zone *
> > >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > >  		victim = mem_cgroup_select_victim(root_mem);
> > >  		if (victim == root_mem) {
> > >  			loop++;
> > > -			if (loop >= 1)
> > > -				drain_all_stock_async();
> > >  			if (loop >= 2) {
> > >  				/*
> > >  				 * If we have not been able to reclaim
> > > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > >  				return total;
> > >  		} else if (mem_cgroup_margin(root_mem))
> > >  			return total;
> > > +		drain_all_stock_async(root_mem);
> > >  	}
> > >  	return total;
> > >  }
> > > @@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
> > >  	struct mem_cgroup *cached; /* this never be root cgroup */
> > >  	unsigned int nr_pages;
> > >  	struct work_struct work;
> > > +	unsigned long flags;
> > > +#define ASYNC_FLUSHING	(0)
> > >  };
> > >  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > > -static atomic_t memcg_drain_count;
> > > +static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
> > >  
> > >  /*
> > >   * Try to consume stocked charge on this cpu. If success, one page is consumed
> > > @@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
> > >  {
> > >  	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
> > >  	drain_stock(stock);
> > > +	clear_bit(ASYNC_FLUSHING, &stock->flags);
> > >  }
> > >  
> > >  /*
> > > @@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
> > >   * Tries to drain stocked charges in other cpus. This function is asynchronous
> > >   * and just put a work per cpu for draining localy on each cpu. Caller can
> > >   * expects some charges will be back to res_counter later but cannot wait for
> > > - * it.
> > > + * it. This runs only when per-cpu stock contains information of memcg which
> > > + * is under specified root_mem and no other flush runs.
> > >   */
> > > -static void drain_all_stock_async(void)
> > > +static void drain_all_stock_async(struct mem_cgroup *root_mem)
> > >  {
> > >  	int cpu;
> > > -	/* This function is for scheduling "drain" in asynchronous way.
> > > -	 * The result of "drain" is not directly handled by callers. Then,
> > > -	 * if someone is calling drain, we don't have to call drain more.
> > > -	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> > > -	 * there is a race. We just do loose check here.
> > > +
> > > +	/*
> > > +	 * If synchronous flushing (which flushes all cpus's cache) runs,
> > > +	 * do nothing.
> > >  	 */
> > > -	if (atomic_read(&memcg_drain_count))
> > > +	if (unlikely(atomic_read(&memcg_drain_count)))
> > >  		return;
> > > -	/* Notify other cpus that system-wide "drain" is running */
> > > -	atomic_inc(&memcg_drain_count);
> > >  	get_online_cpus();
> > >  	for_each_online_cpu(cpu) {
> > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > -		schedule_work_on(cpu, &stock->work);
> > > +		struct mem_cgroup *mem;
> > > +		bool do_flush;
> > > +
> > > +		rcu_read_lock();
> > 
> > Should this rcu_read_lock() be placed here ? IIUC, it's necessary only for css_is_ancestor().
> > 
> 
> I thought rcu_read_lock() is required before getting a pointer to be acceseed.
> 
> But hmm...at second thought..
>  1. stock->cached is flushed before memcg is destroyed..
>  2. force_empty()(before destroy memcg) and this function can
>     do mutual execution by some lock.
> 
> Then, it's safe to access stock->cached. Ok. I'll move this (with some comment)
> 
> > > +		mem = stock->cached;
> > > +		if (!mem) {
> > > +			rcu_read_unlock();
> > > +			continue;
> > > +		}
> > > +		do_flush = ((mem == root_mem) ||
> > > +		     	css_is_ancestor(&mem->css, &root_mem->css));
> > 
> > Adding "root_mem->use_hierarchy" is better to avoid flusing the cache as long as possible.
> > 
> 
> ok. here is v2.
> 

I found typo ;(.
I'll post v3 in a new thread. Very sorry.

Thanks,
-Kame

>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-09  0:13       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-09  0:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, akpm, Ying Han,
	hannes, Balbir Singh

On Wed, 8 Jun 2011 15:29:01 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 8 Jun 2011 14:49:34 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > I have a few minor comments.
> > 
> > On Wed, 8 Jun 2011 14:05:18 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > This patch is made against mainline git tree.
> > > ==
> > > From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > > Subject: [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining.
> > > 
> > > For performance, memory cgroup caches some "charge" from res_counter
> > > into per cpu cache. This works well but because it's cache,
> > > it needs to be flushed in some cases. Typical cases are
> > > 	1. when someone hit limit.
> > > 	2. when rmdir() is called and need to charges to be 0.
> > > 
> > > But "1" has problem.
> > > 
> > > Recently, with large SMP machines, we see many kworker/%d:%d when
> > > memcg hit limit. It is because of flushing memcg's percpu cache. 
> > > Bad things in implementation are
> > > 
> > > a) it's called before calling try_to_free_mem_cgroup_pages()
> > >    so, it's called immidiately when a task hit limit.
> > >    (I thought it was better to avoid to run into memory reclaim.
> > >     But it was wrong decision.)
> > > 
> > > b) Even if a cpu contains a cache for memcg not related to
> > >    a memcg which hits limit, drain code is called.
> > > 
> > > This patch fixes a) and b) by
> > > 
> > > A) delay calling of flushing until one run of try_to_free...
> > >    Then, the number of calling is much decreased.
> > > B) check percpu cache contains a useful data or not.
> > > plus
> > > C) check asynchronous percpu draining doesn't run on the cpu.
> > > 
> > > Reported-by: Ying Han <yinghan@google.com>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  mm/memcontrol.c |   44 ++++++++++++++++++++++++++++----------------
> > >  1 files changed, 28 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index bd9052a..c22c0eb 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -359,7 +359,7 @@ enum charge_type {
> > >  static void mem_cgroup_get(struct mem_cgroup *mem);
> > >  static void mem_cgroup_put(struct mem_cgroup *mem);
> > >  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> > > -static void drain_all_stock_async(void);
> > > +static void drain_all_stock_async(struct mem_cgroup *mem);
> > >  
> > >  static struct mem_cgroup_per_zone *
> > >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > >  		victim = mem_cgroup_select_victim(root_mem);
> > >  		if (victim == root_mem) {
> > >  			loop++;
> > > -			if (loop >= 1)
> > > -				drain_all_stock_async();
> > >  			if (loop >= 2) {
> > >  				/*
> > >  				 * If we have not been able to reclaim
> > > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > >  				return total;
> > >  		} else if (mem_cgroup_margin(root_mem))
> > >  			return total;
> > > +		drain_all_stock_async(root_mem);
> > >  	}
> > >  	return total;
> > >  }
> > > @@ -1934,9 +1933,11 @@ struct memcg_stock_pcp {
> > >  	struct mem_cgroup *cached; /* this never be root cgroup */
> > >  	unsigned int nr_pages;
> > >  	struct work_struct work;
> > > +	unsigned long flags;
> > > +#define ASYNC_FLUSHING	(0)
> > >  };
> > >  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > > -static atomic_t memcg_drain_count;
> > > +static atomic_t memcg_drain_count; /* Indicates there is synchronous flusher */
> > >  
> > >  /*
> > >   * Try to consume stocked charge on this cpu. If success, one page is consumed
> > > @@ -1984,6 +1985,7 @@ static void drain_local_stock(struct work_struct *dummy)
> > >  {
> > >  	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
> > >  	drain_stock(stock);
> > > +	clear_bit(ASYNC_FLUSHING, &stock->flags);
> > >  }
> > >  
> > >  /*
> > > @@ -2006,28 +2008,38 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
> > >   * Tries to drain stocked charges in other cpus. This function is asynchronous
> > >   * and just put a work per cpu for draining localy on each cpu. Caller can
> > >   * expects some charges will be back to res_counter later but cannot wait for
> > > - * it.
> > > + * it. This runs only when per-cpu stock contains information of memcg which
> > > + * is under specified root_mem and no other flush runs.
> > >   */
> > > -static void drain_all_stock_async(void)
> > > +static void drain_all_stock_async(struct mem_cgroup *root_mem)
> > >  {
> > >  	int cpu;
> > > -	/* This function is for scheduling "drain" in asynchronous way.
> > > -	 * The result of "drain" is not directly handled by callers. Then,
> > > -	 * if someone is calling drain, we don't have to call drain more.
> > > -	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> > > -	 * there is a race. We just do loose check here.
> > > +
> > > +	/*
> > > +	 * If synchronous flushing (which flushes all cpus's cache) runs,
> > > +	 * do nothing.
> > >  	 */
> > > -	if (atomic_read(&memcg_drain_count))
> > > +	if (unlikely(atomic_read(&memcg_drain_count)))
> > >  		return;
> > > -	/* Notify other cpus that system-wide "drain" is running */
> > > -	atomic_inc(&memcg_drain_count);
> > >  	get_online_cpus();
> > >  	for_each_online_cpu(cpu) {
> > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > -		schedule_work_on(cpu, &stock->work);
> > > +		struct mem_cgroup *mem;
> > > +		bool do_flush;
> > > +
> > > +		rcu_read_lock();
> > 
> > Should this rcu_read_lock() be placed here ? IIUC, it's necessary only for css_is_ancestor().
> > 
> 
> I thought rcu_read_lock() is required before getting a pointer to be acceseed.
> 
> But hmm...at second thought..
>  1. stock->cached is flushed before memcg is destroyed..
>  2. force_empty()(before destroy memcg) and this function can
>     do mutual execution by some lock.
> 
> Then, it's safe to access stock->cached. Ok. I'll move this (with some comment)
> 
> > > +		mem = stock->cached;
> > > +		if (!mem) {
> > > +			rcu_read_unlock();
> > > +			continue;
> > > +		}
> > > +		do_flush = ((mem == root_mem) ||
> > > +		     	css_is_ancestor(&mem->css, &root_mem->css));
> > 
> > Adding "root_mem->use_hierarchy" is better to avoid flusing the cache as long as possible.
> > 
> 
> ok. here is v2.
> 

I found typo ;(.
I'll post v3 in a new thread. Very sorry.

Thanks,
-Kame

>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-06-09  0:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-08  5:05 [BUGFIX][PATCH] memcg: fix behavior of per cpu charge cache draining KAMEZAWA Hiroyuki
2011-06-08  5:05 ` KAMEZAWA Hiroyuki
2011-06-08  5:49 ` Daisuke Nishimura
2011-06-08  5:49   ` Daisuke Nishimura
2011-06-08  6:29   ` KAMEZAWA Hiroyuki
2011-06-08  6:29     ` KAMEZAWA Hiroyuki
2011-06-08  9:15     ` Michal Hocko
2011-06-08  9:15       ` Michal Hocko
2011-06-08  9:20       ` KAMEZAWA Hiroyuki
2011-06-08  9:20         ` KAMEZAWA Hiroyuki
2011-06-09  0:13     ` KAMEZAWA Hiroyuki
2011-06-09  0:13       ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.