All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-21  9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko
@ 2011-07-21  7:38 ` Michal Hocko
  2011-07-21 10:12     ` KAMEZAWA Hiroyuki
  2011-07-21  7:50 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2011-07-21  7:38 UTC (permalink / raw)
  To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel

drain_all_stock_async tries to optimize a work to be done on the work
queue by excluding any work for the current CPU because it assumes that
the context we are called from already tried to charge from that cache
and it's failed so it must be empty already.
While the assumption is correct we can do it by checking the current
number of pages in the cache. This will also reduce a work on other CPUs
with an empty stock.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   14 ++------------
 1 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f11f198..786bffb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2140,7 +2140,7 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
  */
 static void drain_all_stock_async(struct mem_cgroup *root_mem)
 {
-	int cpu, curcpu;
+	int cpu;
 	/*
 	 * If someone calls draining, avoid adding more kworker runs.
 	 */
@@ -2148,22 +2148,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem)
 		return;
 	/* Notify other cpus that system-wide "drain" is running */
 	get_online_cpus();
-	/*
-	 * Get a hint for avoiding draining charges on the current cpu,
-	 * which must be exhausted by our charging.  It is not required that
-	 * this be a precise check, so we use raw_smp_processor_id() instead of
-	 * getcpu()/putcpu().
-	 */
-	curcpu = raw_smp_processor_id();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *mem;
 
-		if (cpu == curcpu)
-			continue;
-
 		mem = stock->cached;
-		if (!mem)
+		if (!mem || !stock->nr_pages)
 			continue;
 		if (mem != root_mem) {
 			if (!root_mem->use_hierarchy)
-- 
1.7.5.4


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining
  2011-07-21  9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko
  2011-07-21  7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
@ 2011-07-21  7:50 ` Michal Hocko
  2011-07-21 10:25     ` KAMEZAWA Hiroyuki
  2011-07-21  7:58 ` [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko
  2011-07-21  8:28 ` [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges Michal Hocko
  3 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2011-07-21  7:50 UTC (permalink / raw)
  To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel

Currently we have two ways how to drain per-CPU caches for charges.
drain_all_stock_sync will synchronously drain all caches while
drain_all_stock_async will asynchronously drain only those that refer to
a given memory cgroup or its subtree in hierarchy.
Targeted async draining has been introduced by 26fe6168 (memcg: fix
percpu cached charge draining frequency) to reduce the cpu workers
number.

sync draining is currently triggered only from mem_cgroup_force_empty
which is triggered only by userspace (mem_cgroup_force_empty_write) or
when a cgroup is removed (mem_cgroup_pre_destroy). Although these are
not usually frequent operations it still makes some sense to do targeted
draining as well, especially if the box has many CPUs.

This patch unifies both methods to use the single code (drain_all_stock)
which relies on the original async implementation and just adds
flush_work to wait on all caches that are still under work for the sync
mode.
We are using FLUSHING_CACHED_CHARGE bit check to prevent from waiting on
a work that we haven't triggered.
Please note that both sync and async functions are currently protected
by percpu_charge_mutex so we cannot race with other drainers.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   48 ++++++++++++++++++++++++++++++++++--------------
 1 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 786bffb..8180cd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2133,19 +2133,14 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
 }
 
 /*
- * Tries to drain stocked charges in other cpus. This function is asynchronous
- * and just put a work per cpu for draining localy on each cpu. Caller can
- * expects some charges will be back to res_counter later but cannot wait for
- * it.
+ * Drains all per-CPU charge caches for given root_mem resp. subtree
+ * of the hierarchy under it. sync flag says whether we should block
+ * until the work is done.
  */
-static void drain_all_stock_async(struct mem_cgroup *root_mem)
+static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
 {
 	int cpu;
-	/*
-	 * If someone calls draining, avoid adding more kworker runs.
-	 */
-	if (!mutex_trylock(&percpu_charge_mutex))
-		return;
+
 	/* Notify other cpus that system-wide "drain" is running */
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
@@ -2165,17 +2160,42 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem)
 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
 			schedule_work_on(cpu, &stock->work);
 	}
+
+	if (!sync)
+		goto out;
+
+	for_each_online_cpu(cpu) {
+		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
+		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
+			flush_work(&stock->work);
+	}
+out:
  	put_online_cpus();
+}
+
+/*
+ * Tries to drain stocked charges in other cpus. This function is asynchronous
+ * and just put a work per cpu for draining localy on each cpu. Caller can
+ * expects some charges will be back to res_counter later but cannot wait for
+ * it.
+ */
+static void drain_all_stock_async(struct mem_cgroup *root_mem)
+{
+	/*
+	 * If someone calls draining, avoid adding more kworker runs.
+	 */
+	if (!mutex_trylock(&percpu_charge_mutex))
+		return;
+	drain_all_stock(root_mem, false);
 	mutex_unlock(&percpu_charge_mutex);
-	/* We don't wait for flush_work */
 }
 
 /* This is a synchronous drain interface. */
-static void drain_all_stock_sync(void)
+static void drain_all_stock_sync(struct mem_cgroup *root_mem)
 {
 	/* called when force_empty is called */
 	mutex_lock(&percpu_charge_mutex);
-	schedule_on_each_cpu(drain_local_stock);
+	drain_all_stock(root_mem, true);
 	mutex_unlock(&percpu_charge_mutex);
 }
 
@@ -3824,7 +3844,7 @@ move_account:
 			goto out;
 		/* This is for making all *used* pages to be on LRU. */
 		lru_add_drain_all();
-		drain_all_stock_sync();
+		drain_all_stock_sync(mem);
 		ret = 0;
 		mem_cgroup_start_move(mem);
 		for_each_node_state(node, N_HIGH_MEMORY) {
-- 
1.7.5.4


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21  9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko
  2011-07-21  7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
  2011-07-21  7:50 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko
@ 2011-07-21  7:58 ` Michal Hocko
  2011-07-21 10:30     ` KAMEZAWA Hiroyuki
  2011-07-21  8:28 ` [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges Michal Hocko
  3 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2011-07-21  7:58 UTC (permalink / raw)
  To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel

percpu_charge_mutex protects from multiple simultaneous per-cpu charge
caches draining because we might end up having too many work items.
At least this was the case until 26fe6168 (memcg: fix percpu cached
charge draining frequency) when we introduced a more targeted draining
for async mode.
Now that also sync draining is targeted we can safely remove mutex
because we will not send more work than the current number of CPUs.
FLUSHING_CACHED_CHARGE protects from sending the same work multiple
times and stock->nr_pages == 0 protects from pointless sending a work
if there is obviously nothing to be done. This is of course racy but we
can live with it as the race window is really small (we would have to
see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
non-zero).
The only remaining place where we can race is synchronous mode when we
rely on FLUSHING_CACHED_CHARGE test which might have been set by other
drainer on the same group but we should wait in that case as well.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   12 ++----------
 1 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8180cd9..9d49a12 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2065,7 +2065,6 @@ struct memcg_stock_pcp {
 #define FLUSHING_CACHED_CHARGE	(0)
 };
 static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
-static DEFINE_MUTEX(percpu_charge_mutex);
 
 /*
  * Try to consume stocked charge on this cpu. If success, one page is consumed
@@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
 
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
+		if (root_mem == stock->cached &&
+				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
 			flush_work(&stock->work);
 	}
 out:
@@ -2181,22 +2181,14 @@ out:
  */
 static void drain_all_stock_async(struct mem_cgroup *root_mem)
 {
-	/*
-	 * If someone calls draining, avoid adding more kworker runs.
-	 */
-	if (!mutex_trylock(&percpu_charge_mutex))
-		return;
 	drain_all_stock(root_mem, false);
-	mutex_unlock(&percpu_charge_mutex);
 }
 
 /* This is a synchronous drain interface. */
 static void drain_all_stock_sync(struct mem_cgroup *root_mem)
 {
 	/* called when force_empty is called */
-	mutex_lock(&percpu_charge_mutex);
 	drain_all_stock(root_mem, true);
-	mutex_unlock(&percpu_charge_mutex);
 }
 
 /*
-- 
1.7.5.4


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
  2011-07-21  9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko
                   ` (2 preceding siblings ...)
  2011-07-21  7:58 ` [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko
@ 2011-07-21  8:28 ` Michal Hocko
  2011-07-21 10:54     ` KAMEZAWA Hiroyuki
  3 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2011-07-21  8:28 UTC (permalink / raw)
  To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel

If we fail to charge an allocation for a cgroup we usually have to fall
back into direct reclaim (mem_cgroup_hierarchical_reclaim).
The charging code, however, currently doesn't care about per-cpu charge
caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
pages (the current cache is already drained, otherwise we wouldn't get
to mem_cgroup_do_charge).
That can be quite a lot on boxes with big amounts of CPUs so we can end
up reclaiming even though there are charges that could be used. This
will typically happen in a multi-threaded applications pined to many CPUs
which allocates memory heavily.

Currently we are draining caches during reclaim
(mem_cgroup_hierarchical_reclaim) but this can be already late as we
could have already reclaimed from other groups in the hierarchy.

The solution for this would be to synchronously drain charges early when
we fail to charge and retry the charge once more.
I think it still makes sense to keep async draining in the reclaim path
as it is used from other code paths as well (e.g. limit resize). It will
not do any work if we drained previously anyway.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9d49a12..59bcb01 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2265,11 +2265,12 @@ static int mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 				unsigned int nr_pages, bool oom_check)
 {
 	unsigned long csize = nr_pages * PAGE_SIZE;
-	struct mem_cgroup *mem_over_limit;
+	struct mem_cgroup *mem_over_limit, *drained = NULL;
 	struct res_counter *fail_res;
 	unsigned long flags = 0;
 	int ret;
 
+retry:
 	ret = res_counter_charge(&mem->res, csize, &fail_res);
 
 	if (likely(!ret)) {
@@ -2282,8 +2283,14 @@ static int mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 		res_counter_uncharge(&mem->res, csize);
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, memsw);
 		flags |= MEM_CGROUP_RECLAIM_NOSWAP;
-	} else
+	} else {
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
+		if (!drained) {
+			drained = mem_over_limit;
+			drain_all_stock_sync(drained);
+			goto retry;
+		}
+	}
 	/*
 	 * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
 	 * of regular pages (CHARGE_BATCH), or a single regular page (1).
-- 
1.7.5.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges
@ 2011-07-21  9:41 Michal Hocko
  2011-07-21  7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21  9:41 UTC (permalink / raw)
  To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel

Hi,
this patchset cleans up per-cpu charge cache draining code and it fixes
an issue where we could reclaim from a group even though there are
caches with charges on other CPUs that can be used. Although the problem
is far from being critical it can bite us on large machines with many
CPUs.
I am sorry that I am mixing those two things but the fix depends on the
work done in the clean up patches so I hope it won't be confusing.
The reason is that I needed a sane implementation of sync draining code
and wanted to prevent from code duplication.

First two patches should be quite straightforward. Checking
stock->nr_pages is more general than excluding just the local CPU and
having targeted sync draining also makes a good sense to me.

The third one might require some discussion. AFAIU it should be safe but
others might see some issues. Anyway I have no issues to drop it because
the fix doesn't depend on it. I have put it before the fix just because
I wanted to have all cleanups in front.

Finally the fourth patch is the already mentioned fix. I do not think
I have ever seen any sane application (aka not artificially created
usecase) where we would trigger the behavior in a such way that the
performance would hurt or something similar but I have already seen a
pointless reclaim while we had caches on other CPUs. As the number of
CPUs grow I think the change makes quite a good sense.

The patchset is on top of the current Linus tree but it should apply on
top of the current mmotm tree as well.

Any thoughts comments?

Michal Hocko (4):
  memcg: do not try to drain per-cpu caches without pages
  memcg: unify sync and async per-cpu charge cache draining
  memcg: get rid of percpu_charge_mutex lock
  memcg: prevent from reclaiming if there are per-cpu cached charges

 mm/memcontrol.c |   73 +++++++++++++++++++++++++++++++------------------------
 1 files changed, 41 insertions(+), 32 deletions(-)

-- 
1.7.5.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-21  7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
@ 2011-07-21 10:12     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:12 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 09:38:00 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> drain_all_stock_async tries to optimize a work to be done on the work
> queue by excluding any work for the current CPU because it assumes that
> the context we are called from already tried to charge from that cache
> and it's failed so it must be empty already.
> While the assumption is correct we can do it by checking the current
> number of pages in the cache. This will also reduce a work on other CPUs
> with an empty stock.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>


At the first look, when a charge against TransParentHugepage() goes
into the reclaim routine, stock->nr_pages != 0 and this will
call additional kworker.

nr_pages check itself seems good.

Thanks,
-Kame

> ---
>  mm/memcontrol.c |   14 ++------------
>  1 files changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f11f198..786bffb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2140,7 +2140,7 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
>   */
>  static void drain_all_stock_async(struct mem_cgroup *root_mem)
>  {
> -	int cpu, curcpu;
> +	int cpu;
>  	/*
>  	 * If someone calls draining, avoid adding more kworker runs.
>  	 */
> @@ -2148,22 +2148,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem)
>  		return;
>  	/* Notify other cpus that system-wide "drain" is running */
>  	get_online_cpus();
> -	/*
> -	 * Get a hint for avoiding draining charges on the current cpu,
> -	 * which must be exhausted by our charging.  It is not required that
> -	 * this be a precise check, so we use raw_smp_processor_id() instead of
> -	 * getcpu()/putcpu().
> -	 */
> -	curcpu = raw_smp_processor_id();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>  		struct mem_cgroup *mem;
>  
> -		if (cpu == curcpu)
> -			continue;
> -
>  		mem = stock->cached;
> -		if (!mem)
> +		if (!mem || !stock->nr_pages)
>  			continue;
>  		if (mem != root_mem) {
>  			if (!root_mem->use_hierarchy)
> -- 
> 1.7.5.4
> 
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-21 10:12     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:12 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 09:38:00 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> drain_all_stock_async tries to optimize a work to be done on the work
> queue by excluding any work for the current CPU because it assumes that
> the context we are called from already tried to charge from that cache
> and it's failed so it must be empty already.
> While the assumption is correct we can do it by checking the current
> number of pages in the cache. This will also reduce a work on other CPUs
> with an empty stock.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>


At the first look, when a charge against TransParentHugepage() goes
into the reclaim routine, stock->nr_pages != 0 and this will
call additional kworker.

nr_pages check itself seems good.

Thanks,
-Kame

> ---
>  mm/memcontrol.c |   14 ++------------
>  1 files changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f11f198..786bffb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2140,7 +2140,7 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
>   */
>  static void drain_all_stock_async(struct mem_cgroup *root_mem)
>  {
> -	int cpu, curcpu;
> +	int cpu;
>  	/*
>  	 * If someone calls draining, avoid adding more kworker runs.
>  	 */
> @@ -2148,22 +2148,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem)
>  		return;
>  	/* Notify other cpus that system-wide "drain" is running */
>  	get_online_cpus();
> -	/*
> -	 * Get a hint for avoiding draining charges on the current cpu,
> -	 * which must be exhausted by our charging.  It is not required that
> -	 * this be a precise check, so we use raw_smp_processor_id() instead of
> -	 * getcpu()/putcpu().
> -	 */
> -	curcpu = raw_smp_processor_id();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>  		struct mem_cgroup *mem;
>  
> -		if (cpu == curcpu)
> -			continue;
> -
>  		mem = stock->cached;
> -		if (!mem)
> +		if (!mem || !stock->nr_pages)
>  			continue;
>  		if (mem != root_mem) {
>  			if (!root_mem->use_hierarchy)
> -- 
> 1.7.5.4
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining
  2011-07-21  7:50 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko
@ 2011-07-21 10:25     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 09:50:00 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Currently we have two ways how to drain per-CPU caches for charges.
> drain_all_stock_sync will synchronously drain all caches while
> drain_all_stock_async will asynchronously drain only those that refer to
> a given memory cgroup or its subtree in hierarchy.
> Targeted async draining has been introduced by 26fe6168 (memcg: fix
> percpu cached charge draining frequency) to reduce the cpu workers
> number.
> 
> sync draining is currently triggered only from mem_cgroup_force_empty
> which is triggered only by userspace (mem_cgroup_force_empty_write) or
> when a cgroup is removed (mem_cgroup_pre_destroy). Although these are
> not usually frequent operations it still makes some sense to do targeted
> draining as well, especially if the box has many CPUs.
> 
> This patch unifies both methods to use the single code (drain_all_stock)
> which relies on the original async implementation and just adds
> flush_work to wait on all caches that are still under work for the sync
> mode.
> We are using FLUSHING_CACHED_CHARGE bit check to prevent from waiting on
> a work that we haven't triggered.
> Please note that both sync and async functions are currently protected
> by percpu_charge_mutex so we cannot race with other drainers.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

hmm..maybe good.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining
@ 2011-07-21 10:25     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 09:50:00 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Currently we have two ways how to drain per-CPU caches for charges.
> drain_all_stock_sync will synchronously drain all caches while
> drain_all_stock_async will asynchronously drain only those that refer to
> a given memory cgroup or its subtree in hierarchy.
> Targeted async draining has been introduced by 26fe6168 (memcg: fix
> percpu cached charge draining frequency) to reduce the cpu workers
> number.
> 
> sync draining is currently triggered only from mem_cgroup_force_empty
> which is triggered only by userspace (mem_cgroup_force_empty_write) or
> when a cgroup is removed (mem_cgroup_pre_destroy). Although these are
> not usually frequent operations it still makes some sense to do targeted
> draining as well, especially if the box has many CPUs.
> 
> This patch unifies both methods to use the single code (drain_all_stock)
> which relies on the original async implementation and just adds
> flush_work to wait on all caches that are still under work for the sync
> mode.
> We are using FLUSHING_CACHED_CHARGE bit check to prevent from waiting on
> a work that we haven't triggered.
> Please note that both sync and async functions are currently protected
> by percpu_charge_mutex so we cannot race with other drainers.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

hmm..maybe good.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21  7:58 ` [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko
@ 2011-07-21 10:30     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:30 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 09:58:24 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> caches draining because we might end up having too many work items.
> At least this was the case until 26fe6168 (memcg: fix percpu cached
> charge draining frequency) when we introduced a more targeted draining
> for async mode.
> Now that also sync draining is targeted we can safely remove mutex
> because we will not send more work than the current number of CPUs.
> FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> times and stock->nr_pages == 0 protects from pointless sending a work
> if there is obviously nothing to be done. This is of course racy but we
> can live with it as the race window is really small (we would have to
> see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> non-zero).
> The only remaining place where we can race is synchronous mode when we
> rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> drainer on the same group but we should wait in that case as well.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

A concern.

> ---
>  mm/memcontrol.c |   12 ++----------
>  1 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8180cd9..9d49a12 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2065,7 +2065,6 @@ struct memcg_stock_pcp {
>  #define FLUSHING_CACHED_CHARGE	(0)
>  };
>  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>  
>  /*
>   * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>  
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> +		if (root_mem == stock->cached &&
> +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
>  			flush_work(&stock->work);

Doesn't this new check handle hierarchy ?
css_is_ancestor() will be required if you do this check.

BTW, this change should be in other patch, I think.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-21 10:30     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:30 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 09:58:24 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> caches draining because we might end up having too many work items.
> At least this was the case until 26fe6168 (memcg: fix percpu cached
> charge draining frequency) when we introduced a more targeted draining
> for async mode.
> Now that also sync draining is targeted we can safely remove mutex
> because we will not send more work than the current number of CPUs.
> FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> times and stock->nr_pages == 0 protects from pointless sending a work
> if there is obviously nothing to be done. This is of course racy but we
> can live with it as the race window is really small (we would have to
> see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> non-zero).
> The only remaining place where we can race is synchronous mode when we
> rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> drainer on the same group but we should wait in that case as well.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

A concern.

> ---
>  mm/memcontrol.c |   12 ++----------
>  1 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8180cd9..9d49a12 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2065,7 +2065,6 @@ struct memcg_stock_pcp {
>  #define FLUSHING_CACHED_CHARGE	(0)
>  };
>  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>  
>  /*
>   * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>  
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> +		if (root_mem == stock->cached &&
> +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
>  			flush_work(&stock->work);

Doesn't this new check handle hierarchy ?
css_is_ancestor() will be required if you do this check.

BTW, this change should be in other patch, I think.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
  2011-07-21  8:28 ` [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges Michal Hocko
@ 2011-07-21 10:54     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:54 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 10:28:10 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> If we fail to charge an allocation for a cgroup we usually have to fall
> back into direct reclaim (mem_cgroup_hierarchical_reclaim).
> The charging code, however, currently doesn't care about per-cpu charge
> caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
> pages (the current cache is already drained, otherwise we wouldn't get
> to mem_cgroup_do_charge).
> That can be quite a lot on boxes with big amounts of CPUs so we can end
> up reclaiming even though there are charges that could be used. This
> will typically happen in a multi-threaded applications pined to many CPUs
> which allocates memory heavily.
> 

Do you have example and score, numbers on your test ?

> Currently we are draining caches during reclaim
> (mem_cgroup_hierarchical_reclaim) but this can be already late as we
> could have already reclaimed from other groups in the hierarchy.
> 
> The solution for this would be to synchronously drain charges early when
> we fail to charge and retry the charge once more.
> I think it still makes sense to keep async draining in the reclaim path
> as it is used from other code paths as well (e.g. limit resize). It will
> not do any work if we drained previously anyway.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

I don't like this solution, at all.

Assume 2 cpu SMP, (a special case), and 2 applications running under
a memcg.

 - one is running in SCHED_FIFO.
 - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().

Then, the application stops until SCHED_FIFO application release the cpu.

In general, I don't think waiting for schedule_work() against multiple cpus
is not quicker than short memory reclaim. Adding flush_work() here means
that a context switch is requred before calling direct reclaim. That's bad.
(At leaset, please check __GFP_NOWAIT.)


Please find another way, I think calling synchronous drain here is overkill.
There are not important file caches in the most case and reclaim is quick.
(And async draining runs.)

How about automatically adjusting CHARGE_BATCH and make it small when the
system is near to limit ? or flushing ->stock periodically ?


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
@ 2011-07-21 10:54     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 10:54 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 10:28:10 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> If we fail to charge an allocation for a cgroup we usually have to fall
> back into direct reclaim (mem_cgroup_hierarchical_reclaim).
> The charging code, however, currently doesn't care about per-cpu charge
> caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
> pages (the current cache is already drained, otherwise we wouldn't get
> to mem_cgroup_do_charge).
> That can be quite a lot on boxes with big amounts of CPUs so we can end
> up reclaiming even though there are charges that could be used. This
> will typically happen in a multi-threaded applications pined to many CPUs
> which allocates memory heavily.
> 

Do you have example and score, numbers on your test ?

> Currently we are draining caches during reclaim
> (mem_cgroup_hierarchical_reclaim) but this can be already late as we
> could have already reclaimed from other groups in the hierarchy.
> 
> The solution for this would be to synchronously drain charges early when
> we fail to charge and retry the charge once more.
> I think it still makes sense to keep async draining in the reclaim path
> as it is used from other code paths as well (e.g. limit resize). It will
> not do any work if we drained previously anyway.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

I don't like this solution, at all.

Assume 2 cpu SMP, (a special case), and 2 applications running under
a memcg.

 - one is running in SCHED_FIFO.
 - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().

Then, the application stops until SCHED_FIFO application release the cpu.

In general, I don't think waiting for schedule_work() against multiple cpus
is not quicker than short memory reclaim. Adding flush_work() here means
that a context switch is requred before calling direct reclaim. That's bad.
(At leaset, please check __GFP_NOWAIT.)


Please find another way, I think calling synchronous drain here is overkill.
There are not important file caches in the most case and reclaim is quick.
(And async draining runs.)

How about automatically adjusting CHARGE_BATCH and make it small when the
system is near to limit ? or flushing ->stock periodically ?


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-21 10:12     ` KAMEZAWA Hiroyuki
@ 2011-07-21 11:36       ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 09:38:00 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > drain_all_stock_async tries to optimize a work to be done on the work
> > queue by excluding any work for the current CPU because it assumes that
> > the context we are called from already tried to charge from that cache
> > and it's failed so it must be empty already.
> > While the assumption is correct we can do it by checking the current
> > number of pages in the cache. This will also reduce a work on other CPUs
> > with an empty stock.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> 
> At the first look, when a charge against TransParentHugepage() goes
> into the reclaim routine, stock->nr_pages != 0 and this will
> call additional kworker.

True. We will drain a charge which could be used by other allocations
in the meantime so we have a good chance to reclaim less. But how big
problem is that?
I mean I can add a new parameter that would force checking the current
cpu but it doesn't look nice. I cannot add that condition
unconditionally because the code will be shared with the sync path in
the next patch and that one needs to drain _all_ cpus.

What would you suggest?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-21 11:36       ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 09:38:00 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > drain_all_stock_async tries to optimize a work to be done on the work
> > queue by excluding any work for the current CPU because it assumes that
> > the context we are called from already tried to charge from that cache
> > and it's failed so it must be empty already.
> > While the assumption is correct we can do it by checking the current
> > number of pages in the cache. This will also reduce a work on other CPUs
> > with an empty stock.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> 
> At the first look, when a charge against TransParentHugepage() goes
> into the reclaim routine, stock->nr_pages != 0 and this will
> call additional kworker.

True. We will drain a charge which could be used by other allocations
in the meantime so we have a good chance to reclaim less. But how big
problem is that?
I mean I can add a new parameter that would force checking the current
cpu but it doesn't look nice. I cannot add that condition
unconditionally because the code will be shared with the sync path in
the next patch and that one needs to drain _all_ cpus.

What would you suggest?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining
  2011-07-21 10:25     ` KAMEZAWA Hiroyuki
@ 2011-07-21 11:36       ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:25:25, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 09:50:00 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Currently we have two ways how to drain per-CPU caches for charges.
> > drain_all_stock_sync will synchronously drain all caches while
> > drain_all_stock_async will asynchronously drain only those that refer to
> > a given memory cgroup or its subtree in hierarchy.
> > Targeted async draining has been introduced by 26fe6168 (memcg: fix
> > percpu cached charge draining frequency) to reduce the cpu workers
> > number.
> > 
> > sync draining is currently triggered only from mem_cgroup_force_empty
> > which is triggered only by userspace (mem_cgroup_force_empty_write) or
> > when a cgroup is removed (mem_cgroup_pre_destroy). Although these are
> > not usually frequent operations it still makes some sense to do targeted
> > draining as well, especially if the box has many CPUs.
> > 
> > This patch unifies both methods to use the single code (drain_all_stock)
> > which relies on the original async implementation and just adds
> > flush_work to wait on all caches that are still under work for the sync
> > mode.
> > We are using FLUSHING_CACHED_CHARGE bit check to prevent from waiting on
> > a work that we haven't triggered.
> > Please note that both sync and async functions are currently protected
> > by percpu_charge_mutex so we cannot race with other drainers.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> hmm..maybe good.
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining
@ 2011-07-21 11:36       ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:25:25, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 09:50:00 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Currently we have two ways how to drain per-CPU caches for charges.
> > drain_all_stock_sync will synchronously drain all caches while
> > drain_all_stock_async will asynchronously drain only those that refer to
> > a given memory cgroup or its subtree in hierarchy.
> > Targeted async draining has been introduced by 26fe6168 (memcg: fix
> > percpu cached charge draining frequency) to reduce the cpu workers
> > number.
> > 
> > sync draining is currently triggered only from mem_cgroup_force_empty
> > which is triggered only by userspace (mem_cgroup_force_empty_write) or
> > when a cgroup is removed (mem_cgroup_pre_destroy). Although these are
> > not usually frequent operations it still makes some sense to do targeted
> > draining as well, especially if the box has many CPUs.
> > 
> > This patch unifies both methods to use the single code (drain_all_stock)
> > which relies on the original async implementation and just adds
> > flush_work to wait on all caches that are still under work for the sync
> > mode.
> > We are using FLUSHING_CACHED_CHARGE bit check to prevent from waiting on
> > a work that we haven't triggered.
> > Please note that both sync and async functions are currently protected
> > by percpu_charge_mutex so we cannot race with other drainers.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> hmm..maybe good.
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21 10:30     ` KAMEZAWA Hiroyuki
@ 2011-07-21 11:47       ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 11:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 09:58:24 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> > caches draining because we might end up having too many work items.
> > At least this was the case until 26fe6168 (memcg: fix percpu cached
> > charge draining frequency) when we introduced a more targeted draining
> > for async mode.
> > Now that also sync draining is targeted we can safely remove mutex
> > because we will not send more work than the current number of CPUs.
> > FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> > times and stock->nr_pages == 0 protects from pointless sending a work
> > if there is obviously nothing to be done. This is of course racy but we
> > can live with it as the race window is really small (we would have to
> > see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> > non-zero).
> > The only remaining place where we can race is synchronous mode when we
> > rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> > drainer on the same group but we should wait in that case as well.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> A concern.
> 
> > ---
> >  mm/memcontrol.c |   12 ++----------
> >  1 files changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8180cd9..9d49a12 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2065,7 +2065,6 @@ struct memcg_stock_pcp {
> >  #define FLUSHING_CACHED_CHARGE	(0)
> >  };
> >  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > -static DEFINE_MUTEX(percpu_charge_mutex);
> >  
> >  /*
> >   * Try to consume stocked charge on this cpu. If success, one page is consumed
> > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> >  
> >  	for_each_online_cpu(cpu) {
> >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > +		if (root_mem == stock->cached &&
> > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> >  			flush_work(&stock->work);
> 
> Doesn't this new check handle hierarchy ?
> css_is_ancestor() will be required if you do this check.

Yes you are right. Will fix it. I will add a helper for the check.

> BTW, this change should be in other patch, I think.

I have put the change here intentionally because previously we were
protected by the lock so we couldn't race with somebody else so the
check was not necessary.

> 
> Thanks,
> -Kame

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-21 11:47       ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 11:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 09:58:24 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> > caches draining because we might end up having too many work items.
> > At least this was the case until 26fe6168 (memcg: fix percpu cached
> > charge draining frequency) when we introduced a more targeted draining
> > for async mode.
> > Now that also sync draining is targeted we can safely remove mutex
> > because we will not send more work than the current number of CPUs.
> > FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> > times and stock->nr_pages == 0 protects from pointless sending a work
> > if there is obviously nothing to be done. This is of course racy but we
> > can live with it as the race window is really small (we would have to
> > see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> > non-zero).
> > The only remaining place where we can race is synchronous mode when we
> > rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> > drainer on the same group but we should wait in that case as well.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> A concern.
> 
> > ---
> >  mm/memcontrol.c |   12 ++----------
> >  1 files changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8180cd9..9d49a12 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2065,7 +2065,6 @@ struct memcg_stock_pcp {
> >  #define FLUSHING_CACHED_CHARGE	(0)
> >  };
> >  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > -static DEFINE_MUTEX(percpu_charge_mutex);
> >  
> >  /*
> >   * Try to consume stocked charge on this cpu. If success, one page is consumed
> > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> >  
> >  	for_each_online_cpu(cpu) {
> >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > +		if (root_mem == stock->cached &&
> > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> >  			flush_work(&stock->work);
> 
> Doesn't this new check handle hierarchy ?
> css_is_ancestor() will be required if you do this check.

Yes you are right. Will fix it. I will add a helper for the check.

> BTW, this change should be in other patch, I think.

I have put the change here intentionally because previously we were
protected by the lock so we couldn't race with somebody else so the
check was not necessary.

> 
> Thanks,
> -Kame

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
  2011-07-21 10:54     ` KAMEZAWA Hiroyuki
@ 2011-07-21 12:30       ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 12:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:54:11, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 10:28:10 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > If we fail to charge an allocation for a cgroup we usually have to fall
> > back into direct reclaim (mem_cgroup_hierarchical_reclaim).
> > The charging code, however, currently doesn't care about per-cpu charge
> > caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
> > pages (the current cache is already drained, otherwise we wouldn't get
> > to mem_cgroup_do_charge).
> > That can be quite a lot on boxes with big amounts of CPUs so we can end
> > up reclaiming even though there are charges that could be used. This
> > will typically happen in a multi-threaded applications pined to many CPUs
> > which allocates memory heavily.
> > 
> 
> Do you have example and score, numbers on your test ?

As I said, I haven't seen anything that would affect visibly performance
but I have seen situations where we reclaimed even though there were
pre-charges on other CPUs.

> > Currently we are draining caches during reclaim
> > (mem_cgroup_hierarchical_reclaim) but this can be already late as we
> > could have already reclaimed from other groups in the hierarchy.
> > 
> > The solution for this would be to synchronously drain charges early when
> > we fail to charge and retry the charge once more.
> > I think it still makes sense to keep async draining in the reclaim path
> > as it is used from other code paths as well (e.g. limit resize). It will
> > not do any work if we drained previously anyway.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> I don't like this solution, at all.
> 
> Assume 2 cpu SMP, (a special case), and 2 applications running under
> a memcg.
> 
>  - one is running in SCHED_FIFO.
>  - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().
> 
> Then, the application stops until SCHED_FIFO application release the cpu.

It would have to back off during reclaim anyaway (because we check
cond_resched during reclaim), right? 

> In general, I don't think waiting for schedule_work() against multiple cpus
> is not quicker than short memory reclaim. 

You are right, but if you consider small groups then the reclaim can
make the situation much worse.

> Adding flush_work() here means that a context switch is requred before
> calling direct reclaim.

Is that really a problem? We would context switch during reclaim if
there is something else that wants CPU anyway.
Maybe we could drain only if we get a reasonable number of pages back?
This would require two passes over per-cpu caches to find the number -
not nice. Or we could drain only those caches that have at least some
threshold of pages.

> That's bad. (At leaset, please check __GFP_NOWAIT.)

Definitely a good idea. Fixed.

> Please find another way, I think calling synchronous drain here is overkill.
> There are not important file caches in the most case and reclaim is quick.

This is, however, really hard to know in advance. If there are used-once
unmaped file pages then it is much easier to reclaim them for sure.
Maybe I could check the statistics and decide whether to drain according
pages we have in the group. Let me think about that.

> (And async draining runs.)
> 
> How about automatically adjusting CHARGE_BATCH and make it small when the
> system is near to limit ? 

Hmm, we are already bypassing batching if we are close to the limit,
aren't we? If we get to the reclaim we fallback to nr_pages allocation
and so we do not refill the stock.
Maybe we could check how much we have reclaimed and update the batch
size accordingly.

> or flushing ->stock periodically ?
> 
> 
> Thanks,
> -Kame

Thanks!
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
@ 2011-07-21 12:30       ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 12:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 19:54:11, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 10:28:10 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > If we fail to charge an allocation for a cgroup we usually have to fall
> > back into direct reclaim (mem_cgroup_hierarchical_reclaim).
> > The charging code, however, currently doesn't care about per-cpu charge
> > caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
> > pages (the current cache is already drained, otherwise we wouldn't get
> > to mem_cgroup_do_charge).
> > That can be quite a lot on boxes with big amounts of CPUs so we can end
> > up reclaiming even though there are charges that could be used. This
> > will typically happen in a multi-threaded applications pined to many CPUs
> > which allocates memory heavily.
> > 
> 
> Do you have example and score, numbers on your test ?

As I said, I haven't seen anything that would affect visibly performance
but I have seen situations where we reclaimed even though there were
pre-charges on other CPUs.

> > Currently we are draining caches during reclaim
> > (mem_cgroup_hierarchical_reclaim) but this can be already late as we
> > could have already reclaimed from other groups in the hierarchy.
> > 
> > The solution for this would be to synchronously drain charges early when
> > we fail to charge and retry the charge once more.
> > I think it still makes sense to keep async draining in the reclaim path
> > as it is used from other code paths as well (e.g. limit resize). It will
> > not do any work if we drained previously anyway.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> I don't like this solution, at all.
> 
> Assume 2 cpu SMP, (a special case), and 2 applications running under
> a memcg.
> 
>  - one is running in SCHED_FIFO.
>  - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().
> 
> Then, the application stops until SCHED_FIFO application release the cpu.

It would have to back off during reclaim anyaway (because we check
cond_resched during reclaim), right? 

> In general, I don't think waiting for schedule_work() against multiple cpus
> is not quicker than short memory reclaim. 

You are right, but if you consider small groups then the reclaim can
make the situation much worse.

> Adding flush_work() here means that a context switch is requred before
> calling direct reclaim.

Is that really a problem? We would context switch during reclaim if
there is something else that wants CPU anyway.
Maybe we could drain only if we get a reasonable number of pages back?
This would require two passes over per-cpu caches to find the number -
not nice. Or we could drain only those caches that have at least some
threshold of pages.

> That's bad. (At leaset, please check __GFP_NOWAIT.)

Definitely a good idea. Fixed.

> Please find another way, I think calling synchronous drain here is overkill.
> There are not important file caches in the most case and reclaim is quick.

This is, however, really hard to know in advance. If there are used-once
unmaped file pages then it is much easier to reclaim them for sure.
Maybe I could check the statistics and decide whether to drain according
pages we have in the group. Let me think about that.

> (And async draining runs.)
> 
> How about automatically adjusting CHARGE_BATCH and make it small when the
> system is near to limit ? 

Hmm, we are already bypassing batching if we are close to the limit,
aren't we? If we get to the reclaim we fallback to nr_pages allocation
and so we do not refill the stock.
Maybe we could check how much we have reclaimed and update the batch
size accordingly.

> or flushing ->stock periodically ?
> 
> 
> Thanks,
> -Kame

Thanks!
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21 11:47       ` Michal Hocko
@ 2011-07-21 12:42         ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 12:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 09:58:24 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > >  
> > >  	for_each_online_cpu(cpu) {
> > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > +		if (root_mem == stock->cached &&
> > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > >  			flush_work(&stock->work);
> > 
> > Doesn't this new check handle hierarchy ?
> > css_is_ancestor() will be required if you do this check.
> 
> Yes you are right. Will fix it. I will add a helper for the check.

Here is the patch with the helper. The above will then read 
	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))

---
>From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Thu, 21 Jul 2011 13:54:13 +0200
Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper

We are checking whether a given two groups are same or at least in the
same subtree of a hierarchy at several places. Let's make a helper for
it to make code easier to read.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   29 ++++++++++++++++++-----------
 1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8180cd9..8dbb9d6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1046,6 +1046,21 @@ void mem_cgroup_move_lists(struct page *page,
 	mem_cgroup_add_lru_list(page, to);
 }
 
+/*
+ * Checks whether given mem is same or in the root_mem's
+ * hierarchy subtree
+ */
+static bool mem_cgroup_same_or_subtree(const struct mem_cgroup *root_mem,
+		struct mem_cgroup *mem)
+{
+	if (root_mem != mem) {
+		return (root_mem->use_hierarchy &&
+			css_is_ancestor(&mem->css, &root_mem->css));
+	}
+
+	return true;
+}
+
 int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 {
 	int ret;
@@ -1065,10 +1080,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
 	 * hierarchy(even if use_hierarchy is disabled in "mem").
 	 */
-	if (mem->use_hierarchy)
-		ret = css_is_ancestor(&curr->css, &mem->css);
-	else
-		ret = (curr == mem);
+	ret = mem_cgroup_same_or_subtree(mem, curr);
 	css_put(&curr->css);
 	return ret;
 }
@@ -2150,13 +2162,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
 		mem = stock->cached;
 		if (!mem || !stock->nr_pages)
 			continue;
-		if (mem != root_mem) {
-			if (!root_mem->use_hierarchy)
-				continue;
-			/* check whether "mem" is under tree of "root_mem" */
-			if (!css_is_ancestor(&mem->css, &root_mem->css))
-				continue;
-		}
+		if (!mem_cgroup_same_or_subtree(root_mem, mem))
+			continue;
 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
 			schedule_work_on(cpu, &stock->work);
 	}
-- 
1.7.5.4


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-21 12:42         ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-21 12:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 09:58:24 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > >  
> > >  	for_each_online_cpu(cpu) {
> > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > +		if (root_mem == stock->cached &&
> > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > >  			flush_work(&stock->work);
> > 
> > Doesn't this new check handle hierarchy ?
> > css_is_ancestor() will be required if you do this check.
> 
> Yes you are right. Will fix it. I will add a helper for the check.

Here is the patch with the helper. The above will then read 
	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))

---

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-21 11:36       ` Michal Hocko
@ 2011-07-21 23:44         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 23:44 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 13:36:06 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 09:38:00 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > drain_all_stock_async tries to optimize a work to be done on the work
> > > queue by excluding any work for the current CPU because it assumes that
> > > the context we are called from already tried to charge from that cache
> > > and it's failed so it must be empty already.
> > > While the assumption is correct we can do it by checking the current
> > > number of pages in the cache. This will also reduce a work on other CPUs
> > > with an empty stock.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > 
> > 
> > At the first look, when a charge against TransParentHugepage() goes
> > into the reclaim routine, stock->nr_pages != 0 and this will
> > call additional kworker.
> 
> True. We will drain a charge which could be used by other allocations
> in the meantime so we have a good chance to reclaim less. But how big
> problem is that?
> I mean I can add a new parameter that would force checking the current
> cpu but it doesn't look nice. I cannot add that condition
> unconditionally because the code will be shared with the sync path in
> the next patch and that one needs to drain _all_ cpus.
> 
> What would you suggest?
By 2 methods

 - just check nr_pages. 
 - drain "local stock" without calling schedule_work(). It's fast.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-21 23:44         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 23:44 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 13:36:06 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 09:38:00 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > drain_all_stock_async tries to optimize a work to be done on the work
> > > queue by excluding any work for the current CPU because it assumes that
> > > the context we are called from already tried to charge from that cache
> > > and it's failed so it must be empty already.
> > > While the assumption is correct we can do it by checking the current
> > > number of pages in the cache. This will also reduce a work on other CPUs
> > > with an empty stock.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > 
> > 
> > At the first look, when a charge against TransParentHugepage() goes
> > into the reclaim routine, stock->nr_pages != 0 and this will
> > call additional kworker.
> 
> True. We will drain a charge which could be used by other allocations
> in the meantime so we have a good chance to reclaim less. But how big
> problem is that?
> I mean I can add a new parameter that would force checking the current
> cpu but it doesn't look nice. I cannot add that condition
> unconditionally because the code will be shared with the sync path in
> the next patch and that one needs to drain _all_ cpus.
> 
> What would you suggest?
By 2 methods

 - just check nr_pages. 
 - drain "local stock" without calling schedule_work(). It's fast.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21 12:42         ` Michal Hocko
@ 2011-07-21 23:49           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 23:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 14:42:23 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> [...]
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > >  
> > > >  	for_each_online_cpu(cpu) {
> > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > +		if (root_mem == stock->cached &&
> > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > >  			flush_work(&stock->work);
> > > 
> > > Doesn't this new check handle hierarchy ?
> > > css_is_ancestor() will be required if you do this check.
> > 
> > Yes you are right. Will fix it. I will add a helper for the check.
> 
> Here is the patch with the helper. The above will then read 
> 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> 
> ---
> From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 21 Jul 2011 13:54:13 +0200
> Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper
> 
> We are checking whether a given two groups are same or at least in the
> same subtree of a hierarchy at several places. Let's make a helper for
> it to make code easier to read.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-21 23:49           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 23:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 14:42:23 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> [...]
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > >  
> > > >  	for_each_online_cpu(cpu) {
> > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > +		if (root_mem == stock->cached &&
> > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > >  			flush_work(&stock->work);
> > > 
> > > Doesn't this new check handle hierarchy ?
> > > css_is_ancestor() will be required if you do this check.
> > 
> > Yes you are right. Will fix it. I will add a helper for the check.
> 
> Here is the patch with the helper. The above will then read 
> 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> 
> ---
> From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 21 Jul 2011 13:54:13 +0200
> Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper
> 
> We are checking whether a given two groups are same or at least in the
> same subtree of a hierarchy at several places. Let's make a helper for
> it to make code easier to read.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
  2011-07-21 12:30       ` Michal Hocko
@ 2011-07-21 23:56         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 23:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 14:30:12 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 19:54:11, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 10:28:10 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > If we fail to charge an allocation for a cgroup we usually have to fall
> > > back into direct reclaim (mem_cgroup_hierarchical_reclaim).
> > > The charging code, however, currently doesn't care about per-cpu charge
> > > caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
> > > pages (the current cache is already drained, otherwise we wouldn't get
> > > to mem_cgroup_do_charge).
> > > That can be quite a lot on boxes with big amounts of CPUs so we can end
> > > up reclaiming even though there are charges that could be used. This
> > > will typically happen in a multi-threaded applications pined to many CPUs
> > > which allocates memory heavily.
> > > 
> > 
> > Do you have example and score, numbers on your test ?
> 
> As I said, I haven't seen anything that would affect visibly performance
> but I have seen situations where we reclaimed even though there were
> pre-charges on other CPUs.
> 
> > > Currently we are draining caches during reclaim
> > > (mem_cgroup_hierarchical_reclaim) but this can be already late as we
> > > could have already reclaimed from other groups in the hierarchy.
> > > 
> > > The solution for this would be to synchronously drain charges early when
> > > we fail to charge and retry the charge once more.
> > > I think it still makes sense to keep async draining in the reclaim path
> > > as it is used from other code paths as well (e.g. limit resize). It will
> > > not do any work if we drained previously anyway.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > 
> > I don't like this solution, at all.
> > 
> > Assume 2 cpu SMP, (a special case), and 2 applications running under
> > a memcg.
> > 
> >  - one is running in SCHED_FIFO.
> >  - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().
> > 
> > Then, the application stops until SCHED_FIFO application release the cpu.
> 
> It would have to back off during reclaim anyaway (because we check
> cond_resched during reclaim), right? 
> 

just have cond_resched() on a cpu which calls some reclaim stuff. It will no help.


> > In general, I don't think waiting for schedule_work() against multiple cpus
> > is not quicker than short memory reclaim. 
> 
> You are right, but if you consider small groups then the reclaim can
> make the situation much worse.
> 

If the system has many memory and the container has many cgroup, memory is not
small because ...to use cpu properly, you need memroy. It's a mis-configuration.



> > Adding flush_work() here means that a context switch is requred before
> > calling direct reclaim.
> 
> Is that really a problem? We would context switch during reclaim if
> there is something else that wants CPU anyway.
> Maybe we could drain only if we get a reasonable number of pages back?
> This would require two passes over per-cpu caches to find the number -
> not nice. Or we could drain only those caches that have at least some
> threshold of pages.
> 
> > That's bad. (At leaset, please check __GFP_NOWAIT.)
> 
> Definitely a good idea. Fixed.
> 
> > Please find another way, I think calling synchronous drain here is overkill.
> > There are not important file caches in the most case and reclaim is quick.
> 
> This is, however, really hard to know in advance. If there are used-once
> unmaped file pages then it is much easier to reclaim them for sure.
> Maybe I could check the statistics and decide whether to drain according
> pages we have in the group. Let me think about that.
> 
> > (And async draining runs.)
> > 
> > How about automatically adjusting CHARGE_BATCH and make it small when the
> > system is near to limit ? 
> 
> Hmm, we are already bypassing batching if we are close to the limit,
> aren't we? If we get to the reclaim we fallback to nr_pages allocation
> and so we do not refill the stock.
> Maybe we could check how much we have reclaimed and update the batch
> size accordingly.
> 

Please wait until "background reclaim" stuff. I don't stop it and it will
make this cpu-caching stuff better because we can drain before hitting
limit.

If you cannot wait....

One idea is to have a threshold to call async "drain". For example,

 threshould = limit_of_memory - nr_online_cpu() * (BATCH_SIZE + 1)

 if (usage > threshould)
	drain_all_stock_async().

Then, situation will be much better.



Thanks,
-Kame




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
@ 2011-07-21 23:56         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-21 23:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Thu, 21 Jul 2011 14:30:12 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 19:54:11, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 10:28:10 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > If we fail to charge an allocation for a cgroup we usually have to fall
> > > back into direct reclaim (mem_cgroup_hierarchical_reclaim).
> > > The charging code, however, currently doesn't care about per-cpu charge
> > > caches which might have up to (nr_cpus - 1) * CHARGE_BATCH pre charged
> > > pages (the current cache is already drained, otherwise we wouldn't get
> > > to mem_cgroup_do_charge).
> > > That can be quite a lot on boxes with big amounts of CPUs so we can end
> > > up reclaiming even though there are charges that could be used. This
> > > will typically happen in a multi-threaded applications pined to many CPUs
> > > which allocates memory heavily.
> > > 
> > 
> > Do you have example and score, numbers on your test ?
> 
> As I said, I haven't seen anything that would affect visibly performance
> but I have seen situations where we reclaimed even though there were
> pre-charges on other CPUs.
> 
> > > Currently we are draining caches during reclaim
> > > (mem_cgroup_hierarchical_reclaim) but this can be already late as we
> > > could have already reclaimed from other groups in the hierarchy.
> > > 
> > > The solution for this would be to synchronously drain charges early when
> > > we fail to charge and retry the charge once more.
> > > I think it still makes sense to keep async draining in the reclaim path
> > > as it is used from other code paths as well (e.g. limit resize). It will
> > > not do any work if we drained previously anyway.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > 
> > I don't like this solution, at all.
> > 
> > Assume 2 cpu SMP, (a special case), and 2 applications running under
> > a memcg.
> > 
> >  - one is running in SCHED_FIFO.
> >  - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().
> > 
> > Then, the application stops until SCHED_FIFO application release the cpu.
> 
> It would have to back off during reclaim anyaway (because we check
> cond_resched during reclaim), right? 
> 

just have cond_resched() on a cpu which calls some reclaim stuff. It will no help.


> > In general, I don't think waiting for schedule_work() against multiple cpus
> > is not quicker than short memory reclaim. 
> 
> You are right, but if you consider small groups then the reclaim can
> make the situation much worse.
> 

If the system has many memory and the container has many cgroup, memory is not
small because ...to use cpu properly, you need memroy. It's a mis-configuration.



> > Adding flush_work() here means that a context switch is requred before
> > calling direct reclaim.
> 
> Is that really a problem? We would context switch during reclaim if
> there is something else that wants CPU anyway.
> Maybe we could drain only if we get a reasonable number of pages back?
> This would require two passes over per-cpu caches to find the number -
> not nice. Or we could drain only those caches that have at least some
> threshold of pages.
> 
> > That's bad. (At leaset, please check __GFP_NOWAIT.)
> 
> Definitely a good idea. Fixed.
> 
> > Please find another way, I think calling synchronous drain here is overkill.
> > There are not important file caches in the most case and reclaim is quick.
> 
> This is, however, really hard to know in advance. If there are used-once
> unmaped file pages then it is much easier to reclaim them for sure.
> Maybe I could check the statistics and decide whether to drain according
> pages we have in the group. Let me think about that.
> 
> > (And async draining runs.)
> > 
> > How about automatically adjusting CHARGE_BATCH and make it small when the
> > system is near to limit ? 
> 
> Hmm, we are already bypassing batching if we are close to the limit,
> aren't we? If we get to the reclaim we fallback to nr_pages allocation
> and so we do not refill the stock.
> Maybe we could check how much we have reclaimed and update the batch
> size accordingly.
> 

Please wait until "background reclaim" stuff. I don't stop it and it will
make this cpu-caching stuff better because we can drain before hitting
limit.

If you cannot wait....

One idea is to have a threshold to call async "drain". For example,

 threshould = limit_of_memory - nr_online_cpu() * (BATCH_SIZE + 1)

 if (usage > threshould)
	drain_all_stock_async().

Then, situation will be much better.



Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
  2011-07-21 23:56         ` KAMEZAWA Hiroyuki
@ 2011-07-22  0:18           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-22  0:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri, 22 Jul 2011 08:56:52 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 21 Jul 2011 14:30:12 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
 
> Please wait until "background reclaim" stuff. I don't stop it and it will
> make this cpu-caching stuff better because we can drain before hitting
> limit.
> 
> If you cannot wait....
> 
> One idea is to have a threshold to call async "drain". For example,
> 
>  threshould = limit_of_memory - nr_online_cpu() * (BATCH_SIZE + 1)
> 
>  if (usage > threshould)
> 	drain_all_stock_async().
> 
> Then, situation will be much better.
> 

Of course, frequency of this call can be controlled by event counter.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
@ 2011-07-22  0:18           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-22  0:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri, 22 Jul 2011 08:56:52 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 21 Jul 2011 14:30:12 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
 
> Please wait until "background reclaim" stuff. I don't stop it and it will
> make this cpu-caching stuff better because we can drain before hitting
> limit.
> 
> If you cannot wait....
> 
> One idea is to have a threshold to call async "drain". For example,
> 
>  threshould = limit_of_memory - nr_online_cpu() * (BATCH_SIZE + 1)
> 
>  if (usage > threshould)
> 	drain_all_stock_async().
> 
> Then, situation will be much better.
> 

Of course, frequency of this call can be controlled by event counter.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21 12:42         ` Michal Hocko
@ 2011-07-22  0:27           ` Daisuke Nishimura
  -1 siblings, 0 replies; 47+ messages in thread
From: Daisuke Nishimura @ 2011-07-22  0:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, linux-mm, Balbir Singh, Daisuke Nishimura,
	linux-kernel

On Thu, 21 Jul 2011 14:42:23 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> [...]
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > >  
> > > >  	for_each_online_cpu(cpu) {
> > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > +		if (root_mem == stock->cached &&
> > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > >  			flush_work(&stock->work);
> > > 
> > > Doesn't this new check handle hierarchy ?
> > > css_is_ancestor() will be required if you do this check.
> > 
> > Yes you are right. Will fix it. I will add a helper for the check.
> 
> Here is the patch with the helper. The above will then read 
> 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> 
I welcome this new helper function, but it can be used in
memcg_oom_wake_function() and mem_cgroup_under_move() too, can't it ?

Thanks,
Daisuke Nishimura.

> ---
> From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 21 Jul 2011 13:54:13 +0200
> Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper
> 
> We are checking whether a given two groups are same or at least in the
> same subtree of a hierarchy at several places. Let's make a helper for
> it to make code easier to read.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/memcontrol.c |   29 ++++++++++++++++++-----------
>  1 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8180cd9..8dbb9d6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1046,6 +1046,21 @@ void mem_cgroup_move_lists(struct page *page,
>  	mem_cgroup_add_lru_list(page, to);
>  }
>  
> +/*
> + * Checks whether given mem is same or in the root_mem's
> + * hierarchy subtree
> + */
> +static bool mem_cgroup_same_or_subtree(const struct mem_cgroup *root_mem,
> +		struct mem_cgroup *mem)
> +{
> +	if (root_mem != mem) {
> +		return (root_mem->use_hierarchy &&
> +			css_is_ancestor(&mem->css, &root_mem->css));
> +	}
> +
> +	return true;
> +}
> +
>  int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  {
>  	int ret;
> @@ -1065,10 +1080,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
>  	 * hierarchy(even if use_hierarchy is disabled in "mem").
>  	 */
> -	if (mem->use_hierarchy)
> -		ret = css_is_ancestor(&curr->css, &mem->css);
> -	else
> -		ret = (curr == mem);
> +	ret = mem_cgroup_same_or_subtree(mem, curr);
>  	css_put(&curr->css);
>  	return ret;
>  }
> @@ -2150,13 +2162,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>  		mem = stock->cached;
>  		if (!mem || !stock->nr_pages)
>  			continue;
> -		if (mem != root_mem) {
> -			if (!root_mem->use_hierarchy)
> -				continue;
> -			/* check whether "mem" is under tree of "root_mem" */
> -			if (!css_is_ancestor(&mem->css, &root_mem->css))
> -				continue;
> -		}
> +		if (!mem_cgroup_same_or_subtree(root_mem, mem))
> +			continue;
>  		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
>  			schedule_work_on(cpu, &stock->work);
>  	}
> -- 
> 1.7.5.4
> 
> 
> -- 
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9    
> Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-22  0:27           ` Daisuke Nishimura
  0 siblings, 0 replies; 47+ messages in thread
From: Daisuke Nishimura @ 2011-07-22  0:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, linux-mm, Balbir Singh, Daisuke Nishimura,
	linux-kernel

On Thu, 21 Jul 2011 14:42:23 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> [...]
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > >  
> > > >  	for_each_online_cpu(cpu) {
> > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > +		if (root_mem == stock->cached &&
> > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > >  			flush_work(&stock->work);
> > > 
> > > Doesn't this new check handle hierarchy ?
> > > css_is_ancestor() will be required if you do this check.
> > 
> > Yes you are right. Will fix it. I will add a helper for the check.
> 
> Here is the patch with the helper. The above will then read 
> 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> 
I welcome this new helper function, but it can be used in
memcg_oom_wake_function() and mem_cgroup_under_move() too, can't it ?

Thanks,
Daisuke Nishimura.

> ---
> From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 21 Jul 2011 13:54:13 +0200
> Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper
> 
> We are checking whether a given two groups are same or at least in the
> same subtree of a hierarchy at several places. Let's make a helper for
> it to make code easier to read.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/memcontrol.c |   29 ++++++++++++++++++-----------
>  1 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8180cd9..8dbb9d6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1046,6 +1046,21 @@ void mem_cgroup_move_lists(struct page *page,
>  	mem_cgroup_add_lru_list(page, to);
>  }
>  
> +/*
> + * Checks whether given mem is same or in the root_mem's
> + * hierarchy subtree
> + */
> +static bool mem_cgroup_same_or_subtree(const struct mem_cgroup *root_mem,
> +		struct mem_cgroup *mem)
> +{
> +	if (root_mem != mem) {
> +		return (root_mem->use_hierarchy &&
> +			css_is_ancestor(&mem->css, &root_mem->css));
> +	}
> +
> +	return true;
> +}
> +
>  int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  {
>  	int ret;
> @@ -1065,10 +1080,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
>  	 * hierarchy(even if use_hierarchy is disabled in "mem").
>  	 */
> -	if (mem->use_hierarchy)
> -		ret = css_is_ancestor(&curr->css, &mem->css);
> -	else
> -		ret = (curr == mem);
> +	ret = mem_cgroup_same_or_subtree(mem, curr);
>  	css_put(&curr->css);
>  	return ret;
>  }
> @@ -2150,13 +2162,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>  		mem = stock->cached;
>  		if (!mem || !stock->nr_pages)
>  			continue;
> -		if (mem != root_mem) {
> -			if (!root_mem->use_hierarchy)
> -				continue;
> -			/* check whether "mem" is under tree of "root_mem" */
> -			if (!css_is_ancestor(&mem->css, &root_mem->css))
> -				continue;
> -		}
> +		if (!mem_cgroup_same_or_subtree(root_mem, mem))
> +			continue;
>  		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
>  			schedule_work_on(cpu, &stock->work);
>  	}
> -- 
> 1.7.5.4
> 
> 
> -- 
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9    
> Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-21 23:44         ` KAMEZAWA Hiroyuki
@ 2011-07-22  9:19           ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 13:36:06 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 09:38:00 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > drain_all_stock_async tries to optimize a work to be done on the work
> > > > queue by excluding any work for the current CPU because it assumes that
> > > > the context we are called from already tried to charge from that cache
> > > > and it's failed so it must be empty already.
> > > > While the assumption is correct we can do it by checking the current
> > > > number of pages in the cache. This will also reduce a work on other CPUs
> > > > with an empty stock.
> > > > 
> > > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > 
> > > 
> > > At the first look, when a charge against TransParentHugepage() goes
> > > into the reclaim routine, stock->nr_pages != 0 and this will
> > > call additional kworker.
> > 
> > True. We will drain a charge which could be used by other allocations
> > in the meantime so we have a good chance to reclaim less. But how big
> > problem is that?
> > I mean I can add a new parameter that would force checking the current
> > cpu but it doesn't look nice. I cannot add that condition
> > unconditionally because the code will be shared with the sync path in
> > the next patch and that one needs to drain _all_ cpus.
> > 
> > What would you suggest?
> By 2 methods
> 
>  - just check nr_pages. 

Not sure I understand which nr_pages you mean. The one that comes from
the charging path or stock->nr_pages?
If you mean the first one then we do not have in the reclaim path where
we call drain_all_stock_async.

>  - drain "local stock" without calling schedule_work(). It's fast.

but there is nothing to be drained locally in the paths where we call
drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
use work queue at all?

> 
> Thanks,
> -Kame

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-22  9:19           ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 13:36:06 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 09:38:00 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > drain_all_stock_async tries to optimize a work to be done on the work
> > > > queue by excluding any work for the current CPU because it assumes that
> > > > the context we are called from already tried to charge from that cache
> > > > and it's failed so it must be empty already.
> > > > While the assumption is correct we can do it by checking the current
> > > > number of pages in the cache. This will also reduce a work on other CPUs
> > > > with an empty stock.
> > > > 
> > > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > 
> > > 
> > > At the first look, when a charge against TransParentHugepage() goes
> > > into the reclaim routine, stock->nr_pages != 0 and this will
> > > call additional kworker.
> > 
> > True. We will drain a charge which could be used by other allocations
> > in the meantime so we have a good chance to reclaim less. But how big
> > problem is that?
> > I mean I can add a new parameter that would force checking the current
> > cpu but it doesn't look nice. I cannot add that condition
> > unconditionally because the code will be shared with the sync path in
> > the next patch and that one needs to drain _all_ cpus.
> > 
> > What would you suggest?
> By 2 methods
> 
>  - just check nr_pages. 

Not sure I understand which nr_pages you mean. The one that comes from
the charging path or stock->nr_pages?
If you mean the first one then we do not have in the reclaim path where
we call drain_all_stock_async.

>  - drain "local stock" without calling schedule_work(). It's fast.

but there is nothing to be drained locally in the paths where we call
drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
use work queue at all?

> 
> Thanks,
> -Kame

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-21 23:49           ` KAMEZAWA Hiroyuki
@ 2011-07-22  9:21             ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 08:49:27, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 14:42:23 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > > Michal Hocko <mhocko@suse.cz> wrote:
> > [...]
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > > >  
> > > > >  	for_each_online_cpu(cpu) {
> > > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > > +		if (root_mem == stock->cached &&
> > > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > >  			flush_work(&stock->work);
> > > > 
> > > > Doesn't this new check handle hierarchy ?
> > > > css_is_ancestor() will be required if you do this check.
> > > 
> > > Yes you are right. Will fix it. I will add a helper for the check.
> > 
> > Here is the patch with the helper. The above will then read 
> > 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> > 
> > ---
> > From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Thu, 21 Jul 2011 13:54:13 +0200
> > Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper
> > 
> > We are checking whether a given two groups are same or at least in the
> > same subtree of a hierarchy at several places. Let's make a helper for
> > it to make code easier to read.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-22  9:21             ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 08:49:27, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 14:42:23 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > > Michal Hocko <mhocko@suse.cz> wrote:
> > [...]
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > > >  
> > > > >  	for_each_online_cpu(cpu) {
> > > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > > +		if (root_mem == stock->cached &&
> > > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > >  			flush_work(&stock->work);
> > > > 
> > > > Doesn't this new check handle hierarchy ?
> > > > css_is_ancestor() will be required if you do this check.
> > > 
> > > Yes you are right. Will fix it. I will add a helper for the check.
> > 
> > Here is the patch with the helper. The above will then read 
> > 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> > 
> > ---
> > From b963a9f4dac61044daac49700f84b7819d7c2f53 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Thu, 21 Jul 2011 13:54:13 +0200
> > Subject: [PATCH] memcg: add mem_cgroup_same_or_subtree helper
> > 
> > We are checking whether a given two groups are same or at least in the
> > same subtree of a hierarchy at several places. Let's make a helper for
> > it to make code easier to read.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-22  9:19           ` Michal Hocko
@ 2011-07-22  9:28             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-22  9:28 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri, 22 Jul 2011 11:19:36 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 13:36:06 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 09:38:00 +0200
> > > > Michal Hocko <mhocko@suse.cz> wrote:
> > > > 
> > > > > drain_all_stock_async tries to optimize a work to be done on the work
> > > > > queue by excluding any work for the current CPU because it assumes that
> > > > > the context we are called from already tried to charge from that cache
> > > > > and it's failed so it must be empty already.
> > > > > While the assumption is correct we can do it by checking the current
> > > > > number of pages in the cache. This will also reduce a work on other CPUs
> > > > > with an empty stock.
> > > > > 
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > > 
> > > > 
> > > > At the first look, when a charge against TransParentHugepage() goes
> > > > into the reclaim routine, stock->nr_pages != 0 and this will
> > > > call additional kworker.
> > > 
> > > True. We will drain a charge which could be used by other allocations
> > > in the meantime so we have a good chance to reclaim less. But how big
> > > problem is that?
> > > I mean I can add a new parameter that would force checking the current
> > > cpu but it doesn't look nice. I cannot add that condition
> > > unconditionally because the code will be shared with the sync path in
> > > the next patch and that one needs to drain _all_ cpus.
> > > 
> > > What would you suggest?
> > By 2 methods
> > 
> >  - just check nr_pages. 
> 
> Not sure I understand which nr_pages you mean. The one that comes from
> the charging path or stock->nr_pages?
> If you mean the first one then we do not have in the reclaim path where
> we call drain_all_stock_async.
> 

stock->nr_pages.

> >  - drain "local stock" without calling schedule_work(). It's fast.
> 
> but there is nothing to be drained locally in the paths where we call
> drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
> use work queue at all?

I mean calling schedule_work against local cpu is just waste of time.
Then, drain it directly and move local cpu's stock->nr_pages to res_counter.

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-22  9:28             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 47+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-22  9:28 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri, 22 Jul 2011 11:19:36 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Jul 2011 13:36:06 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 09:38:00 +0200
> > > > Michal Hocko <mhocko@suse.cz> wrote:
> > > > 
> > > > > drain_all_stock_async tries to optimize a work to be done on the work
> > > > > queue by excluding any work for the current CPU because it assumes that
> > > > > the context we are called from already tried to charge from that cache
> > > > > and it's failed so it must be empty already.
> > > > > While the assumption is correct we can do it by checking the current
> > > > > number of pages in the cache. This will also reduce a work on other CPUs
> > > > > with an empty stock.
> > > > > 
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > > 
> > > > 
> > > > At the first look, when a charge against TransParentHugepage() goes
> > > > into the reclaim routine, stock->nr_pages != 0 and this will
> > > > call additional kworker.
> > > 
> > > True. We will drain a charge which could be used by other allocations
> > > in the meantime so we have a good chance to reclaim less. But how big
> > > problem is that?
> > > I mean I can add a new parameter that would force checking the current
> > > cpu but it doesn't look nice. I cannot add that condition
> > > unconditionally because the code will be shared with the sync path in
> > > the next patch and that one needs to drain _all_ cpus.
> > > 
> > > What would you suggest?
> > By 2 methods
> > 
> >  - just check nr_pages. 
> 
> Not sure I understand which nr_pages you mean. The one that comes from
> the charging path or stock->nr_pages?
> If you mean the first one then we do not have in the reclaim path where
> we call drain_all_stock_async.
> 

stock->nr_pages.

> >  - drain "local stock" without calling schedule_work(). It's fast.
> 
> but there is nothing to be drained locally in the paths where we call
> drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
> use work queue at all?

I mean calling schedule_work against local cpu is just waste of time.
Then, drain it directly and move local cpu's stock->nr_pages to res_counter.

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
  2011-07-22  0:27           ` Daisuke Nishimura
@ 2011-07-22  9:41             ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:41 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: KAMEZAWA Hiroyuki, linux-mm, Balbir Singh, linux-kernel

On Fri 22-07-11 09:27:59, Daisuke Nishimura wrote:
> On Thu, 21 Jul 2011 14:42:23 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > > Michal Hocko <mhocko@suse.cz> wrote:
> > [...]
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > > >  
> > > > >  	for_each_online_cpu(cpu) {
> > > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > > +		if (root_mem == stock->cached &&
> > > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > >  			flush_work(&stock->work);
> > > > 
> > > > Doesn't this new check handle hierarchy ?
> > > > css_is_ancestor() will be required if you do this check.
> > > 
> > > Yes you are right. Will fix it. I will add a helper for the check.
> > 
> > Here is the patch with the helper. The above will then read 
> > 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> > 
> I welcome this new helper function, but it can be used in
> memcg_oom_wake_function() and mem_cgroup_under_move() too, can't it ?

Sure. Incremental patch (I will fold it into the one above):
--- 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dbb9d6..64569c7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1416,10 +1416,9 @@ static bool mem_cgroup_under_move(struct mem_cgroup *mem)
 	to = mc.to;
 	if (!from)
 		goto unlock;
-	if (from == mem || to == mem
-	    || (mem->use_hierarchy && css_is_ancestor(&from->css, &mem->css))
-	    || (mem->use_hierarchy && css_is_ancestor(&to->css,	&mem->css)))
-		ret = true;
+
+	ret = mem_cgroup_same_or_subtree(mem, from)
+		|| mem_cgroup_same_or_subtree(mem, to);
 unlock:
 	spin_unlock(&mc.lock);
 	return ret;
@@ -1906,25 +1905,20 @@ struct oom_wait_info {
 static int memcg_oom_wake_function(wait_queue_t *wait,
 	unsigned mode, int sync, void *arg)
 {
-	struct mem_cgroup *wake_mem = (struct mem_cgroup *)arg;
+	struct mem_cgroup *wake_mem = (struct mem_cgroup *)arg,
+			  *oom_wait_mem;
 	struct oom_wait_info *oom_wait_info;
 
 	oom_wait_info = container_of(wait, struct oom_wait_info, wait);
+	oom_wait_mem = oom_wait_info->mem;
 
-	if (oom_wait_info->mem == wake_mem)
-		goto wakeup;
-	/* if no hierarchy, no match */
-	if (!oom_wait_info->mem->use_hierarchy || !wake_mem->use_hierarchy)
-		return 0;
 	/*
 	 * Both of oom_wait_info->mem and wake_mem are stable under us.
 	 * Then we can use css_is_ancestor without taking care of RCU.
 	 */
-	if (!css_is_ancestor(&oom_wait_info->mem->css, &wake_mem->css) &&
-	    !css_is_ancestor(&wake_mem->css, &oom_wait_info->mem->css))
+	if (!mem_cgroup_same_or_subtree(oom_wait_mem, wake_mem)
+			&& !mem_cgroup_same_or_subtree(wake_mem, oom_wait_mem))
 		return 0;
-
-wakeup:
 	return autoremove_wake_function(wait, mode, sync, arg);
 }
 
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock
@ 2011-07-22  9:41             ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:41 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: KAMEZAWA Hiroyuki, linux-mm, Balbir Singh, linux-kernel

On Fri 22-07-11 09:27:59, Daisuke Nishimura wrote:
> On Thu, 21 Jul 2011 14:42:23 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 13:47:04, Michal Hocko wrote:
> > > On Thu 21-07-11 19:30:51, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 09:58:24 +0200
> > > > Michal Hocko <mhocko@suse.cz> wrote:
> > [...]
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -2166,7 +2165,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
> > > > >  
> > > > >  	for_each_online_cpu(cpu) {
> > > > >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > > > > -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > > +		if (root_mem == stock->cached &&
> > > > > +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> > > > >  			flush_work(&stock->work);
> > > > 
> > > > Doesn't this new check handle hierarchy ?
> > > > css_is_ancestor() will be required if you do this check.
> > > 
> > > Yes you are right. Will fix it. I will add a helper for the check.
> > 
> > Here is the patch with the helper. The above will then read 
> > 	if (mem_cgroup_same_or_subtree(root_mem, stock->cached))
> > 
> I welcome this new helper function, but it can be used in
> memcg_oom_wake_function() and mem_cgroup_under_move() too, can't it ?

Sure. Incremental patch (I will fold it into the one above):
--- 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dbb9d6..64569c7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1416,10 +1416,9 @@ static bool mem_cgroup_under_move(struct mem_cgroup *mem)
 	to = mc.to;
 	if (!from)
 		goto unlock;
-	if (from == mem || to == mem
-	    || (mem->use_hierarchy && css_is_ancestor(&from->css, &mem->css))
-	    || (mem->use_hierarchy && css_is_ancestor(&to->css,	&mem->css)))
-		ret = true;
+
+	ret = mem_cgroup_same_or_subtree(mem, from)
+		|| mem_cgroup_same_or_subtree(mem, to);
 unlock:
 	spin_unlock(&mc.lock);
 	return ret;
@@ -1906,25 +1905,20 @@ struct oom_wait_info {
 static int memcg_oom_wake_function(wait_queue_t *wait,
 	unsigned mode, int sync, void *arg)
 {
-	struct mem_cgroup *wake_mem = (struct mem_cgroup *)arg;
+	struct mem_cgroup *wake_mem = (struct mem_cgroup *)arg,
+			  *oom_wait_mem;
 	struct oom_wait_info *oom_wait_info;
 
 	oom_wait_info = container_of(wait, struct oom_wait_info, wait);
+	oom_wait_mem = oom_wait_info->mem;
 
-	if (oom_wait_info->mem == wake_mem)
-		goto wakeup;
-	/* if no hierarchy, no match */
-	if (!oom_wait_info->mem->use_hierarchy || !wake_mem->use_hierarchy)
-		return 0;
 	/*
 	 * Both of oom_wait_info->mem and wake_mem are stable under us.
 	 * Then we can use css_is_ancestor without taking care of RCU.
 	 */
-	if (!css_is_ancestor(&oom_wait_info->mem->css, &wake_mem->css) &&
-	    !css_is_ancestor(&wake_mem->css, &oom_wait_info->mem->css))
+	if (!mem_cgroup_same_or_subtree(oom_wait_mem, wake_mem)
+			&& !mem_cgroup_same_or_subtree(wake_mem, oom_wait_mem))
 		return 0;
-
-wakeup:
 	return autoremove_wake_function(wait, mode, sync, arg);
 }
 
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
  2011-07-21 23:56         ` KAMEZAWA Hiroyuki
@ 2011-07-22  9:54           ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 08:56:52, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 14:30:12 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 19:54:11, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 10:28:10 +0200
[...]
> > > Assume 2 cpu SMP, (a special case), and 2 applications running under
> > > a memcg.
> > > 
> > >  - one is running in SCHED_FIFO.
> > >  - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().
> > > 
> > > Then, the application stops until SCHED_FIFO application release the cpu.
> > 
> > It would have to back off during reclaim anyaway (because we check
> > cond_resched during reclaim), right? 
> > 
> 
> just have cond_resched() on a cpu which calls some reclaim stuff. It will no help.

I do not understand what you are saying here. What I meant to say is
that the above example is not a big issue because SCHED_FIFO would throw
us away from the CPU during reclaim anyway so waiting for other CPUs
during draining will not too much overhead, although it definitely adds
some.

> > > In general, I don't think waiting for schedule_work() against multiple cpus
> > > is not quicker than short memory reclaim. 
> > 
> > You are right, but if you consider small groups then the reclaim can
> > make the situation much worse.
> > 
> 
> If the system has many memory and the container has many cgroup, memory is not
> small because ...to use cpu properly, you need memroy. It's a mis-configuration.

I don't think so. You might have small, well suited groups for a
specific workloads.

> > > Adding flush_work() here means that a context switch is requred before
> > > calling direct reclaim.
> > 
> > Is that really a problem? We would context switch during reclaim if
> > there is something else that wants CPU anyway.
> > Maybe we could drain only if we get a reasonable number of pages back?
> > This would require two passes over per-cpu caches to find the number -
> > not nice. Or we could drain only those caches that have at least some
> > threshold of pages.
> > 
> > > That's bad. (At leaset, please check __GFP_NOWAIT.)
> > 
> > Definitely a good idea. Fixed.
> > 
> > > Please find another way, I think calling synchronous drain here is overkill.
> > > There are not important file caches in the most case and reclaim is quick.
> > 
> > This is, however, really hard to know in advance. If there are used-once
> > unmaped file pages then it is much easier to reclaim them for sure.
> > Maybe I could check the statistics and decide whether to drain according
> > pages we have in the group. Let me think about that.
> > 
> > > (And async draining runs.)
> > > 
> > > How about automatically adjusting CHARGE_BATCH and make it small when the
> > > system is near to limit ? 
> > 
> > Hmm, we are already bypassing batching if we are close to the limit,
> > aren't we? If we get to the reclaim we fallback to nr_pages allocation
> > and so we do not refill the stock.
> > Maybe we could check how much we have reclaimed and update the batch
> > size accordingly.
> > 
> 
> Please wait until "background reclaim" stuff. I don't stop it and it will
> make this cpu-caching stuff better because we can drain before hitting
> limit.

As I said I haven't seen this hurting us so this can definitely wait.
I will drop the patch for now and keep just the clean up stuff. I will
repost it when I have some numbers in hands or if I am able to
workaround the current issues with too much waiting problem.

> 
> If you cannot wait....
> 
> One idea is to have a threshold to call async "drain". For example,
> 
>  threshould = limit_of_memory - nr_online_cpu() * (BATCH_SIZE + 1)
> 
>  if (usage > threshould)
> 	drain_all_stock_async().
> 
> Then, situation will be much better.

Will think about it. I am not sure whether this is too rough.

> Thanks,
> -Kame

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges
@ 2011-07-22  9:54           ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 08:56:52, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Jul 2011 14:30:12 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 21-07-11 19:54:11, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 10:28:10 +0200
[...]
> > > Assume 2 cpu SMP, (a special case), and 2 applications running under
> > > a memcg.
> > > 
> > >  - one is running in SCHED_FIFO.
> > >  - another is running into mem_cgroup_do_charge() and call drain_all_stock_sync().
> > > 
> > > Then, the application stops until SCHED_FIFO application release the cpu.
> > 
> > It would have to back off during reclaim anyaway (because we check
> > cond_resched during reclaim), right? 
> > 
> 
> just have cond_resched() on a cpu which calls some reclaim stuff. It will no help.

I do not understand what you are saying here. What I meant to say is
that the above example is not a big issue because SCHED_FIFO would throw
us away from the CPU during reclaim anyway so waiting for other CPUs
during draining will not too much overhead, although it definitely adds
some.

> > > In general, I don't think waiting for schedule_work() against multiple cpus
> > > is not quicker than short memory reclaim. 
> > 
> > You are right, but if you consider small groups then the reclaim can
> > make the situation much worse.
> > 
> 
> If the system has many memory and the container has many cgroup, memory is not
> small because ...to use cpu properly, you need memroy. It's a mis-configuration.

I don't think so. You might have small, well suited groups for a
specific workloads.

> > > Adding flush_work() here means that a context switch is requred before
> > > calling direct reclaim.
> > 
> > Is that really a problem? We would context switch during reclaim if
> > there is something else that wants CPU anyway.
> > Maybe we could drain only if we get a reasonable number of pages back?
> > This would require two passes over per-cpu caches to find the number -
> > not nice. Or we could drain only those caches that have at least some
> > threshold of pages.
> > 
> > > That's bad. (At leaset, please check __GFP_NOWAIT.)
> > 
> > Definitely a good idea. Fixed.
> > 
> > > Please find another way, I think calling synchronous drain here is overkill.
> > > There are not important file caches in the most case and reclaim is quick.
> > 
> > This is, however, really hard to know in advance. If there are used-once
> > unmaped file pages then it is much easier to reclaim them for sure.
> > Maybe I could check the statistics and decide whether to drain according
> > pages we have in the group. Let me think about that.
> > 
> > > (And async draining runs.)
> > > 
> > > How about automatically adjusting CHARGE_BATCH and make it small when the
> > > system is near to limit ? 
> > 
> > Hmm, we are already bypassing batching if we are close to the limit,
> > aren't we? If we get to the reclaim we fallback to nr_pages allocation
> > and so we do not refill the stock.
> > Maybe we could check how much we have reclaimed and update the batch
> > size accordingly.
> > 
> 
> Please wait until "background reclaim" stuff. I don't stop it and it will
> make this cpu-caching stuff better because we can drain before hitting
> limit.

As I said I haven't seen this hurting us so this can definitely wait.
I will drop the patch for now and keep just the clean up stuff. I will
repost it when I have some numbers in hands or if I am able to
workaround the current issues with too much waiting problem.

> 
> If you cannot wait....
> 
> One idea is to have a threshold to call async "drain". For example,
> 
>  threshould = limit_of_memory - nr_online_cpu() * (BATCH_SIZE + 1)
> 
>  if (usage > threshould)
> 	drain_all_stock_async().
> 
> Then, situation will be much better.

Will think about it. I am not sure whether this is too rough.

> Thanks,
> -Kame

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-22  9:28             ` KAMEZAWA Hiroyuki
@ 2011-07-22  9:58               ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote:
> On Fri, 22 Jul 2011 11:19:36 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 13:36:06 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > > > > On Thu, 21 Jul 2011 09:38:00 +0200
> > > > > Michal Hocko <mhocko@suse.cz> wrote:
> > > > > 
> > > > > > drain_all_stock_async tries to optimize a work to be done on the work
> > > > > > queue by excluding any work for the current CPU because it assumes that
> > > > > > the context we are called from already tried to charge from that cache
> > > > > > and it's failed so it must be empty already.
> > > > > > While the assumption is correct we can do it by checking the current
> > > > > > number of pages in the cache. This will also reduce a work on other CPUs
> > > > > > with an empty stock.
> > > > > > 
> > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > > > 
> > > > > 
> > > > > At the first look, when a charge against TransParentHugepage() goes
> > > > > into the reclaim routine, stock->nr_pages != 0 and this will
> > > > > call additional kworker.
> > > > 
> > > > True. We will drain a charge which could be used by other allocations
> > > > in the meantime so we have a good chance to reclaim less. But how big
> > > > problem is that?
> > > > I mean I can add a new parameter that would force checking the current
> > > > cpu but it doesn't look nice. I cannot add that condition
> > > > unconditionally because the code will be shared with the sync path in
> > > > the next patch and that one needs to drain _all_ cpus.
> > > > 
> > > > What would you suggest?
> > > By 2 methods
> > > 
> > >  - just check nr_pages. 
> > 
> > Not sure I understand which nr_pages you mean. The one that comes from
> > the charging path or stock->nr_pages?
> > If you mean the first one then we do not have in the reclaim path where
> > we call drain_all_stock_async.
> > 
> 
> stock->nr_pages.
> 
> > >  - drain "local stock" without calling schedule_work(). It's fast.
> > 
> > but there is nothing to be drained locally in the paths where we call
> > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
> > use work queue at all?
> 
> I mean calling schedule_work against local cpu is just waste of time.
> Then, drain it directly and move local cpu's stock->nr_pages to res_counter.

got it. Thanks for clarification. Will repost the updated version.
 
> Thanks,
> -Kame

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-22  9:58               ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22  9:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote:
> On Fri, 22 Jul 2011 11:19:36 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 21 Jul 2011 13:36:06 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote:
> > > > > On Thu, 21 Jul 2011 09:38:00 +0200
> > > > > Michal Hocko <mhocko@suse.cz> wrote:
> > > > > 
> > > > > > drain_all_stock_async tries to optimize a work to be done on the work
> > > > > > queue by excluding any work for the current CPU because it assumes that
> > > > > > the context we are called from already tried to charge from that cache
> > > > > > and it's failed so it must be empty already.
> > > > > > While the assumption is correct we can do it by checking the current
> > > > > > number of pages in the cache. This will also reduce a work on other CPUs
> > > > > > with an empty stock.
> > > > > > 
> > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > > > 
> > > > > 
> > > > > At the first look, when a charge against TransParentHugepage() goes
> > > > > into the reclaim routine, stock->nr_pages != 0 and this will
> > > > > call additional kworker.
> > > > 
> > > > True. We will drain a charge which could be used by other allocations
> > > > in the meantime so we have a good chance to reclaim less. But how big
> > > > problem is that?
> > > > I mean I can add a new parameter that would force checking the current
> > > > cpu but it doesn't look nice. I cannot add that condition
> > > > unconditionally because the code will be shared with the sync path in
> > > > the next patch and that one needs to drain _all_ cpus.
> > > > 
> > > > What would you suggest?
> > > By 2 methods
> > > 
> > >  - just check nr_pages. 
> > 
> > Not sure I understand which nr_pages you mean. The one that comes from
> > the charging path or stock->nr_pages?
> > If you mean the first one then we do not have in the reclaim path where
> > we call drain_all_stock_async.
> > 
> 
> stock->nr_pages.
> 
> > >  - drain "local stock" without calling schedule_work(). It's fast.
> > 
> > but there is nothing to be drained locally in the paths where we call
> > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
> > use work queue at all?
> 
> I mean calling schedule_work against local cpu is just waste of time.
> Then, drain it directly and move local cpu's stock->nr_pages to res_counter.

got it. Thanks for clarification. Will repost the updated version.
 
> Thanks,
> -Kame

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
  2011-07-22  9:58               ` Michal Hocko
@ 2011-07-22 10:23                 ` Michal Hocko
  -1 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22 10:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 11:58:15, Michal Hocko wrote:
> On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote:
> > On Fri, 22 Jul 2011 11:19:36 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 13:36:06 +0200
> > > > By 2 methods
> > > > 
> > > >  - just check nr_pages. 
> > > 
> > > Not sure I understand which nr_pages you mean. The one that comes from
> > > the charging path or stock->nr_pages?
> > > If you mean the first one then we do not have in the reclaim path where
> > > we call drain_all_stock_async.
> > > 
> > 
> > stock->nr_pages.
> > 
> > > >  - drain "local stock" without calling schedule_work(). It's fast.
> > > 
> > > but there is nothing to be drained locally in the paths where we call
> > > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
> > > use work queue at all?
> > 
> > I mean calling schedule_work against local cpu is just waste of time.
> > Then, drain it directly and move local cpu's stock->nr_pages to res_counter.
> 
> got it. Thanks for clarification. Will repost the updated version.
---
>From 2f17df54db6661c39a05669d08a9e6257435b898 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Thu, 21 Jul 2011 09:38:00 +0200
Subject: [PATCH] memcg: do not try to drain per-cpu caches without pages

drain_all_stock_async tries to optimize a work to be done on the work
queue by excluding any work for the current CPU because it assumes that
the context we are called from already tried to charge from that cache
and it's failed so it must be empty already.
While the assumption is correct we can optimize it even more by checking
the current number of pages in the cache. This will also reduce a work
on other CPUs with an empty stock.
For the current CPU we can simply call drain_local_stock rather than
deferring it to the work queue.

[KAMEZAWA Hiroyuki - use drain_local_stock for current CPU optimization]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f11f198..c012ffe 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2159,11 +2159,8 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem)
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *mem;
 
-		if (cpu == curcpu)
-			continue;
-
 		mem = stock->cached;
-		if (!mem)
+		if (!mem || !stock->nr_pages)
 			continue;
 		if (mem != root_mem) {
 			if (!root_mem->use_hierarchy)
@@ -2172,8 +2169,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem)
 			if (!css_is_ancestor(&mem->css, &root_mem->css))
 				continue;
 		}
-		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
-			schedule_work_on(cpu, &stock->work);
+		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
+			if (cpu == curcpu)
+				drain_local_stock(&stock->work);
+			else
+				schedule_work_on(cpu, &stock->work);
+		}
 	}
  	put_online_cpus();
 	mutex_unlock(&percpu_charge_mutex);
-- 
1.7.5.4

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages
@ 2011-07-22 10:23                 ` Michal Hocko
  0 siblings, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2011-07-22 10:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel

On Fri 22-07-11 11:58:15, Michal Hocko wrote:
> On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote:
> > On Fri, 22 Jul 2011 11:19:36 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 21 Jul 2011 13:36:06 +0200
> > > > By 2 methods
> > > > 
> > > >  - just check nr_pages. 
> > > 
> > > Not sure I understand which nr_pages you mean. The one that comes from
> > > the charging path or stock->nr_pages?
> > > If you mean the first one then we do not have in the reclaim path where
> > > we call drain_all_stock_async.
> > > 
> > 
> > stock->nr_pages.
> > 
> > > >  - drain "local stock" without calling schedule_work(). It's fast.
> > > 
> > > but there is nothing to be drained locally in the paths where we call
> > > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't
> > > use work queue at all?
> > 
> > I mean calling schedule_work against local cpu is just waste of time.
> > Then, drain it directly and move local cpu's stock->nr_pages to res_counter.
> 
> got it. Thanks for clarification. Will repost the updated version.
---

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2011-07-22 10:23 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-21  9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko
2011-07-21  7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
2011-07-21 10:12   ` KAMEZAWA Hiroyuki
2011-07-21 10:12     ` KAMEZAWA Hiroyuki
2011-07-21 11:36     ` Michal Hocko
2011-07-21 11:36       ` Michal Hocko
2011-07-21 23:44       ` KAMEZAWA Hiroyuki
2011-07-21 23:44         ` KAMEZAWA Hiroyuki
2011-07-22  9:19         ` Michal Hocko
2011-07-22  9:19           ` Michal Hocko
2011-07-22  9:28           ` KAMEZAWA Hiroyuki
2011-07-22  9:28             ` KAMEZAWA Hiroyuki
2011-07-22  9:58             ` Michal Hocko
2011-07-22  9:58               ` Michal Hocko
2011-07-22 10:23               ` Michal Hocko
2011-07-22 10:23                 ` Michal Hocko
2011-07-21  7:50 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko
2011-07-21 10:25   ` KAMEZAWA Hiroyuki
2011-07-21 10:25     ` KAMEZAWA Hiroyuki
2011-07-21 11:36     ` Michal Hocko
2011-07-21 11:36       ` Michal Hocko
2011-07-21  7:58 ` [PATCH 3/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko
2011-07-21 10:30   ` KAMEZAWA Hiroyuki
2011-07-21 10:30     ` KAMEZAWA Hiroyuki
2011-07-21 11:47     ` Michal Hocko
2011-07-21 11:47       ` Michal Hocko
2011-07-21 12:42       ` Michal Hocko
2011-07-21 12:42         ` Michal Hocko
2011-07-21 23:49         ` KAMEZAWA Hiroyuki
2011-07-21 23:49           ` KAMEZAWA Hiroyuki
2011-07-22  9:21           ` Michal Hocko
2011-07-22  9:21             ` Michal Hocko
2011-07-22  0:27         ` Daisuke Nishimura
2011-07-22  0:27           ` Daisuke Nishimura
2011-07-22  9:41           ` Michal Hocko
2011-07-22  9:41             ` Michal Hocko
2011-07-21  8:28 ` [PATCH 4/4] memcg: prevent from reclaiming if there are per-cpu cached charges Michal Hocko
2011-07-21 10:54   ` KAMEZAWA Hiroyuki
2011-07-21 10:54     ` KAMEZAWA Hiroyuki
2011-07-21 12:30     ` Michal Hocko
2011-07-21 12:30       ` Michal Hocko
2011-07-21 23:56       ` KAMEZAWA Hiroyuki
2011-07-21 23:56         ` KAMEZAWA Hiroyuki
2011-07-22  0:18         ` KAMEZAWA Hiroyuki
2011-07-22  0:18           ` KAMEZAWA Hiroyuki
2011-07-22  9:54         ` Michal Hocko
2011-07-22  9:54           ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.