All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] memcg: add res_counter_usage_safe()
@ 2012-07-04  2:56 Kamezawa Hiroyuki
  2012-07-04  2:58 ` [PATCH v2 2/2] memcg: remove -ENOMEM at page migration Kamezawa Hiroyuki
  2012-07-04  9:14 ` [PATCH v2 1/2] memcg: add res_counter_usage_safe() Johannes Weiner
  0 siblings, 2 replies; 10+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-04  2:56 UTC (permalink / raw)
  To: linux-mm
  Cc: David Rientjes, Andrew Morton, Michal Hocko, Tejun Heo, Johannes Weiner

I think usage > limit means a sign of BUG. But, sometimes,
res_counter_charge_nofail() is very convenient. tcp_memcg uses it.
And I'd like to use it for helping page migration.

This patch adds res_counter_usage_safe() which returns min(usage,limit).
By this we can use res_counter_charge_nofail() without breaking
user experience.

Changelog:
 - read res_counter directrly under lock.
 - fixed comment.

Acked-by: Glauber Costa <glommer@parallels.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    2 ++
 kernel/res_counter.c        |   18 ++++++++++++++++++
 net/ipv4/tcp_memcontrol.c   |    2 +-
 3 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index 7d7fbe2..a6f8cc5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -226,4 +226,6 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+u64 res_counter_usage_safe(struct res_counter *cnt);
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index ad581aa..f0507cd 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -171,6 +171,24 @@ u64 res_counter_read_u64(struct res_counter *counter, int member)
 }
 #endif
 
+/*
+ * Returns usage. If usage > limit, limit is returned.
+ * This is useful not to break user experiance if the excess
+ * is temporary.
+ */
+u64 res_counter_usage_safe(struct res_counter *counter)
+{
+	unsigned long flags;
+	u64 usage, limit;
+
+	spin_lock_irqsave(&counter->lock, flags);
+	limit = counter->limit;
+	usage = counter->usage;
+	spin_unlock_irqrestore(&counter->lock, flags);
+
+	return min(usage, limit);
+}
+
 int res_counter_memparse_write_strategy(const char *buf,
 					unsigned long long *res)
 {
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index b6f3583..a73dce6 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -180,7 +180,7 @@ static u64 tcp_read_usage(struct mem_cgroup *memcg)
 		return atomic_long_read(&tcp_memory_allocated) << PAGE_SHIFT;
 
 	tcp = tcp_from_cgproto(cg_proto);
-	return res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
+	return res_counter_usage_safe(&tcp->tcp_memory_allocated);
 }
 
 static u64 tcp_cgroup_read(struct cgroup *cont, struct cftype *cft)
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] memcg: remove -ENOMEM at page migration.
  2012-07-04  2:56 [PATCH v2 1/2] memcg: add res_counter_usage_safe() Kamezawa Hiroyuki
@ 2012-07-04  2:58 ` Kamezawa Hiroyuki
  2012-07-04  8:30   ` Johannes Weiner
  2012-07-04  9:14 ` [PATCH v2 1/2] memcg: add res_counter_usage_safe() Johannes Weiner
  1 sibling, 1 reply; 10+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-04  2:58 UTC (permalink / raw)
  To: linux-mm
  Cc: David Rientjes, Andrew Morton, Michal Hocko, Tejun Heo, Johannes Weiner



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] memcg: remove -ENOMEM at page migration.
  2012-07-04  2:58 ` [PATCH v2 2/2] memcg: remove -ENOMEM at page migration Kamezawa Hiroyuki
@ 2012-07-04  8:30   ` Johannes Weiner
  2012-07-04  8:39     ` Kamezawa Hiroyuki
  2012-07-04 12:04     ` Michal Hocko
  0 siblings, 2 replies; 10+ messages in thread
From: Johannes Weiner @ 2012-07-04  8:30 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: linux-mm, David Rientjes, Andrew Morton, Michal Hocko, Tejun Heo

On Wed, Jul 04, 2012 at 11:58:22AM +0900, Kamezawa Hiroyuki wrote:
> >From 257a1e6603aab8c6a3bd25648872a11e8b85ef64 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 28 Jun 2012 19:07:24 +0900
> Subject: [PATCH 2/2] 
> 
> For handling many kinds of races, memcg adds an extra charge to
> page's memcg at page migration. But this affects the page compaction
> and make it fail if the memcg is under OOM.
> 
> This patch uses res_counter_charge_nofail() in page migration path
> and remove -ENOMEM. By this, page migration will not fail by the
> status of memcg.
> 
> Even though res_counter_charge_nofail can silently go over the memcg
> limit mem_cgroup_usage compensates that and it doesn't tell the real truth
> to the userspace.
> 
> Excessive charges are only temporal and done on a single page per-CPU in
> the worst case. This sounds tolerable and actually consumes less charges
> than the current per-cpu memcg_stock.

But it still means we end up going into reclaim on charges, limit
resizing etc. which we wouldn't without a bunch of pages under
migration.

Can we instead not charge the new page, just commit it while holding
on to a css refcount, and have end_migration call a version of
__mem_cgroup_uncharge_common() that updates the stats but leaves the
res counters alone?

oldpage will not get uncharged because of the page lock and
PageCgroupMigration, so the charge is stable during migration.

Patch below

> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3168,6 +3168,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
>  {
>  	struct mem_cgroup *memcg = NULL;
> +	struct res_counter *dummy;
>  	struct page_cgroup *pc;
>  	enum charge_type ctype;
>  	int ret = 0;
> @@ -3222,29 +3223,16 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	 */
>  	if (!memcg)
>  		return 0;
> -
> -	*memcgp = memcg;
> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
> -	css_put(&memcg->css);/* drop extra refcnt */

css_get() now unbalanced?

---

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 83e7ba9..17a09e8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -100,7 +100,7 @@ int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
 
 extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
 
-extern int
+extern void
 mem_cgroup_prepare_migration(struct page *page,
 	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask);
 extern void mem_cgroup_end_migration(struct mem_cgroup *memcg,
@@ -279,11 +279,10 @@ static inline struct cgroup_subsys_state
 	return NULL;
 }
 
-static inline int
+static inline void
 mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
 	struct mem_cgroup **memcgp, gfp_t gfp_mask)
 {
-	return 0;
 }
 
 static inline void mem_cgroup_end_migration(struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f72b5e5..c5161f0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2911,7 +2911,8 @@ static void mem_cgroup_do_uncharge(struct mem_cgroup *memcg,
  * uncharge if !page_mapped(page)
  */
 static struct mem_cgroup *
-__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
+__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype,
+			     bool end_migration)
 {
 	struct mem_cgroup *memcg = NULL;
 	unsigned int nr_pages = 1;
@@ -2955,7 +2956,10 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		/* fallthrough */
 	case MEM_CGROUP_CHARGE_TYPE_DROP:
 		/* See mem_cgroup_prepare_migration() */
-		if (page_mapped(page) || PageCgroupMigration(pc))
+		if (page_mapped(page))
+			goto unlock_out;
+		if (page_mapped(page) || (!end_migration &&
+					  PageCgroupMigration(pc)))
 			goto unlock_out;
 		break;
 	case MEM_CGROUP_CHARGE_TYPE_SWAPOUT:
@@ -2989,7 +2993,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		mem_cgroup_swap_statistics(memcg, true);
 		mem_cgroup_get(memcg);
 	}
-	if (!mem_cgroup_is_root(memcg))
+	if (!end_migration && !mem_cgroup_is_root(memcg))
 		mem_cgroup_do_uncharge(memcg, nr_pages, ctype);
 
 	return memcg;
@@ -3005,14 +3009,14 @@ void mem_cgroup_uncharge_page(struct page *page)
 	if (page_mapped(page))
 		return;
 	VM_BUG_ON(page->mapping && !PageAnon(page));
-	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
+	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED, false);
 }
 
 void mem_cgroup_uncharge_cache_page(struct page *page)
 {
 	VM_BUG_ON(page_mapped(page));
 	VM_BUG_ON(page->mapping);
-	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
+	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE, false);
 }
 
 /*
@@ -3076,7 +3080,7 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
 	if (!swapout) /* this was a swap cache but the swap is unused ! */
 		ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
 
-	memcg = __mem_cgroup_uncharge_common(page, ctype);
+	memcg = __mem_cgroup_uncharge_common(page, ctype, false);
 
 	/*
 	 * record memcg information,  if swapout && memcg != NULL,
@@ -3166,19 +3170,18 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
  */
-int mem_cgroup_prepare_migration(struct page *page,
+void mem_cgroup_prepare_migration(struct page *page,
 	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
 {
 	struct mem_cgroup *memcg = NULL;
 	struct page_cgroup *pc;
 	enum charge_type ctype;
-	int ret = 0;
 
 	*memcgp = NULL;
 
 	VM_BUG_ON(PageTransHuge(page));
 	if (mem_cgroup_disabled())
-		return 0;
+		return;
 
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
@@ -3223,24 +3226,9 @@ int mem_cgroup_prepare_migration(struct page *page,
 	 * we return here.
 	 */
 	if (!memcg)
-		return 0;
+		return;
 
 	*memcgp = memcg;
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
-	css_put(&memcg->css);/* drop extra refcnt */
-	if (ret) {
-		if (PageAnon(page)) {
-			lock_page_cgroup(pc);
-			ClearPageCgroupMigration(pc);
-			unlock_page_cgroup(pc);
-			/*
-			 * The old page may be fully unmapped while we kept it.
-			 */
-			mem_cgroup_uncharge_page(page);
-		}
-		/* we'll need to revisit this error code (we have -EINTR) */
-		return -ENOMEM;
-	}
 	/*
 	 * We charge new page before it's used/mapped. So, even if unlock_page()
 	 * is called before end_migration, we can catch all events on this new
@@ -3254,7 +3242,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 	else
 		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
 	__mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);
-	return ret;
+	return;
 }
 
 /* remove redundant charge if migration failed*/
@@ -3276,6 +3264,14 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
 		used = newpage;
 		unused = oldpage;
 	}
+
+	anon = PageAnon(used);
+	__mem_cgroup_uncharge_common(unused,
+		anon ? MEM_CGROUP_CHARGE_TYPE_MAPPED
+		     : MEM_CGROUP_CHARGE_TYPE_CACHE,
+		true);
+	css_put(&memcg->css);
+
 	/*
 	 * We disallowed uncharge of pages under migration because mapcount
 	 * of the page goes down to zero, temporarly.
@@ -3285,10 +3281,6 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
 	lock_page_cgroup(pc);
 	ClearPageCgroupMigration(pc);
 	unlock_page_cgroup(pc);
-	anon = PageAnon(used);
-	__mem_cgroup_uncharge_common(unused,
-		anon ? MEM_CGROUP_CHARGE_TYPE_MAPPED
-		     : MEM_CGROUP_CHARGE_TYPE_CACHE);
 
 	/*
 	 * If a page is a file cache, radix-tree replacement is very atomic
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..2db060b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -682,7 +682,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 {
 	int rc = -EAGAIN;
 	int remap_swapcache = 1;
-	int charge = 0;
 	struct mem_cgroup *mem;
 	struct anon_vma *anon_vma = NULL;
 
@@ -724,12 +723,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	}
 
 	/* charge against new page */
-	charge = mem_cgroup_prepare_migration(page, newpage, &mem, GFP_KERNEL);
-	if (charge == -ENOMEM) {
-		rc = -ENOMEM;
-		goto unlock;
-	}
-	BUG_ON(charge);
+	mem_cgroup_prepare_migration(page, newpage, &mem, GFP_KERNEL);
 
 	if (PageWriteback(page)) {
 		/*
@@ -819,8 +813,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		put_anon_vma(anon_vma);
 
 uncharge:
-	if (!charge)
-		mem_cgroup_end_migration(mem, page, newpage, rc == 0);
+	mem_cgroup_end_migration(mem, page, newpage, rc == 0);
 unlock:
 	unlock_page(page);
 out:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] memcg: remove -ENOMEM at page migration.
  2012-07-04  8:30   ` Johannes Weiner
@ 2012-07-04  8:39     ` Kamezawa Hiroyuki
  2012-07-04 12:04     ` Michal Hocko
  1 sibling, 0 replies; 10+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-04  8:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, David Rientjes, Andrew Morton, Michal Hocko, Tejun Heo

(2012/07/04 17:30), Johannes Weiner wrote:
> On Wed, Jul 04, 2012 at 11:58:22AM +0900, Kamezawa Hiroyuki wrote:
>> >From 257a1e6603aab8c6a3bd25648872a11e8b85ef64 Mon Sep 17 00:00:00 2001
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Date: Thu, 28 Jun 2012 19:07:24 +0900
>> Subject: [PATCH 2/2]
>>
>> For handling many kinds of races, memcg adds an extra charge to
>> page's memcg at page migration. But this affects the page compaction
>> and make it fail if the memcg is under OOM.
>>
>> This patch uses res_counter_charge_nofail() in page migration path
>> and remove -ENOMEM. By this, page migration will not fail by the
>> status of memcg.
>>
>> Even though res_counter_charge_nofail can silently go over the memcg
>> limit mem_cgroup_usage compensates that and it doesn't tell the real truth
>> to the userspace.
>>
>> Excessive charges are only temporal and done on a single page per-CPU in
>> the worst case. This sounds tolerable and actually consumes less charges
>> than the current per-cpu memcg_stock.
>
> But it still means we end up going into reclaim on charges, limit
> resizing etc. which we wouldn't without a bunch of pages under
> migration.
>
> Can we instead not charge the new page, just commit it while holding
> on to a css refcount, and have end_migration call a version of
> __mem_cgroup_uncharge_common() that updates the stats but leaves the
> res counters alone?
>
> oldpage will not get uncharged because of the page lock and
> PageCgroupMigration, so the charge is stable during migration.
>
> Patch below
>

Hm, your idea is better.

>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3168,6 +3168,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>>   	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
>>   {
>>   	struct mem_cgroup *memcg = NULL;
>> +	struct res_counter *dummy;
>>   	struct page_cgroup *pc;
>>   	enum charge_type ctype;
>>   	int ret = 0;
>> @@ -3222,29 +3223,16 @@ int mem_cgroup_prepare_migration(struct page *page,
>>   	 */
>>   	if (!memcg)
>>   		return 0;
>> -
>> -	*memcgp = memcg;
>> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
>> -	css_put(&memcg->css);/* drop extra refcnt */
>
> css_get() now unbalanced?
>
Ah, yes. I needed to drop it.




> ---
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 83e7ba9..17a09e8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -100,7 +100,7 @@ int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
>
>   extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
>
> -extern int
> +extern void
>   mem_cgroup_prepare_migration(struct page *page,
>   	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask);
>   extern void mem_cgroup_end_migration(struct mem_cgroup *memcg,
> @@ -279,11 +279,10 @@ static inline struct cgroup_subsys_state
>   	return NULL;
>   }
>
> -static inline int
> +static inline void
>   mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
>   	struct mem_cgroup **memcgp, gfp_t gfp_mask)
>   {
> -	return 0;
>   }
>
>   static inline void mem_cgroup_end_migration(struct mem_cgroup *memcg,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f72b5e5..c5161f0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2911,7 +2911,8 @@ static void mem_cgroup_do_uncharge(struct mem_cgroup *memcg,
>    * uncharge if !page_mapped(page)
>    */
>   static struct mem_cgroup *
> -__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> +__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype,
> +			     bool end_migration)
>   {
>   	struct mem_cgroup *memcg = NULL;
>   	unsigned int nr_pages = 1;
> @@ -2955,7 +2956,10 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>   		/* fallthrough */
>   	case MEM_CGROUP_CHARGE_TYPE_DROP:
>   		/* See mem_cgroup_prepare_migration() */
> -		if (page_mapped(page) || PageCgroupMigration(pc))
> +		if (page_mapped(page))
> +			goto unlock_out;
> +		if (page_mapped(page) || (!end_migration &&
> +					  PageCgroupMigration(pc)))
>   			goto unlock_out;
>   		break;
>   	case MEM_CGROUP_CHARGE_TYPE_SWAPOUT:
> @@ -2989,7 +2993,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>   		mem_cgroup_swap_statistics(memcg, true);
>   		mem_cgroup_get(memcg);
>   	}
> -	if (!mem_cgroup_is_root(memcg))
> +	if (!end_migration && !mem_cgroup_is_root(memcg))
>   		mem_cgroup_do_uncharge(memcg, nr_pages, ctype);
>
>   	return memcg;
> @@ -3005,14 +3009,14 @@ void mem_cgroup_uncharge_page(struct page *page)
>   	if (page_mapped(page))
>   		return;
>   	VM_BUG_ON(page->mapping && !PageAnon(page));
> -	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
> +	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED, false);
>   }
>
>   void mem_cgroup_uncharge_cache_page(struct page *page)
>   {
>   	VM_BUG_ON(page_mapped(page));
>   	VM_BUG_ON(page->mapping);
> -	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
> +	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE, false);
>   }
>
>   /*
> @@ -3076,7 +3080,7 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
>   	if (!swapout) /* this was a swap cache but the swap is unused ! */
>   		ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
>
> -	memcg = __mem_cgroup_uncharge_common(page, ctype);
> +	memcg = __mem_cgroup_uncharge_common(page, ctype, false);
>
>   	/*
>   	 * record memcg information,  if swapout && memcg != NULL,
> @@ -3166,19 +3170,18 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>    * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
>    * page belongs to.
>    */
> -int mem_cgroup_prepare_migration(struct page *page,
> +void mem_cgroup_prepare_migration(struct page *page,
>   	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
>   {
>   	struct mem_cgroup *memcg = NULL;
>   	struct page_cgroup *pc;
>   	enum charge_type ctype;
> -	int ret = 0;
>
>   	*memcgp = NULL;
>
>   	VM_BUG_ON(PageTransHuge(page));
>   	if (mem_cgroup_disabled())
> -		return 0;
> +		return;
>
>   	pc = lookup_page_cgroup(page);
>   	lock_page_cgroup(pc);
> @@ -3223,24 +3226,9 @@ int mem_cgroup_prepare_migration(struct page *page,
>   	 * we return here.
>   	 */
>   	if (!memcg)
> -		return 0;
> +		return;
>
>   	*memcgp = memcg;
> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
> -	css_put(&memcg->css);/* drop extra refcnt */
> -	if (ret) {
> -		if (PageAnon(page)) {
> -			lock_page_cgroup(pc);
> -			ClearPageCgroupMigration(pc);
> -			unlock_page_cgroup(pc);
> -			/*
> -			 * The old page may be fully unmapped while we kept it.
> -			 */
> -			mem_cgroup_uncharge_page(page);
> -		}
> -		/* we'll need to revisit this error code (we have -EINTR) */
> -		return -ENOMEM;
> -	}
>   	/*
>   	 * We charge new page before it's used/mapped. So, even if unlock_page()
>   	 * is called before end_migration, we can catch all events on this new
> @@ -3254,7 +3242,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>   	else
>   		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
>   	__mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);
> -	return ret;
> +	return;
>   }
>
>   /* remove redundant charge if migration failed*/
> @@ -3276,6 +3264,14 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
>   		used = newpage;
>   		unused = oldpage;
>   	}
> +
> +	anon = PageAnon(used);
> +	__mem_cgroup_uncharge_common(unused,
> +		anon ? MEM_CGROUP_CHARGE_TYPE_MAPPED
> +		     : MEM_CGROUP_CHARGE_TYPE_CACHE,
> +		true);
> +	css_put(&memcg->css);
> +
>   	/*
>   	 * We disallowed uncharge of pages under migration because mapcount
>   	 * of the page goes down to zero, temporarly.
> @@ -3285,10 +3281,6 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
>   	lock_page_cgroup(pc);
>   	ClearPageCgroupMigration(pc);
>   	unlock_page_cgroup(pc);
> -	anon = PageAnon(used);
> -	__mem_cgroup_uncharge_common(unused,
> -		anon ? MEM_CGROUP_CHARGE_TYPE_MAPPED
> -		     : MEM_CGROUP_CHARGE_TYPE_CACHE);
>
>   	/*

Ah, ok. them, doesn't clear Migration flag before uncharge() is called.

I think yours is better. Could you post a patch with description ?
I'll drop this. BTW, how do you think about the patch 1/2 ?

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] memcg: add res_counter_usage_safe()
  2012-07-04  2:56 [PATCH v2 1/2] memcg: add res_counter_usage_safe() Kamezawa Hiroyuki
  2012-07-04  2:58 ` [PATCH v2 2/2] memcg: remove -ENOMEM at page migration Kamezawa Hiroyuki
@ 2012-07-04  9:14 ` Johannes Weiner
  2012-07-04  9:30   ` Kamezawa Hiroyuki
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2012-07-04  9:14 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: linux-mm, David Rientjes, Andrew Morton, Michal Hocko, Tejun Heo

On Wed, Jul 04, 2012 at 11:56:28AM +0900, Kamezawa Hiroyuki wrote:
> I think usage > limit means a sign of BUG. But, sometimes,
> res_counter_charge_nofail() is very convenient. tcp_memcg uses it.
> And I'd like to use it for helping page migration.
> 
> This patch adds res_counter_usage_safe() which returns min(usage,limit).
> By this we can use res_counter_charge_nofail() without breaking
> user experience.
> 
> Changelog:
>  - read res_counter directrly under lock.
>  - fixed comment.
> 
> Acked-by: Glauber Costa <glommer@parallels.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/res_counter.h |    2 ++
>  kernel/res_counter.c        |   18 ++++++++++++++++++
>  net/ipv4/tcp_memcontrol.c   |    2 +-
>  3 files changed, 21 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 7d7fbe2..a6f8cc5 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -226,4 +226,6 @@ res_counter_set_soft_limit(struct res_counter *cnt,
>  	return 0;
>  }
>  
> +u64 res_counter_usage_safe(struct res_counter *cnt);
> +
>  #endif
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index ad581aa..f0507cd 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -171,6 +171,24 @@ u64 res_counter_read_u64(struct res_counter *counter, int member)
>  }
>  #endif
>  
> +/*
> + * Returns usage. If usage > limit, limit is returned.
> + * This is useful not to break user experiance if the excess
> + * is temporary.
> + */
> +u64 res_counter_usage_safe(struct res_counter *counter)
> +{
> +	unsigned long flags;
> +	u64 usage, limit;
> +
> +	spin_lock_irqsave(&counter->lock, flags);
> +	limit = counter->limit;
> +	usage = counter->usage;
> +	spin_unlock_irqrestore(&counter->lock, flags);
> +
> +	return min(usage, limit);
> +}
> +
>  int res_counter_memparse_write_strategy(const char *buf,
>  					unsigned long long *res)
>  {
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index b6f3583..a73dce6 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -180,7 +180,7 @@ static u64 tcp_read_usage(struct mem_cgroup *memcg)
>  		return atomic_long_read(&tcp_memory_allocated) << PAGE_SHIFT;
>  
>  	tcp = tcp_from_cgproto(cg_proto);
> -	return res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
> +	return res_counter_usage_safe(&tcp->tcp_memory_allocated);
>  }

Hm, it depends on what you consider more important.

Personally, I think it's more useful to report the truth rather than
pretending we'd enforce an invariant that we actually don't.  And I
think it can just be documented that we have to charge memory over the
limit in certain contexts, so people/scripts should expect usage to
exceed the limit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] memcg: add res_counter_usage_safe()
  2012-07-04  9:14 ` [PATCH v2 1/2] memcg: add res_counter_usage_safe() Johannes Weiner
@ 2012-07-04  9:30   ` Kamezawa Hiroyuki
  2012-07-04  9:40     ` Glauber Costa
  0 siblings, 1 reply; 10+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-04  9:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, David Rientjes, Andrew Morton, Michal Hocko, Tejun Heo

(2012/07/04 18:14), Johannes Weiner wrote:
> On Wed, Jul 04, 2012 at 11:56:28AM +0900, Kamezawa Hiroyuki wrote:
>> I think usage > limit means a sign of BUG. But, sometimes,
>> res_counter_charge_nofail() is very convenient. tcp_memcg uses it.
>> And I'd like to use it for helping page migration.
>>
>> This patch adds res_counter_usage_safe() which returns min(usage,limit).
>> By this we can use res_counter_charge_nofail() without breaking
>> user experience.
>>
>> Changelog:
>>   - read res_counter directrly under lock.
>>   - fixed comment.
>>
>> Acked-by: Glauber Costa <glommer@parallels.com>
>> Acked-by: David Rientjes <rientjes@google.com>
>> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>   include/linux/res_counter.h |    2 ++
>>   kernel/res_counter.c        |   18 ++++++++++++++++++
>>   net/ipv4/tcp_memcontrol.c   |    2 +-
>>   3 files changed, 21 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>> index 7d7fbe2..a6f8cc5 100644
>> --- a/include/linux/res_counter.h
>> +++ b/include/linux/res_counter.h
>> @@ -226,4 +226,6 @@ res_counter_set_soft_limit(struct res_counter *cnt,
>>   	return 0;
>>   }
>>
>> +u64 res_counter_usage_safe(struct res_counter *cnt);
>> +
>>   #endif
>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>> index ad581aa..f0507cd 100644
>> --- a/kernel/res_counter.c
>> +++ b/kernel/res_counter.c
>> @@ -171,6 +171,24 @@ u64 res_counter_read_u64(struct res_counter *counter, int member)
>>   }
>>   #endif
>>
>> +/*
>> + * Returns usage. If usage > limit, limit is returned.
>> + * This is useful not to break user experiance if the excess
>> + * is temporary.
>> + */
>> +u64 res_counter_usage_safe(struct res_counter *counter)
>> +{
>> +	unsigned long flags;
>> +	u64 usage, limit;
>> +
>> +	spin_lock_irqsave(&counter->lock, flags);
>> +	limit = counter->limit;
>> +	usage = counter->usage;
>> +	spin_unlock_irqrestore(&counter->lock, flags);
>> +
>> +	return min(usage, limit);
>> +}
>> +
>>   int res_counter_memparse_write_strategy(const char *buf,
>>   					unsigned long long *res)
>>   {
>> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
>> index b6f3583..a73dce6 100644
>> --- a/net/ipv4/tcp_memcontrol.c
>> +++ b/net/ipv4/tcp_memcontrol.c
>> @@ -180,7 +180,7 @@ static u64 tcp_read_usage(struct mem_cgroup *memcg)
>>   		return atomic_long_read(&tcp_memory_allocated) << PAGE_SHIFT;
>>
>>   	tcp = tcp_from_cgproto(cg_proto);
>> -	return res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE);
>> +	return res_counter_usage_safe(&tcp->tcp_memory_allocated);
>>   }
>
> Hm, it depends on what you consider more important.
>
> Personally, I think it's more useful to report the truth rather than
> pretending we'd enforce an invariant that we actually don't.  And I
> think it can just be documented that we have to charge memory over the
> limit in certain contexts, so people/scripts should expect usage to
> exceed the limit.
>

I think asking applications to handle usage > limit case will cause
trouble and we can keep simple interface by lying here. And,
applications doesn't need to handle this case.

 From the viewpoint of our enterprise service, it's better to keep
usage <= limit for avoiding unnecessary, unimportant, troubles.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] memcg: add res_counter_usage_safe()
  2012-07-04  9:30   ` Kamezawa Hiroyuki
@ 2012-07-04  9:40     ` Glauber Costa
  0 siblings, 0 replies; 10+ messages in thread
From: Glauber Costa @ 2012-07-04  9:40 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Johannes Weiner, linux-mm, David Rientjes, Andrew Morton,
	Michal Hocko, Tejun Heo

On 07/04/2012 01:30 PM, Kamezawa Hiroyuki wrote:
>>
> 
> I think asking applications to handle usage > limit case will cause
> trouble and we can keep simple interface by lying here. And,
> applications doesn't need to handle this case.
> 
> From the viewpoint of our enterprise service, it's better to keep
> usage <= limit for avoiding unnecessary, unimportant, troubles.
> 
> Thanks,
> -Kame

One thing to keep in mind, is that usage is already a lie. Mostly
because of batching.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] memcg: remove -ENOMEM at page migration.
  2012-07-04  8:30   ` Johannes Weiner
  2012-07-04  8:39     ` Kamezawa Hiroyuki
@ 2012-07-04 12:04     ` Michal Hocko
  2012-07-04 13:13       ` Johannes Weiner
  1 sibling, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2012-07-04 12:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Kamezawa Hiroyuki, linux-mm, David Rientjes, Andrew Morton, Tejun Heo

On Wed 04-07-12 10:30:19, Johannes Weiner wrote:
> On Wed, Jul 04, 2012 at 11:58:22AM +0900, Kamezawa Hiroyuki wrote:
> > >From 257a1e6603aab8c6a3bd25648872a11e8b85ef64 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Thu, 28 Jun 2012 19:07:24 +0900
> > Subject: [PATCH 2/2] 
> > 
> > For handling many kinds of races, memcg adds an extra charge to
> > page's memcg at page migration. But this affects the page compaction
> > and make it fail if the memcg is under OOM.
> > 
> > This patch uses res_counter_charge_nofail() in page migration path
> > and remove -ENOMEM. By this, page migration will not fail by the
> > status of memcg.
> > 
> > Even though res_counter_charge_nofail can silently go over the memcg
> > limit mem_cgroup_usage compensates that and it doesn't tell the real truth
> > to the userspace.
> > 
> > Excessive charges are only temporal and done on a single page per-CPU in
> > the worst case. This sounds tolerable and actually consumes less charges
> > than the current per-cpu memcg_stock.
> 
> But it still means we end up going into reclaim on charges, limit
> resizing etc. which we wouldn't without a bunch of pages under
> migration.
> 
> Can we instead not charge the new page, just commit it while holding
> on to a css refcount, and have end_migration call a version of
> __mem_cgroup_uncharge_common() that updates the stats but leaves the
> res counters alone?

Yes, this is also a way to go. Both approaches have to lie a bit and
both have a discrepancy between stat and usage_in_bytes. I guess we can
live with that.
Kame's solution seems easier but yours prevent from a corner case when
the reclaim is triggered due to artificial charges so I guess it is
better to go with yours.
Few (trivial) comments on the patch bellow.

> oldpage will not get uncharged because of the page lock and
> PageCgroupMigration, so the charge is stable during migration.
> 
> Patch below
[...]
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 83e7ba9..17a09e8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -100,7 +100,7 @@ int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
>  
>  extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
>  
> -extern int
> +extern void
>  mem_cgroup_prepare_migration(struct page *page,
>  	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask);
>  extern void mem_cgroup_end_migration(struct mem_cgroup *memcg,
> @@ -279,11 +279,10 @@ static inline struct cgroup_subsys_state
>  	return NULL;
>  }
>  
> -static inline int
> +static inline void
>  mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
>  	struct mem_cgroup **memcgp, gfp_t gfp_mask)
>  {
> -	return 0;
>  }
>  
>  static inline void mem_cgroup_end_migration(struct mem_cgroup *memcg,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f72b5e5..c5161f0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2911,7 +2911,8 @@ static void mem_cgroup_do_uncharge(struct mem_cgroup *memcg,
>   * uncharge if !page_mapped(page)
>   */
>  static struct mem_cgroup *
> -__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> +__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype,
> +			     bool end_migration)
>  {
>  	struct mem_cgroup *memcg = NULL;
>  	unsigned int nr_pages = 1;
> @@ -2955,7 +2956,10 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  		/* fallthrough */
>  	case MEM_CGROUP_CHARGE_TYPE_DROP:
>  		/* See mem_cgroup_prepare_migration() */
> -		if (page_mapped(page) || PageCgroupMigration(pc))
> +		if (page_mapped(page))
> +			goto unlock_out;

Don't need that test or remove the one below (seems easier to read
because those cases are really different things).

> +		if (page_mapped(page) || (!end_migration &&
> +					  PageCgroupMigration(pc)))
>  			goto unlock_out;
>  		break;
>  	case MEM_CGROUP_CHARGE_TYPE_SWAPOUT:
[...]
> @@ -3166,19 +3170,18 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>   * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
>   * page belongs to.
>   */
> -int mem_cgroup_prepare_migration(struct page *page,
> +void mem_cgroup_prepare_migration(struct page *page,
>  	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)

gfp_mask is not needed anymore


>  {
>  	struct mem_cgroup *memcg = NULL;
>  	struct page_cgroup *pc;
>  	enum charge_type ctype;
> -	int ret = 0;
>  
>  	*memcgp = NULL;
>  
>  	VM_BUG_ON(PageTransHuge(page));
>  	if (mem_cgroup_disabled())
> -		return 0;
> +		return;
>  
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
> @@ -3223,24 +3226,9 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	 * we return here.
>  	 */
>  	if (!memcg)
> -		return 0;
> +		return;
>  
>  	*memcgp = memcg;
> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
> -	css_put(&memcg->css);/* drop extra refcnt */
> -	if (ret) {
> -		if (PageAnon(page)) {
> -			lock_page_cgroup(pc);
> -			ClearPageCgroupMigration(pc);
> -			unlock_page_cgroup(pc);
> -			/*
> -			 * The old page may be fully unmapped while we kept it.
> -			 */
> -			mem_cgroup_uncharge_page(page);
> -		}
> -		/* we'll need to revisit this error code (we have -EINTR) */
> -		return -ENOMEM;
> -	}
>  	/*
>  	 * We charge new page before it's used/mapped. So, even if unlock_page()
>  	 * is called before end_migration, we can catch all events on this new
> @@ -3254,7 +3242,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	else
>  		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
>  	__mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);

Perhaps a comment that we are doing commit without charge because this
is only temporal would be good?

Thanks!
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] memcg: remove -ENOMEM at page migration.
  2012-07-04 12:04     ` Michal Hocko
@ 2012-07-04 13:13       ` Johannes Weiner
  2012-07-04 13:38         ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2012-07-04 13:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kamezawa Hiroyuki, linux-mm, David Rientjes, Andrew Morton, Tejun Heo

On Wed, Jul 04, 2012 at 02:04:45PM +0200, Michal Hocko wrote:
> On Wed 04-07-12 10:30:19, Johannes Weiner wrote:
> > On Wed, Jul 04, 2012 at 11:58:22AM +0900, Kamezawa Hiroyuki wrote:
> > > >From 257a1e6603aab8c6a3bd25648872a11e8b85ef64 Mon Sep 17 00:00:00 2001
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Date: Thu, 28 Jun 2012 19:07:24 +0900
> > > Subject: [PATCH 2/2] 
> > > 
> > > For handling many kinds of races, memcg adds an extra charge to
> > > page's memcg at page migration. But this affects the page compaction
> > > and make it fail if the memcg is under OOM.
> > > 
> > > This patch uses res_counter_charge_nofail() in page migration path
> > > and remove -ENOMEM. By this, page migration will not fail by the
> > > status of memcg.
> > > 
> > > Even though res_counter_charge_nofail can silently go over the memcg
> > > limit mem_cgroup_usage compensates that and it doesn't tell the real truth
> > > to the userspace.
> > > 
> > > Excessive charges are only temporal and done on a single page per-CPU in
> > > the worst case. This sounds tolerable and actually consumes less charges
> > > than the current per-cpu memcg_stock.
> > 
> > But it still means we end up going into reclaim on charges, limit
> > resizing etc. which we wouldn't without a bunch of pages under
> > migration.
> > 
> > Can we instead not charge the new page, just commit it while holding
> > on to a css refcount, and have end_migration call a version of
> > __mem_cgroup_uncharge_common() that updates the stats but leaves the
> > res counters alone?
> 
> Yes, this is also a way to go. Both approaches have to lie a bit and
> both have a discrepancy between stat and usage_in_bytes. I guess we can
> live with that.
> Kame's solution seems easier but yours prevent from a corner case when
> the reclaim is triggered due to artificial charges so I guess it is
> better to go with yours.
> Few (trivial) comments on the patch bellow.

It's true that the cache/rss statistics still account for both pages.
But they don't have behavioural impact and so I didn't bother.  We
could still fix this up later, but it's less urgent, I think.

> > @@ -2955,7 +2956,10 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> >  		/* fallthrough */
> >  	case MEM_CGROUP_CHARGE_TYPE_DROP:
> >  		/* See mem_cgroup_prepare_migration() */
> > -		if (page_mapped(page) || PageCgroupMigration(pc))
> > +		if (page_mapped(page))
> > +			goto unlock_out;
> 
> Don't need that test or remove the one below (seems easier to read
> because those cases are really different things).
> 
> > +		if (page_mapped(page) || (!end_migration &&
> > +					  PageCgroupMigration(pc)))

My bad, I meant to remove this second page_mapped() and forgot.  Will
fix.  I take it

		if (page_mapped(page))
			goto unlock_out;
		if (!end_migration && PageCgroupMigration(pc))
			goto unlock_out;

is what you had in mind?

> > @@ -3166,19 +3170,18 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
> >   * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
> >   * page belongs to.
> >   */
> > -int mem_cgroup_prepare_migration(struct page *page,
> > +void mem_cgroup_prepare_migration(struct page *page,
> >  	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
> 
> gfp_mask is not needed anymore

Good catch, will fix.

> > @@ -3254,7 +3242,7 @@ int mem_cgroup_prepare_migration(struct page *page,
> >  	else
> >  		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> >  	__mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);
> 
> Perhaps a comment that we are doing commit without charge because this
> is only temporal would be good?

Yes, I'll add something.

Thanks for the review!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] memcg: remove -ENOMEM at page migration.
  2012-07-04 13:13       ` Johannes Weiner
@ 2012-07-04 13:38         ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2012-07-04 13:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Kamezawa Hiroyuki, linux-mm, David Rientjes, Andrew Morton, Tejun Heo

On Wed 04-07-12 15:13:02, Johannes Weiner wrote:
> On Wed, Jul 04, 2012 at 02:04:45PM +0200, Michal Hocko wrote:
> > On Wed 04-07-12 10:30:19, Johannes Weiner wrote:
[...]
> > > Can we instead not charge the new page, just commit it while holding
> > > on to a css refcount, and have end_migration call a version of
> > > __mem_cgroup_uncharge_common() that updates the stats but leaves the
> > > res counters alone?
> > 
> > Yes, this is also a way to go. Both approaches have to lie a bit and
> > both have a discrepancy between stat and usage_in_bytes. I guess we can
> > live with that.
> > Kame's solution seems easier but yours prevent from a corner case when
> > the reclaim is triggered due to artificial charges so I guess it is
> > better to go with yours.
> > Few (trivial) comments on the patch bellow.
> 
> It's true that the cache/rss statistics still account for both pages.
> But they don't have behavioural impact and so I didn't bother.  

Only if somebody watches those numbers and blows up if rss+cache >
limit_in_bytes. I can imagine an LTP test like that. But the test would
need to trigger migration in the background...

> We could still fix this up later, but it's less urgent, I think.

Yes, I guess so

> 
> > > @@ -2955,7 +2956,10 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> > >  		/* fallthrough */
> > >  	case MEM_CGROUP_CHARGE_TYPE_DROP:
> > >  		/* See mem_cgroup_prepare_migration() */
> > > -		if (page_mapped(page) || PageCgroupMigration(pc))
> > > +		if (page_mapped(page))
> > > +			goto unlock_out;
> > 
> > Don't need that test or remove the one below (seems easier to read
> > because those cases are really different things).
> > 
> > > +		if (page_mapped(page) || (!end_migration &&
> > > +					  PageCgroupMigration(pc)))
> 
> My bad, I meant to remove this second page_mapped() and forgot.  Will
> fix.  I take it
> 
> 		if (page_mapped(page))
> 			goto unlock_out;
> 		if (!end_migration && PageCgroupMigration(pc))
> 			goto unlock_out;
> 
> is what you had in mind?

Yes

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-04 13:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-04  2:56 [PATCH v2 1/2] memcg: add res_counter_usage_safe() Kamezawa Hiroyuki
2012-07-04  2:58 ` [PATCH v2 2/2] memcg: remove -ENOMEM at page migration Kamezawa Hiroyuki
2012-07-04  8:30   ` Johannes Weiner
2012-07-04  8:39     ` Kamezawa Hiroyuki
2012-07-04 12:04     ` Michal Hocko
2012-07-04 13:13       ` Johannes Weiner
2012-07-04 13:38         ` Michal Hocko
2012-07-04  9:14 ` [PATCH v2 1/2] memcg: add res_counter_usage_safe() Johannes Weiner
2012-07-04  9:30   ` Kamezawa Hiroyuki
2012-07-04  9:40     ` Glauber Costa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.