All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak
@ 2008-12-12  8:29 KAMEZAWA Hiroyuki
  2008-12-12  9:43 ` Daisuke Nishimura
  0 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-12  8:29 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, balbir, nishimura, akpm


From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Fix swap-page-fault charge leak of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

	(Page fault with WRITE)
	Assume the page is SwapCache "on memory (still charged)"
	try_charge() (charge += PAGESIZE)
	commit_charge()
	   => (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE
	       because it seems already charged.)
	reuse_swap_page()
		-> delete_from_swapcache()
			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
	......

too much uncharge.....

To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
By this,
        Assume the page is SwapCache "on memory"
	try_charge()		(charge += PAGESIZE)
	reuse_swap_page()	(may charge -= PAGESIZE if PCG_USED is set)
	commit_charge()		(Ony if page_cgroup is marked as PCG_USED,
				 charge -= PAGESIZE)
Accounting will be correct.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memory.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: mmotm-2.6.28-Dec11/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec11.orig/mm/memory.c
+++ mmotm-2.6.28-Dec11/mm/memory.c
@@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct
 	 * which may delete_from_swap_cache().
 	 */
 
-	mem_cgroup_commit_charge_swapin(page, ptr);
 	inc_mm_counter(mm, anon_rss);
 	pte = mk_pte(page, vma->vm_page_prot);
 	if (write_access && reuse_swap_page(page)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		write_access = 0;
 	}
-
 	flush_icache_page(vma, page);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
+	/* It's better to call commit-charge after rmap is established */
+	mem_cgroup_commit_charge_swapin(page, ptr);
 
 	swap_free(entry);
 	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak
  2008-12-12  8:29 [BUGFIX][PATCH mmotm] memcg fix swap accounting leak KAMEZAWA Hiroyuki
@ 2008-12-12  9:43 ` Daisuke Nishimura
  2008-12-12 11:16     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Daisuke Nishimura @ 2008-12-12  9:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, akpm, Hugh Dickins, nishimura

(add CC: Hugh Dickins <hugh@veritas.com>)

On Fri, 12 Dec 2008 17:29:30 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Fix swap-page-fault charge leak of memcg.
> 
> Now, memcg has hooks to swap-out operation and checks SwapCache is really
> unused or not. That check depends on contents of struct page.
> I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> still-in-use.
> 
> Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> of any rmap. Then, in followinig sequence
> 
> 	(Page fault with WRITE)
> 	Assume the page is SwapCache "on memory (still charged)"
> 	try_charge() (charge += PAGESIZE)
> 	commit_charge()
> 	   => (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE
> 	       because it seems already charged.)
> 	reuse_swap_page()
> 		-> delete_from_swapcache()
> 			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> 	......
> 
> too much uncharge.....
> 
> To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
> By this,
>         Assume the page is SwapCache "on memory"
> 	try_charge()		(charge += PAGESIZE)
> 	reuse_swap_page()	(may charge -= PAGESIZE if PCG_USED is set)
> 	commit_charge()		(Ony if page_cgroup is marked as PCG_USED,
> 				 charge -= PAGESIZE)
> Accounting will be correct.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I've confirmed that the problem you reported offline is fixed, but...

> ---
>  mm/memory.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: mmotm-2.6.28-Dec11/mm/memory.c
> ===================================================================
> --- mmotm-2.6.28-Dec11.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec11/mm/memory.c
> @@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct
>  	 * which may delete_from_swap_cache().
>  	 */
>  
The comment here says:

        /*
         * The page isn't present yet, go ahead with the fault.
         *
         * Be careful about the sequence of operations here.
         * To get its accounting right, reuse_swap_page() must be called
         * while the page is counted on swap but not yet in mapcount i.e.
         * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
         * must be called after the swap_free(), or it will never succeed.
         * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
         * in page->private, must be called before reuse_swap_page(),
         * which may delete_from_swap_cache().
         */

Hmm.. should we save page->private before calling reuse_swap_page and pass it
to mem_cgroup_commit_charge_swapin(I think it cannot be changed because the page
is locked)?


Thanks,
Daisuke Nishimura. 

> -	mem_cgroup_commit_charge_swapin(page, ptr);
>  	inc_mm_counter(mm, anon_rss);
>  	pte = mk_pte(page, vma->vm_page_prot);
>  	if (write_access && reuse_swap_page(page)) {
>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>  		write_access = 0;
>  	}
> -
>  	flush_icache_page(vma, page);
>  	set_pte_at(mm, address, page_table, pte);
>  	page_add_anon_rmap(page, vma, address);
> +	/* It's better to call commit-charge after rmap is established */
> +	mem_cgroup_commit_charge_swapin(page, ptr);
>  
>  	swap_free(entry);
>  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak
  2008-12-12  9:43 ` Daisuke Nishimura
@ 2008-12-12 11:16     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-12 11:16 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, balbir, akpm,
	Hugh Dickins, nishimura

Daisuke Nishimura said:
>
>         /*
>          * The page isn't present yet, go ahead with the fault.
>          *
>          * Be careful about the sequence of operations here.
>          * To get its accounting right, reuse_swap_page() must be called
>          * while the page is counted on swap but not yet in mapcount i.e.
>          * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
>          * must be called after the swap_free(), or it will never succeed.
>          * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
>          * in page->private, must be called before reuse_swap_page(),
>          * which may delete_from_swap_cache().
>          */
>
> Hmm.. should we save page->private before calling reuse_swap_page and pass
> it
> to mem_cgroup_commit_charge_swapin(I think it cannot be changed because
> the page
> is locked)?
>
seems not necessary (see below).  I'll fix comment if I uses my pc tomorrow..

Considering 2 cases,
 A. the SwapCache is already chareged before try_charge_swapin()
 B. the SwapCache is very new and not charged before try_charge_swapin()

Case A.
   0. We have charge of PAGE_SIZE to this page before reach here.
   1. try_charge_swapin() is called and charge += PAGE_SIZE
   2. reuse_swap_page() is called.
          when delete_from_swap_cache() is called..
          2-a. if already mapped, no change in charges.
          2-b. if not mapped, charge-=PAGE_SIZE. PCG_USED bit is cleared.
               and charge-record is written into swap_cgroup
          not called.
          2-c. no changes in charge.
   3. commit_charge is called.
          3-a. PCG_USED bit is set, so charge -= PAGE_SIZE.
          3-b. PCG_USED bit is cleared and so we set PCG_USED bit and no
               changes in charge.
          3-c. no changes in charge.
   4-b. swap_free() will clear record in swap_cgroup.

   Then, finally we have PAGE_SIZE of charge to this page.

Case B.
   0. We have no charges to this page.
   1. try_charge_swapin() is called and charge += PAGE_SIZE.
   2. reuse_swap_page() is called.
         2-a if delete_from_swap_cache() is called.
         the page is not mapped. but PCG_USED bit is not set.
         so, no change in charges finally. (just recorded in swap_cgroup)
         2-b. not called ... no changes in charge.
   3. commit_charge() is called and set PCG_USED bit. no changes in charnge.
   4. swap_free() is called and clear record in swap_cgroup.

   Then, finally we have PAGE_SIZE of charge to this page.



Thanks,
-Kame




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak
@ 2008-12-12 11:16     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-12 11:16 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, balbir, akpm, Hugh Dickins

Daisuke Nishimura said:
>
>         /*
>          * The page isn't present yet, go ahead with the fault.
>          *
>          * Be careful about the sequence of operations here.
>          * To get its accounting right, reuse_swap_page() must be called
>          * while the page is counted on swap but not yet in mapcount i.e.
>          * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
>          * must be called after the swap_free(), or it will never succeed.
>          * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
>          * in page->private, must be called before reuse_swap_page(),
>          * which may delete_from_swap_cache().
>          */
>
> Hmm.. should we save page->private before calling reuse_swap_page and pass
> it
> to mem_cgroup_commit_charge_swapin(I think it cannot be changed because
> the page
> is locked)?
>
seems not necessary (see below).  I'll fix comment if I uses my pc tomorrow..

Considering 2 cases,
 A. the SwapCache is already chareged before try_charge_swapin()
 B. the SwapCache is very new and not charged before try_charge_swapin()

Case A.
   0. We have charge of PAGE_SIZE to this page before reach here.
   1. try_charge_swapin() is called and charge += PAGE_SIZE
   2. reuse_swap_page() is called.
          when delete_from_swap_cache() is called..
          2-a. if already mapped, no change in charges.
          2-b. if not mapped, charge-=PAGE_SIZE. PCG_USED bit is cleared.
               and charge-record is written into swap_cgroup
          not called.
          2-c. no changes in charge.
   3. commit_charge is called.
          3-a. PCG_USED bit is set, so charge -= PAGE_SIZE.
          3-b. PCG_USED bit is cleared and so we set PCG_USED bit and no
               changes in charge.
          3-c. no changes in charge.
   4-b. swap_free() will clear record in swap_cgroup.

   Then, finally we have PAGE_SIZE of charge to this page.

Case B.
   0. We have no charges to this page.
   1. try_charge_swapin() is called and charge += PAGE_SIZE.
   2. reuse_swap_page() is called.
         2-a if delete_from_swap_cache() is called.
         the page is not mapped. but PCG_USED bit is not set.
         so, no change in charges finally. (just recorded in swap_cgroup)
         2-b. not called ... no changes in charge.
   3. commit_charge() is called and set PCG_USED bit. no changes in charnge.
   4. swap_free() is called and clear record in swap_cgroup.

   Then, finally we have PAGE_SIZE of charge to this page.



Thanks,
-Kame

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)
  2008-12-12 11:16     ` KAMEZAWA Hiroyuki
  (?)
@ 2008-12-13  7:03     ` KAMEZAWA Hiroyuki
  2008-12-13  9:49       ` Hugh Dickins
  -1 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-13  7:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm, Hugh Dickins

Updated explanation and fixed comment.
==

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

	(Page fault with WRITE)
	try_charge() (charge += PAGESIZE)
	commit_charge() (Check page_cgroup is used or not..)
	reuse_swap_page()
		-> delete_from_swapcache()
			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
	......
New charge is uncharged soon....
To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
By this,

	try_charge()		(usage += PAGESIZE)
	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
	commit_charge()		(If page_cgroup is not marked as PCG_USED,
				 add new charge.)
Accounting will be correct.

Changelog (v1) -> (v2)
  - fixed comment.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/controllers/memcg_test.txt |   50 +++++++++++++++++++++++++++++--
 mm/memory.c                              |   11 +++---
 2 files changed, 54 insertions(+), 7 deletions(-)

Index: mmotm-2.6.28-Dec12/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memory.c
+++ mmotm-2.6.28-Dec12/mm/memory.c
@@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
 	 * while the page is counted on swap but not yet in mapcount i.e.
 	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
 	 * must be called after the swap_free(), or it will never succeed.
-	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
-	 * in page->private, must be called before reuse_swap_page(),
-	 * which may delete_from_swap_cache().
+	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
+	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
+	 * in page->private. In this case, a record in swap_cgroup  is silently
+	 * discarded at swap_free().
 	 */
 
-	mem_cgroup_commit_charge_swapin(page, ptr);
 	inc_mm_counter(mm, anon_rss);
 	pte = mk_pte(page, vma->vm_page_prot);
 	if (write_access && reuse_swap_page(page)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		write_access = 0;
 	}
-
 	flush_icache_page(vma, page);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
+	/* It's better to call commit-charge after rmap is established */
+	mem_cgroup_commit_charge_swapin(page, ptr);
 
 	swap_free(entry);
 	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
===================================================================
--- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
+++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
@@ -1,6 +1,6 @@
 Memory Resource Controller(Memcg)  Implementation Memo.
-Last Updated: 2008/12/10
-Base Kernel Version: based on 2.6.28-rc7-mm.
+Last Updated: 2008/12/13
+Base Kernel Version: based on 2.6.28-rc8-mm.
 
 Because VM is getting complex (one of reasons is memcg...), memcg's behavior
 is complex. This is a document for memcg's internal behavior.
@@ -115,6 +115,52 @@ Under below explanation, we assume CONFI
 	(But racy state between (a) and (b) exists. We do check it.)
 	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
 
+	In case (a), reuse_swap_page() may call delete_from_swap_cache() if
+	the page can drop swp_entry and be reused for "WRITE".
+	Note: If the page may be accounted before (A), if it isn't kicked out
+	      to disk before page fault.
+
+	case A) the page is not accounted as SwapCache and SwapCache is deleted
+		by reuse_swap_page().
+		1. try_charge_swapin() is called and
+			- charge_for_memory +=1.
+			- charge_for_memsw  +=1.
+		2. reuse_swap_page -> delete_from_swap_cache() is called.
+			because the page is not accounted as SwapCache,
+			no changes in accounting.
+		3. commit_charge_swapin() finds PCG_USED bit is not set and
+		   set PCG_USED bit.
+		   Because page->private is empty by 2. no changes in charge.
+		4. swap_free(entry) is called.
+			- charge_for_memsw -= 1.
+
+		Finally, charge_for_memory +=1, charge_for_memsw = +-0.
+
+	case B) the page is accounted as SwapCache and SwapCache is deleted
+		by reuse_swap_page.
+		1. try_charge_swapin() is called.
+			- charge_for_memory += 1.
+			- charge_for_memsw += 1.
+		2. reuse_swap_page -> delete_from_swap_cache() is called.
+			PCG_USED bit is found and cleared.
+			- charge_for_memory -= 1. (swap_cgroup is recorded.)
+		3. commit_charge_swapin() finds PCG_USED bit is not set.
+		4. swap_free(entry) is called and
+			- charge_for_memsw -= 1.
+
+		Finally, charge_for_memory = +-0, charge_for_memsw = +-0.
+
+	case C) the page is not accounted as SwapCache and reuse_swap_page
+		doesn't call delete_from_swap_cache()
+		1. try_charge_swapin() is called.
+			- charge_for_memory += 1.
+			- charge_for_memsw += 1.
+		2. commit_charge_swapin() finds PCG_USED bit is not set
+		   and finds swap_cgroup records this entry.
+			- charge_for_memsw -= 1.
+
+		Finally, charge_for_memory +=1, charge_for_memsw = +-0
+
 	4.2 Swap-out.
 	At swap-out, typical state transition is below.
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)
  2008-12-13  7:03     ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki
@ 2008-12-13  9:49       ` Hugh Dickins
  2008-12-13 10:27         ` KAMEZAWA Hiroyuki
  2008-12-13 10:38         ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 20+ messages in thread
From: Hugh Dickins @ 2008-12-13  9:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm

On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote:
> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec12/mm/memory.c
>  
> -	mem_cgroup_commit_charge_swapin(page, ptr);
>  	inc_mm_counter(mm, anon_rss);
>  	pte = mk_pte(page, vma->vm_page_prot);
>  	if (write_access && reuse_swap_page(page)) {
>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>  		write_access = 0;
>  	}
> -
>  	flush_icache_page(vma, page);
>  	set_pte_at(mm, address, page_table, pte);
>  	page_add_anon_rmap(page, vma, address);
> +	/* It's better to call commit-charge after rmap is established */
> +	mem_cgroup_commit_charge_swapin(page, ptr);
>  
>  	swap_free(entry);
>  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))

That ordering is back to how it was before I adjusted it
for reuse_swap_page()'s delete_from_swap_cache(), isn't it?

So I don't understand how you've fixed the bug I hit (not an
accounting imbalance but an oops or BUG, I forget) with this
ordering, without making some other change elsewhere.

mem_cgroup_commit_charge_swapin calls swap_cgroup_record with
bogus swp_entry_t 0, which appears to belong to swp_offset 0 of
swp_type 0, but the ctrl->map for type 0 may have been freed
ages ago (we do always start from 0, but maybe we swapped on
type 1 and swapped off type 0 meanwhile).  I'm guessing that
by looking at the code, not by retesting it, so I may have the
details wrong; but I didn't reorder your code just for fun.

Perhaps your restored ordering works if you check PageSwapCache
in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record,
but I don't see that in yesterday's mmotm, nor in this patch.

(And I should admit, I've not even attempted to follow your
accounting justification: I'll leave that to you memcg guys.)

An alternative could be not to clear page->private when deleting
from swap cache, that's only done for tidiness and to force notice
of races like this; but I'd want a much stronger reason to change that.

Or am I making this up?  As I say, I've not tested it this time around.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)
  2008-12-13  9:49       ` Hugh Dickins
@ 2008-12-13 10:27         ` KAMEZAWA Hiroyuki
  2008-12-15  7:07             ` KAMEZAWA Hiroyuki
  2008-12-13 10:38         ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-13 10:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-mm, linux-kernel,
	balbir, akpm

Hugh Dickins said:
> On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote:
>> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
>> +++ mmotm-2.6.28-Dec12/mm/memory.c
>>
>> -	mem_cgroup_commit_charge_swapin(page, ptr);
>>  	inc_mm_counter(mm, anon_rss);
>>  	pte = mk_pte(page, vma->vm_page_prot);
>>  	if (write_access && reuse_swap_page(page)) {
>>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>>  		write_access = 0;
>>  	}
>> -
>>  	flush_icache_page(vma, page);
>>  	set_pte_at(mm, address, page_table, pte);
>>  	page_add_anon_rmap(page, vma, address);
>> +	/* It's better to call commit-charge after rmap is established */
>> +	mem_cgroup_commit_charge_swapin(page, ptr);
>>
>>  	swap_free(entry);
>>  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) ||
>> PageMlocked(page))
>
> That ordering is back to how it was before I adjusted it
> for reuse_swap_page()'s delete_from_swap_cache(), isn't it?
>
> So I don't understand how you've fixed the bug I hit (not an
> accounting imbalance but an oops or BUG, I forget) with this
> ordering, without making some other change elsewhere.
>
Ah, this is for fixing the bug by this order of calls.
==


> mem_cgroup_commit_charge_swapin calls swap_cgroup_record with
> bogus swp_entry_t 0, which appears to belong to swp_offset 0 of
> swp_type 0, but the ctrl->map for type 0 may have been freed
> ages ago (we do always start from 0, but maybe we swapped on
> type 1 and swapped off type 0 meanwhile).  I'm guessing that
> by looking at the code, not by retesting it, so I may have the
> details wrong; but I didn't reorder your code just for fun.
>
Ah, sorry. commit_charge_swapin() should chekc the page is still
SwapCache. Sorry. I'll update this in Monday.

> Perhaps your restored ordering works if you check PageSwapCache
> in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record,
> but I don't see that in yesterday's mmotm, nor in this patch.
>
yes. I'm wrong at that point.
I'll add PageSwapCache check to "commit" ops.

> (And I should admit, I've not even attempted to follow your
> accounting justification: I'll leave that to you memcg guys.)
>
> An alternative could be not to clear page->private when deleting
> from swap cache, that's only done for tidiness and to force notice
> of races like this; but I'd want a much stronger reason to change that.
>
Hmm, doesn't that change will add new unnecessary complex ?

> Or am I making this up?  As I say, I've not tested it this time around.
>
I'll revisit this Monday and think of swp_entry==0 problem.

Thank you for pointing out.

-Kame

> Hugh
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)
  2008-12-13  9:49       ` Hugh Dickins
  2008-12-13 10:27         ` KAMEZAWA Hiroyuki
@ 2008-12-13 10:38         ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-13 10:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-mm, linux-kernel,
	balbir, akpm

Hugh Dickins said:
> On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote:
>> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
>> +++ mmotm-2.6.28-Dec12/mm/memory.c
>>
>> -	mem_cgroup_commit_charge_swapin(page, ptr);
>>  	inc_mm_counter(mm, anon_rss);
>>  	pte = mk_pte(page, vma->vm_page_prot);
>>  	if (write_access && reuse_swap_page(page)) {
>>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>>  		write_access = 0;
>>  	}
>> -
>>  	flush_icache_page(vma, page);
>>  	set_pte_at(mm, address, page_table, pte);
>>  	page_add_anon_rmap(page, vma, address);
>> +	/* It's better to call commit-charge after rmap is established */
>> +	mem_cgroup_commit_charge_swapin(page, ptr);
>>
>>  	swap_free(entry);
>>  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) ||
>> PageMlocked(page))
>
> That ordering is back to how it was before I adjusted it
> for reuse_swap_page()'s delete_from_swap_cache(), isn't it?
>
> So I don't understand how you've fixed the bug I hit (not an
> accounting imbalance but an oops or BUG, I forget) with this
> ordering, without making some other change elsewhere.
>
Ah, this is a fix for the new bug by this order.
==
    try_charge()
    commit_charge()
    reuse_swap_page()
         -> delete_from_swapcache() -> uncharge_swapcache().
    increase mapcount here.
==
Because ucharge_swapcache() assumes following
  a. if mapcount==0, this swap cache is of no use and will be discarded.
  b. if mapcount >0, this swap cache is in use.
A charge commited by commit_charge() is discarded by reuse_swap_page().

By delaying commit (means checking flag of page_cgroup).
==
  try_charge()
  reuse_swap_page()
  commit_charge()
==
the leak of charge doesn't happen.
(reuse_swap_page() may drop page from swap-cache, but it's no probelm to
 commit. But as you say, this has swp_entry==0 bug.)

> mem_cgroup_commit_charge_swapin calls swap_cgroup_record with
> bogus swp_entry_t 0, which appears to belong to swp_offset 0 of
> swp_type 0, but the ctrl->map for type 0 may have been freed
> ages ago (we do always start from 0, but maybe we swapped on
> type 1 and swapped off type 0 meanwhile).  I'm guessing that
> by looking at the code, not by retesting it, so I may have the
> details wrong; but I didn't reorder your code just for fun.
>
> Perhaps your restored ordering works if you check PageSwapCache
> in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record,
> but I don't see that in yesterday's mmotm, nor in this patch.
>
Ahhhh, sorry. ok, swp_entry==0 is valid...Sigh...
I'll revisit this and check how commit_charge() works.
I think checking PageSwapCache() is enough but if not, do somehing other.
(Maybe Nishimura's suggestion to pass swp_entry directly to commit_charge()
 is one way.)

> (And I should admit, I've not even attempted to follow your
> accounting justification: I'll leave that to you memcg guys.)
>
Sorry for complication ;(

> An alternative could be not to clear page->private when deleting
> from swap cache, that's only done for tidiness and to force notice
> of races like this; but I'd want a much stronger reason to change that.
>
It seems that it  will add another complex or unexpected behavior..
I think I can do something workaround.

> Or am I making this up?  As I say, I've not tested it this time around.
>
I'll ask you if I found I can't do anything ;(

Thank you for pointing out!
I'll revisit this on Monday.

-Kame




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
  2008-12-13 10:27         ` KAMEZAWA Hiroyuki
@ 2008-12-15  7:07             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-15  7:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm


From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

	(Page fault with WRITE)
	try_charge() (charge += PAGESIZE)
	commit_charge() (Check page_cgroup is used or not..)
	reuse_swap_page()
		-> delete_from_swapcache()
			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
	......
New charge is uncharged soon....
To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
By this,

	try_charge()		(usage += PAGESIZE)
	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
	commit_charge()		(If page_cgroup is not marked as PCG_USED,
				 add new charge.)
Accounting will be correct.

Changelog (v2) -> (v3)
  - fixed invalid charge to swp_entry==0.
  - updated documentation.
Changelog (v1) -> (v2)
  - fixed comment.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/controllers/memcg_test.txt |   41 +++++++++++++++++++++++++++----
 mm/memcontrol.c                          |    7 +++--
 mm/memory.c                              |   11 ++++----
 3 files changed, 46 insertions(+), 13 deletions(-)

Index: mmotm-2.6.28-Dec12/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memory.c
+++ mmotm-2.6.28-Dec12/mm/memory.c
@@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
 	 * while the page is counted on swap but not yet in mapcount i.e.
 	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
 	 * must be called after the swap_free(), or it will never succeed.
-	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
-	 * in page->private, must be called before reuse_swap_page(),
-	 * which may delete_from_swap_cache().
+	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
+	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
+	 * in page->private. In this case, a record in swap_cgroup  is silently
+	 * discarded at swap_free().
 	 */
 
-	mem_cgroup_commit_charge_swapin(page, ptr);
 	inc_mm_counter(mm, anon_rss);
 	pte = mk_pte(page, vma->vm_page_prot);
 	if (write_access && reuse_swap_page(page)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		write_access = 0;
 	}
-
 	flush_icache_page(vma, page);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
+	/* It's better to call commit-charge after rmap is established */
+	mem_cgroup_commit_charge_swapin(page, ptr);
 
 	swap_free(entry);
 	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
===================================================================
--- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
+++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
@@ -1,6 +1,6 @@
 Memory Resource Controller(Memcg)  Implementation Memo.
-Last Updated: 2008/12/10
-Base Kernel Version: based on 2.6.28-rc7-mm.
+Last Updated: 2008/12/15
+Base Kernel Version: based on 2.6.28-rc8-mm.
 
 Because VM is getting complex (one of reasons is memcg...), memcg's behavior
 is complex. This is a document for memcg's internal behavior.
@@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
 	(b) If the SwapCache has been mapped by processes, it has been
 	    charged already.
 
-	In case (a), we charge it. In case (b), we don't charge it.
-	(But racy state between (a) and (b) exists. We do check it.)
-	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
+	This swap-in is one of the most complicated work. In do_swap_page(),
+	following events occur when pte is unchanged.
+
+	(1) the page (SwapCache) is looked up.
+	(2) lock_page()
+	(3) try_charge_swapin()
+	(4) reuse_swap_page() (may call delete_swap_cache())
+	(5) commit_charge_swapin()
+	(6) swap_free().
+
+	Considering following situation for example.
+
+	(A) The page has not been charged before (2) and reuse_swap_page()
+	    doesn't call delete_from_swap_cache().
+	(B) The page has not been charged before (2) and reuse_swap_page()
+	    calls delete_from_swap_cache().
+	(C) The page has been charged before (2) and reuse_swap_page() doesn't
+	    call delete_from_swap_cache().
+	(D) The page has been charged before (2) and reuse_swap_page() calls
+	    delete_from_swap_cache().
+
+	    memory.usage/memsw.usage changes to this page/swp_entry will be
+	 Case          (A)      (B)       (C)     (D)
+         Event
+       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
+          ===========================================
+          (3)        +1/+1    +1/+1     +1/+1   +1/+1
+          (4)          -       0/ 0       -     -1/ 0
+          (5)         0/ 1     0/-1     -1/-1    0/ 0
+          (6)          -        -         -      0/-1
+          ===========================================
+       Result         1/ 1     1/1       1/ 1    1/ 1
+
+       In any cases, charges to this page should be 1/ 1.
 
 	4.2 Swap-out.
 	At swap-out, typical state transition is below.
Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
+++ mmotm-2.6.28-Dec12/mm/memcontrol.c
@@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
 	/*
 	 * Now swap is on-memory. This means this page may be
 	 * counted both as mem and swap....double count.
-	 * Fix it by uncharging from memsw. This SwapCache is stable
-	 * because we're still under lock_page().
+	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
+	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
+	 * may call delete_from_swap_cache() before reach here.
 	 */
-	if (do_swap_account) {
+	if (do_swap_account && PageSwapCache(page)) {
 		swp_entry_t ent = {.val = page_private(page)};
 		struct mem_cgroup *memcg;
 		memcg = swap_cgroup_record(ent, NULL);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
@ 2008-12-15  7:07             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-15  7:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm


From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

	(Page fault with WRITE)
	try_charge() (charge += PAGESIZE)
	commit_charge() (Check page_cgroup is used or not..)
	reuse_swap_page()
		-> delete_from_swapcache()
			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
	......
New charge is uncharged soon....
To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
By this,

	try_charge()		(usage += PAGESIZE)
	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
	commit_charge()		(If page_cgroup is not marked as PCG_USED,
				 add new charge.)
Accounting will be correct.

Changelog (v2) -> (v3)
  - fixed invalid charge to swp_entry==0.
  - updated documentation.
Changelog (v1) -> (v2)
  - fixed comment.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/controllers/memcg_test.txt |   41 +++++++++++++++++++++++++++----
 mm/memcontrol.c                          |    7 +++--
 mm/memory.c                              |   11 ++++----
 3 files changed, 46 insertions(+), 13 deletions(-)

Index: mmotm-2.6.28-Dec12/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memory.c
+++ mmotm-2.6.28-Dec12/mm/memory.c
@@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
 	 * while the page is counted on swap but not yet in mapcount i.e.
 	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
 	 * must be called after the swap_free(), or it will never succeed.
-	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
-	 * in page->private, must be called before reuse_swap_page(),
-	 * which may delete_from_swap_cache().
+	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
+	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
+	 * in page->private. In this case, a record in swap_cgroup  is silently
+	 * discarded at swap_free().
 	 */
 
-	mem_cgroup_commit_charge_swapin(page, ptr);
 	inc_mm_counter(mm, anon_rss);
 	pte = mk_pte(page, vma->vm_page_prot);
 	if (write_access && reuse_swap_page(page)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		write_access = 0;
 	}
-
 	flush_icache_page(vma, page);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
+	/* It's better to call commit-charge after rmap is established */
+	mem_cgroup_commit_charge_swapin(page, ptr);
 
 	swap_free(entry);
 	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
===================================================================
--- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
+++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
@@ -1,6 +1,6 @@
 Memory Resource Controller(Memcg)  Implementation Memo.
-Last Updated: 2008/12/10
-Base Kernel Version: based on 2.6.28-rc7-mm.
+Last Updated: 2008/12/15
+Base Kernel Version: based on 2.6.28-rc8-mm.
 
 Because VM is getting complex (one of reasons is memcg...), memcg's behavior
 is complex. This is a document for memcg's internal behavior.
@@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
 	(b) If the SwapCache has been mapped by processes, it has been
 	    charged already.
 
-	In case (a), we charge it. In case (b), we don't charge it.
-	(But racy state between (a) and (b) exists. We do check it.)
-	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
+	This swap-in is one of the most complicated work. In do_swap_page(),
+	following events occur when pte is unchanged.
+
+	(1) the page (SwapCache) is looked up.
+	(2) lock_page()
+	(3) try_charge_swapin()
+	(4) reuse_swap_page() (may call delete_swap_cache())
+	(5) commit_charge_swapin()
+	(6) swap_free().
+
+	Considering following situation for example.
+
+	(A) The page has not been charged before (2) and reuse_swap_page()
+	    doesn't call delete_from_swap_cache().
+	(B) The page has not been charged before (2) and reuse_swap_page()
+	    calls delete_from_swap_cache().
+	(C) The page has been charged before (2) and reuse_swap_page() doesn't
+	    call delete_from_swap_cache().
+	(D) The page has been charged before (2) and reuse_swap_page() calls
+	    delete_from_swap_cache().
+
+	    memory.usage/memsw.usage changes to this page/swp_entry will be
+	 Case          (A)      (B)       (C)     (D)
+         Event
+       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
+          ===========================================
+          (3)        +1/+1    +1/+1     +1/+1   +1/+1
+          (4)          -       0/ 0       -     -1/ 0
+          (5)         0/ 1     0/-1     -1/-1    0/ 0
+          (6)          -        -         -      0/-1
+          ===========================================
+       Result         1/ 1     1/1       1/ 1    1/ 1
+
+       In any cases, charges to this page should be 1/ 1.
 
 	4.2 Swap-out.
 	At swap-out, typical state transition is below.
Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
+++ mmotm-2.6.28-Dec12/mm/memcontrol.c
@@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
 	/*
 	 * Now swap is on-memory. This means this page may be
 	 * counted both as mem and swap....double count.
-	 * Fix it by uncharging from memsw. This SwapCache is stable
-	 * because we're still under lock_page().
+	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
+	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
+	 * may call delete_from_swap_cache() before reach here.
 	 */
-	if (do_swap_account) {
+	if (do_swap_account && PageSwapCache(page)) {
 		swp_entry_t ent = {.val = page_private(page)};
 		struct mem_cgroup *memcg;
 		memcg = swap_cgroup_record(ent, NULL);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
  2008-12-15  7:07             ` KAMEZAWA Hiroyuki
@ 2008-12-15  8:37               ` Balbir Singh
  -1 siblings, 0 replies; 20+ messages in thread
From: Balbir Singh @ 2008-12-15  8:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-15 16:07:51]:

> 
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Fix swapin charge operation of memcg.
> 
> Now, memcg has hooks to swap-out operation and checks SwapCache is really
> unused or not. That check depends on contents of struct page.
> I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> still-in-use.
> 
> Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> of any rmap. Then, in followinig sequence
> 
> 	(Page fault with WRITE)
> 	try_charge() (charge += PAGESIZE)
> 	commit_charge() (Check page_cgroup is used or not..)
> 	reuse_swap_page()
> 		-> delete_from_swapcache()
> 			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> 	......
> New charge is uncharged soon....
> To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
> By this,
> 
> 	try_charge()		(usage += PAGESIZE)
> 	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
> 	commit_charge()		(If page_cgroup is not marked as PCG_USED,
> 				 add new charge.)
> Accounting will be correct.
> 
> Changelog (v2) -> (v3)
>   - fixed invalid charge to swp_entry==0.
>   - updated documentation.
> Changelog (v1) -> (v2)
>   - fixed comment.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  Documentation/controllers/memcg_test.txt |   41 +++++++++++++++++++++++++++----
>  mm/memcontrol.c                          |    7 +++--
>  mm/memory.c                              |   11 ++++----
>  3 files changed, 46 insertions(+), 13 deletions(-)
> 
> Index: mmotm-2.6.28-Dec12/mm/memory.c
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec12/mm/memory.c
> @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
>  	 * while the page is counted on swap but not yet in mapcount i.e.
>  	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
>  	 * must be called after the swap_free(), or it will never succeed.
> -	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> -	 * in page->private, must be called before reuse_swap_page(),
> -	 * which may delete_from_swap_cache().
> +	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
> +	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
> +	 * in page->private. In this case, a record in swap_cgroup  is silently
> +	 * discarded at swap_free().
>  	 */
> 
> -	mem_cgroup_commit_charge_swapin(page, ptr);
>  	inc_mm_counter(mm, anon_rss);
>  	pte = mk_pte(page, vma->vm_page_prot);
>  	if (write_access && reuse_swap_page(page)) {
>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>  		write_access = 0;
>  	}
> -

Removal of unassociated lines, not sure if that is a good practice.

>  	flush_icache_page(vma, page);
>  	set_pte_at(mm, address, page_table, pte);
>  	page_add_anon_rmap(page, vma, address);
> +	/* It's better to call commit-charge after rmap is established */
> +	mem_cgroup_commit_charge_swapin(page, ptr);
> 

Yes, it does make sense

>  	swap_free(entry);
>  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
> +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> @@ -1,6 +1,6 @@
>  Memory Resource Controller(Memcg)  Implementation Memo.
> -Last Updated: 2008/12/10
> -Base Kernel Version: based on 2.6.28-rc7-mm.
> +Last Updated: 2008/12/15
> +Base Kernel Version: based on 2.6.28-rc8-mm.
> 
>  Because VM is getting complex (one of reasons is memcg...), memcg's behavior
>  is complex. This is a document for memcg's internal behavior.
> @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
>  	(b) If the SwapCache has been mapped by processes, it has been
>  	    charged already.
> 
> -	In case (a), we charge it. In case (b), we don't charge it.
> -	(But racy state between (a) and (b) exists. We do check it.)
> -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> +	This swap-in is one of the most complicated work. In do_swap_page(),
> +	following events occur when pte is unchanged.
> +
> +	(1) the page (SwapCache) is looked up.
> +	(2) lock_page()
> +	(3) try_charge_swapin()
> +	(4) reuse_swap_page() (may call delete_swap_cache())
> +	(5) commit_charge_swapin()
> +	(6) swap_free().
> +
> +	Considering following situation for example.
> +
> +	(A) The page has not been charged before (2) and reuse_swap_page()
> +	    doesn't call delete_from_swap_cache().
> +	(B) The page has not been charged before (2) and reuse_swap_page()
> +	    calls delete_from_swap_cache().
> +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> +	    call delete_from_swap_cache().
> +	(D) The page has been charged before (2) and reuse_swap_page() calls
> +	    delete_from_swap_cache().
> +
> +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> +	 Case          (A)      (B)       (C)     (D)
> +         Event
> +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> +          ===========================================
> +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> +          (4)          -       0/ 0       -     -1/ 0
> +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> +          (6)          -        -         -      0/-1
> +          ===========================================
> +       Result         1/ 1     1/1       1/ 1    1/ 1
> +
> +       In any cases, charges to this page should be 1/ 1.
> 

The documentation patch failed to apply for me

>  	4.2 Swap-out.
>  	At swap-out, typical state transition is below.
> Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
> +++ mmotm-2.6.28-Dec12/mm/memcontrol.c
> @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> -	 * Fix it by uncharging from memsw. This SwapCache is stable
> -	 * because we're still under lock_page().
> +	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> +	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> +	 * may call delete_from_swap_cache() before reach here.
>  	 */
> -	if (do_swap_account) {
> +	if (do_swap_account && PageSwapCache(page)) {
>  		swp_entry_t ent = {.val = page_private(page)};
>  		struct mem_cgroup *memcg;
>  		memcg = swap_cgroup_record(ent, NULL);
> 
>


Looks good to me

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> 
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
@ 2008-12-15  8:37               ` Balbir Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Balbir Singh @ 2008-12-15  8:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-15 16:07:51]:

> 
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Fix swapin charge operation of memcg.
> 
> Now, memcg has hooks to swap-out operation and checks SwapCache is really
> unused or not. That check depends on contents of struct page.
> I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> still-in-use.
> 
> Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> of any rmap. Then, in followinig sequence
> 
> 	(Page fault with WRITE)
> 	try_charge() (charge += PAGESIZE)
> 	commit_charge() (Check page_cgroup is used or not..)
> 	reuse_swap_page()
> 		-> delete_from_swapcache()
> 			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> 	......
> New charge is uncharged soon....
> To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
> By this,
> 
> 	try_charge()		(usage += PAGESIZE)
> 	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
> 	commit_charge()		(If page_cgroup is not marked as PCG_USED,
> 				 add new charge.)
> Accounting will be correct.
> 
> Changelog (v2) -> (v3)
>   - fixed invalid charge to swp_entry==0.
>   - updated documentation.
> Changelog (v1) -> (v2)
>   - fixed comment.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  Documentation/controllers/memcg_test.txt |   41 +++++++++++++++++++++++++++----
>  mm/memcontrol.c                          |    7 +++--
>  mm/memory.c                              |   11 ++++----
>  3 files changed, 46 insertions(+), 13 deletions(-)
> 
> Index: mmotm-2.6.28-Dec12/mm/memory.c
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec12/mm/memory.c
> @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
>  	 * while the page is counted on swap but not yet in mapcount i.e.
>  	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
>  	 * must be called after the swap_free(), or it will never succeed.
> -	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> -	 * in page->private, must be called before reuse_swap_page(),
> -	 * which may delete_from_swap_cache().
> +	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
> +	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
> +	 * in page->private. In this case, a record in swap_cgroup  is silently
> +	 * discarded at swap_free().
>  	 */
> 
> -	mem_cgroup_commit_charge_swapin(page, ptr);
>  	inc_mm_counter(mm, anon_rss);
>  	pte = mk_pte(page, vma->vm_page_prot);
>  	if (write_access && reuse_swap_page(page)) {
>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>  		write_access = 0;
>  	}
> -

Removal of unassociated lines, not sure if that is a good practice.

>  	flush_icache_page(vma, page);
>  	set_pte_at(mm, address, page_table, pte);
>  	page_add_anon_rmap(page, vma, address);
> +	/* It's better to call commit-charge after rmap is established */
> +	mem_cgroup_commit_charge_swapin(page, ptr);
> 

Yes, it does make sense

>  	swap_free(entry);
>  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
> +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> @@ -1,6 +1,6 @@
>  Memory Resource Controller(Memcg)  Implementation Memo.
> -Last Updated: 2008/12/10
> -Base Kernel Version: based on 2.6.28-rc7-mm.
> +Last Updated: 2008/12/15
> +Base Kernel Version: based on 2.6.28-rc8-mm.
> 
>  Because VM is getting complex (one of reasons is memcg...), memcg's behavior
>  is complex. This is a document for memcg's internal behavior.
> @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
>  	(b) If the SwapCache has been mapped by processes, it has been
>  	    charged already.
> 
> -	In case (a), we charge it. In case (b), we don't charge it.
> -	(But racy state between (a) and (b) exists. We do check it.)
> -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> +	This swap-in is one of the most complicated work. In do_swap_page(),
> +	following events occur when pte is unchanged.
> +
> +	(1) the page (SwapCache) is looked up.
> +	(2) lock_page()
> +	(3) try_charge_swapin()
> +	(4) reuse_swap_page() (may call delete_swap_cache())
> +	(5) commit_charge_swapin()
> +	(6) swap_free().
> +
> +	Considering following situation for example.
> +
> +	(A) The page has not been charged before (2) and reuse_swap_page()
> +	    doesn't call delete_from_swap_cache().
> +	(B) The page has not been charged before (2) and reuse_swap_page()
> +	    calls delete_from_swap_cache().
> +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> +	    call delete_from_swap_cache().
> +	(D) The page has been charged before (2) and reuse_swap_page() calls
> +	    delete_from_swap_cache().
> +
> +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> +	 Case          (A)      (B)       (C)     (D)
> +         Event
> +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> +          ===========================================
> +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> +          (4)          -       0/ 0       -     -1/ 0
> +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> +          (6)          -        -         -      0/-1
> +          ===========================================
> +       Result         1/ 1     1/1       1/ 1    1/ 1
> +
> +       In any cases, charges to this page should be 1/ 1.
> 

The documentation patch failed to apply for me

>  	4.2 Swap-out.
>  	At swap-out, typical state transition is below.
> Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
> +++ mmotm-2.6.28-Dec12/mm/memcontrol.c
> @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> -	 * Fix it by uncharging from memsw. This SwapCache is stable
> -	 * because we're still under lock_page().
> +	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> +	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> +	 * may call delete_from_swap_cache() before reach here.
>  	 */
> -	if (do_swap_account) {
> +	if (do_swap_account && PageSwapCache(page)) {
>  		swp_entry_t ent = {.val = page_private(page)};
>  		struct mem_cgroup *memcg;
>  		memcg = swap_cgroup_record(ent, NULL);
> 
>


Looks good to me

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> 
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> 

-- 
	Balbir

----- End forwarded message -----
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
  2008-12-15  8:37               ` Balbir Singh
@ 2008-12-15  8:40                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-15  8:40 UTC (permalink / raw)
  To: balbir; +Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, akpm

On Mon, 15 Dec 2008 14:07:26 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-15 16:07:51]:
> 
> > 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Fix swapin charge operation of memcg.
> > 
> > Now, memcg has hooks to swap-out operation and checks SwapCache is really
> > unused or not. That check depends on contents of struct page.
> > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> > still-in-use.
> > 
> > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> > of any rmap. Then, in followinig sequence
> > 
> > 	(Page fault with WRITE)
> > 	try_charge() (charge += PAGESIZE)
> > 	commit_charge() (Check page_cgroup is used or not..)
> > 	reuse_swap_page()
> > 		-> delete_from_swapcache()
> > 			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> > 	......
> > New charge is uncharged soon....
> > To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
> > By this,
> > 
> > 	try_charge()		(usage += PAGESIZE)
> > 	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
> > 	commit_charge()		(If page_cgroup is not marked as PCG_USED,
> > 				 add new charge.)
> > Accounting will be correct.
> > 
> > Changelog (v2) -> (v3)
> >   - fixed invalid charge to swp_entry==0.
> >   - updated documentation.
> > Changelog (v1) -> (v2)
> >   - fixed comment.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  Documentation/controllers/memcg_test.txt |   41 +++++++++++++++++++++++++++----
> >  mm/memcontrol.c                          |    7 +++--
> >  mm/memory.c                              |   11 ++++----
> >  3 files changed, 46 insertions(+), 13 deletions(-)
> > 
> > Index: mmotm-2.6.28-Dec12/mm/memory.c
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> > +++ mmotm-2.6.28-Dec12/mm/memory.c
> > @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
> >  	 * while the page is counted on swap but not yet in mapcount i.e.
> >  	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
> >  	 * must be called after the swap_free(), or it will never succeed.
> > -	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> > -	 * in page->private, must be called before reuse_swap_page(),
> > -	 * which may delete_from_swap_cache().
> > +	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
> > +	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
> > +	 * in page->private. In this case, a record in swap_cgroup  is silently
> > +	 * discarded at swap_free().
> >  	 */
> > 
> > -	mem_cgroup_commit_charge_swapin(page, ptr);
> >  	inc_mm_counter(mm, anon_rss);
> >  	pte = mk_pte(page, vma->vm_page_prot);
> >  	if (write_access && reuse_swap_page(page)) {
> >  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> >  		write_access = 0;
> >  	}
> > -
> 
> Removal of unassociated lines, not sure if that is a good practice.
> 
my mistake...

> >  	flush_icache_page(vma, page);
> >  	set_pte_at(mm, address, page_table, pte);
> >  	page_add_anon_rmap(page, vma, address);
> > +	/* It's better to call commit-charge after rmap is established */
> > +	mem_cgroup_commit_charge_swapin(page, ptr);
> > 
> 
> Yes, it does make sense
> 
> >  	swap_free(entry);
> >  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> > Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
> > +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> > @@ -1,6 +1,6 @@
> >  Memory Resource Controller(Memcg)  Implementation Memo.
> > -Last Updated: 2008/12/10
> > -Base Kernel Version: based on 2.6.28-rc7-mm.
> > +Last Updated: 2008/12/15
> > +Base Kernel Version: based on 2.6.28-rc8-mm.
> > 
> >  Because VM is getting complex (one of reasons is memcg...), memcg's behavior
> >  is complex. This is a document for memcg's internal behavior.
> > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> >  	(b) If the SwapCache has been mapped by processes, it has been
> >  	    charged already.
> > 
> > -	In case (a), we charge it. In case (b), we don't charge it.
> > -	(But racy state between (a) and (b) exists. We do check it.)
> > -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> > +	This swap-in is one of the most complicated work. In do_swap_page(),
> > +	following events occur when pte is unchanged.
> > +
> > +	(1) the page (SwapCache) is looked up.
> > +	(2) lock_page()
> > +	(3) try_charge_swapin()
> > +	(4) reuse_swap_page() (may call delete_swap_cache())
> > +	(5) commit_charge_swapin()
> > +	(6) swap_free().
> > +
> > +	Considering following situation for example.
> > +
> > +	(A) The page has not been charged before (2) and reuse_swap_page()
> > +	    doesn't call delete_from_swap_cache().
> > +	(B) The page has not been charged before (2) and reuse_swap_page()
> > +	    calls delete_from_swap_cache().
> > +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> > +	    call delete_from_swap_cache().
> > +	(D) The page has been charged before (2) and reuse_swap_page() calls
> > +	    delete_from_swap_cache().
> > +
> > +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> > +	 Case          (A)      (B)       (C)     (D)
> > +         Event
> > +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> > +          ===========================================
> > +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> > +          (4)          -       0/ 0       -     -1/ 0
> > +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> > +          (6)          -        -         -      0/-1
> > +          ===========================================
> > +       Result         1/ 1     1/1       1/ 1    1/ 1
> > +
> > +       In any cases, charges to this page should be 1/ 1.
> > 
> 
> The documentation patch failed to apply for me
> 
Hmm... I'll check my queue again.

Thanks,
-Kame

> >  	4.2 Swap-out.
> >  	At swap-out, typical state transition is below.
> > Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
> > +++ mmotm-2.6.28-Dec12/mm/memcontrol.c
> > @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
> >  	/*
> >  	 * Now swap is on-memory. This means this page may be
> >  	 * counted both as mem and swap....double count.
> > -	 * Fix it by uncharging from memsw. This SwapCache is stable
> > -	 * because we're still under lock_page().
> > +	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> > +	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> > +	 * may call delete_from_swap_cache() before reach here.
> >  	 */
> > -	if (do_swap_account) {
> > +	if (do_swap_account && PageSwapCache(page)) {
> >  		swp_entry_t ent = {.val = page_private(page)};
> >  		struct mem_cgroup *memcg;
> >  		memcg = swap_cgroup_record(ent, NULL);
> > 
> >
> 
> 
> Looks good to me
> 
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> 
> Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> 
> 
> -- 
> 	Balbir
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
@ 2008-12-15  8:40                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-15  8:40 UTC (permalink / raw)
  To: balbir; +Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, akpm

On Mon, 15 Dec 2008 14:07:26 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-15 16:07:51]:
> 
> > 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Fix swapin charge operation of memcg.
> > 
> > Now, memcg has hooks to swap-out operation and checks SwapCache is really
> > unused or not. That check depends on contents of struct page.
> > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> > still-in-use.
> > 
> > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> > of any rmap. Then, in followinig sequence
> > 
> > 	(Page fault with WRITE)
> > 	try_charge() (charge += PAGESIZE)
> > 	commit_charge() (Check page_cgroup is used or not..)
> > 	reuse_swap_page()
> > 		-> delete_from_swapcache()
> > 			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> > 	......
> > New charge is uncharged soon....
> > To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
> > By this,
> > 
> > 	try_charge()		(usage += PAGESIZE)
> > 	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
> > 	commit_charge()		(If page_cgroup is not marked as PCG_USED,
> > 				 add new charge.)
> > Accounting will be correct.
> > 
> > Changelog (v2) -> (v3)
> >   - fixed invalid charge to swp_entry==0.
> >   - updated documentation.
> > Changelog (v1) -> (v2)
> >   - fixed comment.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  Documentation/controllers/memcg_test.txt |   41 +++++++++++++++++++++++++++----
> >  mm/memcontrol.c                          |    7 +++--
> >  mm/memory.c                              |   11 ++++----
> >  3 files changed, 46 insertions(+), 13 deletions(-)
> > 
> > Index: mmotm-2.6.28-Dec12/mm/memory.c
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> > +++ mmotm-2.6.28-Dec12/mm/memory.c
> > @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
> >  	 * while the page is counted on swap but not yet in mapcount i.e.
> >  	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
> >  	 * must be called after the swap_free(), or it will never succeed.
> > -	 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> > -	 * in page->private, must be called before reuse_swap_page(),
> > -	 * which may delete_from_swap_cache().
> > +	 * Because delete_from_swap_page() may be called by reuse_swap_page(),
> > +	 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
> > +	 * in page->private. In this case, a record in swap_cgroup  is silently
> > +	 * discarded at swap_free().
> >  	 */
> > 
> > -	mem_cgroup_commit_charge_swapin(page, ptr);
> >  	inc_mm_counter(mm, anon_rss);
> >  	pte = mk_pte(page, vma->vm_page_prot);
> >  	if (write_access && reuse_swap_page(page)) {
> >  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> >  		write_access = 0;
> >  	}
> > -
> 
> Removal of unassociated lines, not sure if that is a good practice.
> 
my mistake...

> >  	flush_icache_page(vma, page);
> >  	set_pte_at(mm, address, page_table, pte);
> >  	page_add_anon_rmap(page, vma, address);
> > +	/* It's better to call commit-charge after rmap is established */
> > +	mem_cgroup_commit_charge_swapin(page, ptr);
> > 
> 
> Yes, it does make sense
> 
> >  	swap_free(entry);
> >  	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> > Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
> > +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> > @@ -1,6 +1,6 @@
> >  Memory Resource Controller(Memcg)  Implementation Memo.
> > -Last Updated: 2008/12/10
> > -Base Kernel Version: based on 2.6.28-rc7-mm.
> > +Last Updated: 2008/12/15
> > +Base Kernel Version: based on 2.6.28-rc8-mm.
> > 
> >  Because VM is getting complex (one of reasons is memcg...), memcg's behavior
> >  is complex. This is a document for memcg's internal behavior.
> > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> >  	(b) If the SwapCache has been mapped by processes, it has been
> >  	    charged already.
> > 
> > -	In case (a), we charge it. In case (b), we don't charge it.
> > -	(But racy state between (a) and (b) exists. We do check it.)
> > -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> > +	This swap-in is one of the most complicated work. In do_swap_page(),
> > +	following events occur when pte is unchanged.
> > +
> > +	(1) the page (SwapCache) is looked up.
> > +	(2) lock_page()
> > +	(3) try_charge_swapin()
> > +	(4) reuse_swap_page() (may call delete_swap_cache())
> > +	(5) commit_charge_swapin()
> > +	(6) swap_free().
> > +
> > +	Considering following situation for example.
> > +
> > +	(A) The page has not been charged before (2) and reuse_swap_page()
> > +	    doesn't call delete_from_swap_cache().
> > +	(B) The page has not been charged before (2) and reuse_swap_page()
> > +	    calls delete_from_swap_cache().
> > +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> > +	    call delete_from_swap_cache().
> > +	(D) The page has been charged before (2) and reuse_swap_page() calls
> > +	    delete_from_swap_cache().
> > +
> > +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> > +	 Case          (A)      (B)       (C)     (D)
> > +         Event
> > +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> > +          ===========================================
> > +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> > +          (4)          -       0/ 0       -     -1/ 0
> > +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> > +          (6)          -        -         -      0/-1
> > +          ===========================================
> > +       Result         1/ 1     1/1       1/ 1    1/ 1
> > +
> > +       In any cases, charges to this page should be 1/ 1.
> > 
> 
> The documentation patch failed to apply for me
> 
Hmm... I'll check my queue again.

Thanks,
-Kame

> >  	4.2 Swap-out.
> >  	At swap-out, typical state transition is below.
> > Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
> > +++ mmotm-2.6.28-Dec12/mm/memcontrol.c
> > @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
> >  	/*
> >  	 * Now swap is on-memory. This means this page may be
> >  	 * counted both as mem and swap....double count.
> > -	 * Fix it by uncharging from memsw. This SwapCache is stable
> > -	 * because we're still under lock_page().
> > +	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> > +	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> > +	 * may call delete_from_swap_cache() before reach here.
> >  	 */
> > -	if (do_swap_account) {
> > +	if (do_swap_account && PageSwapCache(page)) {
> >  		swp_entry_t ent = {.val = page_private(page)};
> >  		struct mem_cgroup *memcg;
> >  		memcg = swap_cgroup_record(ent, NULL);
> > 
> >
> 
> 
> Looks good to me
> 
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> 
> Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> 
> 
> -- 
> 	Balbir
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
  2008-12-15  7:07             ` KAMEZAWA Hiroyuki
@ 2008-12-15 10:34               ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2008-12-15 10:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm

On Mon, 15 Dec 2008, KAMEZAWA Hiroyuki wrote:
> 
> Fix swapin charge operation of memcg.
> 
> @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> -	 * Fix it by uncharging from memsw. This SwapCache is stable
> -	 * because we're still under lock_page().
> +	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> +	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> +	 * may call delete_from_swap_cache() before reach here.
>  	 */
> -	if (do_swap_account) {
> +	if (do_swap_account && PageSwapCache(page)) {
>  		swp_entry_t ent = {.val = page_private(page)};
>  		struct mem_cgroup *memcg;
>  		memcg = swap_cgroup_record(ent, NULL);

Yes, that addition looks good to me.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)
@ 2008-12-15 10:34               ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2008-12-15 10:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm

On Mon, 15 Dec 2008, KAMEZAWA Hiroyuki wrote:
> 
> Fix swapin charge operation of memcg.
> 
> @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> -	 * Fix it by uncharging from memsw. This SwapCache is stable
> -	 * because we're still under lock_page().
> +	 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> +	 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> +	 * may call delete_from_swap_cache() before reach here.
>  	 */
> -	if (do_swap_account) {
> +	if (do_swap_account && PageSwapCache(page)) {
>  		swp_entry_t ent = {.val = page_private(page)};
>  		struct mem_cgroup *memcg;
>  		memcg = swap_cgroup_record(ent, NULL);

Yes, that addition looks good to me.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3))
  2008-12-15  7:07             ` KAMEZAWA Hiroyuki
@ 2008-12-16  4:02               ` Daisuke Nishimura
  -1 siblings, 0 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2008-12-16  4:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-mm, linux-kernel, balbir, akpm, nishimura

Sorry for late reply.

> @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
>  	(b) If the SwapCache has been mapped by processes, it has been
>  	    charged already.
>  
> -	In case (a), we charge it. In case (b), we don't charge it.
> -	(But racy state between (a) and (b) exists. We do check it.)
> -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> +	This swap-in is one of the most complicated work. In do_swap_page(),
> +	following events occur when pte is unchanged.
> +
> +	(1) the page (SwapCache) is looked up.
> +	(2) lock_page()
> +	(3) try_charge_swapin()
> +	(4) reuse_swap_page() (may call delete_swap_cache())
> +	(5) commit_charge_swapin()
> +	(6) swap_free().
> +
> +	Considering following situation for example.
> +
> +	(A) The page has not been charged before (2) and reuse_swap_page()
> +	    doesn't call delete_from_swap_cache().
> +	(B) The page has not been charged before (2) and reuse_swap_page()
> +	    calls delete_from_swap_cache().
> +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> +	    call delete_from_swap_cache().
> +	(D) The page has been charged before (2) and reuse_swap_page() calls
> +	    delete_from_swap_cache().
> +
> +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> +	 Case          (A)      (B)       (C)     (D)
> +         Event
> +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> +          ===========================================
> +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> +          (4)          -       0/ 0       -     -1/ 0
> +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> +          (6)          -        -         -      0/-1
> +          ===========================================
> +       Result         1/ 1     1/1       1/ 1    1/ 1
> +
> +       In any cases, charges to this page should be 1/ 1.
>  
I've verified that charges will result in valid values by tracing source code
in all of these cases, but in case of (B) I don't think commit_charge_swapin
does memsw-- because PageSwapCache has been cleared already. swap_free does
memsw-- in this case.

I attached a fix patch.

Thanks,
Daisuke Nishimura.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

fix for documentation.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 Documentation/controllers/memcg_test.txt |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
index 3c1458a..08d4d3e 100644
--- a/Documentation/controllers/memcg_test.txt
+++ b/Documentation/controllers/memcg_test.txt
@@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
           ===========================================
           (3)        +1/+1    +1/+1     +1/+1   +1/+1
           (4)          -       0/ 0       -     -1/ 0
-          (5)         0/ 1     0/-1     -1/-1    0/ 0
-          (6)          -        -         -      0/-1
+          (5)         0/-1     0/ 0     -1/-1    0/ 0
+          (6)          -       0/-1       -      0/-1
           ===========================================
-       Result         1/ 1     1/1       1/ 1    1/ 1
+       Result         1/ 1     1/ 1      1/ 1    1/ 1
 
        In any cases, charges to this page should be 1/ 1.
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3))
@ 2008-12-16  4:02               ` Daisuke Nishimura
  0 siblings, 0 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2008-12-16  4:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-mm, linux-kernel, balbir, akpm, nishimura

Sorry for late reply.

> @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
>  	(b) If the SwapCache has been mapped by processes, it has been
>  	    charged already.
>  
> -	In case (a), we charge it. In case (b), we don't charge it.
> -	(But racy state between (a) and (b) exists. We do check it.)
> -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> +	This swap-in is one of the most complicated work. In do_swap_page(),
> +	following events occur when pte is unchanged.
> +
> +	(1) the page (SwapCache) is looked up.
> +	(2) lock_page()
> +	(3) try_charge_swapin()
> +	(4) reuse_swap_page() (may call delete_swap_cache())
> +	(5) commit_charge_swapin()
> +	(6) swap_free().
> +
> +	Considering following situation for example.
> +
> +	(A) The page has not been charged before (2) and reuse_swap_page()
> +	    doesn't call delete_from_swap_cache().
> +	(B) The page has not been charged before (2) and reuse_swap_page()
> +	    calls delete_from_swap_cache().
> +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> +	    call delete_from_swap_cache().
> +	(D) The page has been charged before (2) and reuse_swap_page() calls
> +	    delete_from_swap_cache().
> +
> +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> +	 Case          (A)      (B)       (C)     (D)
> +         Event
> +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> +          ===========================================
> +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> +          (4)          -       0/ 0       -     -1/ 0
> +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> +          (6)          -        -         -      0/-1
> +          ===========================================
> +       Result         1/ 1     1/1       1/ 1    1/ 1
> +
> +       In any cases, charges to this page should be 1/ 1.
>  
I've verified that charges will result in valid values by tracing source code
in all of these cases, but in case of (B) I don't think commit_charge_swapin
does memsw-- because PageSwapCache has been cleared already. swap_free does
memsw-- in this case.

I attached a fix patch.

Thanks,
Daisuke Nishimura.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

fix for documentation.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 Documentation/controllers/memcg_test.txt |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
index 3c1458a..08d4d3e 100644
--- a/Documentation/controllers/memcg_test.txt
+++ b/Documentation/controllers/memcg_test.txt
@@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
           ===========================================
           (3)        +1/+1    +1/+1     +1/+1   +1/+1
           (4)          -       0/ 0       -     -1/ 0
-          (5)         0/ 1     0/-1     -1/-1    0/ 0
-          (6)          -        -         -      0/-1
+          (5)         0/-1     0/ 0     -1/-1    0/ 0
+          (6)          -       0/-1       -      0/-1
           ===========================================
-       Result         1/ 1     1/1       1/ 1    1/ 1
+       Result         1/ 1     1/ 1      1/ 1    1/ 1
 
        In any cases, charges to this page should be 1/ 1.
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3))
  2008-12-16  4:02               ` Daisuke Nishimura
@ 2008-12-16  4:53                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-16  4:53 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: Hugh Dickins, linux-mm, linux-kernel, balbir, akpm

On Tue, 16 Dec 2008 13:02:30 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Sorry for late reply.
> 
> > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> >  	(b) If the SwapCache has been mapped by processes, it has been
> >  	    charged already.
> >  
> > -	In case (a), we charge it. In case (b), we don't charge it.
> > -	(But racy state between (a) and (b) exists. We do check it.)
> > -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> > +	This swap-in is one of the most complicated work. In do_swap_page(),
> > +	following events occur when pte is unchanged.
> > +
> > +	(1) the page (SwapCache) is looked up.
> > +	(2) lock_page()
> > +	(3) try_charge_swapin()
> > +	(4) reuse_swap_page() (may call delete_swap_cache())
> > +	(5) commit_charge_swapin()
> > +	(6) swap_free().
> > +
> > +	Considering following situation for example.
> > +
> > +	(A) The page has not been charged before (2) and reuse_swap_page()
> > +	    doesn't call delete_from_swap_cache().
> > +	(B) The page has not been charged before (2) and reuse_swap_page()
> > +	    calls delete_from_swap_cache().
> > +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> > +	    call delete_from_swap_cache().
> > +	(D) The page has been charged before (2) and reuse_swap_page() calls
> > +	    delete_from_swap_cache().
> > +
> > +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> > +	 Case          (A)      (B)       (C)     (D)
> > +         Event
> > +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> > +          ===========================================
> > +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> > +          (4)          -       0/ 0       -     -1/ 0
> > +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> > +          (6)          -        -         -      0/-1
> > +          ===========================================
> > +       Result         1/ 1     1/1       1/ 1    1/ 1
> > +
> > +       In any cases, charges to this page should be 1/ 1.
> >  
> I've verified that charges will result in valid values by tracing source code
> in all of these cases, but in case of (B) I don't think commit_charge_swapin
> does memsw-- because PageSwapCache has been cleared already. swap_free does
> memsw-- in this case.
> 
> I attached a fix patch.
> 
you're right. it comes from this version's new change....sorry.

Acked-by; KAMEZAWA Hiroyuki <kamezaw.hiroyu@jp.fujitsu.com>


> Thanks,
> Daisuke Nishimura.
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> fix for documentation.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  Documentation/controllers/memcg_test.txt |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
> index 3c1458a..08d4d3e 100644
> --- a/Documentation/controllers/memcg_test.txt
> +++ b/Documentation/controllers/memcg_test.txt
> @@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
>            ===========================================
>            (3)        +1/+1    +1/+1     +1/+1   +1/+1
>            (4)          -       0/ 0       -     -1/ 0
> -          (5)         0/ 1     0/-1     -1/-1    0/ 0
> -          (6)          -        -         -      0/-1
> +          (5)         0/-1     0/ 0     -1/-1    0/ 0
> +          (6)          -       0/-1       -      0/-1
>            ===========================================
> -       Result         1/ 1     1/1       1/ 1    1/ 1
> +       Result         1/ 1     1/ 1      1/ 1    1/ 1
>  
>         In any cases, charges to this page should be 1/ 1.
>  
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3))
@ 2008-12-16  4:53                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-16  4:53 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: Hugh Dickins, linux-mm, linux-kernel, balbir, akpm

On Tue, 16 Dec 2008 13:02:30 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Sorry for late reply.
> 
> > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> >  	(b) If the SwapCache has been mapped by processes, it has been
> >  	    charged already.
> >  
> > -	In case (a), we charge it. In case (b), we don't charge it.
> > -	(But racy state between (a) and (b) exists. We do check it.)
> > -	At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> > +	This swap-in is one of the most complicated work. In do_swap_page(),
> > +	following events occur when pte is unchanged.
> > +
> > +	(1) the page (SwapCache) is looked up.
> > +	(2) lock_page()
> > +	(3) try_charge_swapin()
> > +	(4) reuse_swap_page() (may call delete_swap_cache())
> > +	(5) commit_charge_swapin()
> > +	(6) swap_free().
> > +
> > +	Considering following situation for example.
> > +
> > +	(A) The page has not been charged before (2) and reuse_swap_page()
> > +	    doesn't call delete_from_swap_cache().
> > +	(B) The page has not been charged before (2) and reuse_swap_page()
> > +	    calls delete_from_swap_cache().
> > +	(C) The page has been charged before (2) and reuse_swap_page() doesn't
> > +	    call delete_from_swap_cache().
> > +	(D) The page has been charged before (2) and reuse_swap_page() calls
> > +	    delete_from_swap_cache().
> > +
> > +	    memory.usage/memsw.usage changes to this page/swp_entry will be
> > +	 Case          (A)      (B)       (C)     (D)
> > +         Event
> > +       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
> > +          ===========================================
> > +          (3)        +1/+1    +1/+1     +1/+1   +1/+1
> > +          (4)          -       0/ 0       -     -1/ 0
> > +          (5)         0/ 1     0/-1     -1/-1    0/ 0
> > +          (6)          -        -         -      0/-1
> > +          ===========================================
> > +       Result         1/ 1     1/1       1/ 1    1/ 1
> > +
> > +       In any cases, charges to this page should be 1/ 1.
> >  
> I've verified that charges will result in valid values by tracing source code
> in all of these cases, but in case of (B) I don't think commit_charge_swapin
> does memsw-- because PageSwapCache has been cleared already. swap_free does
> memsw-- in this case.
> 
> I attached a fix patch.
> 
you're right. it comes from this version's new change....sorry.

Acked-by; KAMEZAWA Hiroyuki <kamezaw.hiroyu@jp.fujitsu.com>


> Thanks,
> Daisuke Nishimura.
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> fix for documentation.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  Documentation/controllers/memcg_test.txt |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
> index 3c1458a..08d4d3e 100644
> --- a/Documentation/controllers/memcg_test.txt
> +++ b/Documentation/controllers/memcg_test.txt
> @@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
>            ===========================================
>            (3)        +1/+1    +1/+1     +1/+1   +1/+1
>            (4)          -       0/ 0       -     -1/ 0
> -          (5)         0/ 1     0/-1     -1/-1    0/ 0
> -          (6)          -        -         -      0/-1
> +          (5)         0/-1     0/ 0     -1/-1    0/ 0
> +          (6)          -       0/-1       -      0/-1
>            ===========================================
> -       Result         1/ 1     1/1       1/ 1    1/ 1
> +       Result         1/ 1     1/ 1      1/ 1    1/ 1
>  
>         In any cases, charges to this page should be 1/ 1.
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-12-16  4:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-12  8:29 [BUGFIX][PATCH mmotm] memcg fix swap accounting leak KAMEZAWA Hiroyuki
2008-12-12  9:43 ` Daisuke Nishimura
2008-12-12 11:16   ` KAMEZAWA Hiroyuki
2008-12-12 11:16     ` KAMEZAWA Hiroyuki
2008-12-13  7:03     ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki
2008-12-13  9:49       ` Hugh Dickins
2008-12-13 10:27         ` KAMEZAWA Hiroyuki
2008-12-15  7:07           ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) KAMEZAWA Hiroyuki
2008-12-15  7:07             ` KAMEZAWA Hiroyuki
2008-12-15  8:37             ` Balbir Singh
2008-12-15  8:37               ` Balbir Singh
2008-12-15  8:40               ` KAMEZAWA Hiroyuki
2008-12-15  8:40                 ` KAMEZAWA Hiroyuki
2008-12-15 10:34             ` Hugh Dickins
2008-12-15 10:34               ` Hugh Dickins
2008-12-16  4:02             ` [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) Daisuke Nishimura
2008-12-16  4:02               ` Daisuke Nishimura
2008-12-16  4:53               ` KAMEZAWA Hiroyuki
2008-12-16  4:53                 ` KAMEZAWA Hiroyuki
2008-12-13 10:38         ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.