All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: nishimura@mxp.nes.nec.co.jp,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 8/10] memcg: clean up charge/uncharge anon
Date: Tue, 29 Sep 2009 11:18:28 +0900	[thread overview]
Message-ID: <20090929111828.6f9148d6.nishimura@mxp.nes.nec.co.jp> (raw)
In-Reply-To: <20090929102653.612cc2a4.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, 29 Sep 2009 10:26:53 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> Thank you for testing.
> 
> On Tue, 29 Sep 2009 09:24:13 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > On Fri, 25 Sep 2009 17:28:50 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > This may need careful review.
> > > 
> > > ==
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > In old codes, this function was used for other purposes rather
> > > than charginc new anon pages. But now, this function is (ranamed) and
> > > used only for new pages.
> > > 
> > > For the same kind of reason, ucharge_page() should use VM_BUG_ON().
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  mm/memcontrol.c |   27 +++++++++++++--------------
> > >  1 file changed, 13 insertions(+), 14 deletions(-)
> > > 
> > > Index: temp-mmotm/mm/memcontrol.c
> > > ===================================================================
> > > --- temp-mmotm.orig/mm/memcontrol.c
> > > +++ temp-mmotm/mm/memcontrol.c
> > > @@ -1638,15 +1638,8 @@ int mem_cgroup_newpage_charge(struct pag
> > >  		return 0;
> > >  	if (PageCompound(page))
> > >  		return 0;
> > > -	/*
> > > -	 * If already mapped, we don't have to account.
> > > -	 * If page cache, page->mapping has address_space.
> > > -	 * But page->mapping may have out-of-use anon_vma pointer,
> > > -	 * detecit it by PageAnon() check. newly-mapped-anon's page->mapping
> > > -	 * is NULL.
> > > -  	 */
> > > -	if (page_mapped(page) || (page->mapping && !PageAnon(page)))
> > > -		return 0;
> > > +	/* This function is "newpage_charge" and called right after alloc */
> > > +	VM_BUG_ON(page_mapped(page) || (page->mapping && !PageAnon(page)));
> > >  	if (unlikely(!mm))
> > >  		mm = &init_mm;
> > >  	return mem_cgroup_charge_common(page, mm, gfp_mask,
> > I think this VM_BUG_ON() is vaild.
> > 
> > > @@ -1901,11 +1894,11 @@ unlock_out:
> > >  
> > >  void mem_cgroup_uncharge_page(struct page *page)
> > >  {
> > > -	/* early check. */
> > > -	if (page_mapped(page))
> > > -		return;
> > > -	if (page->mapping && !PageAnon(page))
> > > -		return;
> > > +	/*
> > > + 	 * Called when anonymous page's page->mapcount goes down to zero,
> > > + 	 * or cancel a charge gotten by newpage_charge().
> > > +	 */
> > > +	VM_BUG_ON(page_mapped(page) || (page->mapping && !PageAnon(page)));
> > >  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
> > >  }
> > >  
> > > 
> > I hit this VM_BUG_ON() when I'm testing mmotm-2009-09-25-14-35 + these patches
> > + my latest recharge-at-task-move patches(not posted yet).
> > 
> 
> Ouch.
> 
> 
> > [ 2966.031815] kernel BUG at mm/memcontrol.c:2014!
> > [ 2966.031815] invalid opcode: 0000 [#1] SMP
> > [ 2966.031815] last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:09:00.2/0000:0b:
> > 04.1/irq
> > [ 2966.031815] CPU 2
> > [ 2966.031815] Modules linked in: autofs4 lockd sunrpc iscsi_tcp libiscsi_tcp libiscsi scs
> > i_transport_iscsi dm_mirror dm_multipath video output lp sg ide_cd_mod cdrom serio_raw but
> > ton parport_pc parport e1000 i2c_i801 i2c_core pata_acpi ata_generic pcspkr dm_region_hash
> >  dm_log dm_mod ata_piix libata shpchp megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd u
> > hci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> > [ 2966.031815] Pid: 19677, comm: mmapstress10 Not tainted 2.6.31-mmotm-2009-09-25-14-35-35
> > ace741 #1 Express5800/140Rd-4 [N8100-1065]
> > [ 2966.031815] RIP: 0010:[<ffffffff800efbe2>]  [<ffffffff800efbe2>] mem_cgroup_uncharge_pa
> > ge+0x18/0x28
> > [ 2966.031815] RSP: 0000:ffff880381713da8  EFLAGS: 00010246
> > [ 2966.031815] RAX: 0000000000000000 RBX: ffffea001668fef0 RCX: ffffea001763afe0
> > [ 2966.031815] RDX: 0000000000000080 RSI: 0000000000000008 RDI: ffffea001668fef0
> > [ 2966.031815] RBP: ffff880381713da8 R08: ffffea001763afe0 R09: ffff88039dfafa28
> > [ 2966.031815] R10: ffff880381617b80 R11: 0000000000000000 R12: ffffea001668fef0
> > [ 2966.031815] R13: ffffea001668fef0 R14: 0000000000000008 R15: ffff880381617b80
> > [ 2966.031815] FS:  00007f9a617fa6e0(0000) GS:ffff88002a000000(0000) knlGS:000000000000000
> > 0
> > [ 2966.031815] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 2966.031815] CR2: 000000385021c4b8 CR3: 0000000372107000 CR4: 00000000000006e0
> > [ 2966.031815] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 2966.031815] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [ 2966.031815] Process mmapstress10 (pid: 19677, threadinfo ffff880381712000, task ffff880
> > 3817c2920)
> > [ 2966.031815] Stack:
> > [ 2966.031815]  ffff880381713dc8 ffffffff800d6bff 00000003992ec067 00000003992ec067
> > [ 2966.031815] <0> ffff880381713e58 ffffffff800ce2f3 ffff880381713e18 ffff88038a90c408
> > [ 2966.031815] <0> 000000385021c4b8 ffff88039dfaf6c0 ffffea000168fef0 ffffea0016c91508
> > [ 2966.031815] Call Trace:
> > [ 2966.031815]  [<ffffffff800d6bff>] page_remove_rmap+0x28/0x44
> > [ 2966.031815]  [<ffffffff800ce2f3>] do_wp_page+0x626/0x73c
> > [ 2966.031815]  [<ffffffff8005e9e3>] ? __wake_up_bit+0x2c/0x2e
> > [ 2966.031815]  [<ffffffff800cfbfc>] handle_mm_fault+0x712/0x824
> > [ 2966.031815]  [<ffffffff8034e6d7>] do_page_fault+0x255/0x2e5
> > [ 2966.031815]  [<ffffffff8034c59f>] page_fault+0x1f/0x30
> > [ 2966.031815] Code: 83 7f 18 00 74 04 0f 0b eb fe 31 f6 e8 75 fe ff ff c9 c3 8b 47 0c 55
> > 48 89 e5 85 c0 79 0d 48 8b 47 18 48 85 c0 74 08 a8 01 75 04 <0f> 0b eb fe be 01 00 00 00 e
> > 8 4d fe ff ff c9 c3 55 65 48 8b 04
> > [ 2966.031815] RIP  [<ffffffff800efbe2>] mem_cgroup_uncharge_page+0x18/0x28
> > [ 2966.031815]  RSP <ffff880381713da8>
> > 
> > I don't think my patch is the guilt because this also happens even if I don't
> > set "recharge_at_immigrate".
> > 
> > It might be better that I test w/o my patches, but IIUC, this can happen
> > in following scenario like:
> > 
> > Assume process A and B has the same swap pte.
> > 
> >           process A                       process B
> >   do_swap_page()                    do_swap_page()
> >     read_swap_cache_async()
> >     lock_page()                       lookup_swap_cache()
> >     page_add_anon_rmap()
> >     unlock_page()
> >                                       lock_page()
> >     do_wp_page()
> I think do_wp_page() do lock_page() here.
> Because the page is ANON.
> 
hmm, but it calls unlock_page() soon.
So, I think this can happen if lock_page() of B gets the lock after do_wp_page()
of A release the lock.

> >       page_remove_rmap()
> >         atomic_add_negative()
> >           -> page->_mapcount = -1     page_add_anon_rmap()
> >                                         atomic_inc_and_test()
> >                                           -> page->_mapcount = 0
> >         mem_cgroup_uncharge_page()
> >           -> hit the VM_BUG_ON()
> > 
> > 
> > So, I think this should be like:
> > 
> > void mem_cgroup_uncharge_page(struct page *page)
> > {
> > 	/*
> > 	 * Called when anonymous page's page->mapcount goes down to zero,
> > 	 * or cancel a charge gotten by newpage_charge().
> > 	 * But there is a small race between page_remove_rmap() and
> > 	 * page_add_anon_rmap(), so we can reach here with page_mapped().
> > 	 */
> > 	VM_BUG_ON(page->mapping && !PageAnon(page))
> >         if (unlikely(page_mapped(page))
> > 		return;
> > 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
> > }
> > 
> 
> From backtrace, mem_cgroup_uncharge_page() is called via page_remove_rmap().
> Then, page_mapped() should be false.
> 
> Hmm.
>  VM_BUG_ON(page_mapped(page) || (page->mapping && !PageAnon(page)));
> 
> mem_cgroup_uncharge_page() is called by page_remove_rmap() only when
>  !page_mapped() and PageAnon(page) == true.
> Could you show me disassemble of (head lines of) mem_cgroup_uncharge_page() ?
> 
here.

===
(gdb) disas mem_cgroup_uncharge_page
Dump of assembler code for function mem_cgroup_uncharge_page:
0xffffffff800efbca <mem_cgroup_uncharge_page+0>:        mov    0xc(%rdi),%eax
0xffffffff800efbcd <mem_cgroup_uncharge_page+3>:        push   %rbp
0xffffffff800efbce <mem_cgroup_uncharge_page+4>:        mov    %rsp,%rbp
0xffffffff800efbd1 <mem_cgroup_uncharge_page+7>:        test   %eax,%eax
0xffffffff800efbd3 <mem_cgroup_uncharge_page+9>:        jns    0xffffffff800efbe2 <mem_cgroup_uncharge_page+24>
0xffffffff800efbd5 <mem_cgroup_uncharge_page+11>:       mov    0x18(%rdi),%rax
0xffffffff800efbd9 <mem_cgroup_uncharge_page+15>:       test   %rax,%rax
0xffffffff800efbdc <mem_cgroup_uncharge_page+18>:       je     0xffffffff800efbe6 <mem_cgroup_uncharge_page+28>
0xffffffff800efbde <mem_cgroup_uncharge_page+20>:       test   $0x1,%al
0xffffffff800efbe0 <mem_cgroup_uncharge_page+22>:       jne    0xffffffff800efbe6 <mem_cgroup_uncharge_page+28>
0xffffffff800efbe2 <mem_cgroup_uncharge_page+24>:       ud2a
0xffffffff800efbe4 <mem_cgroup_uncharge_page+26>:       jmp    0xffffffff800efbe4 <mem_cgroup_uncharge_page+26>
0xffffffff800efbe6 <mem_cgroup_uncharge_page+28>:       mov    $0x1,%esi
0xffffffff800efbeb <mem_cgroup_uncharge_page+33>:       callq  0xffffffff800efa3d <__mem_cgroup_uncharge_common>
0xffffffff800efbf0 <mem_cgroup_uncharge_page+38>:       leaveq
0xffffffff800efbf1 <mem_cgroup_uncharge_page+39>:       retq
End of assembler dump.
===

> Maybe there is something I don't understand..
> IIUC, when page_remove_rmap() is called by do_wp_page(),
> there must be pte(s) which points to the page and a pte is guarded by
> page table lock. So, I think page_mapcount() > 0 before calling page_remove_rmap()
> because there must be a valid pte, at least.
> 
> Can this scenario happen ?
I think so. I intended to mention this case :)
I'm sorry for my vague explanation.

> ==
>     Thread A.                                      Thread B.
> 
>     do_wp_page()                                 do_swap_page()
>        PageAnon(oldpage)                         
>          lock_page()                             lock_page()=> wait.
>          reuse = false.
>          unlock_page()                           get lock.      
>        do copy-on-write
>        pte_same() == true
>          page_remove_rmap(oldpage) (mapcount goes to -1)
>                                                  page_set_anon_rmap() (new anon rmap again)
> ==
> Then, oldpage's mapcount goes down to 0 and up to 1 immediately.
> 
> About charge itself, Thread B. executes swap's charge sequence and maybe
> no problem. Hmm.
> 
I think so too.


Thanks,
Daisuke Nishimura.

> I'll apply your change. Thank you for report.
> 
> Regards,
> -Kame
> 
> 
> 
> > 
> > 
> > Thanks,
> > Daisuke Nishimura.
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-09-29  2:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-25  8:17 [RFC][PATCH 0/10] memcg clean up and some fixes for softlimit (Sep25) KAMEZAWA Hiroyuki
2009-09-25  8:18 ` [RFC][PATCH 1/10] memcg : modifications for softlimit uncharge KAMEZAWA Hiroyuki
2009-09-25  8:20 ` [RFC][PATCH 2/10] memcg : clean up in softlimit charge KAMEZAWA Hiroyuki
2009-09-25  8:21 ` [RFC][PATCH 3/10] memcg: reorganize memcontrol.c KAMEZAWA Hiroyuki
2009-09-25  8:22 ` [RFC][PATCH 4/10] memcg: add memcg charge cancel KAMEZAWA Hiroyuki
2009-09-25  8:24 ` [RFC][PATCH 5/10] memcg: clean up percpu statistics access KAMEZAWA Hiroyuki
2009-09-25  8:25 ` [RFC][PATCH 6/10] memcg: remove unsued macros KAMEZAWA Hiroyuki
2009-09-25  8:25 ` [RFC][PATCH 0/10] memcg clean up and some fixes for softlimit (Sep25) Balbir Singh
2009-09-25  8:27 ` [RFC][PATCH 7/10] memcg: replace cont with cgroup KAMEZAWA Hiroyuki
2009-09-25  8:28 ` [RFC][PATCH 8/10] memcg: clean up charge/uncharge anon KAMEZAWA Hiroyuki
2009-09-29  0:24   ` Daisuke Nishimura
2009-09-29  1:26     ` KAMEZAWA Hiroyuki
2009-09-29  2:18       ` Daisuke Nishimura [this message]
2009-09-29  3:03         ` Daisuke Nishimura
2009-09-29  3:14           ` KAMEZAWA Hiroyuki
2009-09-25  8:29 ` [RFC][PATCH 9/10] memcg: clean up perzone stat KAMEZAWA Hiroyuki
2009-09-25  8:30 ` [RFC][PATCH 10/10] memcg: add commentary KAMEZAWA Hiroyuki
2009-09-30  2:21   ` Daisuke Nishimura
2009-09-30  2:41     ` KAMEZAWA Hiroyuki
2009-09-30  4:36       ` Daisuke Nishimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090929111828.6f9148d6.nishimura@mxp.nes.nec.co.jp \
    --to=nishimura@mxp.nes.nec.co.jp \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.