From mboxrd@z Thu Jan 1 00:00:00 1970 From: MaoXiaoyun Subject: RE: mem_sharing: summarized problems when domain is dying Date: Mon, 24 Jan 2011 21:14:18 +0800 Message-ID: References: , Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1620054647==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen devel Cc: george.dunlap@eu.citrix.com, tim.deegan@citrix.com, juihaochiang@gmail.com List-Id: xen-devel@lists.xenproject.org --===============1620054647== Content-Type: multipart/alternative; boundary="_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_" --_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Hi: =20 Another BUG found when testing memory sharing. In this test, I start 24 linux HVMS, each of them reboot through "= xm reboot" every 30minutes. After several hours, some of the HVM will crash. All of the crash = HVM are stopped during booting. The bug still exists even I forbid page sharing by cheating tapdis= k that xc_memshr_nominate_gref() return failure. =20 And no special log found. =20 I was able to dump the crash stack. =20 what could happen? thanks. =20 PID: 2307 TASK: ffff810014166100 CPU: 0 COMMAND: "setfont" #0 [ffff8100123cd900] xen_panic_event at ffffffff88001d28 #1 [ffff8100123cd920] notifier_call_chain at ffffffff80066eaa #2 [ffff8100123cd940] panic at ffffffff8009094a #3 [ffff8100123cda30] oops_end at ffffffff80064fca #4 [ffff8100123cda40] do_page_fault at ffffffff80066dc0 #5 [ffff8100123cdb30] error_exit at ffffffff8005dde9 [exception RIP: vgacon_do_font_op+363] RIP: ffffffff800515e5 RSP: ffff8100123cdbe8 RFLAGS: 00010203 RAX: 0000000000000000 RBX: ffffffff804b3740 RCX: ffff8100000a03fc RDX: 00000000000003fd RSI: ffff810011cec000 RDI: ffffffff803244c4 RBP: ffff810011cec000 R8: d0d6999996000000 R9: 0000009090b0b0ff R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 R13: 0000000000000001 R14: 0000000000000001 R15: 000000000000000e ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff8100123cdc20] vgacon_font_set at ffffffff8016bec5 #7 [ffff8100123cdc60] con_font_op at ffffffff801aa86b #8 [ffff8100123cdcd0] vt_ioctl at ffffffff801a5af4 #9 [ffff8100123cdd70] tty_ioctl at ffffffff80038a2c #10 [ffff8100123cdeb0] do_ioctl at ffffffff800420d9 #11 [ffff8100123cded0] vfs_ioctl at ffffffff800302ce #12 [ffff8100123cdf40] sys_ioctl at ffffffff8004c766 #13 [ffff8100123cdf80] tracesys at ffffffff8005d28d (via system_call) RIP: 00000039294cc557 RSP: 00007fff54c4aec8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff RDX: 00007fff54c4aee0 RSI: 0000000000004b72 RDI: 0000000000000003 RBP: 000000001d747ab0 R8: 0000000000000010 R9: 0000000000800000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000010 R13: 0000000000000200 R14: 0000000000000008 R15: 0000000000000008 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b =20 > Date: Fri, 21 Jan 2011 14:45:14 -0500 > Subject: Re: mem_sharing: summarized problems when domain is dying > From: juihaochiang@gmail.com > To: Tim.Deegan@citrix.com > CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >=20 > Hi >=20 > On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang wrote: > > Hi, Tim: > > > > From tinnycloud's result, here I summarize the current problem and > > findings of mem_sharing due to domain dying. > > (1) When domain is dying, alloc_domheap_page() and > > set_shared_p2m_entry() would just fail. So the shr_lock is not enough > > to ensure that the domain won't die in the middle of mem_sharing code= . > > As tinnycloud's code shows, is that better to use > > rcu_lock_domain_by_id before calling the above two functions? > > >=20 > There seems no good locking to protect a domain from changing the > is_dying state. So the unshare function could fail in the middle in > several points, e.g., alloc_domheap_page and set_shared_p2m_entry. > If that's the case, we need to add some checking, and probably revert > the things we have done when is_dying is changed in the middle. >=20 > Any comments? >=20 > Jui-Hao =20 --_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable Hi:
 
       Another BUG found whe= n testing memory sharing.
       In this test, I start 24 linux = HVMS, each of them reboot through "xm reboot" every 30minutes.
       After several hours, some of the HVM= will crash. All of the crash HVM are stopped during booting.
       The bug still exists even I forbid p= age sharing by cheating tapdisk that xc_memshr_nominate_gref()
       return failure.
 
       And no special log found.
 
       I was able to dump the crash st= ack. 
       what could happen?
       thanks.
 
PID: 2307   TASK: ffff810014166100  C= PU: 0   COMMAND: "setfont"
 #0 [fff= f8100123cd900] xen_panic_event at ffffffff88001d28
&nbs= p;#1 [ffff8100123cd920] notifier_call_chain at ffffff= ff80066eaa
 #2 [ffff8100123cd940] panic at ff= ffffff8009094a
 #3 [ffff8100123cda30] oops_end at&= nbsp;ffffffff80064fca
 #4 [ffff8100123cda40] do_page_fa= ult at ffffffff80066dc0
 #5 [ffff8100123cdb30]&nbs= p;error_exit at ffffffff8005dde9
    [ex= ception RIP: vgacon_do_font_op+363]
    = RIP: ffffffff800515e5  RSP: ffff8100123cdbe 8  RFLAGS: 00010203
    RAX: = 0000000000000000  RBX: ffffffff804b3740  RCX:&nb= sp;ffff8100000a03fc
    RDX: 00000000000003fd=   RSI: ffff810011cec000  RDI: ffffffff80324= 4c4
    RBP: ffff810011cec000  &nbs= p;R8: d0d6999996000000   R9: 0000009090b0b0ff    R10: 0000000000000000  R11: = 0000000000000000  R12: 0000000000000004
  &nb= sp; R13: 0000000000000001  R14: 0000000000000001=   R15: 000000000000000e
    ORIG_RA= X: ffffffffffffffff  CS: 0010  SS: 001= 8
 #6 [ffff8100123cdc20] vgacon_font_set at f= fffffff8016bec5
 #7 [ffff8100123cdc60] con_font_op = ;at ffffffff801aa86b
 #8  ;[ffff8100123cdcd0] vt_ioctl at ffffffff801a5af4
 = ;#9 [ffff8100123cdd70] tty_ioctl at ffffffff80038a2c<= BR>#10 [ffff8100123cdeb0] do_ioctl at ffffffff800420d= 9
#11 [ffff8100123cded0] vfs_ioctl at ffffffff8003= 02ce
#12 [ffff8100123cdf40] sys_ioctl at ffffffff8= 004c766
#13 [ffff8100123cdf80] tracesys at fffffff= f8005d28d (via system_call)
    RIP:&nbs= p;00000039294cc557  RSP: 00007fff54c4aec8  RFLAG= S: 00000246
    RAX: ffffffffffffffda&nb= sp; RBX: ffffffff8005d28d  RCX: ffffffffffffffff=
    RDX: 00007fff54c4aee0  RSI:&nb= sp;0000000000004b72  RDI: 0000000000000003
  =   RBP: 000000001d747ab0   R8: 00000000= 00000010   R9: 0000000 000800000
    R10: 0000000000000000 &nb= sp;R11: 0000000000000246  R12: 0000000000000010
&n= bsp;   R13: 0000000000000200  R14: 000= 0000000000008  R15: 0000000000000008
   =  ORIG_RAX: 0000000000000010  CS: 0033  = ;SS: 002b

 
> Date: Fri, 21 Jan 2011 14:45:14 -0500
> Subject: Re: mem_shari= ng: summarized problems when domain is dying
> From: juihaochiang@g= mail.com
> To: Tim.Deegan@citrix.com
> CC: tinnycloud@hotmail= .com; xen-devel@lists.xensource.com
>
> Hi
>
> = On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang <juihaochiang@gmail.c= om> wrote:
> > Hi, Tim:
> >
> > From tinnyc= loud's result, here I summarize the current problem and
> > find= ings of mem_sharing due to domain dying.
> > (1) When domain is = dying, alloc_domheap_page() and
> > set_shared_p2m_entry() would= just fail. So the shr_lock is not enough
> > to ensure that the= domain won't die in the middle of mem_sharing code.
> > As tinn= ycloud's code shows, is that better to use
> > rcu_lock_domain_b= y_id before calling the above two functions?
> >
>
>= ; There seems no good locking to protect=20 a domain from changing the
> is_dying state. So the unshare functi= on could fail in the middle in
> several points, e.g., alloc_domhea= p_page and set_shared_p2m_entry.
> If that's the case, we need to a= dd some checking, and probably revert
> the things we have done whe= n is_dying is changed in the middle.
>
> Any comments?
&g= t;
> Jui-Hao
--_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_-- --===============1620054647== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1620054647==--