From mboxrd@z Thu Jan  1 00:00:00 1970
From: MaoXiaoyun <tinnycloud@hotmail.com>
Subject: RE: mem_sharing: summarized problems when domain is
	dying
Date: Mon, 24 Jan 2011 21:14:18 +0800
Message-ID: <BLU157-w27B4B745F95C5525F45568DAFD0@phx.gbl>
References: <AANLkTi=wimfV7Wc6aEd2cYS-=dOb2V5Xy97eSCgW-gKh@mail.gmail.com>,
	<AANLkTi=nhs9edNB2-A700RKcGrj52EgcvVUPaK_qiM0N@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1620054647=="
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <AANLkTi=nhs9edNB2-A700RKcGrj52EgcvVUPaK_qiM0N@mail.gmail.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: xen devel <xen-devel@lists.xensource.com>
Cc: george.dunlap@eu.citrix.com, tim.deegan@citrix.com, juihaochiang@gmail.com
List-Id: xen-devel@lists.xenproject.org

--===============1620054647==
Content-Type: multipart/alternative;
	boundary="_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_"

--_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_
Content-Type: text/plain; charset="gb2312"
Content-Transfer-Encoding: quoted-printable


Hi:
=20
       Another BUG found when testing memory sharing.
       In this test, I start 24 linux HVMS, each of them reboot through "=
xm reboot" every 30minutes.
       After several hours, some of the HVM will crash. All of the crash =
HVM are stopped during booting.
       The bug still exists even I forbid page sharing by cheating tapdis=
k that xc_memshr_nominate_gref()
       return failure.
=20
       And no special log found.
=20
       I was able to dump the crash stack. =20
       what could happen?
       thanks.
=20
PID: 2307   TASK: ffff810014166100  CPU: 0   COMMAND: "setfont"
 #0 [ffff8100123cd900] xen_panic_event at ffffffff88001d28
 #1 [ffff8100123cd920] notifier_call_chain at ffffffff80066eaa
 #2 [ffff8100123cd940] panic at ffffffff8009094a
 #3 [ffff8100123cda30] oops_end at ffffffff80064fca
 #4 [ffff8100123cda40] do_page_fault at ffffffff80066dc0
 #5 [ffff8100123cdb30] error_exit at ffffffff8005dde9
    [exception RIP: vgacon_do_font_op+363]
    RIP: ffffffff800515e5  RSP: ffff8100123cdbe8  RFLAGS: 00010203
    RAX: 0000000000000000  RBX: ffffffff804b3740  RCX: ffff8100000a03fc
    RDX: 00000000000003fd  RSI: ffff810011cec000  RDI: ffffffff803244c4
    RBP: ffff810011cec000   R8: d0d6999996000000   R9: 0000009090b0b0ff
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000004
    R13: 0000000000000001  R14: 0000000000000001  R15: 000000000000000e
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff8100123cdc20] vgacon_font_set at ffffffff8016bec5
 #7 [ffff8100123cdc60] con_font_op at ffffffff801aa86b
 #8 [ffff8100123cdcd0] vt_ioctl at ffffffff801a5af4
 #9 [ffff8100123cdd70] tty_ioctl at ffffffff80038a2c
#10 [ffff8100123cdeb0] do_ioctl at ffffffff800420d9
#11 [ffff8100123cded0] vfs_ioctl at ffffffff800302ce
#12 [ffff8100123cdf40] sys_ioctl at ffffffff8004c766
#13 [ffff8100123cdf80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00000039294cc557  RSP: 00007fff54c4aec8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 00007fff54c4aee0  RSI: 0000000000004b72  RDI: 0000000000000003
    RBP: 000000001d747ab0   R8: 0000000000000010   R9: 0000000000800000
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000010
    R13: 0000000000000200  R14: 0000000000000008  R15: 0000000000000008
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

=20
> Date: Fri, 21 Jan 2011 14:45:14 -0500
> Subject: Re: mem_sharing: summarized problems when domain is dying
> From: juihaochiang@gmail.com
> To: Tim.Deegan@citrix.com
> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
>=20
> Hi
>=20
> On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang <juihaochiang@gmail.co=
m> wrote:
> > Hi, Tim:
> >
> > From tinnycloud's result, here I summarize the current problem and
> > findings of mem_sharing due to domain dying.
> > (1) When domain is dying, alloc_domheap_page() and
> > set_shared_p2m_entry() would just fail. So the shr_lock is not enough
> > to ensure that the domain won't die in the middle of mem_sharing code=
.
> > As tinnycloud's code shows, is that better to use
> > rcu_lock_domain_by_id before calling the above two functions?
> >
>=20
> There seems no good locking to protect a domain from changing the
> is_dying state. So the unshare function could fail in the middle in
> several points, e.g., alloc_domheap_page and set_shared_p2m_entry.
> If that's the case, we need to add some checking, and probably revert
> the things we have done when is_dying is changed in the middle.
>=20
> Any comments?
>=20
> Jui-Hao
 		 	   		 =20
--_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_
Content-Type: text/html; charset="gb2312"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:=CE=A2=C8=ED=D1=C5=BA=DA
}
--></style>
</head>
<body class=3D'hmmessage'>
Hi:<BR>
&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Another&nbsp;BUG&nbsp;found whe=
n testing&nbsp;memory sharing.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In this test, I start 24 linux =
HVMS, each of them reboot through "xm reboot" every 30minutes.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; After several hours, some of the HVM=
 will&nbsp;crash. All of the crash HVM are stopped during booting.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The bug still exists even I forbid p=
age sharing by cheating tapdisk that xc_memshr_nominate_gref()<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return failure.<BR>
&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; And no special log found.<BR>
&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I was able to dump the&nbsp;crash st=
ack.&nbsp; <BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;what could happen?<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;thanks.<BR>
&nbsp;<BR>
<DIV id=3DMsg_{797AF472-457F-42E2-8C62-840817E0BE74} class=3DMsgContent s=
endId=3D"cntaobaozhangpf0501" msgId=3D"{797AF472-457F-42E2-8C62-840817E0B=
E74}"><FONT style=3D"FONT-SIZE: 10pt" color=3D#000000 face=3D=CB=CE=CC=E5=
>PID:&nbsp;2307&nbsp;&nbsp;&nbsp;TASK:&nbsp;ffff810014166100&nbsp;&nbsp;C=
PU:&nbsp;0&nbsp;&nbsp;&nbsp;COMMAND:&nbsp;"setfont"<BR>&nbsp;#0&nbsp;[fff=
f8100123cd900]&nbsp;xen_panic_event&nbsp;at&nbsp;ffffffff88001d28<BR>&nbs=
p;#1&nbsp;[ffff8100123cd920]&nbsp;notifier_call_chain&nbsp;at&nbsp;ffffff=
ff80066eaa<BR>&nbsp;#2&nbsp;[ffff8100123cd940]&nbsp;panic&nbsp;at&nbsp;ff=
ffffff8009094a<BR>&nbsp;#3&nbsp;[ffff8100123cda30]&nbsp;oops_end&nbsp;at&=
nbsp;ffffffff80064fca<BR>&nbsp;#4&nbsp;[ffff8100123cda40]&nbsp;do_page_fa=
ult&nbsp;at&nbsp;ffffffff80066dc0<BR>&nbsp;#5&nbsp;[ffff8100123cdb30]&nbs=
p;error_exit&nbsp;at&nbsp;ffffffff8005dde9<BR>&nbsp;&nbsp;&nbsp;&nbsp;[ex=
ception&nbsp;RIP:&nbsp;vgacon_do_font_op+363]<BR>&nbsp;&nbsp;&nbsp;&nbsp;=
RIP:&nbsp;ffffffff800515e5&nbsp;&nbsp;RSP:&nbsp;ffff8100123cdbe
 8&nbsp;&nbsp;RFLAGS:&nbsp;00010203<BR>&nbsp;&nbsp;&nbsp;&nbsp;RAX:&nbsp;=
0000000000000000&nbsp;&nbsp;RBX:&nbsp;ffffffff804b3740&nbsp;&nbsp;RCX:&nb=
sp;ffff8100000a03fc<BR>&nbsp;&nbsp;&nbsp;&nbsp;RDX:&nbsp;00000000000003fd=
&nbsp;&nbsp;RSI:&nbsp;ffff810011cec000&nbsp;&nbsp;RDI:&nbsp;ffffffff80324=
4c4<BR>&nbsp;&nbsp;&nbsp;&nbsp;RBP:&nbsp;ffff810011cec000&nbsp;&nbsp;&nbs=
p;R8:&nbsp;d0d6999996000000&nbsp;&nbsp;&nbsp;R9:&nbsp;0000009090b0b0ff<BR=
>&nbsp;&nbsp;&nbsp;&nbsp;R10:&nbsp;0000000000000000&nbsp;&nbsp;R11:&nbsp;=
0000000000000000&nbsp;&nbsp;R12:&nbsp;0000000000000004<BR>&nbsp;&nbsp;&nb=
sp;&nbsp;R13:&nbsp;0000000000000001&nbsp;&nbsp;R14:&nbsp;0000000000000001=
&nbsp;&nbsp;R15:&nbsp;000000000000000e<BR>&nbsp;&nbsp;&nbsp;&nbsp;ORIG_RA=
X:&nbsp;ffffffffffffffff&nbsp;&nbsp;CS:&nbsp;0010&nbsp;&nbsp;SS:&nbsp;001=
8<BR>&nbsp;#6&nbsp;[ffff8100123cdc20]&nbsp;vgacon_font_set&nbsp;at&nbsp;f=
fffffff8016bec5<BR>&nbsp;#7&nbsp;[ffff8100123cdc60]&nbsp;con_font_op&nbsp=
;at&nbsp;ffffffff801aa86b<BR>&nbsp;#8&nbsp
 ;[ffff8100123cdcd0]&nbsp;vt_ioctl&nbsp;at&nbsp;ffffffff801a5af4<BR>&nbsp=
;#9&nbsp;[ffff8100123cdd70]&nbsp;tty_ioctl&nbsp;at&nbsp;ffffffff80038a2c<=
BR>#10&nbsp;[ffff8100123cdeb0]&nbsp;do_ioctl&nbsp;at&nbsp;ffffffff800420d=
9<BR>#11&nbsp;[ffff8100123cded0]&nbsp;vfs_ioctl&nbsp;at&nbsp;ffffffff8003=
02ce<BR>#12&nbsp;[ffff8100123cdf40]&nbsp;sys_ioctl&nbsp;at&nbsp;ffffffff8=
004c766<BR>#13&nbsp;[ffff8100123cdf80]&nbsp;tracesys&nbsp;at&nbsp;fffffff=
f8005d28d&nbsp;(via&nbsp;system_call)<BR>&nbsp;&nbsp;&nbsp;&nbsp;RIP:&nbs=
p;00000039294cc557&nbsp;&nbsp;RSP:&nbsp;00007fff54c4aec8&nbsp;&nbsp;RFLAG=
S:&nbsp;00000246<BR>&nbsp;&nbsp;&nbsp;&nbsp;RAX:&nbsp;ffffffffffffffda&nb=
sp;&nbsp;RBX:&nbsp;ffffffff8005d28d&nbsp;&nbsp;RCX:&nbsp;ffffffffffffffff=
<BR>&nbsp;&nbsp;&nbsp;&nbsp;RDX:&nbsp;00007fff54c4aee0&nbsp;&nbsp;RSI:&nb=
sp;0000000000004b72&nbsp;&nbsp;RDI:&nbsp;0000000000000003<BR>&nbsp;&nbsp;=
&nbsp;&nbsp;RBP:&nbsp;000000001d747ab0&nbsp;&nbsp;&nbsp;R8:&nbsp;00000000=
00000010&nbsp;&nbsp;&nbsp;R9:&nbsp;0000000
 000800000<BR>&nbsp;&nbsp;&nbsp;&nbsp;R10:&nbsp;0000000000000000&nbsp;&nb=
sp;R11:&nbsp;0000000000000246&nbsp;&nbsp;R12:&nbsp;0000000000000010<BR>&n=
bsp;&nbsp;&nbsp;&nbsp;R13:&nbsp;0000000000000200&nbsp;&nbsp;R14:&nbsp;000=
0000000000008&nbsp;&nbsp;R15:&nbsp;0000000000000008<BR>&nbsp;&nbsp;&nbsp;=
&nbsp;ORIG_RAX:&nbsp;0000000000000010&nbsp;&nbsp;CS:&nbsp;0033&nbsp;&nbsp=
;SS:&nbsp;002b</FONT></DIV>
<BR>&nbsp;<BR>
&gt; Date: Fri, 21 Jan 2011 14:45:14 -0500<BR>&gt; Subject: Re: mem_shari=
ng: summarized problems when domain is dying<BR>&gt; From: juihaochiang@g=
mail.com<BR>&gt; To: Tim.Deegan@citrix.com<BR>&gt; CC: tinnycloud@hotmail=
.com; xen-devel@lists.xensource.com<BR>&gt; <BR>&gt; Hi<BR>&gt; <BR>&gt; =
On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang &lt;juihaochiang@gmail.c=
om&gt; wrote:<BR>&gt; &gt; Hi, Tim:<BR>&gt; &gt;<BR>&gt; &gt; From tinnyc=
loud's result, here I summarize the current problem and<BR>&gt; &gt; find=
ings of mem_sharing due to domain dying.<BR>&gt; &gt; (1) When domain is =
dying, alloc_domheap_page() and<BR>&gt; &gt; set_shared_p2m_entry() would=
 just fail. So the shr_lock is not enough<BR>&gt; &gt; to ensure that the=
 domain won't die in the middle of mem_sharing code.<BR>&gt; &gt; As tinn=
ycloud's code shows, is that better to use<BR>&gt; &gt; rcu_lock_domain_b=
y_id before calling the above two functions?<BR>&gt; &gt;<BR>&gt; <BR>&gt=
; There seems no good locking to protect=20
 a domain from changing the<BR>&gt; is_dying state. So the unshare functi=
on could fail in the middle in<BR>&gt; several points, e.g., alloc_domhea=
p_page and set_shared_p2m_entry.<BR>&gt; If that's the case, we need to a=
dd some checking, and probably revert<BR>&gt; the things we have done whe=
n is_dying is changed in the middle.<BR>&gt; <BR>&gt; Any comments?<BR>&g=
t; <BR>&gt; Jui-Hao<BR> 		 	   		  </body>
</html>
--_666cf96a-7af5-4bbc-8286-db5fdc6fab7d_--


--===============1620054647==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--===============1620054647==--