From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: RE: mem_sharing: summarized problems when domain is dying Date: Mon, 24 Jan 2011 14:08:01 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: MaoXiaoyun Cc: xen devel , tim.deegan@citrix.com, juihaochiang@gmail.com List-Id: xen-devel@lists.xenproject.org I think it would be best if every separate issue you're facing is a separate thread. This looks like a Linux crash -- please include the kernel version you're using, and whatever other information might be appropriate. -George 2011/1/24 MaoXiaoyun : > Hi: > > =A0=A0=A0=A0=A0=A0=A0Another=A0BUG=A0found when testing=A0memory sharing. > =A0=A0=A0=A0=A0=A0=A0In this test, I start 24 linux HVMS, each of them re= boot through "xm > reboot" every 30minutes. > =A0=A0=A0=A0=A0=A0 After several hours, some of the HVM will=A0crash. All= of the crash HVM > are stopped during booting. > =A0=A0=A0=A0=A0=A0 The bug still exists even I forbid page sharing by che= ating tapdisk > that xc_memshr_nominate_gref() > =A0=A0=A0=A0=A0=A0 return failure. > > =A0=A0=A0=A0=A0=A0 And no special log found. > > =A0=A0=A0=A0=A0=A0 I was able to dump the=A0crash stack. > =A0=A0=A0=A0=A0=A0=A0what could happen? > =A0=A0=A0=A0=A0=A0=A0thanks. > > PID:=A02307=A0=A0=A0TASK:=A0ffff810014166100=A0=A0CPU:=A00=A0=A0=A0COMMAN= D:=A0"setfont" > =A0#0=A0[ffff8100123cd900]=A0xen_panic_event=A0at=A0ffffffff88001d28 > =A0#1=A0[ffff8100123cd920]=A0notifier_call_chain=A0at=A0ffffffff80066eaa > =A0#2=A0[ffff8100123cd940]=A0panic=A0at=A0ffffffff8009094a > =A0#3=A0[ffff8100123cda30]=A0oops_end=A0at=A0ffffffff80064fca > =A0#4=A0[ffff8100123cda40]=A0do_page_fault=A0at=A0ffffffff80066dc0 > =A0#5=A0[ffff8100123cdb30]=A0error_exit=A0at=A0ffffffff8005dde9 > =A0=A0=A0=A0[exception=A0RIP:=A0vgacon_do_font_op+363] > =A0=A0=A0=A0RIP:=A0ffffffff800515e5=A0=A0RSP:=A0ffff8100123cdbe 8=A0=A0RF= LAGS:=A000010203 > =A0=A0=A0=A0RAX:=A00000000000000000=A0=A0RBX:=A0ffffffff804b3740=A0=A0RCX= :=A0ffff8100000a03fc > =A0=A0=A0=A0RDX:=A000000000000003fd=A0=A0RSI:=A0ffff810011cec000=A0=A0RDI= :=A0ffffffff803244c4 > =A0=A0=A0=A0RBP:=A0ffff810011cec000=A0=A0=A0R8:=A0d0d6999996000000=A0=A0= =A0R9:=A00000009090b0b0ff > =A0=A0=A0=A0R10:=A00000000000000000=A0=A0R11:=A00000000000000000=A0=A0R12= :=A00000000000000004 > =A0=A0=A0=A0R13:=A00000000000000001=A0=A0R14:=A00000000000000001=A0=A0R15= :=A0000000000000000e > =A0=A0=A0=A0ORIG_RAX:=A0ffffffffffffffff=A0=A0CS:=A00010=A0=A0SS:=A00018 > =A0#6=A0[ffff8100123cdc20]=A0vgacon_font_set=A0at=A0ffffffff8016bec5 > =A0#7=A0[ffff8100123cdc60]=A0con_font_op=A0at=A0ffffffff801aa86b > =A0#8  ;[ffff8100123cdcd0]=A0vt_ioctl=A0at=A0ffffffff801a5af4 > =A0#9=A0[ffff8100123cdd70]=A0tty_ioctl=A0at=A0ffffffff80038a2c > #10=A0[ffff8100123cdeb0]=A0do_ioctl=A0at=A0ffffffff800420d9 > #11=A0[ffff8100123cded0]=A0vfs_ioctl=A0at=A0ffffffff800302ce > #12=A0[ffff8100123cdf40]=A0sys_ioctl=A0at=A0ffffffff8004c766 > #13=A0[ffff8100123cdf80]=A0tracesys=A0at=A0ffffffff8005d28d=A0(via=A0syst= em_call) > =A0=A0=A0=A0RIP:=A000000039294cc557=A0=A0RSP:=A000007fff54c4aec8=A0=A0RFL= AGS:=A000000246 > =A0=A0=A0=A0RAX:=A0ffffffffffffffda=A0=A0RBX:=A0ffffffff8005d28d=A0=A0RCX= :=A0ffffffffffffffff > =A0=A0=A0=A0RDX:=A000007fff54c4aee0=A0=A0RSI:=A00000000000004b72=A0=A0RDI= :=A00000000000000003 > =A0=A0=A0=A0RBP:=A0000000001d747ab0=A0=A0=A0R8:=A00000000000000010=A0=A0= =A0R9:=A00000000 000800000 > =A0=A0=A0=A0R10:=A00000000000000000=A0=A0R11:=A00000000000000246=A0=A0R12= :=A00000000000000010 > =A0=A0=A0=A0R13:=A00000000000000200=A0=A0R14:=A00000000000000008=A0=A0R15= :=A00000000000000008 > =A0=A0=A0=A0ORIG_RAX:=A00000000000000010=A0=A0CS:=A00033=A0=A0SS:=A0002b > >> Date: Fri, 21 Jan 2011 14:45:14 -0500 >> Subject: Re: mem_sharing: summarized problems when domain is dying >> From: juihaochiang@gmail.com >> To: Tim.Deegan@citrix.com >> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> Hi >> >> On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang >> wrote: >> > Hi, Tim: >> > >> > From tinnycloud's result, here I summarize the current problem and >> > findings of mem_sharing due to domain dying. >> > (1) When domain is dying, alloc_domheap_page() and >> > set_shared_p2m_entry() would just fail. So the shr_lock is not enough >> > to ensure that the domain won't die in the middle of mem_sharing code. >> > As tinnycloud's code shows, is that better to use >> > rcu_lock_domain_by_id before calling the above two functions? >> > >> >> There seems no good locking to protect a domain from changing the >> is_dying state. So the unshare function could fail in the middle in >> several points, e.g., alloc_domheap_page and set_shared_p2m_entry. >> If that's the case, we need to add some checking, and probably revert >> the things we have done when is_dying is changed in the middle. >> >> Any comments? >> >> Jui-Hao > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >