From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ian Pratt" Subject: RE: Live migration fails under heavy network use Date: Tue, 20 Feb 2007 22:38:47 -0000 Message-ID: <8A87A9A84C201449A0C56B728ACF491E0B9AD2@liverpoolst.ad.cl.cam.ac.uk> References: <20070220215039.GA28903@totally.trollied.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Return-path: Content-class: urn:content-classes:message List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: John Levon , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org > I've observed this with both a Solaris and a FC6 domU (up to date as of > bash-3.00# while xm migrate --live fedora64 localhost ; do echo done ; done > (XEN) memory.c:188:d2 Dom2 freeing in-use page 9f40f (pseudophys 1d007): > count=3D2 type=3De8000000 > (XEN) memory.c:188:d2 Dom2 freeing in-use page 9f409 (pseudophys 1d00b): > count=3D2 type=3De8000000 > (XEN) /export/johnlev/xen/xen-work/xen.hg/xen/include/asm/mm.h:184:d0 Error > pfn 9f738: rd=3Dffff830000fe0100, od=3Dffff830000000002, = caf=3D00000000, > taf=3D0000000000000002 > (XEN) mm.c:590:d0 Error getting mfn 9f738 (pfn 12026) from L1 entry > 000000009f738705 for dom2 > Error: /usr/lib/xen/bin/xc_save 27 2 0 0 1 failed >=20 > Some experimentation has revealed that this only happens if a vif is > configured > and used, which seems like it's related to giving away pages (as rd = !=3D od > would > indicate too...). Anybody else seeing this? I've only tested on a Solaris > dom0 > so far, though I can't think of anything that would affect this. These guests are using rx-flip rather than rx-copy, right? This has certainly worked reliably in the past (e.g. 3.0.3), but is now getting little testing as current guests use rx-copy by default. The freeing in-use page messages may be unrelated to the actual problem -- AFAIK that's a relatively new printk that could occur benignly during a live migrate of an rx-flip guest. Even get_page can fail benignly under certain circumstances during a live migrate. It's worth finding out where the actual error in xc_linux_save is. Ian =20