From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG Date: Tue, 24 Nov 2009 14:27:04 +0000 Message-ID: <1259072824.7590.389.camel@zakaz.uk.xensource.com> References: <20091108154153.GM1434@reaktio.net> <693ea516-aa5b-4f82-ad48-1bd51cfa3480@default> <20091108172747.GR1434@reaktio.net> <20091110100806.GE16033@reaktio.net> <4AFC9BFF.9030707@goop.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4AFC9BFF.9030707@goop.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jeremy Fitzhardinge Cc: Dan Magenheimer , "Xen-Devel (E-mail)" List-Id: xen-devel@lists.xenproject.org On Thu, 2009-11-12 at 23:36 +0000, Jeremy Fitzhardinge wrote: > On 11/10/09 02:08, Pasi K=C3=A4rkk=C3=A4inen wrote: > > Hello, > > > > Jeremy: Here's summary about these save/restore problems > > using upstream Linux 2.6.31.5 PV guest. > > > > For me: > > - I can "xm save" + "xm restore" UP guest, but I get non-fatal > > BUG in the guest kernel, see [1]. > > - "xm save" fails for SMP guest with "failed to get the suspend evtc= hn port", see [2]. > > > > For Dan: > > - "xm save" works for UP guest, but "xm restore" doesn't, giving > > infinite xen_sched_clock related dumps in the guest kernel, see [3= ]. > > - "xm save" for SMP guest fails, it never ends. I suspect this > > is the same problem I'm seeing. > > > > > > [1] non-fatal BUG on the guest kernel after "xm restore": > > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86_64-sav= erestore.txt > > =20 >=20 > Does this help: It does for me. There's another dpm_resume_noirq(PMSG_RESUME) a little later in do_suspend() which I think needs to be dropped as well. I'm still seeing other problems with resume, the system is hung on restore and the RCU stall detection logic is triggering, unfortunately arch_trigger_all_cpu_backtrace is not Xen compatible (uses APIC directly) so I don't get much useful info out of it. It's most likely a symptom of the actual problem rather than a problem with RCU per-se anyhow. diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 10d03d7..7b69a1a 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -43,7 +43,6 @@ static int xen_suspend(void *data) if (err) { printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n", err); - dpm_resume_noirq(PMSG_RESUME); return err; } =20 @@ -69,7 +68,6 @@ static int xen_suspend(void *data) } =20 sysdev_resume(); - dpm_resume_noirq(PMSG_RESUME); =20 return 0; } @@ -108,6 +106,9 @@ static void do_suspend(void) } =20 err =3D stop_machine(xen_suspend, &cancelled, cpumask_of(0)); + + dpm_resume_noirq(PMSG_RESUME); + if (err) { printk(KERN_ERR "failed to start xen_suspend: %d\n", err); goto out; @@ -119,8 +120,6 @@ static void do_suspend(void) } else xs_suspend_cancel(); =20 - dpm_resume_noirq(PMSG_RESUME); - resume_devices: dpm_resume_end(PMSG_RESUME); =20