All of lore.kernel.org
 help / color / mirror / Atom feed
* hang on restore in 3.3.1
@ 2009-02-10  1:45 James Harper
  2009-02-11  8:45 ` James Harper
  0 siblings, 1 reply; 8+ messages in thread
From: James Harper @ 2009-02-10  1:45 UTC (permalink / raw)
  To: xen-devel

I am having problems with save/restore under 3.3.1 in the GPLPV drivers.
I call hvm_shutdown(xpdd, SHUTDOWN_suspend), but as soon as I lower IRQL
(enabling interrupts), qemu goes to 100% CPU and the DomU load goes
right up too.

Xentrace is showing a whole lot of this going on:


CPU0  200130258143212 (+     770)  hypercall  [ rip =
0x000000008020632a, eax = 0xffffffff ]
CPU0  200130258151107 (+    7895)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258156293 (+    5186)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258161233 (+    4940)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258165467 (+    4234)  hypercall  [ rip =
0x000000008020640a, eax = 0xffffffff ]
CPU0  200130258167202 (+    1735)  domain_wake       [ domid =
0x00000062, edomid = 0x00000000 ]
CPU0  200130258168511 (+    1309)  switch_infprev    [ old_domid =
0x00000000, runtime = 31143 ]
CPU0  200130258168716 (+     205)  switch_infnext    [ new_domid =
0x00000062, time = 786, r_time = 30000000 ]
CPU0  200130258169338 (+     622)  __enter_scheduler [
prev<domid:edomid> = 0x00000000 : 0x00000000, next<domid:edomid> =
0x00000062 : 0x00000000 ]
CPU0  200130258175532 (+    6194)  VMENTRY     [ dom:vcpu = 0x00000062 ]
CPU0  200130258179633 (+    4101)  VMEXIT      [ dom:vcpu = 0x00000062,
exitcode = 0x0000004e, rIP  = 0x0000000080a562b9 ]
CPU0  0 (+       0)  MMIO_AST_WR [ address = 0xfee000b0, data =
0x00000000 ]
CPU0  0 (+       0)  PF_XEN      [ dom:vcpu = 0x00000062, errorcode =
0x0b, virt = 0xfffe00b0 ]
CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector = 0x00,
fake = 1 ]
CPU0  200130258185932 (+    6299)  VMENTRY     [ dom:vcpu = 0x00000062 ]
CPU0  200130258189737 (+    3805)  VMEXIT      [ dom:vcpu = 0x00000062,
exitcode = 0x00000064, rIP  = 0x0000000080a560ad ]
CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector = 0x83,
fake = 0 ]
CPU0  200130258190990 (+    1253)  VMENTRY     [ dom:vcpu = 0x00000062 ]
CPU0  200130258194791 (+    3801)  VMEXIT      [ dom:vcpu = 0x00000062,
exitcode = 0x0000007b, rIP  = 0x0000000080a5a29e ]
CPU0  0 (+       0)  IO_ASSIST   [ dom:vcpu = 0x0000c202, data = 0x0000
]
CPU0  200130258198944 (+    4153)  switch_infprev    [ old_domid =
0x00000062, runtime = 17087 ]
CPU0  200130258199132 (+     188)  switch_infnext    [ new_domid =
0x00000000, time = 17087, r_time = 30000000 ]
CPU0  200130258199702 (+     570)  __enter_scheduler [
prev<domid:edomid> = 0x00000062 : 0x00000000, next<domid:edomid> =
0x00000000 : 0x00000000 ]
CPU0  200130258206470 (+    6768)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258210964 (+    4494)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258214767 (+    3803)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258218019 (+    3252)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]
CPU0  200130258227419 (+    9400)  hypercall  [ rip =
0x00000000802062eb, eax = 0xffffffff ]

It kind of looks like vector 0x83 is being fired over and over, which
would explain why things hang once I enable interrupts again. I will
look into what vector 0x83 is attached to, but does anyone have any
ideas?

Thanks

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: hang on restore in 3.3.1
  2009-02-10  1:45 hang on restore in 3.3.1 James Harper
@ 2009-02-11  8:45 ` James Harper
  2009-02-11  9:45   ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: James Harper @ 2009-02-11  8:45 UTC (permalink / raw)
  To: James Harper, xen-devel

Now I'm seeing the same thing but on vector 0x93 instead. There is
nothing on that vector. It appears that when xen is restoring my domain,
an interrupt line is getting 'stuck' somehow, as the hang occurs as soon
as I enable interrupts after doing the restore... any suggestions?

Can anyone confirm that "INJ_VIRQ [ dom:vcpu = 0x00000062, vector =
0x83, fake = 0 ]" does actually imply that an interrupt is being set in
my DomU, and that the vector is the actual offset into the vector table?

Thanks

James

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of James Harper
> Sent: Tuesday, 10 February 2009 12:46
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] hang on restore in 3.3.1
> 
> I am having problems with save/restore under 3.3.1 in the GPLPV
drivers.
> I call hvm_shutdown(xpdd, SHUTDOWN_suspend), but as soon as I lower
IRQL
> (enabling interrupts), qemu goes to 100% CPU and the DomU load goes
> right up too.
> 
> Xentrace is showing a whole lot of this going on:
> 
> 
> CPU0  200130258143212 (+     770)  hypercall  [ rip =
> 0x000000008020632a, eax = 0xffffffff ]
> CPU0  200130258151107 (+    7895)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258156293 (+    5186)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258161233 (+    4940)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258165467 (+    4234)  hypercall  [ rip =
> 0x000000008020640a, eax = 0xffffffff ]
> CPU0  200130258167202 (+    1735)  domain_wake       [ domid =
> 0x00000062, edomid = 0x00000000 ]
> CPU0  200130258168511 (+    1309)  switch_infprev    [ old_domid =
> 0x00000000, runtime = 31143 ]
> CPU0  200130258168716 (+     205)  switch_infnext    [ new_domid =
> 0x00000062, time = 786, r_time = 30000000 ]
> CPU0  200130258169338 (+     622)  __enter_scheduler [
> prev<domid:edomid> = 0x00000000 : 0x00000000, next<domid:edomid> =
> 0x00000062 : 0x00000000 ]
> CPU0  200130258175532 (+    6194)  VMENTRY     [ dom:vcpu = 0x00000062
]
> CPU0  200130258179633 (+    4101)  VMEXIT      [ dom:vcpu =
0x00000062,
> exitcode = 0x0000004e, rIP  = 0x0000000080a562b9 ]
> CPU0  0 (+       0)  MMIO_AST_WR [ address = 0xfee000b0, data =
> 0x00000000 ]
> CPU0  0 (+       0)  PF_XEN      [ dom:vcpu = 0x00000062, errorcode =
> 0x0b, virt = 0xfffe00b0 ]
> CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector =
0x00,
> fake = 1 ]
> CPU0  200130258185932 (+    6299)  VMENTRY     [ dom:vcpu = 0x00000062
]
> CPU0  200130258189737 (+    3805)  VMEXIT      [ dom:vcpu =
0x00000062,
> exitcode = 0x00000064, rIP  = 0x0000000080a560ad ]
> CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector =
0x83,
> fake = 0 ]
> CPU0  200130258190990 (+    1253)  VMENTRY     [ dom:vcpu = 0x00000062
]
> CPU0  200130258194791 (+    3801)  VMEXIT      [ dom:vcpu =
0x00000062,
> exitcode = 0x0000007b, rIP  = 0x0000000080a5a29e ]
> CPU0  0 (+       0)  IO_ASSIST   [ dom:vcpu = 0x0000c202, data =
0x0000
> ]
> CPU0  200130258198944 (+    4153)  switch_infprev    [ old_domid =
> 0x00000062, runtime = 17087 ]
> CPU0  200130258199132 (+     188)  switch_infnext    [ new_domid =
> 0x00000000, time = 17087, r_time = 30000000 ]
> CPU0  200130258199702 (+     570)  __enter_scheduler [
> prev<domid:edomid> = 0x00000062 : 0x00000000, next<domid:edomid> =
> 0x00000000 : 0x00000000 ]
> CPU0  200130258206470 (+    6768)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258210964 (+    4494)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258214767 (+    3803)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258218019 (+    3252)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258227419 (+    9400)  hypercall  [ rip =
> 0x00000000802062eb, eax = 0xffffffff ]
> 
> It kind of looks like vector 0x83 is being fired over and over, which
> would explain why things hang once I enable interrupts again. I will
> look into what vector 0x83 is attached to, but does anyone have any
> ideas?
> 
> Thanks
> 
> James
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on restore in 3.3.1
  2009-02-11  8:45 ` James Harper
@ 2009-02-11  9:45   ` Keir Fraser
  2009-02-11  9:49     ` James Harper
  2009-02-11  9:51     ` James Harper
  0 siblings, 2 replies; 8+ messages in thread
From: Keir Fraser @ 2009-02-11  9:45 UTC (permalink / raw)
  To: James Harper, xen-devel

On 11/02/2009 08:45, "James Harper" <james.harper@bendigoit.com.au> wrote:

> Now I'm seeing the same thing but on vector 0x93 instead. There is
> nothing on that vector. It appears that when xen is restoring my domain,
> an interrupt line is getting 'stuck' somehow, as the hang occurs as soon
> as I enable interrupts after doing the restore... any suggestions?

Not for a line that isn't connected up. Usually this is due to bad restore
of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3 you
could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu) and
see if that makes the problem go away.

> Can anyone confirm that "INJ_VIRQ [ dom:vcpu = 0x00000062, vector =
> 0x83, fake = 0 ]" does actually imply that an interrupt is being set in
> my DomU, and that the vector is the actual offset into the vector table?

Yes that's right.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: hang on restore in 3.3.1
  2009-02-11  9:45   ` Keir Fraser
@ 2009-02-11  9:49     ` James Harper
  2009-02-11  9:51     ` James Harper
  1 sibling, 0 replies; 8+ messages in thread
From: James Harper @ 2009-02-11  9:49 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

> On 11/02/2009 08:45, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > Now I'm seeing the same thing but on vector 0x93 instead. There is
> > nothing on that vector. It appears that when xen is restoring my
domain,
> > an interrupt line is getting 'stuck' somehow, as the hang occurs as
soon
> > as I enable interrupts after doing the restore... any suggestions?
> 
> Not for a line that isn't connected up. Usually this is due to bad
restore
> of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3
you
> could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu)
and
> see if that makes the problem go away.

I actually turn off the evtchn callback IRQ by setting it to 0. Even
when I don't do this though the problem still occurs.

When I analyse the IRR in windows, before enabling interrupts again, I
can definitely see that the bit for vector 0x93 is set.

Time to go digging... any suggestions for places to look?

Thanks

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: hang on restore in 3.3.1
  2009-02-11  9:45   ` Keir Fraser
  2009-02-11  9:49     ` James Harper
@ 2009-02-11  9:51     ` James Harper
  2009-02-11 10:21       ` Keir Fraser
  1 sibling, 1 reply; 8+ messages in thread
From: James Harper @ 2009-02-11  9:51 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

> On 11/02/2009 08:45, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > Now I'm seeing the same thing but on vector 0x93 instead. There is
> > nothing on that vector. It appears that when xen is restoring my
domain,
> > an interrupt line is getting 'stuck' somehow, as the hang occurs as
soon
> > as I enable interrupts after doing the restore... any suggestions?
> 
> Not for a line that isn't connected up. Usually this is due to bad
restore
> of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3
you
> could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu)
and
> see if that makes the problem go away.
> 

What do you think the chances are of it being a qemu problem? the
xentrace code would indicate that it was the hypervisor asserting the
interrupt, but that wouldn't preclude qemu from being the originator of
the interrupt would it?

Thanks

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on restore in 3.3.1
  2009-02-11  9:51     ` James Harper
@ 2009-02-11 10:21       ` Keir Fraser
  2009-02-11 11:26         ` James Harper
  2009-02-11 13:08         ` James Harper
  0 siblings, 2 replies; 8+ messages in thread
From: Keir Fraser @ 2009-02-11 10:21 UTC (permalink / raw)
  To: James Harper, xen-devel




On 11/02/2009 09:51, "James Harper" <james.harper@bendigoit.com.au> wrote:

> What do you think the chances are of it being a qemu problem? the
> xentrace code would indicate that it was the hypervisor asserting the
> interrupt, but that wouldn't preclude qemu from being the originator of
> the interrupt would it?

Most interrupts come from qemu, since it emulates most devices. Switching
your qemu is a pretty easy test (build internal old version rather than
out-of-tree new version).

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: hang on restore in 3.3.1
  2009-02-11 10:21       ` Keir Fraser
@ 2009-02-11 11:26         ` James Harper
  2009-02-11 13:08         ` James Harper
  1 sibling, 0 replies; 8+ messages in thread
From: James Harper @ 2009-02-11 11:26 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

> On 11/02/2009 09:51, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > What do you think the chances are of it being a qemu problem? the
> > xentrace code would indicate that it was the hypervisor asserting
the
> > interrupt, but that wouldn't preclude qemu from being the originator
of
> > the interrupt would it?
> 
> Most interrupts come from qemu, since it emulates most devices.
Switching
> your qemu is a pretty easy test (build internal old version rather
than
> out-of-tree new version).
> 

I just rebooted with my GPLPV drivers inactive (eg not hiding qemu
devices, leaving the PV network device with 'disconnected' and not
enumerating block devices), and I found that an NDIS device is using
vector 0x93, which will be the qemu realtek device. I hide it on boot,
but I forgot to hide it again after the restore which will probably be
the cause of my problems... hopefully hiding it on restore again will
stop it generating interrupts!

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: hang on restore in 3.3.1
  2009-02-11 10:21       ` Keir Fraser
  2009-02-11 11:26         ` James Harper
@ 2009-02-11 13:08         ` James Harper
  1 sibling, 0 replies; 8+ messages in thread
From: James Harper @ 2009-02-11 13:08 UTC (permalink / raw)
  To: James Harper, Keir Fraser, xen-devel

> > On 11/02/2009 09:51, "James Harper" <james.harper@bendigoit.com.au>
> wrote:
> >
> > > What do you think the chances are of it being a qemu problem? the
> > > xentrace code would indicate that it was the hypervisor asserting
the
> > > interrupt, but that wouldn't preclude qemu from being the
originator
> of
> > > the interrupt would it?
> >
> > Most interrupts come from qemu, since it emulates most devices.
> Switching
> > your qemu is a pretty easy test (build internal old version rather
than
> > out-of-tree new version).
> >
> 
> I just rebooted with my GPLPV drivers inactive (eg not hiding qemu
> devices, leaving the PV network device with 'disconnected' and not
> enumerating block devices), and I found that an NDIS device is using
> vector 0x93, which will be the qemu realtek device. I hide it on boot,
but
> I forgot to hide it again after the restore which will probably be the
> cause of my problems... hopefully hiding it on restore again will stop
it
> generating interrupts!

Well it's not the qemu realtek device like I thought it was - I tried it
on a domain with no network interfaces at all. The other thing that uses
that vector is the USB interface.

I have noticed that after a restore, the restored computer has reverted
back to the 'mouse' rather than the 'tablet' driver... maybe there is
something in that? I'll keep looking.

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-11 13:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-10  1:45 hang on restore in 3.3.1 James Harper
2009-02-11  8:45 ` James Harper
2009-02-11  9:45   ` Keir Fraser
2009-02-11  9:49     ` James Harper
2009-02-11  9:51     ` James Harper
2009-02-11 10:21       ` Keir Fraser
2009-02-11 11:26         ` James Harper
2009-02-11 13:08         ` James Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.