xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [BUG] Emulation issues
@ 2015-07-29 10:17 Roger Pau Monné
  2015-07-29 10:27 ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-29 10:17 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper, Paul Durrant

Hello,

While trying to debug a hotplug scripts issue, I came across what seems
to be an emulation bug inside of Xen. The result of this is a bunch of
repeated messages on the serial console:

[...]
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
[...]

This is quite spammy, I've been able to reliably trigger this by using
the following config file:

boot="d"
disk = [
'file:/root/test/FreeBSD-10.2-RC1-amd64-bootonly.iso,hdc:cdrom,r',
	]
memory=256
vcpus=1
name = "freebsd@i@"
builder='hvm'
on_crash="preserve"
on_reboot="preserve"
vnc=1
vnclisten="0.0.0.0"
serial='pty'

And launching them in a loop using the following script:

for i in `seq 1 20`; do
	sed s/@i@/$i/g freebsd.cfg > freebsd$i.cfg
	xl -vvv create freebsd$i.cfg
done

The bug happens when all 20 domains are running and I run a `xl shutdown
-w -a`, then some of them (not all) start triggering this emulation
issues. For the reference, the FreeBSD image can be obtained from:

ftp://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/10.2/FreeBSD-10.2-RC1-amd64-bootonly.iso

Also, I'm using latest staging changeset, 44313a.

Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 10:17 [BUG] Emulation issues Roger Pau Monné
@ 2015-07-29 10:27 ` Paul Durrant
  2015-07-29 10:36   ` Roger Pau Monné
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-29 10:27 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 29 July 2015 11:17
> To: xen-devel; Andrew Cooper; Paul Durrant
> Subject: [BUG] Emulation issues
> 
> Hello,
> 
> While trying to debug a hotplug scripts issue, I came across what seems
> to be an emulation bug inside of Xen. The result of this is a bunch of
> repeated messages on the serial console:
> 

Was there anything of interest before this? You got an 'unhandleable' emulation which generally should not happen, but I guess there may be a shutdown race in tearing down the ioreq server list and sending emulation requests which may cause hvm_send_ioreq() to return X86EMUL_UNHANDLEABLE. It would be good to better understand the sequence of events.

  Paul

> [...]
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d54v0 Weird HVM ioemulation status 1.
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d56v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> [...]
> 
> This is quite spammy, I've been able to reliably trigger this by using
> the following config file:
> 
> boot="d"
> disk = [
> 'file:/root/test/FreeBSD-10.2-RC1-amd64-bootonly.iso,hdc:cdrom,r',
> 	]
> memory=256
> vcpus=1
> name = "freebsd@i@"
> builder='hvm'
> on_crash="preserve"
> on_reboot="preserve"
> vnc=1
> vnclisten="0.0.0.0"
> serial='pty'
> 
> And launching them in a loop using the following script:
> 
> for i in `seq 1 20`; do
> 	sed s/@i@/$i/g freebsd.cfg > freebsd$i.cfg
> 	xl -vvv create freebsd$i.cfg
> done
> 
> The bug happens when all 20 domains are running and I run a `xl shutdown
> -w -a`, then some of them (not all) start triggering this emulation
> issues. For the reference, the FreeBSD image can be obtained from:
> 
> ftp://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/10.2/FreeBSD-
> 10.2-RC1-amd64-bootonly.iso
> 
> Also, I'm using latest staging changeset, 44313a.
> 
> Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 10:27 ` Paul Durrant
@ 2015-07-29 10:36   ` Roger Pau Monné
  2015-07-29 10:37     ` Paul Durrant
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-29 10:36 UTC (permalink / raw)
  To: Paul Durrant, xen-devel, Andrew Cooper

El 29/07/15 a les 12.27, Paul Durrant ha escrit:
>> -----Original Message-----
>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>> Sent: 29 July 2015 11:17
>> To: xen-devel; Andrew Cooper; Paul Durrant
>> Subject: [BUG] Emulation issues
>>
>> Hello,
>>
>> While trying to debug a hotplug scripts issue, I came across what seems
>> to be an emulation bug inside of Xen. The result of this is a bunch of
>> repeated messages on the serial console:
>>
> 
> Was there anything of interest before this? You got an 'unhandleable' emulation which generally should not happen, but I guess there may be a shutdown race in tearing down the ioreq server list and sending emulation requests which may cause hvm_send_ioreq() to return X86EMUL_UNHANDLEABLE. It would be good to better understand the sequence of events.

I don't think there's anything relevant before the messages I've posted,
here is a more complete log:

(XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
(XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
(XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
(XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
(XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
(XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
(XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
(XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
(XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
(XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
INIT: Id "T0" respawning too fast: disabled for 5 minutes
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166

If you can provide a debug/trace patch I can run the same workload with
it in order to trace the sequence of events.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 10:36   ` Roger Pau Monné
@ 2015-07-29 10:37     ` Paul Durrant
  2015-07-29 12:08     ` Andrew Cooper
  2015-07-29 12:41     ` Paul Durrant
  2 siblings, 0 replies; 26+ messages in thread
From: Paul Durrant @ 2015-07-29 10:37 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 29 July 2015 11:37
> To: Paul Durrant; xen-devel; Andrew Cooper
> Subject: Re: [BUG] Emulation issues
> 
> El 29/07/15 a les 12.27, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> >> Sent: 29 July 2015 11:17
> >> To: xen-devel; Andrew Cooper; Paul Durrant
> >> Subject: [BUG] Emulation issues
> >>
> >> Hello,
> >>
> >> While trying to debug a hotplug scripts issue, I came across what seems
> >> to be an emulation bug inside of Xen. The result of this is a bunch of
> >> repeated messages on the serial console:
> >>
> >
> > Was there anything of interest before this? You got an 'unhandleable'
> emulation which generally should not happen, but I guess there may be a
> shutdown race in tearing down the ioreq server list and sending emulation
> requests which may cause hvm_send_ioreq() to return
> X86EMUL_UNHANDLEABLE. It would be good to better understand the
> sequence of events.
> 
> I don't think there's anything relevant before the messages I've posted,
> here is a more complete log:
> 
> (XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
> (XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
> (XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
> (XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
> INIT: Id "T0" respawning too fast: disabled for 5 minutes
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> 
> If you can provide a debug/trace patch I can run the same workload with
> it in order to trace the sequence of events.
>

Ok, thanks. I'll create a patch with some relevant printks.

  Paul

 
> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 10:36   ` Roger Pau Monné
  2015-07-29 10:37     ` Paul Durrant
@ 2015-07-29 12:08     ` Andrew Cooper
  2015-07-29 12:41     ` Paul Durrant
  2 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2015-07-29 12:08 UTC (permalink / raw)
  To: Roger Pau Monné, Paul Durrant, xen-devel

On 29/07/15 11:36, Roger Pau Monné wrote:
> El 29/07/15 a les 12.27, Paul Durrant ha escrit:
>>> -----Original Message-----
>>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>>> Sent: 29 July 2015 11:17
>>> To: xen-devel; Andrew Cooper; Paul Durrant
>>> Subject: [BUG] Emulation issues
>>>
>>> Hello,
>>>
>>> While trying to debug a hotplug scripts issue, I came across what seems
>>> to be an emulation bug inside of Xen. The result of this is a bunch of
>>> repeated messages on the serial console:
>>>
>> Was there anything of interest before this? You got an 'unhandleable' emulation which generally should not happen, but I guess there may be a shutdown race in tearing down the ioreq server list and sending emulation requests which may cause hvm_send_ioreq() to return X86EMUL_UNHANDLEABLE. It would be good to better understand the sequence of events.
> I don't think there's anything relevant before the messages I've posted,
> here is a more complete log:
>
> (XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
> (XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
> (XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
> (XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
> INIT: Id "T0" respawning too fast: disabled for 5 minutes
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
>
> If you can provide a debug/trace patch I can run the same workload with
> it in order to trace the sequence of events.

Sander reported the same sequence, although from a different starting point.

Please can you try this ad-hoc debugging to start with?

diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index d3b9cae..7560d08 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int
dir)
         break;
     default:
         gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
-        domain_crash(curr->domain);
+        show_execution_state(curr);
+        dump_execution_state();
+        domain_crash_synchronous(curr->domain);
         break;
     }

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 10:36   ` Roger Pau Monné
  2015-07-29 10:37     ` Paul Durrant
  2015-07-29 12:08     ` Andrew Cooper
@ 2015-07-29 12:41     ` Paul Durrant
  2015-07-29 13:54       ` Roger Pau Monné
  2 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-29 12:41 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 29 July 2015 11:37
> To: Paul Durrant; xen-devel; Andrew Cooper
> Subject: Re: [BUG] Emulation issues
> 
> El 29/07/15 a les 12.27, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> >> Sent: 29 July 2015 11:17
> >> To: xen-devel; Andrew Cooper; Paul Durrant
> >> Subject: [BUG] Emulation issues
> >>
> >> Hello,
> >>
> >> While trying to debug a hotplug scripts issue, I came across what seems
> >> to be an emulation bug inside of Xen. The result of this is a bunch of
> >> repeated messages on the serial console:
> >>
> >
> > Was there anything of interest before this? You got an 'unhandleable'
> emulation which generally should not happen, but I guess there may be a
> shutdown race in tearing down the ioreq server list and sending emulation
> requests which may cause hvm_send_ioreq() to return
> X86EMUL_UNHANDLEABLE. It would be good to better understand the
> sequence of events.
> 
> I don't think there's anything relevant before the messages I've posted,
> here is a more complete log:
> 
> (XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
> (XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
> (XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
> (XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
> INIT: Id "T0" respawning too fast: disabled for 5 minutes
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> 
> If you can provide a debug/trace patch I can run the same workload with
> it in order to trace the sequence of events.
> 

Could you try this?

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 30acb78..1bc3cc9 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -145,6 +145,8 @@ static int hvmemul_do_io(
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
     default:
+        gprintk(XENLOG_ERR, "weird emulation state %u\n",
+                vio->io_req.state);
         return X86EMUL_UNHANDLEABLE;
     }

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec1d797..38d6d99 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *pr
         }
     }

+    gprintk(XENLOG_ERR, "unable to contact device model\n");
     return X86EMUL_UNHANDLEABLE;
 }

Paul

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 12:41     ` Paul Durrant
@ 2015-07-29 13:54       ` Roger Pau Monné
  2015-07-30 10:12         ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-29 13:54 UTC (permalink / raw)
  To: Paul Durrant, xen-devel, Andrew Cooper

El 29/07/15 a les 14.41, Paul Durrant ha escrit:
>> -----Original Message-----
>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>> Sent: 29 July 2015 11:37
>> To: Paul Durrant; xen-devel; Andrew Cooper
>> Subject: Re: [BUG] Emulation issues
>>
>> El 29/07/15 a les 12.27, Paul Durrant ha escrit:
>>>> -----Original Message-----
>>>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>>>> Sent: 29 July 2015 11:17
>>>> To: xen-devel; Andrew Cooper; Paul Durrant
>>>> Subject: [BUG] Emulation issues
>>>>
>>>> Hello,
>>>>
>>>> While trying to debug a hotplug scripts issue, I came across what seems
>>>> to be an emulation bug inside of Xen. The result of this is a bunch of
>>>> repeated messages on the serial console:
>>>>
>>>
>>> Was there anything of interest before this? You got an 'unhandleable'
>> emulation which generally should not happen, but I guess there may be a
>> shutdown race in tearing down the ioreq server list and sending emulation
>> requests which may cause hvm_send_ioreq() to return
>> X86EMUL_UNHANDLEABLE. It would be good to better understand the
>> sequence of events.
>>
>> I don't think there's anything relevant before the messages I've posted,
>> here is a more complete log:
>>
>> (XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
>> (XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
>> (XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
>> (XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
>> (XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
>> (XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
>> (XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
>> (XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
>> (XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
>> (XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
>> INIT: Id "T0" respawning too fast: disabled for 5 minutes
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>>
>> If you can provide a debug/trace patch I can run the same workload with
>> it in order to trace the sequence of events.
>>
> 
> Could you try this?
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 30acb78..1bc3cc9 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>              return X86EMUL_UNHANDLEABLE;
>          goto finish_access;
>      default:
> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> +                vio->io_req.state);
>          return X86EMUL_UNHANDLEABLE;
>      }
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index ec1d797..38d6d99 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *pr
>          }
>      }
> 
> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>      return X86EMUL_UNHANDLEABLE;
>  }

I've applied your patch and the one from Andrew, so my current diff is:

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 30acb78..1bc3cc9 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -145,6 +145,8 @@ static int hvmemul_do_io(
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
     default:
+        gprintk(XENLOG_ERR, "weird emulation state %u\n",
+                vio->io_req.state);
         return X86EMUL_UNHANDLEABLE;
     }
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec1d797..38d6d99 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
         }
     }
 
+    gprintk(XENLOG_ERR, "unable to contact device model\n");
     return X86EMUL_UNHANDLEABLE;
 }
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index d3b9cae..12d50c2 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
         break;
     default:
         gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
-        domain_crash(curr->domain);
+        show_execution_state(&curr->arch.user_regs);
+        dump_execution_state();
+        domain_crash_synchronous();
         break;
     }
 
And got the following panic while doing a `xl shutdown -w -a` of 20 HVM 
guests:

(XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
(XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
(XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
(XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
(XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
(XEN) d10v0 weird emulation state 1
(XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
(XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
(XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
(XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx: 0000000000000000
(XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
(XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
(XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
(XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
(XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
(XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046 ffff8300dfaf7ac8
(XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0 ffff82d08012cfd3
(XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046 0000000000002001
(XEN)    0000000000002001 fffff80002089e28 0000000000000001 fffffe00003829c0
(XEN)    000000000000b004 0000000000000000 0000000000000014 0000000000000002
(XEN)    000000000000b004 0000000000002001 000000000000b005 000000000000b004
(XEN)    0000000000002001 000000000000b004 0000beef0000beef<G><0>d15v0 weird emulation state 1
(XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation status 1.
(XEN)
(XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
(XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN)  0000000000000046CPU:    6
(XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>] 000000000000beef show_registers+0x60/0x32f
(XEN)
(XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT: hypervisor 0000000000000000 (d15v0) 0000000000000000
(XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx: 0000000000000000
(XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi: ffff83019d297bb8
(XEN)
(XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:  ffff83019d270000
(XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001   r11: 0000000000000001
(XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000   r14: ffff82c000299000
(XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
(XEN)
(XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
(XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001 ffff830198bf9000 0000000000000002 00007cfe62d68527
(XEN)    ffff82d08023b132 ffff8300dfaf7c38
(XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38 ffff82d0802685bf 0000000000002001 ffff83019d297b70
(XEN)    0000000000000200 ffff8300cc7da000
(XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000 0000000000000000 ffff8300cc7da250 0000000000000001
(XEN)    0000000000002001 ffff8300dfaf7db8
(XEN)    ffff82d0801c5934 0000000000002001 8000000000000000 fffff80002089e28 ffff8300cc7da000 0000000000000001
(XEN)    fffffe00003829c0 ffff8300dfaf0000
(XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8 0000000000000000 00000000000cc277 0000000000000014
(XEN)    0000000000000002 0000000000000000
(XEN)    0000000000000001 000000000000b004 00000000000feff0 0000000000002001 ffff8300ccfec820 000000000000b005
(XEN)    000000000000b004 ffff8300dfaf7d08
(XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff 000000000000b004 ffffffffffffffff 0000beef0000beef
(XEN)    ffffffff8036fa45 00000000000001f0
(XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000 0000000000000046 0000000000000000 fffffe00003829c0
(XEN)    000000000000beef ffff8300ccfec820
(XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820 0000000000000000 ffff8300cc7da000 0000000000000000
(XEN)    0000000000000000 ffff8300dfaf7da8
(XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8 ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
(XEN)  ffff82c000299000Xen call trace:
(XEN)
(XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
(XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>] show_execution_state+0x11/0x20
(XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
(XEN)  0000000000000001   [<ffff82d0801c5934>] hvm_do_resume+0x258/0x33e
(XEN)  0000000000000002   [<ffff82d0801e3166>] vmx_do_resume+0x12b/0x142
(XEN)
(XEN)      [<ffff82d080164adc>] context_switch+0xf0c/0xf63
(XEN)  ffff83019d297c38   [<ffff82d0801299e0>] schedule+0x5b9/0x612
(XEN)  ffff82d0801caff0   [<ffff82d08012c765>] __do_softirq+0x82/0x8d
(XEN)  ffff83019d297c38   [<ffff82d08012c7bd>] do_softirq+0x13/0x15
(XEN)  0000000000002001   [<ffff82d08023ace1>] process_softirqs+0x21/0x30
(XEN)
(XEN)
(XEN)  ffff83007b637000
(XEN) ****************************************
(XEN)  ffff83007b60aed0Panic on CPU 0:
(XEN)  ffff83007b4c4000Assertion 'diff < STACK_SIZE' failed at traps.c:91
(XEN)  ffff83007b637250****************************************
(XEN)
(XEN)
(XEN)   Reboot in five seconds...
(XEN)  ffff83019d297db8

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-29 13:54       ` Roger Pau Monné
@ 2015-07-30 10:12         ` Paul Durrant
  2015-07-30 10:16           ` Roger Pau Monné
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-30 10:12 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 29 July 2015 14:54
> To: Paul Durrant; xen-devel; Andrew Cooper
> Subject: Re: [BUG] Emulation issues
> 
> El 29/07/15 a les 14.41, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> >> Sent: 29 July 2015 11:37
> >> To: Paul Durrant; xen-devel; Andrew Cooper
> >> Subject: Re: [BUG] Emulation issues
> >>
> >> El 29/07/15 a les 12.27, Paul Durrant ha escrit:
> >>>> -----Original Message-----
> >>>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> >>>> Sent: 29 July 2015 11:17
> >>>> To: xen-devel; Andrew Cooper; Paul Durrant
> >>>> Subject: [BUG] Emulation issues
> >>>>
> >>>> Hello,
> >>>>
> >>>> While trying to debug a hotplug scripts issue, I came across what seems
> >>>> to be an emulation bug inside of Xen. The result of this is a bunch of
> >>>> repeated messages on the serial console:
> >>>>
> >>>
> >>> Was there anything of interest before this? You got an 'unhandleable'
> >> emulation which generally should not happen, but I guess there may be a
> >> shutdown race in tearing down the ioreq server list and sending
> emulation
> >> requests which may cause hvm_send_ioreq() to return
> >> X86EMUL_UNHANDLEABLE. It would be good to better understand the
> >> sequence of events.
> >>
> >> I don't think there's anything relevant before the messages I've posted,
> >> here is a more complete log:
> >>
> >> (XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
> >> (XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
> >> (XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
> >> (XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
> >> (XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
> >> (XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
> >> INIT: Id "T0" respawning too fast: disabled for 5 minutes
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >>
> >> If you can provide a debug/trace patch I can run the same workload with
> >> it in order to trace the sequence of events.
> >>
> >
> > Could you try this?
> >
> > diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> > index 30acb78..1bc3cc9 100644
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -145,6 +145,8 @@ static int hvmemul_do_io(
> >              return X86EMUL_UNHANDLEABLE;
> >          goto finish_access;
> >      default:
> > +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> > +                vio->io_req.state);
> >          return X86EMUL_UNHANDLEABLE;
> >      }
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index ec1d797..38d6d99 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
> ioreq_t *pr
> >          }
> >      }
> >
> > +    gprintk(XENLOG_ERR, "unable to contact device model\n");
> >      return X86EMUL_UNHANDLEABLE;
> >  }
> 
> I've applied your patch and the one from Andrew, so my current diff is:
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 30acb78..1bc3cc9 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>              return X86EMUL_UNHANDLEABLE;
>          goto finish_access;
>      default:
> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> +                vio->io_req.state);
>          return X86EMUL_UNHANDLEABLE;
>      }
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index ec1d797..38d6d99 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
> ioreq_t *proto_p,
>          }
>      }
> 
> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>      return X86EMUL_UNHANDLEABLE;
>  }
> 
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index d3b9cae..12d50c2 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
>          break;
>      default:
>          gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
> -        domain_crash(curr->domain);
> +        show_execution_state(&curr->arch.user_regs);
> +        dump_execution_state();
> +        domain_crash_synchronous();
>          break;
>      }
> 
> And got the following panic while doing a `xl shutdown -w -a` of 20 HVM
> guests:
> 
> (XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
> (XEN) d10v0 weird emulation state 1
> (XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
> (XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
> (XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
> (XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx: 0000000000000000
> (XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
> (XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
> (XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
> (XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
> (XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046
> ffff8300dfaf7ac8
> (XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0
> ffff82d08012cfd3
> (XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046
> 0000000000002001
> (XEN)    0000000000002001 fffff80002089e28 0000000000000001
> fffffe00003829c0
> (XEN)    000000000000b004 0000000000000000 0000000000000014
> 0000000000000002
> (XEN)    000000000000b004 0000000000002001 000000000000b005
> 000000000000b004
> (XEN)    0000000000002001 000000000000b004
> 0000beef0000beef<G><0>d15v0 weird emulation state 1
> (XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation status
> 1.
> (XEN)
> (XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
> (XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:
> C ]----
> (XEN)  0000000000000046CPU:    6
> (XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>] 000000000000beef
> show_registers+0x60/0x32f
> (XEN)
> (XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT: hypervisor
> 0000000000000000 (d15v0) 0000000000000000
> (XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx: 0000000000000000
> (XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi:
> ffff83019d297bb8
> (XEN)
> (XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:  ffff83019d270000
> (XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001   r11:
> 0000000000000001
> (XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000   r14:
> ffff82c000299000
> (XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b   cr4:
> 00000000000026e0
> (XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
> (XEN)
> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
> (XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001
> ffff830198bf9000 0000000000000002 00007cfe62d68527
> (XEN)    ffff82d08023b132 ffff8300dfaf7c38
> (XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38 ffff82d0802685bf
> 0000000000002001 ffff83019d297b70
> (XEN)    0000000000000200 ffff8300cc7da000
> (XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000
> 0000000000000000 ffff8300cc7da250 0000000000000001
> (XEN)    0000000000002001 ffff8300dfaf7db8
> (XEN)    ffff82d0801c5934 0000000000002001 8000000000000000
> fffff80002089e28 ffff8300cc7da000 0000000000000001
> (XEN)    fffffe00003829c0 ffff8300dfaf0000
> (XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8
> 0000000000000000 00000000000cc277 0000000000000014
> (XEN)    0000000000000002 0000000000000000
> (XEN)    0000000000000001 000000000000b004 00000000000feff0
> 0000000000002001 ffff8300ccfec820 000000000000b005
> (XEN)    000000000000b004 ffff8300dfaf7d08
> (XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff 000000000000b004
> ffffffffffffffff 0000beef0000beef
> (XEN)    ffffffff8036fa45 00000000000001f0
> (XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000
> 0000000000000046 0000000000000000 fffffe00003829c0
> (XEN)    000000000000beef ffff8300ccfec820
> (XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820
> 0000000000000000 ffff8300cc7da000 0000000000000000
> (XEN)    0000000000000000 ffff8300dfaf7da8
> (XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8
> ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
> (XEN)  ffff82c000299000Xen call trace:
> (XEN)
> (XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
> (XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>]
> show_execution_state+0x11/0x20
> (XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
> (XEN)  0000000000000001   [<ffff82d0801c5934>]
> hvm_do_resume+0x258/0x33e
> (XEN)  0000000000000002   [<ffff82d0801e3166>]
> vmx_do_resume+0x12b/0x142

Ok, so this is a handle_pio() that's being called to pick up the results of an I/O that was sent to an external emulator. Did you manage to apply my patch too? I'd really like to know what part of the emulation handling is actually returning X86EMUL_UNHANDLEABLE.

  Paul

> (XEN)
> (XEN)      [<ffff82d080164adc>] context_switch+0xf0c/0xf63
> (XEN)  ffff83019d297c38   [<ffff82d0801299e0>] schedule+0x5b9/0x612
> (XEN)  ffff82d0801caff0   [<ffff82d08012c765>] __do_softirq+0x82/0x8d
> (XEN)  ffff83019d297c38   [<ffff82d08012c7bd>] do_softirq+0x13/0x15
> (XEN)  0000000000002001   [<ffff82d08023ace1>] process_softirqs+0x21/0x30
> (XEN)
> (XEN)
> (XEN)  ffff83007b637000
> (XEN) ****************************************
> (XEN)  ffff83007b60aed0Panic on CPU 0:
> (XEN)  ffff83007b4c4000Assertion 'diff < STACK_SIZE' failed at traps.c:91
> (XEN)  ffff83007b637250****************************************
> (XEN)
> (XEN)
> (XEN)   Reboot in five seconds...
> (XEN)  ffff83019d297db8
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 10:12         ` Paul Durrant
@ 2015-07-30 10:16           ` Roger Pau Monné
  2015-07-30 10:21             ` Paul Durrant
  2015-07-30 10:24             ` Andrew Cooper
  0 siblings, 2 replies; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-30 10:16 UTC (permalink / raw)
  To: Paul Durrant, xen-devel, Andrew Cooper

El 30/07/15 a les 12.12, Paul Durrant ha escrit:
>> -----Original Message-----
>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>> Sent: 29 July 2015 14:54
>> To: Paul Durrant; xen-devel; Andrew Cooper
>> Subject: Re: [BUG] Emulation issues
>> I've applied your patch and the one from Andrew, so my current diff is:
>>
>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>> index 30acb78..1bc3cc9 100644
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>>              return X86EMUL_UNHANDLEABLE;
>>          goto finish_access;
>>      default:
>> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
>> +                vio->io_req.state);
>>          return X86EMUL_UNHANDLEABLE;
>>      }
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index ec1d797..38d6d99 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
>> ioreq_t *proto_p,
>>          }
>>      }
>>
>> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>>      return X86EMUL_UNHANDLEABLE;
>>  }
>>
>> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
>> index d3b9cae..12d50c2 100644
>> --- a/xen/arch/x86/hvm/io.c
>> +++ b/xen/arch/x86/hvm/io.c
>> @@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
>>          break;
>>      default:
>>          gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
>> -        domain_crash(curr->domain);
>> +        show_execution_state(&curr->arch.user_regs);
>> +        dump_execution_state();
>> +        domain_crash_synchronous();
>>          break;
>>      }
>>
>> And got the following panic while doing a `xl shutdown -w -a` of 20 HVM
>> guests:
>>
>> (XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
>> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
>> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
>> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
>> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
>> (XEN) d10v0 weird emulation state 1
>> (XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
>> (XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
>> (XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
>> (XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx: 0000000000000000
>> (XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
>> (XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
>> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
>> (XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
>> (XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
>> (XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
>> (XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046
>> ffff8300dfaf7ac8
>> (XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0
>> ffff82d08012cfd3
>> (XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046
>> 0000000000002001
>> (XEN)    0000000000002001 fffff80002089e28 0000000000000001
>> fffffe00003829c0
>> (XEN)    000000000000b004 0000000000000000 0000000000000014
>> 0000000000000002
>> (XEN)    000000000000b004 0000000000002001 000000000000b005
>> 000000000000b004
>> (XEN)    0000000000002001 000000000000b004
>> 0000beef0000beef<G><0>d15v0 weird emulation state 1
>> (XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation status
>> 1.
>> (XEN)
>> (XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
>> (XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:
>> C ]----
>> (XEN)  0000000000000046CPU:    6
>> (XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>] 000000000000beef
>> show_registers+0x60/0x32f
>> (XEN)
>> (XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT: hypervisor
>> 0000000000000000 (d15v0) 0000000000000000
>> (XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx: 0000000000000000
>> (XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi:
>> ffff83019d297bb8
>> (XEN)
>> (XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:  ffff83019d270000
>> (XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001   r11:
>> 0000000000000001
>> (XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000   r14:
>> ffff82c000299000
>> (XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b   cr4:
>> 00000000000026e0
>> (XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
>> (XEN)
>> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
>> (XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001
>> ffff830198bf9000 0000000000000002 00007cfe62d68527
>> (XEN)    ffff82d08023b132 ffff8300dfaf7c38
>> (XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38 ffff82d0802685bf
>> 0000000000002001 ffff83019d297b70
>> (XEN)    0000000000000200 ffff8300cc7da000
>> (XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000
>> 0000000000000000 ffff8300cc7da250 0000000000000001
>> (XEN)    0000000000002001 ffff8300dfaf7db8
>> (XEN)    ffff82d0801c5934 0000000000002001 8000000000000000
>> fffff80002089e28 ffff8300cc7da000 0000000000000001
>> (XEN)    fffffe00003829c0 ffff8300dfaf0000
>> (XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8
>> 0000000000000000 00000000000cc277 0000000000000014
>> (XEN)    0000000000000002 0000000000000000
>> (XEN)    0000000000000001 000000000000b004 00000000000feff0
>> 0000000000002001 ffff8300ccfec820 000000000000b005
>> (XEN)    000000000000b004 ffff8300dfaf7d08
>> (XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff 000000000000b004
>> ffffffffffffffff 0000beef0000beef
>> (XEN)    ffffffff8036fa45 00000000000001f0
>> (XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000
>> 0000000000000046 0000000000000000 fffffe00003829c0
>> (XEN)    000000000000beef ffff8300ccfec820
>> (XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820
>> 0000000000000000 ffff8300cc7da000 0000000000000000
>> (XEN)    0000000000000000 ffff8300dfaf7da8
>> (XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8
>> ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
>> (XEN)  ffff82c000299000Xen call trace:
>> (XEN)
>> (XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
>> (XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>]
>> show_execution_state+0x11/0x20
>> (XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
>> (XEN)  0000000000000001   [<ffff82d0801c5934>]
>> hvm_do_resume+0x258/0x33e
>> (XEN)  0000000000000002   [<ffff82d0801e3166>]
>> vmx_do_resume+0x12b/0x142
> 
> Ok, so this is a handle_pio() that's being called to pick up the results of an I/O that was sent to an external emulator. Did you manage to apply my patch too? I'd really like to know what part of the emulation handling is actually returning X86EMUL_UNHANDLEABLE.

Yes, I've applied both patches at the same time and this is the output I
got. You can see my diff above the trace. Do you want me to apply only
your patch?

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 10:16           ` Roger Pau Monné
@ 2015-07-30 10:21             ` Paul Durrant
  2015-07-30 10:59               ` Paul Durrant
  2015-07-30 10:24             ` Andrew Cooper
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-30 10:21 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 30 July 2015 11:17
> To: Paul Durrant; xen-devel; Andrew Cooper
> Subject: Re: [BUG] Emulation issues
> 
> El 30/07/15 a les 12.12, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> >> Sent: 29 July 2015 14:54
> >> To: Paul Durrant; xen-devel; Andrew Cooper
> >> Subject: Re: [BUG] Emulation issues
> >> I've applied your patch and the one from Andrew, so my current diff is:
> >>
> >> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> >> index 30acb78..1bc3cc9 100644
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
> >>              return X86EMUL_UNHANDLEABLE;
> >>          goto finish_access;
> >>      default:
> >> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> >> +                vio->io_req.state);
> >>          return X86EMUL_UNHANDLEABLE;
> >>      }
> >>
> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >> index ec1d797..38d6d99 100644
> >> --- a/xen/arch/x86/hvm/hvm.c
> >> +++ b/xen/arch/x86/hvm/hvm.c
> >> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server
> *s,
> >> ioreq_t *proto_p,
> >>          }
> >>      }
> >>
> >> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
> >>      return X86EMUL_UNHANDLEABLE;
> >>  }
> >>
> >> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> >> index d3b9cae..12d50c2 100644
> >> --- a/xen/arch/x86/hvm/io.c
> >> +++ b/xen/arch/x86/hvm/io.c
> >> @@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int
> dir)
> >>          break;
> >>      default:
> >>          gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
> >> -        domain_crash(curr->domain);
> >> +        show_execution_state(&curr->arch.user_regs);
> >> +        dump_execution_state();
> >> +        domain_crash_synchronous();
> >>          break;
> >>      }
> >>
> >> And got the following panic while doing a `xl shutdown -w -a` of 20 HVM
> >> guests:
> >>
> >> (XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
> >> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
> >> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
> >> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
> >> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
> >> (XEN) d10v0 weird emulation state 1
> >> (XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
> >> (XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
> >> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> >> (XEN) CPU:    0
> >> (XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
> >> (XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
> >> (XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx:
> 0000000000000000
> >> (XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
> >> (XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
> >> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11:
> 0000000000000001
> >> (XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
> >> (XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4:
> 00000000000026e0
> >> (XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
> >> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> >> (XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
> >> (XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046
> >> ffff8300dfaf7ac8
> >> (XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0
> >> ffff82d08012cfd3
> >> (XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046
> >> 0000000000002001
> >> (XEN)    0000000000002001 fffff80002089e28 0000000000000001
> >> fffffe00003829c0
> >> (XEN)    000000000000b004 0000000000000000 0000000000000014
> >> 0000000000000002
> >> (XEN)    000000000000b004 0000000000002001 000000000000b005
> >> 000000000000b004
> >> (XEN)    0000000000002001 000000000000b004
> >> 0000beef0000beef<G><0>d15v0 weird emulation state 1
> >> (XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation
> status
> >> 1.
> >> (XEN)
> >> (XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
> >> (XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:
> >> C ]----
> >> (XEN)  0000000000000046CPU:    6
> >> (XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>]
> 000000000000beef
> >> show_registers+0x60/0x32f
> >> (XEN)
> >> (XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT:
> hypervisor
> >> 0000000000000000 (d15v0) 0000000000000000
> >> (XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx:
> 0000000000000000
> >> (XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi:
> >> ffff83019d297bb8
> >> (XEN)
> >> (XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:
> ffff83019d270000
> >> (XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001
> r11:
> >> 0000000000000001
> >> (XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000
> r14:
> >> ffff82c000299000
> >> (XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b
> cr4:
> >> 00000000000026e0
> >> (XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
> >> (XEN)
> >> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> >> (XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
> >> (XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001
> >> ffff830198bf9000 0000000000000002 00007cfe62d68527
> >> (XEN)    ffff82d08023b132 ffff8300dfaf7c38
> >> (XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38
> ffff82d0802685bf
> >> 0000000000002001 ffff83019d297b70
> >> (XEN)    0000000000000200 ffff8300cc7da000
> >> (XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000
> >> 0000000000000000 ffff8300cc7da250 0000000000000001
> >> (XEN)    0000000000002001 ffff8300dfaf7db8
> >> (XEN)    ffff82d0801c5934 0000000000002001 8000000000000000
> >> fffff80002089e28 ffff8300cc7da000 0000000000000001
> >> (XEN)    fffffe00003829c0 ffff8300dfaf0000
> >> (XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8
> >> 0000000000000000 00000000000cc277 0000000000000014
> >> (XEN)    0000000000000002 0000000000000000
> >> (XEN)    0000000000000001 000000000000b004 00000000000feff0
> >> 0000000000002001 ffff8300ccfec820 000000000000b005
> >> (XEN)    000000000000b004 ffff8300dfaf7d08
> >> (XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff
> 000000000000b004
> >> ffffffffffffffff 0000beef0000beef
> >> (XEN)    ffffffff8036fa45 00000000000001f0
> >> (XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000
> >> 0000000000000046 0000000000000000 fffffe00003829c0
> >> (XEN)    000000000000beef ffff8300ccfec820
> >> (XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820
> >> 0000000000000000 ffff8300cc7da000 0000000000000000
> >> (XEN)    0000000000000000 ffff8300dfaf7da8
> >> (XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8
> >> ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
> >> (XEN)  ffff82c000299000Xen call trace:
> >> (XEN)
> >> (XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
> >> (XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>]
> >> show_execution_state+0x11/0x20
> >> (XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
> >> (XEN)  0000000000000001   [<ffff82d0801c5934>]
> >> hvm_do_resume+0x258/0x33e
> >> (XEN)  0000000000000002   [<ffff82d0801e3166>]
> >> vmx_do_resume+0x12b/0x142
> >
> > Ok, so this is a handle_pio() that's being called to pick up the results of an
> I/O that was sent to an external emulator. Did you manage to apply my patch
> too? I'd really like to know what part of the emulation handling is actually
> returning X86EMUL_UNHANDLEABLE.
> 
> Yes, I've applied both patches at the same time and this is the output I
> got. You can see my diff above the trace. Do you want me to apply only
> your patch?

Sorry, missed that in the noise. So, the problem is that there is no in-flight I/O even though pio completion is being attempted. Something has got out of sync.

Thanks,

  Paul

> 
> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 10:16           ` Roger Pau Monné
  2015-07-30 10:21             ` Paul Durrant
@ 2015-07-30 10:24             ` Andrew Cooper
  2015-07-30 10:27               ` Andrew Cooper
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2015-07-30 10:24 UTC (permalink / raw)
  To: Roger Pau Monné, Paul Durrant, xen-devel

On 30/07/15 11:16, Roger Pau Monné wrote:
> El 30/07/15 a les 12.12, Paul Durrant ha escrit:
>>> -----Original Message-----
>>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>>> Sent: 29 July 2015 14:54
>>> To: Paul Durrant; xen-devel; Andrew Cooper
>>> Subject: Re: [BUG] Emulation issues
>>> I've applied your patch and the one from Andrew, so my current diff is:
>>>
>>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>>> index 30acb78..1bc3cc9 100644
>>> --- a/xen/arch/x86/hvm/emulate.c
>>> +++ b/xen/arch/x86/hvm/emulate.c
>>> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>>>              return X86EMUL_UNHANDLEABLE;
>>>          goto finish_access;
>>>      default:
>>> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
>>> +                vio->io_req.state);
>>>          return X86EMUL_UNHANDLEABLE;
>>>      }
>>>
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index ec1d797..38d6d99 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
>>> ioreq_t *proto_p,
>>>          }
>>>      }
>>>
>>> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>>>      return X86EMUL_UNHANDLEABLE;
>>>  }
>>>
>>> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
>>> index d3b9cae..12d50c2 100644
>>> --- a/xen/arch/x86/hvm/io.c
>>> +++ b/xen/arch/x86/hvm/io.c
>>> @@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
>>>          break;
>>>      default:
>>>          gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
>>> -        domain_crash(curr->domain);
>>> +        show_execution_state(&curr->arch.user_regs);
>>> +        dump_execution_state();
>>> +        domain_crash_synchronous();
>>>          break;
>>>      }
>>>
>>> And got the following panic while doing a `xl shutdown -w -a` of 20 HVM
>>> guests:
>>>
>>> (XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
>>> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
>>> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
>>> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
>>> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
>>> (XEN) d10v0 weird emulation state 1
>>> (XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
>>> (XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
>>> (XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
>>> (XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx: 0000000000000000
>>> (XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
>>> (XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
>>> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
>>> (XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
>>> (XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
>>> (XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
>>> (XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046
>>> ffff8300dfaf7ac8
>>> (XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0
>>> ffff82d08012cfd3
>>> (XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046
>>> 0000000000002001
>>> (XEN)    0000000000002001 fffff80002089e28 0000000000000001
>>> fffffe00003829c0
>>> (XEN)    000000000000b004 0000000000000000 0000000000000014
>>> 0000000000000002
>>> (XEN)    000000000000b004 0000000000002001 000000000000b005
>>> 000000000000b004
>>> (XEN)    0000000000002001 000000000000b004
>>> 0000beef0000beef<G><0>d15v0 weird emulation state 1
>>> (XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation status
>>> 1.
>>> (XEN)
>>> (XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
>>> (XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:
>>> C ]----
>>> (XEN)  0000000000000046CPU:    6
>>> (XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>] 000000000000beef
>>> show_registers+0x60/0x32f
>>> (XEN)
>>> (XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT: hypervisor
>>> 0000000000000000 (d15v0) 0000000000000000
>>> (XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx: 0000000000000000
>>> (XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi:
>>> ffff83019d297bb8
>>> (XEN)
>>> (XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:  ffff83019d270000
>>> (XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001   r11:
>>> 0000000000000001
>>> (XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000   r14:
>>> ffff82c000299000
>>> (XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b   cr4:
>>> 00000000000026e0
>>> (XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
>>> (XEN)
>>> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
>>> (XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
>>> (XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001
>>> ffff830198bf9000 0000000000000002 00007cfe62d68527
>>> (XEN)    ffff82d08023b132 ffff8300dfaf7c38
>>> (XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38 ffff82d0802685bf
>>> 0000000000002001 ffff83019d297b70
>>> (XEN)    0000000000000200 ffff8300cc7da000
>>> (XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000
>>> 0000000000000000 ffff8300cc7da250 0000000000000001
>>> (XEN)    0000000000002001 ffff8300dfaf7db8
>>> (XEN)    ffff82d0801c5934 0000000000002001 8000000000000000
>>> fffff80002089e28 ffff8300cc7da000 0000000000000001
>>> (XEN)    fffffe00003829c0 ffff8300dfaf0000
>>> (XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8
>>> 0000000000000000 00000000000cc277 0000000000000014
>>> (XEN)    0000000000000002 0000000000000000
>>> (XEN)    0000000000000001 000000000000b004 00000000000feff0
>>> 0000000000002001 ffff8300ccfec820 000000000000b005
>>> (XEN)    000000000000b004 ffff8300dfaf7d08
>>> (XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff 000000000000b004
>>> ffffffffffffffff 0000beef0000beef
>>> (XEN)    ffffffff8036fa45 00000000000001f0
>>> (XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000
>>> 0000000000000046 0000000000000000 fffffe00003829c0
>>> (XEN)    000000000000beef ffff8300ccfec820
>>> (XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820
>>> 0000000000000000 ffff8300cc7da000 0000000000000000
>>> (XEN)    0000000000000000 ffff8300dfaf7da8
>>> (XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8
>>> ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
>>> (XEN)  ffff82c000299000Xen call trace:
>>> (XEN)
>>> (XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
>>> (XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>]
>>> show_execution_state+0x11/0x20
>>> (XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
>>> (XEN)  0000000000000001   [<ffff82d0801c5934>]
>>> hvm_do_resume+0x258/0x33e
>>> (XEN)  0000000000000002   [<ffff82d0801e3166>]
>>> vmx_do_resume+0x12b/0x142
>> Ok, so this is a handle_pio() that's being called to pick up the results of an I/O that was sent to an external emulator. Did you manage to apply my patch too? I'd really like to know what part of the emulation handling is actually returning X86EMUL_UNHANDLEABLE.
> Yes, I've applied both patches at the same time and this is the output I
> got. You can see my diff above the trace. Do you want me to apply only
> your patch?

This patch is in my queue for 4.7, and should prevent the interleaving
of the messages.

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 58ba4ea..c384d4d 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -417,12 +417,30 @@ void show_stack_overflow(unsigned int cpu, const
struct cpu_user_regs *regs)
 
 void show_execution_state(const struct cpu_user_regs *regs)
 {
+    /*
+     * Avoid interleaving the concurrent results from two cpus, but fault
+     * handlers need to take priority.
+     */
+    static DEFINE_SPINLOCK(lock);
+
+    spin_lock_recursive(&lock);
+
     show_registers(regs);
     show_stack(regs);
+
+    spin_unlock_recursive(&lock);
 }
 
 void vcpu_show_execution_state(struct vcpu *v)
 {
+    /*
+     * Avoid interleaving the concurrent results from two cpus, but fault
+     * handlers need to take priority.
+     */
+    static DEFINE_SPINLOCK(lock);
+
+    spin_lock_recursive(&lock);
+
     printk("*** Dumping Dom%d vcpu#%d state: ***\n",
            v->domain->domain_id, v->vcpu_id);
 
@@ -439,6 +457,8 @@ void vcpu_show_execution_state(struct vcpu *v)
         show_guest_stack(v, &v->arch.user_regs);
 
     vcpu_unpause(v);
+
+    spin_unlock_recursive(&lock);
 }
 
 static const char *trapstr(unsigned int trapnr)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 10:24             ` Andrew Cooper
@ 2015-07-30 10:27               ` Andrew Cooper
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2015-07-30 10:27 UTC (permalink / raw)
  To: Roger Pau Monné, Paul Durrant, xen-devel

On 30/07/15 11:24, Andrew Cooper wrote:
> On 30/07/15 11:16, Roger Pau Monné wrote:
>> El 30/07/15 a les 12.12, Paul Durrant ha escrit:
>>>> -----Original Message-----
>>>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>>>> Sent: 29 July 2015 14:54
>>>> To: Paul Durrant; xen-devel; Andrew Cooper
>>>> Subject: Re: [BUG] Emulation issues
>>>> I've applied your patch and the one from Andrew, so my current diff is:
>>>>
>>>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>>>> index 30acb78..1bc3cc9 100644
>>>> --- a/xen/arch/x86/hvm/emulate.c
>>>> +++ b/xen/arch/x86/hvm/emulate.c
>>>> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>>>>              return X86EMUL_UNHANDLEABLE;
>>>>          goto finish_access;
>>>>      default:
>>>> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
>>>> +                vio->io_req.state);
>>>>          return X86EMUL_UNHANDLEABLE;
>>>>      }
>>>>
>>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>>> index ec1d797..38d6d99 100644
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
>>>> ioreq_t *proto_p,
>>>>          }
>>>>      }
>>>>
>>>> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>>>>      return X86EMUL_UNHANDLEABLE;
>>>>  }
>>>>
>>>> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
>>>> index d3b9cae..12d50c2 100644
>>>> --- a/xen/arch/x86/hvm/io.c
>>>> +++ b/xen/arch/x86/hvm/io.c
>>>> @@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
>>>>          break;
>>>>      default:
>>>>          gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
>>>> -        domain_crash(curr->domain);
>>>> +        show_execution_state(&curr->arch.user_regs);
>>>> +        dump_execution_state();
>>>> +        domain_crash_synchronous();
>>>>          break;
>>>>      }
>>>>
>>>> And got the following panic while doing a `xl shutdown -w -a` of 20 HVM
>>>> guests:
>>>>
>>>> (XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
>>>> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
>>>> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
>>>> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
>>>> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
>>>> (XEN) d10v0 weird emulation state 1
>>>> (XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
>>>> (XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
>>>> (XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
>>>> (XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx: 0000000000000000
>>>> (XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
>>>> (XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
>>>> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
>>>> (XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
>>>> (XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
>>>> (XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
>>>> (XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046
>>>> ffff8300dfaf7ac8
>>>> (XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0
>>>> ffff82d08012cfd3
>>>> (XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046
>>>> 0000000000002001
>>>> (XEN)    0000000000002001 fffff80002089e28 0000000000000001
>>>> fffffe00003829c0
>>>> (XEN)    000000000000b004 0000000000000000 0000000000000014
>>>> 0000000000000002
>>>> (XEN)    000000000000b004 0000000000002001 000000000000b005
>>>> 000000000000b004
>>>> (XEN)    0000000000002001 000000000000b004
>>>> 0000beef0000beef<G><0>d15v0 weird emulation state 1
>>>> (XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation status
>>>> 1.
>>>> (XEN)
>>>> (XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
>>>> (XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:
>>>> C ]----
>>>> (XEN)  0000000000000046CPU:    6
>>>> (XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>] 000000000000beef
>>>> show_registers+0x60/0x32f
>>>> (XEN)
>>>> (XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT: hypervisor
>>>> 0000000000000000 (d15v0) 0000000000000000
>>>> (XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx: 0000000000000000
>>>> (XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi:
>>>> ffff83019d297bb8
>>>> (XEN)
>>>> (XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:  ffff83019d270000
>>>> (XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001   r11:
>>>> 0000000000000001
>>>> (XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000   r14:
>>>> ffff82c000299000
>>>> (XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b   cr4:
>>>> 00000000000026e0
>>>> (XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
>>>> (XEN)
>>>> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
>>>> (XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
>>>> (XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001
>>>> ffff830198bf9000 0000000000000002 00007cfe62d68527
>>>> (XEN)    ffff82d08023b132 ffff8300dfaf7c38
>>>> (XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38 ffff82d0802685bf
>>>> 0000000000002001 ffff83019d297b70
>>>> (XEN)    0000000000000200 ffff8300cc7da000
>>>> (XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000
>>>> 0000000000000000 ffff8300cc7da250 0000000000000001
>>>> (XEN)    0000000000002001 ffff8300dfaf7db8
>>>> (XEN)    ffff82d0801c5934 0000000000002001 8000000000000000
>>>> fffff80002089e28 ffff8300cc7da000 0000000000000001
>>>> (XEN)    fffffe00003829c0 ffff8300dfaf0000
>>>> (XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8
>>>> 0000000000000000 00000000000cc277 0000000000000014
>>>> (XEN)    0000000000000002 0000000000000000
>>>> (XEN)    0000000000000001 000000000000b004 00000000000feff0
>>>> 0000000000002001 ffff8300ccfec820 000000000000b005
>>>> (XEN)    000000000000b004 ffff8300dfaf7d08
>>>> (XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff 000000000000b004
>>>> ffffffffffffffff 0000beef0000beef
>>>> (XEN)    ffffffff8036fa45 00000000000001f0
>>>> (XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000
>>>> 0000000000000046 0000000000000000 fffffe00003829c0
>>>> (XEN)    000000000000beef ffff8300ccfec820
>>>> (XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820
>>>> 0000000000000000 ffff8300cc7da000 0000000000000000
>>>> (XEN)    0000000000000000 ffff8300dfaf7da8
>>>> (XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8
>>>> ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
>>>> (XEN)  ffff82c000299000Xen call trace:
>>>> (XEN)
>>>> (XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
>>>> (XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>]
>>>> show_execution_state+0x11/0x20
>>>> (XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
>>>> (XEN)  0000000000000001   [<ffff82d0801c5934>]
>>>> hvm_do_resume+0x258/0x33e
>>>> (XEN)  0000000000000002   [<ffff82d0801e3166>]
>>>> vmx_do_resume+0x12b/0x142
>>> Ok, so this is a handle_pio() that's being called to pick up the results of an I/O that was sent to an external emulator. Did you manage to apply my patch too? I'd really like to know what part of the emulation handling is actually returning X86EMUL_UNHANDLEABLE.
>> Yes, I've applied both patches at the same time and this is the output I
>> got. You can see my diff above the trace. Do you want me to apply only
>> your patch?
> This patch is in my queue for 4.7, and should prevent the interleaving
> of the messages.

Ahem! take two, without a missing unlock path.

~Andrew

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 58ba4ea..5a808d4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -417,18 +417,37 @@ void show_stack_overflow(unsigned int cpu, const
struct cpu_user_regs *regs)
 
 void show_execution_state(const struct cpu_user_regs *regs)
 {
+    /*
+     * Avoid interleaving the concurrent results from two cpus, but fault
+     * handlers need to take priority.
+     */
+    static DEFINE_SPINLOCK(lock);
+
+    spin_lock_recursive(&lock);
+
     show_registers(regs);
     show_stack(regs);
+
+    spin_unlock_recursive(&lock);
 }
 
 void vcpu_show_execution_state(struct vcpu *v)
 {
+    /*
+     * Avoid interleaving the concurrent results from two cpus, but fault
+     * handlers need to take priority.
+     */
+    static DEFINE_SPINLOCK(lock);
+
+    spin_lock_recursive(&lock);
+
     printk("*** Dumping Dom%d vcpu#%d state: ***\n",
            v->domain->domain_id, v->vcpu_id);
 
     if ( v == current )
     {
         show_execution_state(guest_cpu_user_regs());
+        spin_unlock_recursive(&lock);
         return;
     }
 
@@ -439,6 +458,8 @@ void vcpu_show_execution_state(struct vcpu *v)
         show_guest_stack(v, &v->arch.user_regs);
 
     vcpu_unpause(v);
+
+    spin_unlock_recursive(&lock);
 }
 
 static const char *trapstr(unsigned int trapnr)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 10:21             ` Paul Durrant
@ 2015-07-30 10:59               ` Paul Durrant
  2015-07-30 13:06                 ` Roger Pau Monné
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-30 10:59 UTC (permalink / raw)
  To: Paul Durrant, Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
[big snip]
> Sorry, missed that in the noise. So, the problem is that there is no in-flight I/O
> even though pio completion is being attempted. Something has got out of
> sync.
> 

I think I understand what may be happening... The code in hvmemul_do_io() basically expects to be called either to issue an I/O or to extract info from a completed one. However it is being called unconditionally (in the PIO case) out of hvm_do_resume, rather than only if the in-flight I/O state has been updated to STATE_IORESP_READY.

Can you try this patch (also containing my previous debug patch)?

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 30acb78..1bc3cc9 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -145,6 +145,8 @@ static int hvmemul_do_io(
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
     default:
+        gprintk(XENLOG_ERR, "weird emulation state %u\n",
+                vio->io_req.state);
         return X86EMUL_UNHANDLEABLE;
     }

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec1d797..a476271 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -472,7 +472,6 @@ void hvm_do_resume(struct vcpu *v)
     struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s;
-    enum hvm_io_completion io_completion;

     check_wakeup_from_wait();

@@ -499,33 +498,38 @@ void hvm_do_resume(struct vcpu *v)
         }
     }

-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
-
-    switch ( io_completion )
-    {
-    case HVMIO_no_completion:
-        break;
-    case HVMIO_mmio_completion:
-        handle_mmio();
-        break;
-    case HVMIO_pio_completion:
-        (void)handle_pio(vio->io_req.addr, vio->io_req.size,
-                         vio->io_req.dir);
-        break;
-    case HVMIO_realmode_completion:
+    if ( vio->io_req.state == STATE_IORESP_READY )
     {
-        struct hvm_emulate_ctxt ctxt;
+        enum hvm_io_completion io_completion;

-        hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
-        vmx_realmode_emulate_one(&ctxt);
-        hvm_emulate_writeback(&ctxt);
+        io_completion = vio->io_completion;
+        vio->io_completion = HVMIO_no_completion;

-        break;
-    }
-    default:
-        ASSERT_UNREACHABLE();
-        break;
+        switch ( io_completion )
+        {
+        case HVMIO_no_completion:
+            break;
+        case HVMIO_mmio_completion:
+            handle_mmio();
+            break;
+        case HVMIO_pio_completion:
+            (void)handle_pio(vio->io_req.addr, vio->io_req.size,
+                             vio->io_req.dir);
+            break;
+        case HVMIO_realmode_completion:
+        {
+            struct hvm_emulate_ctxt ctxt;
+
+            hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
+            vmx_realmode_emulate_one(&ctxt);
+            hvm_emulate_writeback(&ctxt);
+
+            break;
+        }
+        default:
+            ASSERT_UNREACHABLE();
+            break;
+        }
     }

     if ( unlikely(d->arch.event_write_data) )
@@ -2747,6 +2751,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
         }
     }

+    gprintk(XENLOG_ERR, "unable to contact device model\n");
     return X86EMUL_UNHANDLEABLE;
 }

  Paul

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 10:59               ` Paul Durrant
@ 2015-07-30 13:06                 ` Roger Pau Monné
  2015-07-30 13:12                   ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-30 13:06 UTC (permalink / raw)
  To: Paul Durrant, xen-devel, Andrew Cooper

El 30/07/15 a les 12.59, Paul Durrant ha escrit:
>> -----Original Message-----
> [big snip]
>> Sorry, missed that in the noise. So, the problem is that there is no in-flight I/O
>> even though pio completion is being attempted. Something has got out of
>> sync.
>>
> 
> I think I understand what may be happening... The code in hvmemul_do_io() basically expects to be called either to issue an I/O or to extract info from a completed one. However it is being called unconditionally (in the PIO case) out of hvm_do_resume, rather than only if the in-flight I/O state has been updated to STATE_IORESP_READY.
> 
> Can you try this patch (also containing my previous debug patch)?
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 30acb78..1bc3cc9 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>              return X86EMUL_UNHANDLEABLE;
>          goto finish_access;
>      default:
> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> +                vio->io_req.state);
>          return X86EMUL_UNHANDLEABLE;
>      }
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index ec1d797..a476271 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -472,7 +472,6 @@ void hvm_do_resume(struct vcpu *v)
>      struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
>      struct domain *d = v->domain;
>      struct hvm_ioreq_server *s;
> -    enum hvm_io_completion io_completion;
> 
>      check_wakeup_from_wait();
> 
> @@ -499,33 +498,38 @@ void hvm_do_resume(struct vcpu *v)
>          }
>      }
> 
> -    io_completion = vio->io_completion;
> -    vio->io_completion = HVMIO_no_completion;
> -
> -    switch ( io_completion )
> -    {
> -    case HVMIO_no_completion:
> -        break;
> -    case HVMIO_mmio_completion:
> -        handle_mmio();
> -        break;
> -    case HVMIO_pio_completion:
> -        (void)handle_pio(vio->io_req.addr, vio->io_req.size,
> -                         vio->io_req.dir);
> -        break;
> -    case HVMIO_realmode_completion:
> +    if ( vio->io_req.state == STATE_IORESP_READY )
>      {
> -        struct hvm_emulate_ctxt ctxt;
> +        enum hvm_io_completion io_completion;
> 
> -        hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
> -        vmx_realmode_emulate_one(&ctxt);
> -        hvm_emulate_writeback(&ctxt);
> +        io_completion = vio->io_completion;
> +        vio->io_completion = HVMIO_no_completion;
> 
> -        break;
> -    }
> -    default:
> -        ASSERT_UNREACHABLE();
> -        break;
> +        switch ( io_completion )
> +        {
> +        case HVMIO_no_completion:
> +            break;
> +        case HVMIO_mmio_completion:
> +            handle_mmio();
> +            break;
> +        case HVMIO_pio_completion:
> +            (void)handle_pio(vio->io_req.addr, vio->io_req.size,
> +                             vio->io_req.dir);
> +            break;
> +        case HVMIO_realmode_completion:meet
> +        {
> +            struct hvm_emulate_ctxt ctxt;
> +
> +            hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
> +            vmx_realmode_emulate_one(&ctxt);
> +            hvm_emulate_writeback(&ctxt);
> +
> +            break;
> +        }
> +        default:
> +            ASSERT_UNREACHABLE();
> +            break;
> +        }
>      }
> 
>      if ( unlikely(d->arch.event_write_data) )
> @@ -2747,6 +2751,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
>          }
>      }
> 
> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>      return X86EMUL_UNHANDLEABLE;
>  }

With this (and only this) patch applied I get the following output:

(XEN) irq.c:276: Dom18 PCI link 2 changed 11 -> 0
(XEN) irq.c:276: Dom18 PCI link 3 changed 5 -> 0
(XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
(XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
(XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
(XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166
(XEN) d19v0 weird emulation state 1
(XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:166

Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 13:06                 ` Roger Pau Monné
@ 2015-07-30 13:12                   ` Paul Durrant
  2015-07-30 13:19                     ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-30 13:12 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 30 July 2015 14:06
> To: Paul Durrant; xen-devel; Andrew Cooper
> Subject: Re: [BUG] Emulation issues
> 
> El 30/07/15 a les 12.59, Paul Durrant ha escrit:
> >> -----Original Message-----
> > [big snip]
> >> Sorry, missed that in the noise. So, the problem is that there is no in-flight
> I/O
> >> even though pio completion is being attempted. Something has got out of
> >> sync.
> >>
> >
> > I think I understand what may be happening... The code in
> hvmemul_do_io() basically expects to be called either to issue an I/O or to
> extract info from a completed one. However it is being called unconditionally
> (in the PIO case) out of hvm_do_resume, rather than only if the in-flight I/O
> state has been updated to STATE_IORESP_READY.
> >
> > Can you try this patch (also containing my previous debug patch)?
> >
> > diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> > index 30acb78..1bc3cc9 100644
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -145,6 +145,8 @@ static int hvmemul_do_io(
> >              return X86EMUL_UNHANDLEABLE;
> >          goto finish_access;
> >      default:
> > +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> > +                vio->io_req.state);
> >          return X86EMUL_UNHANDLEABLE;
> >      }
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index ec1d797..a476271 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -472,7 +472,6 @@ void hvm_do_resume(struct vcpu *v)
> >      struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
> >      struct domain *d = v->domain;
> >      struct hvm_ioreq_server *s;
> > -    enum hvm_io_completion io_completion;
> >
> >      check_wakeup_from_wait();
> >
> > @@ -499,33 +498,38 @@ void hvm_do_resume(struct vcpu *v)
> >          }
> >      }
> >
> > -    io_completion = vio->io_completion;
> > -    vio->io_completion = HVMIO_no_completion;
> > -
> > -    switch ( io_completion )
> > -    {
> > -    case HVMIO_no_completion:
> > -        break;
> > -    case HVMIO_mmio_completion:
> > -        handle_mmio();
> > -        break;
> > -    case HVMIO_pio_completion:
> > -        (void)handle_pio(vio->io_req.addr, vio->io_req.size,
> > -                         vio->io_req.dir);
> > -        break;
> > -    case HVMIO_realmode_completion:
> > +    if ( vio->io_req.state == STATE_IORESP_READY )
> >      {
> > -        struct hvm_emulate_ctxt ctxt;
> > +        enum hvm_io_completion io_completion;
> >
> > -        hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
> > -        vmx_realmode_emulate_one(&ctxt);
> > -        hvm_emulate_writeback(&ctxt);
> > +        io_completion = vio->io_completion;
> > +        vio->io_completion = HVMIO_no_completion;
> >
> > -        break;
> > -    }
> > -    default:
> > -        ASSERT_UNREACHABLE();
> > -        break;
> > +        switch ( io_completion )
> > +        {
> > +        case HVMIO_no_completion:
> > +            break;
> > +        case HVMIO_mmio_completion:
> > +            handle_mmio();
> > +            break;
> > +        case HVMIO_pio_completion:
> > +            (void)handle_pio(vio->io_req.addr, vio->io_req.size,
> > +                             vio->io_req.dir);
> > +            break;
> > +        case HVMIO_realmode_completion:meet
> > +        {
> > +            struct hvm_emulate_ctxt ctxt;
> > +
> > +            hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
> > +            vmx_realmode_emulate_one(&ctxt);
> > +            hvm_emulate_writeback(&ctxt);
> > +
> > +            break;
> > +        }
> > +        default:
> > +            ASSERT_UNREACHABLE();
> > +            break;
> > +        }
> >      }
> >
> >      if ( unlikely(d->arch.event_write_data) )
> > @@ -2747,6 +2751,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
> ioreq_t *proto_p,
> >          }
> >      }
> >
> > +    gprintk(XENLOG_ERR, "unable to contact device model\n");
> >      return X86EMUL_UNHANDLEABLE;
> >  }
> 
> With this (and only this) patch applied I get the following output:
> 
> (XEN) irq.c:276: Dom18 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom18 PCI link 3 changed 5 -> 0
> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> (XEN) d19v0 weird emulation state 1
> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:166
> 

Hmm. Can't understand how that's happening... handle_pio() shouldn't be called unless the state is STATE_IORESP_READY and yet the inner function is hitting the default case in the switch.

  Paul

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 13:12                   ` Paul Durrant
@ 2015-07-30 13:19                     ` Andrew Cooper
  2015-07-30 13:20                       ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2015-07-30 13:19 UTC (permalink / raw)
  To: Paul Durrant, Roger Pau Monne, xen-devel

On 30/07/15 14:12, Paul Durrant wrote:
>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) d19v0 weird emulation state 1
>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>> (XEN) d19v0 weird emulation state 1
>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>> (XEN) domain_crash called from io.c:166
>>
> Hmm. Can't understand how that's happening... handle_pio() shouldn't be called unless the state is STATE_IORESP_READY and yet the inner function is hitting the default case in the switch.

Sounds like something is changing the state between the two checks.  Is
this shared memory writeable by qemu?

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 13:19                     ` Andrew Cooper
@ 2015-07-30 13:20                       ` Paul Durrant
  2015-07-31 10:22                         ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-30 13:20 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne, xen-devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 30 July 2015 14:19
> To: Paul Durrant; Roger Pau Monne; xen-devel
> Subject: Re: [BUG] Emulation issues
> 
> On 30/07/15 14:12, Paul Durrant wrote:
> >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) d19v0 weird emulation state 1
> >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) d19v0 weird emulation state 1
> >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >>
> > Hmm. Can't understand how that's happening... handle_pio() shouldn't be
> called unless the state is STATE_IORESP_READY and yet the inner function is
> hitting the default case in the switch.
> 
> Sounds like something is changing the state between the two checks.  Is
> this shared memory writeable by qemu?
> 

No, this is the internal state. I really can't see how it's being changed.

  Paul

> ~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-30 13:20                       ` Paul Durrant
@ 2015-07-31 10:22                         ` Paul Durrant
  2015-07-31 11:11                           ` Sander Eikelenboom
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-31 10:22 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, Roger Pau Monne, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 30 July 2015 14:20
> To: Andrew Cooper; Roger Pau Monne; xen-devel
> Subject: Re: [Xen-devel] [BUG] Emulation issues
> 
> > -----Original Message-----
> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > Sent: 30 July 2015 14:19
> > To: Paul Durrant; Roger Pau Monne; xen-devel
> > Subject: Re: [BUG] Emulation issues
> >
> > On 30/07/15 14:12, Paul Durrant wrote:
> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > >> (XEN) domain_crash called from io.c:166
> > >> (XEN) d19v0 weird emulation state 1
> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > >> (XEN) domain_crash called from io.c:166
> > >> (XEN) d19v0 weird emulation state 1
> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > >> (XEN) domain_crash called from io.c:166
> > >>
> > > Hmm. Can't understand how that's happening... handle_pio() shouldn't
> be
> > called unless the state is STATE_IORESP_READY and yet the inner function
> is
> > hitting the default case in the switch.
> >
> > Sounds like something is changing the state between the two checks.  Is
> > this shared memory writeable by qemu?
> >
> 
> No, this is the internal state. I really can't see how it's being changed.
> 

I've tried to replicate your test on my rig (which is an old AMD box but quite a big one). Even so I only seem to get about half the VMs to start. The shutdown works fine, and I don't see any problems on the Xen console. I'm using an older build of Xen but still one with my series in. I'll try pulling up to the same commit as you and try again.

  Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 10:22                         ` Paul Durrant
@ 2015-07-31 11:11                           ` Sander Eikelenboom
  2015-07-31 11:39                             ` Roger Pau Monné
  2015-07-31 11:39                             ` Paul Durrant
  0 siblings, 2 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2015-07-31 11:11 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, xen-devel, Roger Pau Monne


Friday, July 31, 2015, 12:22:16 PM, you wrote:

>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of Paul Durrant
>> Sent: 30 July 2015 14:20
>> To: Andrew Cooper; Roger Pau Monne; xen-devel
>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>> 
>> > -----Original Message-----
>> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> > Sent: 30 July 2015 14:19
>> > To: Paul Durrant; Roger Pau Monne; xen-devel
>> > Subject: Re: [BUG] Emulation issues
>> >
>> > On 30/07/15 14:12, Paul Durrant wrote:
>> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>> > >> (XEN) domain_crash called from io.c:166
>> > >> (XEN) d19v0 weird emulation state 1
>> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>> > >> (XEN) domain_crash called from io.c:166
>> > >> (XEN) d19v0 weird emulation state 1
>> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>> > >> (XEN) domain_crash called from io.c:166
>> > >>
>> > > Hmm. Can't understand how that's happening... handle_pio() shouldn't
>> be
>> > called unless the state is STATE_IORESP_READY and yet the inner function
>> is
>> > hitting the default case in the switch.
>> >
>> > Sounds like something is changing the state between the two checks.  Is
>> > this shared memory writeable by qemu?
>> >
>> 
>> No, this is the internal state. I really can't see how it's being changed.
>> 

> I've tried to replicate your test on my rig (which is an old AMD box but quite a big one). Even so I only seem to get about half the VMs to start. The shutdown works fine, and I don't see any problems on the Xen console. I'm using an older build of Xen but still one with my series in. I'll try pulling up to the same commit as you and try again.

>   Paul

Hi Paul,

>From what i recall it started around when Tiejun Chen's series went in.

--

Sander

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 11:11                           ` Sander Eikelenboom
@ 2015-07-31 11:39                             ` Roger Pau Monné
  2015-07-31 11:39                             ` Paul Durrant
  1 sibling, 0 replies; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-31 11:39 UTC (permalink / raw)
  To: Sander Eikelenboom, Paul Durrant; +Cc: Andrew Cooper, xen-devel

El 31/07/15 a les 13.11, Sander Eikelenboom ha escrit:
> 
> Friday, July 31, 2015, 12:22:16 PM, you wrote:
> 
>>> -----Original Message-----
>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
>>> Sent: 30 July 2015 14:20
>>> To: Andrew Cooper; Roger Pau Monne; xen-devel
>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>>
>>>> -----Original Message-----
>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>>>> Sent: 30 July 2015 14:19
>>>> To: Paul Durrant; Roger Pau Monne; xen-devel
>>>> Subject: Re: [BUG] Emulation issues
>>>>
>>>> On 30/07/15 14:12, Paul Durrant wrote:
>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>> (XEN) domain_crash called from io.c:166
>>>>>> (XEN) d19v0 weird emulation state 1
>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>> (XEN) domain_crash called from io.c:166
>>>>>> (XEN) d19v0 weird emulation state 1
>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>
>>>>> Hmm. Can't understand how that's happening... handle_pio() shouldn't
>>> be
>>>> called unless the state is STATE_IORESP_READY and yet the inner function
>>> is
>>>> hitting the default case in the switch.
>>>>
>>>> Sounds like something is changing the state between the two checks.  Is
>>>> this shared memory writeable by qemu?
>>>>
>>>
>>> No, this is the internal state. I really can't see how it's being changed.
>>>
> 
>> I've tried to replicate your test on my rig (which is an old AMD box but quite a big one). Even so I only seem to get about half the VMs to start. The shutdown works fine, and I don't see any problems on the Xen console. I'm using an older build of Xen but still one with my series in. I'll try pulling up to the same commit as you and try again.
> 
>>   Paul
> 
> Hi Paul,
> 
>>From what i recall it started around when Tiejun Chen's series went in.

FWIW I've seen this on an Intel box with a Xeon W3550. I have not tried
to reproduce it on any of my other boxes, but I could give it a try if
needed.

Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 11:11                           ` Sander Eikelenboom
  2015-07-31 11:39                             ` Roger Pau Monné
@ 2015-07-31 11:39                             ` Paul Durrant
  2015-07-31 11:41                               ` Roger Pau Monné
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-31 11:39 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel, Roger Pau Monne

> -----Original Message-----
> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
> Sent: 31 July 2015 12:12
> To: Paul Durrant
> Cc: Andrew Cooper; Roger Pau Monne; xen-devel
> Subject: Re: [Xen-devel] [BUG] Emulation issues
> 
> 
> Friday, July 31, 2015, 12:22:16 PM, you wrote:
> 
> >> -----Original Message-----
> >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> >> bounces@lists.xen.org] On Behalf Of Paul Durrant
> >> Sent: 30 July 2015 14:20
> >> To: Andrew Cooper; Roger Pau Monne; xen-devel
> >> Subject: Re: [Xen-devel] [BUG] Emulation issues
> >>
> >> > -----Original Message-----
> >> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> > Sent: 30 July 2015 14:19
> >> > To: Paul Durrant; Roger Pau Monne; xen-devel
> >> > Subject: Re: [BUG] Emulation issues
> >> >
> >> > On 30/07/15 14:12, Paul Durrant wrote:
> >> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >> > >> (XEN) domain_crash called from io.c:166
> >> > >> (XEN) d19v0 weird emulation state 1
> >> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >> > >> (XEN) domain_crash called from io.c:166
> >> > >> (XEN) d19v0 weird emulation state 1
> >> > >> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >> > >> (XEN) domain_crash called from io.c:166
> >> > >>
> >> > > Hmm. Can't understand how that's happening... handle_pio()
> shouldn't
> >> be
> >> > called unless the state is STATE_IORESP_READY and yet the inner
> function
> >> is
> >> > hitting the default case in the switch.
> >> >
> >> > Sounds like something is changing the state between the two checks.  Is
> >> > this shared memory writeable by qemu?
> >> >
> >>
> >> No, this is the internal state. I really can't see how it's being changed.
> >>
> 
> > I've tried to replicate your test on my rig (which is an old AMD box but quite
> a big one). Even so I only seem to get about half the VMs to start. The
> shutdown works fine, and I don't see any problems on the Xen console. I'm
> using an older build of Xen but still one with my series in. I'll try pulling up to
> the same commit as you and try again.
> 
> >   Paul
> 
> Hi Paul,
> 
> From what i recall it started around when Tiejun Chen's series went in.
> 

Interesting.

  Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 11:39                             ` Paul Durrant
@ 2015-07-31 11:41                               ` Roger Pau Monné
  2015-07-31 11:42                                 ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-31 11:41 UTC (permalink / raw)
  To: Paul Durrant, Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel

El 31/07/15 a les 13.39, Paul Durrant ha escrit:
>> -----Original Message-----
>> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
>> Sent: 31 July 2015 12:12
>> To: Paul Durrant
>> Cc: Andrew Cooper; Roger Pau Monne; xen-devel
>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>
>>
>> Friday, July 31, 2015, 12:22:16 PM, you wrote:
>>
>>>> -----Original Message-----
>>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
>>>> Sent: 30 July 2015 14:20
>>>> To: Andrew Cooper; Roger Pau Monne; xen-devel
>>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>>>
>>>>> -----Original Message-----
>>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>>>>> Sent: 30 July 2015 14:19
>>>>> To: Paul Durrant; Roger Pau Monne; xen-devel
>>>>> Subject: Re: [BUG] Emulation issues
>>>>>
>>>>> On 30/07/15 14:12, Paul Durrant wrote:
>>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>> (XEN) d19v0 weird emulation state 1
>>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>> (XEN) d19v0 weird emulation state 1
>>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>>
>>>>>> Hmm. Can't understand how that's happening... handle_pio()
>> shouldn't
>>>> be
>>>>> called unless the state is STATE_IORESP_READY and yet the inner
>> function
>>>> is
>>>>> hitting the default case in the switch.
>>>>>
>>>>> Sounds like something is changing the state between the two checks.  Is
>>>>> this shared memory writeable by qemu?
>>>>>
>>>>
>>>> No, this is the internal state. I really can't see how it's being changed.
>>>>
>>
>>> I've tried to replicate your test on my rig (which is an old AMD box but quite
>> a big one). Even so I only seem to get about half the VMs to start. The
>> shutdown works fine, and I don't see any problems on the Xen console. I'm
>> using an older build of Xen but still one with my series in. I'll try pulling up to
>> the same commit as you and try again.
>>
>>>   Paul
>>
>> Hi Paul,
>>
>> From what i recall it started around when Tiejun Chen's series went in.
>>

Since I can reproduce this at will I will attempt to perform a
bisection. Maybe this can help narrow down the issue.

Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 11:41                               ` Roger Pau Monné
@ 2015-07-31 11:42                                 ` Paul Durrant
  2015-07-31 12:21                                   ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-31 11:42 UTC (permalink / raw)
  To: Roger Pau Monne, Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 31 July 2015 12:42
> To: Paul Durrant; Sander Eikelenboom
> Cc: Andrew Cooper; xen-devel
> Subject: Re: [Xen-devel] [BUG] Emulation issues
> 
> El 31/07/15 a les 13.39, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
> >> Sent: 31 July 2015 12:12
> >> To: Paul Durrant
> >> Cc: Andrew Cooper; Roger Pau Monne; xen-devel
> >> Subject: Re: [Xen-devel] [BUG] Emulation issues
> >>
> >>
> >> Friday, July 31, 2015, 12:22:16 PM, you wrote:
> >>
> >>>> -----Original Message-----
> >>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> >>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
> >>>> Sent: 30 July 2015 14:20
> >>>> To: Andrew Cooper; Roger Pau Monne; xen-devel
> >>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >>>>> Sent: 30 July 2015 14:19
> >>>>> To: Paul Durrant; Roger Pau Monne; xen-devel
> >>>>> Subject: Re: [BUG] Emulation issues
> >>>>>
> >>>>> On 30/07/15 14:12, Paul Durrant wrote:
> >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >>>>>>> (XEN) domain_crash called from io.c:166
> >>>>>>> (XEN) d19v0 weird emulation state 1
> >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >>>>>>> (XEN) domain_crash called from io.c:166
> >>>>>>> (XEN) d19v0 weird emulation state 1
> >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> >>>>>>> (XEN) domain_crash called from io.c:166
> >>>>>>>
> >>>>>> Hmm. Can't understand how that's happening... handle_pio()
> >> shouldn't
> >>>> be
> >>>>> called unless the state is STATE_IORESP_READY and yet the inner
> >> function
> >>>> is
> >>>>> hitting the default case in the switch.
> >>>>>
> >>>>> Sounds like something is changing the state between the two checks.
> Is
> >>>>> this shared memory writeable by qemu?
> >>>>>
> >>>>
> >>>> No, this is the internal state. I really can't see how it's being changed.
> >>>>
> >>
> >>> I've tried to replicate your test on my rig (which is an old AMD box but
> quite
> >> a big one). Even so I only seem to get about half the VMs to start. The
> >> shutdown works fine, and I don't see any problems on the Xen console.
> I'm
> >> using an older build of Xen but still one with my series in. I'll try pulling up
> to
> >> the same commit as you and try again.
> >>
> >>>   Paul
> >>
> >> Hi Paul,
> >>
> >> From what i recall it started around when Tiejun Chen's series went in.
> >>
> 
> Since I can reproduce this at will I will attempt to perform a
> bisection. Maybe this can help narrow down the issue.
> 

Thanks. That would be very helpful. I will continue to try to repro.

  Paul

> Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 11:42                                 ` Paul Durrant
@ 2015-07-31 12:21                                   ` Paul Durrant
  2015-07-31 14:19                                     ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-31 12:21 UTC (permalink / raw)
  To: Paul Durrant, Roger Pau Monne, Sander Eikelenboom
  Cc: Andrew Cooper, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 31 July 2015 12:43
> To: Roger Pau Monne; Sander Eikelenboom
> Cc: Andrew Cooper; xen-devel
> Subject: Re: [Xen-devel] [BUG] Emulation issues
> 
> > -----Original Message-----
> > From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> > Sent: 31 July 2015 12:42
> > To: Paul Durrant; Sander Eikelenboom
> > Cc: Andrew Cooper; xen-devel
> > Subject: Re: [Xen-devel] [BUG] Emulation issues
> >
> > El 31/07/15 a les 13.39, Paul Durrant ha escrit:
> > >> -----Original Message-----
> > >> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
> > >> Sent: 31 July 2015 12:12
> > >> To: Paul Durrant
> > >> Cc: Andrew Cooper; Roger Pau Monne; xen-devel
> > >> Subject: Re: [Xen-devel] [BUG] Emulation issues
> > >>
> > >>
> > >> Friday, July 31, 2015, 12:22:16 PM, you wrote:
> > >>
> > >>>> -----Original Message-----
> > >>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> > >>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
> > >>>> Sent: 30 July 2015 14:20
> > >>>> To: Andrew Cooper; Roger Pau Monne; xen-devel
> > >>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > >>>>> Sent: 30 July 2015 14:19
> > >>>>> To: Paul Durrant; Roger Pau Monne; xen-devel
> > >>>>> Subject: Re: [BUG] Emulation issues
> > >>>>>
> > >>>>> On 30/07/15 14:12, Paul Durrant wrote:
> > >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > >>>>>>> (XEN) domain_crash called from io.c:166
> > >>>>>>> (XEN) d19v0 weird emulation state 1
> > >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > >>>>>>> (XEN) domain_crash called from io.c:166
> > >>>>>>> (XEN) d19v0 weird emulation state 1
> > >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > >>>>>>> (XEN) domain_crash called from io.c:166
> > >>>>>>>
> > >>>>>> Hmm. Can't understand how that's happening... handle_pio()
> > >> shouldn't
> > >>>> be
> > >>>>> called unless the state is STATE_IORESP_READY and yet the inner
> > >> function
> > >>>> is
> > >>>>> hitting the default case in the switch.
> > >>>>>
> > >>>>> Sounds like something is changing the state between the two
> checks.
> > Is
> > >>>>> this shared memory writeable by qemu?
> > >>>>>
> > >>>>
> > >>>> No, this is the internal state. I really can't see how it's being changed.
> > >>>>
> > >>
> > >>> I've tried to replicate your test on my rig (which is an old AMD box but
> > quite
> > >> a big one). Even so I only seem to get about half the VMs to start. The
> > >> shutdown works fine, and I don't see any problems on the Xen console.
> > I'm
> > >> using an older build of Xen but still one with my series in. I'll try pulling
> up
> > to
> > >> the same commit as you and try again.
> > >>
> > >>>   Paul
> > >>
> > >> Hi Paul,
> > >>
> > >> From what i recall it started around when Tiejun Chen's series went in.
> > >>
> >
> > Since I can reproduce this at will I will attempt to perform a
> > bisection. Maybe this can help narrow down the issue.
> >
> 
> Thanks. That would be very helpful. I will continue to try to repro.
> 

Still no luck with the repro but I think I might my thought experiments might have got it...

If a vcpu has a request in-flight then its internal ioreq state will be IOREQ_READY and it will be waiting for wake-up. When it is woken up hvm_do_resume() will be called and it will call hvm_wait_for_io(). If the shared (with QEMU) ioreq state is still IOREQ_READY or IOREQ_INPROCESS then the vcpu will block again. If the shared state is IORESP_READY then the emulation is done and the internal state will be updated to IORESP_READY or IOREQ_NONE by hvm_io_assist() depending upon whether any completion is needed or not.
*However* if the emulator (or Xen) happens to zero out the shared ioreq state before hvm_wait_for_io() is called then it will see a shared state of IOREQ_NONE so it will terminate without calling hvm_io_assist() leaving the internal ioreq state as IOREQ_READY which will then cause the domain_crash() you're seeing when re-emulation is attempted by a completion handler.

So, there is an underlying problem in that a dying emulator can leave an I/O uncompleted but the code in Xen needs to cope more gracefully with that (since the vcpu will be going away anyway) and not call domain_crash().

  Paul


>   Paul
> 
> > Roger.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 12:21                                   ` Paul Durrant
@ 2015-07-31 14:19                                     ` Paul Durrant
  2015-07-31 15:15                                       ` Roger Pau Monné
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2015-07-31 14:19 UTC (permalink / raw)
  To: Roger Pau Monne, Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel

> -----Original Message-----
> From: Paul Durrant
> Sent: 31 July 2015 13:21
> To: Paul Durrant; Roger Pau Monne; Sander Eikelenboom
> Cc: Andrew Cooper; xen-devel
> Subject: RE: [Xen-devel] [BUG] Emulation issues
> 
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> > bounces@lists.xen.org] On Behalf Of Paul Durrant
> > Sent: 31 July 2015 12:43
> > To: Roger Pau Monne; Sander Eikelenboom
> > Cc: Andrew Cooper; xen-devel
> > Subject: Re: [Xen-devel] [BUG] Emulation issues
> >
> > > -----Original Message-----
> > > From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> > > Sent: 31 July 2015 12:42
> > > To: Paul Durrant; Sander Eikelenboom
> > > Cc: Andrew Cooper; xen-devel
> > > Subject: Re: [Xen-devel] [BUG] Emulation issues
> > >
> > > El 31/07/15 a les 13.39, Paul Durrant ha escrit:
> > > >> -----Original Message-----
> > > >> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
> > > >> Sent: 31 July 2015 12:12
> > > >> To: Paul Durrant
> > > >> Cc: Andrew Cooper; Roger Pau Monne; xen-devel
> > > >> Subject: Re: [Xen-devel] [BUG] Emulation issues
> > > >>
> > > >>
> > > >> Friday, July 31, 2015, 12:22:16 PM, you wrote:
> > > >>
> > > >>>> -----Original Message-----
> > > >>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> > > >>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
> > > >>>> Sent: 30 July 2015 14:20
> > > >>>> To: Andrew Cooper; Roger Pau Monne; xen-devel
> > > >>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > > >>>>> Sent: 30 July 2015 14:19
> > > >>>>> To: Paul Durrant; Roger Pau Monne; xen-devel
> > > >>>>> Subject: Re: [BUG] Emulation issues
> > > >>>>>
> > > >>>>> On 30/07/15 14:12, Paul Durrant wrote:
> > > >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > > >>>>>>> (XEN) domain_crash called from io.c:166
> > > >>>>>>> (XEN) d19v0 weird emulation state 1
> > > >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > > >>>>>>> (XEN) domain_crash called from io.c:166
> > > >>>>>>> (XEN) d19v0 weird emulation state 1
> > > >>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
> > > >>>>>>> (XEN) domain_crash called from io.c:166
> > > >>>>>>>
> > > >>>>>> Hmm. Can't understand how that's happening... handle_pio()
> > > >> shouldn't
> > > >>>> be
> > > >>>>> called unless the state is STATE_IORESP_READY and yet the inner
> > > >> function
> > > >>>> is
> > > >>>>> hitting the default case in the switch.
> > > >>>>>
> > > >>>>> Sounds like something is changing the state between the two
> > checks.
> > > Is
> > > >>>>> this shared memory writeable by qemu?
> > > >>>>>
> > > >>>>
> > > >>>> No, this is the internal state. I really can't see how it's being
> changed.
> > > >>>>
> > > >>
> > > >>> I've tried to replicate your test on my rig (which is an old AMD box
> but
> > > quite
> > > >> a big one). Even so I only seem to get about half the VMs to start. The
> > > >> shutdown works fine, and I don't see any problems on the Xen
> console.
> > > I'm
> > > >> using an older build of Xen but still one with my series in. I'll try pulling
> > up
> > > to
> > > >> the same commit as you and try again.
> > > >>
> > > >>>   Paul
> > > >>
> > > >> Hi Paul,
> > > >>
> > > >> From what i recall it started around when Tiejun Chen's series went in.
> > > >>
> > >
> > > Since I can reproduce this at will I will attempt to perform a
> > > bisection. Maybe this can help narrow down the issue.
> > >
> >
> > Thanks. That would be very helpful. I will continue to try to repro.
> >
> 
> Still no luck with the repro but I think I might my thought experiments might
> have got it...
> 
> If a vcpu has a request in-flight then its internal ioreq state will be
> IOREQ_READY and it will be waiting for wake-up. When it is woken up
> hvm_do_resume() will be called and it will call hvm_wait_for_io(). If the
> shared (with QEMU) ioreq state is still IOREQ_READY or IOREQ_INPROCESS
> then the vcpu will block again. If the shared state is IORESP_READY then the
> emulation is done and the internal state will be updated to IORESP_READY or
> IOREQ_NONE by hvm_io_assist() depending upon whether any completion
> is needed or not.
> *However* if the emulator (or Xen) happens to zero out the shared ioreq
> state before hvm_wait_for_io() is called then it will see a shared state of
> IOREQ_NONE so it will terminate without calling hvm_io_assist() leaving the
> internal ioreq state as IOREQ_READY which will then cause the
> domain_crash() you're seeing when re-emulation is attempted by a
> completion handler.
> 
> So, there is an underlying problem in that a dying emulator can leave an I/O
> uncompleted but the code in Xen needs to cope more gracefully with that
> (since the vcpu will be going away anyway) and not call domain_crash().
> 

Can you please try this patch:

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec1d797..197a8c4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -412,44 +412,52 @@ bool_t hvm_io_pending(struct vcpu *v)
                           &d->arch.hvm_domain.ioreq_server.list,
                           list_entry )
     {
-        ioreq_t *p = get_ioreq(s, v);
+        struct hvm_ioreq_vcpu *sv;

-        if ( p->state != STATE_IOREQ_NONE )
-            return 1;
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            if ( sv->vcpu == v && sv->pending )
+                return 1;
+        }
     }

     return 0;
 }

-static void hvm_io_assist(ioreq_t *p)
+static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
 {
-    struct vcpu *curr = current;
-    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-
-    p->state = STATE_IOREQ_NONE;
+    struct vcpu *v = sv->vcpu;
+    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;

     if ( hvm_vcpu_io_need_completion(vio) )
     {
         vio->io_req.state = STATE_IORESP_READY;
-        vio->io_req.data = p->data;
+        vio->io_req.data = data;
     }
     else
         vio->io_req.state = STATE_IOREQ_NONE;

-    msix_write_completion(curr);
-    vcpu_end_shutdown_deferral(curr);
+    msix_write_completion(v);
+    vcpu_end_shutdown_deferral(v);
+
+    sv->pending = 0;
 }

 static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
 {
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    while ( p->state != STATE_IOREQ_NONE )
+    while ( sv->pending )
     {
         switch ( p->state )
         {
+        case STATE_IOREQ_NONE:
+            hvm_io_assist(sv, ~0ul);
+            break;
         case STATE_IORESP_READY: /* IORESP_READY -> NONE */
             rmb(); /* see IORESP_READY /then/ read contents of ioreq */
-            hvm_io_assist(p);
+            p->state = STATE_IOREQ_NONE;
+            hvm_io_assist(sv, p->data);
             break;
         case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
         case STATE_IOREQ_INPROCESS:
@@ -459,6 +467,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ior
             break;
         default:
             gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
+            sv->pending = 0;
             domain_crash(sv->vcpu->domain);
             return 0; /* bail */
         }
@@ -489,7 +498,7 @@ void hvm_do_resume(struct vcpu *v)
                               &s->ioreq_vcpu_list,
                               list_entry )
         {
-            if ( sv->vcpu == v )
+            if ( sv->vcpu == v && sv->pending )
             {
                 if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) )
                     return;
@@ -2743,6 +2752,8 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *pr
              */
             p->state = STATE_IOREQ_READY;
             notify_via_xen_event_channel(d, port);
+
+            sv->pending = 1;
             return X86EMUL_RETRY;
         }
     }
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b612755..12b5e0c 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -46,6 +46,7 @@ struct hvm_ioreq_vcpu {
     struct list_head list_entry;
     struct vcpu      *vcpu;
     evtchn_port_t    ioreq_evtchn;
+    bool_t           pending;
 };

 #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
--

  Paul

>   Paul
> 
> 
> >   Paul
> >
> > > Roger.
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] Emulation issues
  2015-07-31 14:19                                     ` Paul Durrant
@ 2015-07-31 15:15                                       ` Roger Pau Monné
  0 siblings, 0 replies; 26+ messages in thread
From: Roger Pau Monné @ 2015-07-31 15:15 UTC (permalink / raw)
  To: Paul Durrant, Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel

El 31/07/15 a les 16.19, Paul Durrant ha escrit:
>> -----Original Message-----
>> From: Paul Durrant
>> Sent: 31 July 2015 13:21
>> To: Paul Durrant; Roger Pau Monne; Sander Eikelenboom
>> Cc: Andrew Cooper; xen-devel
>> Subject: RE: [Xen-devel] [BUG] Emulation issues
>>
>>> -----Original Message-----
>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
>>> Sent: 31 July 2015 12:43
>>> To: Roger Pau Monne; Sander Eikelenboom
>>> Cc: Andrew Cooper; xen-devel
>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>>
>>>> -----Original Message-----
>>>> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
>>>> Sent: 31 July 2015 12:42
>>>> To: Paul Durrant; Sander Eikelenboom
>>>> Cc: Andrew Cooper; xen-devel
>>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>>>
>>>> El 31/07/15 a les 13.39, Paul Durrant ha escrit:
>>>>>> -----Original Message-----
>>>>>> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
>>>>>> Sent: 31 July 2015 12:12
>>>>>> To: Paul Durrant
>>>>>> Cc: Andrew Cooper; Roger Pau Monne; xen-devel
>>>>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>>>>>
>>>>>>
>>>>>> Friday, July 31, 2015, 12:22:16 PM, you wrote:
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>>>>>>> bounces@lists.xen.org] On Behalf Of Paul Durrant
>>>>>>>> Sent: 30 July 2015 14:20
>>>>>>>> To: Andrew Cooper; Roger Pau Monne; xen-devel
>>>>>>>> Subject: Re: [Xen-devel] [BUG] Emulation issues
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>>>>>>>>> Sent: 30 July 2015 14:19
>>>>>>>>> To: Paul Durrant; Roger Pau Monne; xen-devel
>>>>>>>>> Subject: Re: [BUG] Emulation issues
>>>>>>>>>
>>>>>>>>> On 30/07/15 14:12, Paul Durrant wrote:
>>>>>>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>>>>>> (XEN) d19v0 weird emulation state 1
>>>>>>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>>>>>> (XEN) d19v0 weird emulation state 1
>>>>>>>>>>> (XEN) io.c:165:d19v0 Weird HVM ioemulation status 1.
>>>>>>>>>>> (XEN) domain_crash called from io.c:166
>>>>>>>>>>>
>>>>>>>>>> Hmm. Can't understand how that's happening... handle_pio()
>>>>>> shouldn't
>>>>>>>> be
>>>>>>>>> called unless the state is STATE_IORESP_READY and yet the inner
>>>>>> function
>>>>>>>> is
>>>>>>>>> hitting the default case in the switch.
>>>>>>>>>
>>>>>>>>> Sounds like something is changing the state between the two
>>> checks.
>>>> Is
>>>>>>>>> this shared memory writeable by qemu?
>>>>>>>>>
>>>>>>>>
>>>>>>>> No, this is the internal state. I really can't see how it's being
>> changed.
>>>>>>>>
>>>>>>
>>>>>>> I've tried to replicate your test on my rig (which is an old AMD box
>> but
>>>> quite
>>>>>> a big one). Even so I only seem to get about half the VMs to start. The
>>>>>> shutdown works fine, and I don't see any problems on the Xen
>> console.
>>>> I'm
>>>>>> using an older build of Xen but still one with my series in. I'll try pulling
>>> up
>>>> to
>>>>>> the same commit as you and try again.
>>>>>>
>>>>>>>   Paul
>>>>>>
>>>>>> Hi Paul,
>>>>>>
>>>>>> From what i recall it started around when Tiejun Chen's series went in.
>>>>>>
>>>>
>>>> Since I can reproduce this at will I will attempt to perform a
>>>> bisection. Maybe this can help narrow down the issue.
>>>>
>>>
>>> Thanks. That would be very helpful. I will continue to try to repro.
>>>
>>
>> Still no luck with the repro but I think I might my thought experiments might
>> have got it...
>>
>> If a vcpu has a request in-flight then its internal ioreq state will be
>> IOREQ_READY and it will be waiting for wake-up. When it is woken up
>> hvm_do_resume() will be called and it will call hvm_wait_for_io(). If the
>> shared (with QEMU) ioreq state is still IOREQ_READY or IOREQ_INPROCESS
>> then the vcpu will block again. If the shared state is IORESP_READY then the
>> emulation is done and the internal state will be updated to IORESP_READY or
>> IOREQ_NONE by hvm_io_assist() depending upon whether any completion
>> is needed or not.
>> *However* if the emulator (or Xen) happens to zero out the shared ioreq
>> state before hvm_wait_for_io() is called then it will see a shared state of
>> IOREQ_NONE so it will terminate without calling hvm_io_assist() leaving the
>> internal ioreq state as IOREQ_READY which will then cause the
>> domain_crash() you're seeing when re-emulation is attempted by a
>> completion handler.
>>
>> So, there is an underlying problem in that a dying emulator can leave an I/O
>> uncompleted but the code in Xen needs to cope more gracefully with that
>> (since the vcpu will be going away anyway) and not call domain_crash().
>>
> 
> Can you please try this patch:
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index ec1d797..197a8c4 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -412,44 +412,52 @@ bool_t hvm_io_pending(struct vcpu *v)
>                            &d->arch.hvm_domain.ioreq_server.list,
>                            list_entry )
>      {
> -        ioreq_t *p = get_ioreq(s, v);
> +        struct hvm_ioreq_vcpu *sv;
> 
> -        if ( p->state != STATE_IOREQ_NONE )
> -            return 1;
> +        list_for_each_entry ( sv,
> +                              &s->ioreq_vcpu_list,
> +                              list_entry )
> +        {
> +            if ( sv->vcpu == v && sv->pending )
> +                return 1;
> +        }
>      }
> 
>      return 0;
>  }
> 
> -static void hvm_io_assist(ioreq_t *p)
> +static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
>  {
> -    struct vcpu *curr = current;
> -    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> -
> -    p->state = STATE_IOREQ_NONE;
> +    struct vcpu *v = sv->vcpu;
> +    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
> 
>      if ( hvm_vcpu_io_need_completion(vio) )
>      {
>          vio->io_req.state = STATE_IORESP_READY;
> -        vio->io_req.data = p->data;
> +        vio->io_req.data = data;
>      }
>      else
>          vio->io_req.state = STATE_IOREQ_NONE;
> 
> -    msix_write_completion(curr);
> -    vcpu_end_shutdown_deferral(curr);
> +    msix_write_completion(v);
> +    vcpu_end_shutdown_deferral(v);
> +
> +    sv->pending = 0;
>  }
> 
>  static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>  {
> -    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> -    while ( p->state != STATE_IOREQ_NONE )
> +    while ( sv->pending )
>      {
>          switch ( p->state )
>          {
> +        case STATE_IOREQ_NONE:
> +            hvm_io_assist(sv, ~0ul);
> +            break;
>          case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>              rmb(); /* see IORESP_READY /then/ read contents of ioreq */
> -            hvm_io_assist(p);
> +            p->state = STATE_IOREQ_NONE;
> +            hvm_io_assist(sv, p->data);
>              break;
>          case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
>          case STATE_IOREQ_INPROCESS:
> @@ -459,6 +467,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ior
>              break;
>          default:
>              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
> +            sv->pending = 0;
>              domain_crash(sv->vcpu->domain);
>              return 0; /* bail */
>          }
> @@ -489,7 +498,7 @@ void hvm_do_resume(struct vcpu *v)
>                                &s->ioreq_vcpu_list,
>                                list_entry )
>          {
> -            if ( sv->vcpu == v )
> +            if ( sv->vcpu == v && sv->pending )
>              {
>                  if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>                      return;
> @@ -2743,6 +2752,8 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *pr
>               */
>              p->state = STATE_IOREQ_READY;
>              notify_via_xen_event_channel(d, port);
> +
> +            sv->pending = 1;
>              return X86EMUL_RETRY;
>          }
>      }
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index b612755..12b5e0c 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -46,6 +46,7 @@ struct hvm_ioreq_vcpu {
>      struct list_head list_entry;
>      struct vcpu      *vcpu;
>      evtchn_port_t    ioreq_evtchn;
> +    bool_t           pending;
>  };
> 
>  #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
> --

Thanks, this solves the issue for me, I've been able to shutdown +40 HVM
guests without issues. You can add my Tested-by when you formally post
the patch.

Roger.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-07-31 15:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29 10:17 [BUG] Emulation issues Roger Pau Monné
2015-07-29 10:27 ` Paul Durrant
2015-07-29 10:36   ` Roger Pau Monné
2015-07-29 10:37     ` Paul Durrant
2015-07-29 12:08     ` Andrew Cooper
2015-07-29 12:41     ` Paul Durrant
2015-07-29 13:54       ` Roger Pau Monné
2015-07-30 10:12         ` Paul Durrant
2015-07-30 10:16           ` Roger Pau Monné
2015-07-30 10:21             ` Paul Durrant
2015-07-30 10:59               ` Paul Durrant
2015-07-30 13:06                 ` Roger Pau Monné
2015-07-30 13:12                   ` Paul Durrant
2015-07-30 13:19                     ` Andrew Cooper
2015-07-30 13:20                       ` Paul Durrant
2015-07-31 10:22                         ` Paul Durrant
2015-07-31 11:11                           ` Sander Eikelenboom
2015-07-31 11:39                             ` Roger Pau Monné
2015-07-31 11:39                             ` Paul Durrant
2015-07-31 11:41                               ` Roger Pau Monné
2015-07-31 11:42                                 ` Paul Durrant
2015-07-31 12:21                                   ` Paul Durrant
2015-07-31 14:19                                     ` Paul Durrant
2015-07-31 15:15                                       ` Roger Pau Monné
2015-07-30 10:24             ` Andrew Cooper
2015-07-30 10:27               ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).