handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

All of lore.kernel.org
 help / color / mirror / Atom feed

* handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
@ 2020-06-04  1:46 Marek Marczykowski-Górecki
  2020-06-04  7:08 ` Jan Beulich
  2020-06-05  9:38 ` Jan Beulich
  0 siblings, 2 replies; 40+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-06-04  1:46 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3066 bytes --]

Hi,

(continuation of a thread from #xendevel)

During system shutdown quite often I hit infinite stream of errors like
this:

    (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
    (XEN) domain_crash called from io.c:178

This is all running on Xen 4.13.0 (I think I've got this with 4.13.1
too), nested within KVM. The KVM part means everything is very slow, so
various race conditions are much more likely to happen.

It started happening not long ago, and I'm pretty sure it's about
updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0.

Thanks to Andrew and Roger, I've managed to collect more info.

Context:
    dom0: pv
    dom1: hvm
    dom2: stubdom for dom1
    dom3: hvm
    dom4: stubdom for dom3
    dom5: pvh
    dom6: pvh

It starts I think ok:

    (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
    (XEN) d3v0 handle_pio port 0xb004 read 0x0000
    (XEN) d3v0 handle_pio port 0xb004 read 0x0000
    (XEN) d3v0 handle_pio port 0xb004 write 0x0001
    (XEN) d3v0 handle_pio port 0xb004 write 0x2001
    (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
    (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
    (XEN) d1v0 handle_pio port 0xb004 read 0x0000
    (XEN) d1v0 handle_pio port 0xb004 read 0x0000
    (XEN) d1v0 handle_pio port 0xb004 write 0x0001
    (XEN) d1v0 handle_pio port 0xb004 write 0x2001
    (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0

But then (after a second or so) when the toolstack tries to clean it up,
things go sideways:

    (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6
    (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 got domain_lock
    (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 ret -85
    (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6
    (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 got domain_lock
    (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 ret -85
    (... long stream of domain destroy that can't really finish ...)
    
And then, similar also for dom1:

    (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1
    (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 got domain_lock
    (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 ret -85
    (... now a stream of this for dom1 and dom6 interleaved ...)

At some point, domain 2 (stubdom for domain 1) and domain 5 join too. 

Then, we get the main issue:

    (XEN) d3v0 handle_pio port 0xb004 read 0x0000
    (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
    (XEN) domain_crash called from io.c:178

Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.

Full log of the shutdown:
https://gist.github.com/marmarek/fbfe1b5d8f4c7b47df5a5e28bd95ea66

And the patch adding those extra messages:
https://gist.github.com/marmarek/dc739a820928e641a1ed6b4759cdf6f3

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04  1:46 handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom Marek Marczykowski-Górecki
@ 2020-06-04  7:08 ` Jan Beulich
  2020-06-04  7:17   ` Jan Beulich
  2020-06-04 11:13   ` Andrew Cooper
  2020-06-05  9:38 ` Jan Beulich
  1 sibling, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2020-06-04  7:08 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> During system shutdown quite often I hit infinite stream of errors like
> this:
> 
>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>     (XEN) domain_crash called from io.c:178
> 
> This is all running on Xen 4.13.0 (I think I've got this with 4.13.1
> too), nested within KVM. The KVM part means everything is very slow, so
> various race conditions are much more likely to happen.
> 
> It started happening not long ago, and I'm pretty sure it's about
> updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0.
> 
> Thanks to Andrew and Roger, I've managed to collect more info.
> 
> Context:
>     dom0: pv
>     dom1: hvm
>     dom2: stubdom for dom1
>     dom3: hvm
>     dom4: stubdom for dom3
>     dom5: pvh
>     dom6: pvh
> 
> It starts I think ok:
> 
>     (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>     (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>     (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>     (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>     (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>     (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>     (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> 
> But then (after a second or so) when the toolstack tries to clean it up,
> things go sideways:
> 
>     (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6
>     (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 got domain_lock
>     (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 ret -85
>     (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6
>     (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 got domain_lock
>     (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 ret -85
>     (... long stream of domain destroy that can't really finish ...)
>     
> And then, similar also for dom1:
> 
>     (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1
>     (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 got domain_lock
>     (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 ret -85
>     (... now a stream of this for dom1 and dom6 interleaved ...)
> 
> At some point, domain 2 (stubdom for domain 1) and domain 5 join too. 

What makes you think this is an indication of things going sideways?
-85 is -ERESTART, which is quite normal to see for a period of time
while cleaning up a domain.

> Then, we get the main issue:
> 
>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>     (XEN) domain_crash called from io.c:178
> 
> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.

I'd guess an issue with the shutdown deferral logic. Did you / can
you check whether XEN_DMOP_remote_shutdown managed to pause all
CPUs (I assume it didn't, since once they're paused there shouldn't
be any I/O there anymore, and hence no I/O emulation)?

Another question though: In 4.13 the log message next to the
domain_crash() I assume you're hitting is "Weird HVM ioemulation
status", not "Weird PIO status", and the debugging patch you
referenced doesn't have any change there. Andrew's recent
change to master, otoh, doesn't use the word "weird" anymore. I
can therefore only guess that the value logged is still
hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE.
Please confirm.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04  7:08 ` Jan Beulich
@ 2020-06-04  7:17   ` Jan Beulich
  2020-06-04 11:13   ` Andrew Cooper
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2020-06-04  7:17 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 04.06.2020 09:08, Jan Beulich wrote:
> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>> Then, we get the main issue:
>>
>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>     (XEN) domain_crash called from io.c:178
>>
>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> 
> I'd guess an issue with the shutdown deferral logic. Did you / can
> you check whether XEN_DMOP_remote_shutdown managed to pause all
> CPUs (I assume it didn't, since once they're paused there shouldn't
> be any I/O there anymore, and hence no I/O emulation)?
> 
> Another question though: In 4.13 the log message next to the
> domain_crash() I assume you're hitting is "Weird HVM ioemulation
> status", not "Weird PIO status", and the debugging patch you
> referenced doesn't have any change there. Andrew's recent
> change to master, otoh, doesn't use the word "weird" anymore. I
> can therefore only guess that the value logged is still
> hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE.
> Please confirm.

If so, the prime candidate source of the X86EMUL_UNHANDLEABLE would
seem to be hvmemul_do_io()'s first switch(). Could you instrument
that one as well, so see what vio->io_req.state has caused it?

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04  7:08 ` Jan Beulich
  2020-06-04  7:17   ` Jan Beulich
@ 2020-06-04 11:13   ` Andrew Cooper
  2020-06-04 12:36     ` Jan Beulich
  1 sibling, 1 reply; 40+ messages in thread
From: Andrew Cooper @ 2020-06-04 11:13 UTC (permalink / raw)
  To: Jan Beulich, Marek Marczykowski-Górecki; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2419 bytes --]

On 04/06/2020 08:08, Jan Beulich wrote:
> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>> Then, we get the main issue:
>>
>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>     (XEN) domain_crash called from io.c:178
>>
>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> I'd guess an issue with the shutdown deferral logic. Did you / can
> you check whether XEN_DMOP_remote_shutdown managed to pause all
> CPUs (I assume it didn't, since once they're paused there shouldn't
> be any I/O there anymore, and hence no I/O emulation)?

The vcpu in question is talking to Qemu, so will have v->defer_shutdown
intermittently set, and skip the pause in domain_shutdown()

I presume this lack of pause is to allow the vcpu in question to still
be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
0 clarifying details).

What *should* happen is that, after consuming the reply, the vcpu should
notice and pause itself, at which point it would yield to the
scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().

Evidentially, this is not happening.

Marek: can you add a BUG() after the weird PIO printing?  That should
confirm whether we're getting into handle_pio() via the
handle_hvm_io_completion() path, or via the vmexit path (at which case,
we're fully re-entering the guest).

I suspect you can drop the debugging of XEN_DOMCTL_destroydomain - I
think its just noise atm.

However, it would be very helpful to see the vcpus which fall into
domain_shutdown()'s "else if ( v->defer_shutdown ) continue;" path.

> Another question though: In 4.13 the log message next to the
> domain_crash() I assume you're hitting is "Weird HVM ioemulation
> status", not "Weird PIO status", and the debugging patch you
> referenced doesn't have any change there. Andrew's recent
> change to master, otoh, doesn't use the word "weird" anymore. I
> can therefore only guess that the value logged is still
> hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE.
> Please confirm.

It's the first draft of the patch which I did, before submitting to
xen-devel.  We do have X86EMUL_UNHANDLEABLE at this point, but its not
terribly helpful - there are loads of paths which fail silently with
this error.

~Andrew

[-- Attachment #2: Type: text/html, Size: 3268 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04 11:13   ` Andrew Cooper
@ 2020-06-04 12:36     ` Jan Beulich
  2020-06-04 14:25       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-04 12:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Marek Marczykowski-Górecki, Paul Durrant

On 04.06.2020 13:13, Andrew Cooper wrote:
> On 04/06/2020 08:08, Jan Beulich wrote:
>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>> Then, we get the main issue:
>>>
>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>     (XEN) domain_crash called from io.c:178
>>>
>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>> I'd guess an issue with the shutdown deferral logic. Did you / can
>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>> CPUs (I assume it didn't, since once they're paused there shouldn't
>> be any I/O there anymore, and hence no I/O emulation)?
> 
> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> intermittently set, and skip the pause in domain_shutdown()
> 
> I presume this lack of pause is to allow the vcpu in question to still
> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> 0 clarifying details).
> 
> What *should* happen is that, after consuming the reply, the vcpu should
> notice and pause itself, at which point it would yield to the
> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> 
> Evidentially, this is not happening.

We can't tell yet, until ...

> Marek: can you add a BUG() after the weird PIO printing?  That should
> confirm whether we're getting into handle_pio() via the
> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> we're fully re-entering the guest).

... we know this. handle_pio() gets called from handle_hvm_io_completion()
after having called hvm_wait_for_io() -> hvm_io_assist() ->
vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
handle_pio() (etc) at all anymore in this state. IOW perhaps
hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
instead of plain "true"?

Adding Paul to Cc, as being the maintainer here.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04 12:36     ` Jan Beulich
@ 2020-06-04 14:25       ` Marek Marczykowski-Górecki
  2020-06-05  9:09         ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-06-04 14:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Paul Durrant, xen-devel

[-- Attachment #1: Type: text/plain, Size: 6177 bytes --]

On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> On 04.06.2020 13:13, Andrew Cooper wrote:
> > On 04/06/2020 08:08, Jan Beulich wrote:
> >> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>> Then, we get the main issue:
> >>>
> >>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>     (XEN) domain_crash called from io.c:178
> >>>
> >>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >> I'd guess an issue with the shutdown deferral logic. Did you / can
> >> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >> CPUs (I assume it didn't, since once they're paused there shouldn't
> >> be any I/O there anymore, and hence no I/O emulation)?
> > 
> > The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> > intermittently set, and skip the pause in domain_shutdown()
> > 
> > I presume this lack of pause is to allow the vcpu in question to still
> > be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> > 0 clarifying details).
> > 
> > What *should* happen is that, after consuming the reply, the vcpu should
> > notice and pause itself, at which point it would yield to the
> > scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> > 
> > Evidentially, this is not happening.
> 
> We can't tell yet, until ...
> 
> > Marek: can you add a BUG() after the weird PIO printing?  That should
> > confirm whether we're getting into handle_pio() via the
> > handle_hvm_io_completion() path, or via the vmexit path (at which case,
> > we're fully re-entering the guest).
> 
> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> handle_pio() (etc) at all anymore in this state. IOW perhaps
> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> instead of plain "true"?
> 
> Adding Paul to Cc, as being the maintainer here.

Got it, by sticking BUG() just before that domain_crash() in
handle_pio(). And also vcpu 0 of both HVM domains do have
v->defer_shutdown.

(XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
(XEN) d3v0 handle_pio port 0xb004 read 0x0000
(XEN) d3v0 handle_pio port 0xb004 read 0x0000
(XEN) d3v0 handle_pio port 0xb004 write 0x0001
(XEN) d3v0 handle_pio port 0xb004 write 0x2001
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
(XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
(XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
(XEN) d1v0 handle_pio port 0xb004 read 0x0000
(XEN) d1v0 handle_pio port 0xb004 read 0x0000
(XEN) d1v0 handle_pio port 0xb004 write 0x0001
(XEN) d1v0 handle_pio port 0xb004 write 0x2001
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
(XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
(XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
(XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
(XEN) d3v0 handle_pio port 0xb004 read 0x0000
(XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
(XEN) Xen BUG at io.c:178
(XEN) ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d0802fcb0f>] handle_pio+0x1e4/0x1e6
(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d3v0)
(XEN) rax: ffff8301ba6fffff   rbx: 0000000000000002   rcx: 0000000000000000
(XEN) rdx: 0000000000000001   rsi: 000000000000000a   rdi: ffff82d080438698
(XEN) rbp: ffff8301ba6ffe90   rsp: ffff8301ba6ffe58   r8:  0000000000000001
(XEN) r9:  ffff8301ba6ffdc0   r10: 0000000000000001   r11: 000000000000000f
(XEN) r12: 000000000000b004   r13: ffff8300bfcf1000   r14: 0000000000000001
(XEN) r15: ffff8300bfcf4000   cr0: 000000008005003b   cr4: 00000000000006e0
(XEN) cr3: 00000000bebb8000   cr2: 00007d081d9b82a0
(XEN) fsb: 00007d081cafcb80   gsb: ffff9ae510c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d0802fcb0f> (handle_pio+0x1e4/0x1e6):
(XEN)  03 09 00 e8 5b 83 f4 ff <0f> 0b 55 48 89 e5 e8 b2 f5 ff ff 48 85 c0 74 0f
(XEN) Xen stack trace from rsp=ffff8301ba6ffe58:
(XEN)    000000000000ffff ffff8300bfcfffff 000000000000007b ffff8301ba6ffef8
(XEN)    ffff8300bfcf1000 ffff8300bfcf4000 0000000000000000 ffff8301ba6ffee8
(XEN)    ffff82d0803128f1 00ff8301ba6ffec0 ffff82d08030c257 ffff8301ba6ffef8
(XEN)    ffff8300bfcf1000 ffff8300bfcf4000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00007cfe459000e7 ffff82d08031470b
(XEN)    0000000000000010 0000000000000010 0000000000000010 ffffa92ec001fd0c
(XEN)    000000000000b004 0000000000000010 0000000000000001 0000000000000000
(XEN)    0000000000000002 000000000000b004 ffffa92ec001fca4 0000000000000002
(XEN)    000000000000b004 ffffa92ec001fd0c 000000000000b004 0000beef0000beef
(XEN)    ffffffffaa5d43bf 000000bf0000beef 0000000000000046 ffffa92ec001fca0
(XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN)    000000000000beef 0000e01000000001 ffff8300bfcf4000 000000313a1d6000
(XEN)    00000000000006e0 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0802fcb0f>] R handle_pio+0x1e4/0x1e6
(XEN)    [<ffff82d0803128f1>] F svm_vmexit_handler+0x97a/0x165b
(XEN)    [<ffff82d08031470b>] F svm_stgi_label+0x8/0x18
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Xen BUG at io.c:178
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04 14:25       ` Marek Marczykowski-Górecki
@ 2020-06-05  9:09         ` Jan Beulich
  2020-06-05  9:22           ` Jan Beulich
  2020-06-05 11:05           ` Paul Durrant
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2020-06-05  9:09 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Andrew Cooper, Paul Durrant, xen-devel

On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
>> On 04.06.2020 13:13, Andrew Cooper wrote:
>>> On 04/06/2020 08:08, Jan Beulich wrote:
>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>>>> Then, we get the main issue:
>>>>>
>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>>>     (XEN) domain_crash called from io.c:178
>>>>>
>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
>>>> be any I/O there anymore, and hence no I/O emulation)?
>>>
>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
>>> intermittently set, and skip the pause in domain_shutdown()
>>>
>>> I presume this lack of pause is to allow the vcpu in question to still
>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
>>> 0 clarifying details).
>>>
>>> What *should* happen is that, after consuming the reply, the vcpu should
>>> notice and pause itself, at which point it would yield to the
>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
>>>
>>> Evidentially, this is not happening.
>>
>> We can't tell yet, until ...
>>
>>> Marek: can you add a BUG() after the weird PIO printing?  That should
>>> confirm whether we're getting into handle_pio() via the
>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
>>> we're fully re-entering the guest).
>>
>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
>> handle_pio() (etc) at all anymore in this state. IOW perhaps
>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
>> instead of plain "true"?
>>
>> Adding Paul to Cc, as being the maintainer here.
> 
> Got it, by sticking BUG() just before that domain_crash() in
> handle_pio(). And also vcpu 0 of both HVM domains do have
> v->defer_shutdown.

As per the log they did get it set. I'd be curious of the flag's
value (as well as v->paused_for_shutdown's) at the point of the
problematic handle_pio() invocation (see below). It may be
worthwhile to instrument vcpu_check_shutdown() (inside its if())
- before exiting to guest context (in order to then come back
and call handle_pio()) the vCPU ought to be getting through
there. No indication of it doing so would be a sign that there's
a code path bypassing the call to vcpu_end_shutdown_deferral().

> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000

Perhaps in this message could you also log
v->domain->is_shutting_down, v->defer_shutdown, and
v->paused_for_shutdown? (Would be nice if, after having made
changes to your debugging patch, you could point again at the
precise version you've used for the log provided.)

> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> (XEN) Xen BUG at io.c:178

Btw, instead of BUG(), WARN() or dump_execution_state() would
likely also do, keeping Xen alive.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05  9:09         ` Jan Beulich
@ 2020-06-05  9:22           ` Jan Beulich
  2020-06-05 12:01             ` Marek Marczykowski-Górecki
  2020-06-05 11:05           ` Paul Durrant
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05  9:22 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Andrew Cooper, xen-devel, Paul Durrant

On 05.06.2020 11:09, Jan Beulich wrote:
> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> 
> Perhaps in this message could you also log
> v->domain->is_shutting_down, v->defer_shutdown, and
> v->paused_for_shutdown?

And v->domain->is_shut_down please.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-04  1:46 handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom Marek Marczykowski-Górecki
  2020-06-04  7:08 ` Jan Beulich
@ 2020-06-05  9:38 ` Jan Beulich
  2020-06-05 11:18   ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05  9:38 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> (continuation of a thread from #xendevel)
> 
> During system shutdown quite often I hit infinite stream of errors like
> this:
> 
>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>     (XEN) domain_crash called from io.c:178
> 
> This is all running on Xen 4.13.0 (I think I've got this with 4.13.1
> too), nested within KVM. The KVM part means everything is very slow, so
> various race conditions are much more likely to happen.
> 
> It started happening not long ago, and I'm pretty sure it's about
> updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0.
> 
> Thanks to Andrew and Roger, I've managed to collect more info.
> 
> Context:
>     dom0: pv
>     dom1: hvm
>     dom2: stubdom for dom1
>     dom3: hvm
>     dom4: stubdom for dom3
>     dom5: pvh
>     dom6: pvh
> 
> It starts I think ok:
> 
>     (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>     (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>     (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>     (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0

I can't seem to be able to spot the call site of this, in any of
qemu, libxl, or libxc. I'm in particular curious as to the further
actions taken on the domain after this was invoked: Do any ioreq
servers get unregistered immediately (which I think would be a
problem)?

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05  9:09         ` Jan Beulich
  2020-06-05  9:22           ` Jan Beulich
@ 2020-06-05 11:05           ` Paul Durrant
  2020-06-05 11:25             ` Paul Durrant
  2020-06-05 13:32             ` Jan Beulich
  1 sibling, 2 replies; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 11:05 UTC (permalink / raw)
  To: 'Jan Beulich', 'Marek Marczykowski-Górecki'
  Cc: 'Andrew Cooper', 'xen-devel'

Sorry, only just catching up with this...

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 June 2020 10:09
> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> Durrant <paul@xen.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> >> On 04.06.2020 13:13, Andrew Cooper wrote:
> >>> On 04/06/2020 08:08, Jan Beulich wrote:
> >>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>>>> Then, we get the main issue:
> >>>>>
> >>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>>>     (XEN) domain_crash called from io.c:178
> >>>>>
> >>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> >>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> >>>> be any I/O there anymore, and hence no I/O emulation)?
> >>>
> >>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> >>> intermittently set, and skip the pause in domain_shutdown()
> >>>
> >>> I presume this lack of pause is to allow the vcpu in question to still
> >>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> >>> 0 clarifying details).
> >>>
> >>> What *should* happen is that, after consuming the reply, the vcpu should
> >>> notice and pause itself, at which point it would yield to the
> >>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> >>>
> >>> Evidentially, this is not happening.
> >>
> >> We can't tell yet, until ...
> >>
> >>> Marek: can you add a BUG() after the weird PIO printing?  That should
> >>> confirm whether we're getting into handle_pio() via the
> >>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> >>> we're fully re-entering the guest).
> >>
> >> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> >> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> >> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> >> handle_pio() (etc) at all anymore in this state. IOW perhaps
> >> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> >> instead of plain "true"?
> >>
> >> Adding Paul to Cc, as being the maintainer here.
> >
> > Got it, by sticking BUG() just before that domain_crash() in
> > handle_pio(). And also vcpu 0 of both HVM domains do have
> > v->defer_shutdown.
> 
> As per the log they did get it set. I'd be curious of the flag's
> value (as well as v->paused_for_shutdown's) at the point of the
> problematic handle_pio() invocation (see below). It may be
> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> - before exiting to guest context (in order to then come back
> and call handle_pio()) the vCPU ought to be getting through
> there. No indication of it doing so would be a sign that there's
> a code path bypassing the call to vcpu_end_shutdown_deferral().
> 
> > (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> 
> Perhaps in this message could you also log
> v->domain->is_shutting_down, v->defer_shutdown, and
> v->paused_for_shutdown? (Would be nice if, after having made
> changes to your debugging patch, you could point again at the
> precise version you've used for the log provided.)
> 
> > (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > (XEN) Xen BUG at io.c:178
> 
> Btw, instead of BUG(), WARN() or dump_execution_state() would
> likely also do, keeping Xen alive.
> 

A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a server but then not finding a vcpu match in ioreq_vcpu_list.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05  9:38 ` Jan Beulich
@ 2020-06-05 11:18   ` Marek Marczykowski-Górecki
  2020-06-05 13:59     ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-06-05 11:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

On Fri, Jun 05, 2020 at 11:38:17AM +0200, Jan Beulich wrote:
> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > (continuation of a thread from #xendevel)
> > 
> > During system shutdown quite often I hit infinite stream of errors like
> > this:
> > 
> >     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >     (XEN) domain_crash called from io.c:178
> > 
> > This is all running on Xen 4.13.0 (I think I've got this with 4.13.1
> > too), nested within KVM. The KVM part means everything is very slow, so
> > various race conditions are much more likely to happen.
> > 
> > It started happening not long ago, and I'm pretty sure it's about
> > updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0.
> > 
> > Thanks to Andrew and Roger, I've managed to collect more info.
> > 
> > Context:
> >     dom0: pv
> >     dom1: hvm
> >     dom2: stubdom for dom1
> >     dom3: hvm
> >     dom4: stubdom for dom3
> >     dom5: pvh
> >     dom6: pvh
> > 
> > It starts I think ok:
> > 
> >     (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >     (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >     (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >     (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> 
> I can't seem to be able to spot the call site of this, in any of
> qemu, libxl, or libxc. I'm in particular curious as to the further
> actions taken on the domain after this was invoked: Do any ioreq
> servers get unregistered immediately (which I think would be a
> problem)?

It is here:
https://github.com/qemu/qemu/blob/master/hw/i386/xen/xen-hvm.c#L1539

I think it's called from cpu_handle_ioreq(), and I think the request
state is set to STATE_IORESP_READY before exiting (unless there is some
exit() hidden in another function used there).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 11:05           ` Paul Durrant
@ 2020-06-05 11:25             ` Paul Durrant
  2020-06-05 13:36               ` Jan Beulich
  2020-06-05 13:32             ` Jan Beulich
  1 sibling, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 11:25 UTC (permalink / raw)
  To: paul, 'Jan Beulich', 'Marek Marczykowski-Górecki'
  Cc: 'Andrew Cooper', 'xen-devel'

> -----Original Message-----
> From: Paul Durrant <xadimgnik@gmail.com>
> Sent: 05 June 2020 12:06
> To: 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> Sorry, only just catching up with this...
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Sent: 05 June 2020 10:09
> > To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> > Durrant <paul@xen.org>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >
> > On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > > On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> > >> On 04.06.2020 13:13, Andrew Cooper wrote:
> > >>> On 04/06/2020 08:08, Jan Beulich wrote:
> > >>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> > >>>>> Then, we get the main issue:
> > >>>>>
> > >>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> > >>>>>     (XEN) domain_crash called from io.c:178
> > >>>>>
> > >>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> > >>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> > >>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> > >>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> > >>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> > >>>> be any I/O there anymore, and hence no I/O emulation)?
> > >>>
> > >>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> > >>> intermittently set, and skip the pause in domain_shutdown()
> > >>>
> > >>> I presume this lack of pause is to allow the vcpu in question to still
> > >>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> > >>> 0 clarifying details).
> > >>>
> > >>> What *should* happen is that, after consuming the reply, the vcpu should
> > >>> notice and pause itself, at which point it would yield to the
> > >>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> > >>>
> > >>> Evidentially, this is not happening.
> > >>
> > >> We can't tell yet, until ...
> > >>
> > >>> Marek: can you add a BUG() after the weird PIO printing?  That should
> > >>> confirm whether we're getting into handle_pio() via the
> > >>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> > >>> we're fully re-entering the guest).
> > >>
> > >> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> > >> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> > >> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> > >> handle_pio() (etc) at all anymore in this state. IOW perhaps
> > >> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> > >> instead of plain "true"?
> > >>
> > >> Adding Paul to Cc, as being the maintainer here.
> > >
> > > Got it, by sticking BUG() just before that domain_crash() in
> > > handle_pio(). And also vcpu 0 of both HVM domains do have
> > > v->defer_shutdown.
> >
> > As per the log they did get it set. I'd be curious of the flag's
> > value (as well as v->paused_for_shutdown's) at the point of the
> > problematic handle_pio() invocation (see below). It may be
> > worthwhile to instrument vcpu_check_shutdown() (inside its if())
> > - before exiting to guest context (in order to then come back
> > and call handle_pio()) the vCPU ought to be getting through
> > there. No indication of it doing so would be a sign that there's
> > a code path bypassing the call to vcpu_end_shutdown_deferral().
> >
> > > (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > > (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > > (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > > (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > > (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > > (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > > (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > > (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > > (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > > (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > > (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > > (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > > (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >
> > Perhaps in this message could you also log
> > v->domain->is_shutting_down, v->defer_shutdown, and
> > v->paused_for_shutdown? (Would be nice if, after having made
> > changes to your debugging patch, you could point again at the
> > precise version you've used for the log provided.)
> >
> > > (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > > (XEN) Xen BUG at io.c:178
> >
> > Btw, instead of BUG(), WARN() or dump_execution_state() would
> > likely also do, keeping Xen alive.
> >
> 
> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we wouldn't be
> seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which
> suggests a race with ioreq server teardown, possibly due to selecting a server but then not finding a
> vcpu match in ioreq_vcpu_list.

Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash ) and then clearing defer_shutdown looks a bit odd... Just because the domain shutdown code has been set that way doesn't mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code could easily be called from one vcpu whilst another has emulation pending.

  Paul

> 
>   Paul
> 
> > Jan




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05  9:22           ` Jan Beulich
@ 2020-06-05 12:01             ` Marek Marczykowski-Górecki
  2020-06-05 12:39               ` Paul Durrant
                                 ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-06-05 12:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 3657 bytes --]

On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote:
> On 05.06.2020 11:09, Jan Beulich wrote:
> > On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> >> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> >> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> >> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> >> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> >> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> >> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > 
> > Perhaps in this message could you also log
> > v->domain->is_shutting_down, v->defer_shutdown, and
> > v->paused_for_shutdown?
> 
> And v->domain->is_shut_down please.

Here it is:

(XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
(XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
(XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
(XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
(XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
(XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
(XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6
(XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6
(XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
(XEN) d1v0 Unexpected PIO status 1, port 0xb004 read 0xffff

(and then the stacktrace saying it's from vmexit handler)

Regarding BUG/WARN - do you think I could get any more info then? I
really don't mind crashing that system, it's a virtual machine
currently used only for debugging this issue.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 12:01             ` Marek Marczykowski-Górecki
@ 2020-06-05 12:39               ` Paul Durrant
  2020-06-05 13:04                 ` 'Marek Marczykowski-Górecki'
  2020-06-05 12:44               ` Andrew Cooper
  2020-06-05 14:13               ` Jan Beulich
  2 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 12:39 UTC (permalink / raw)
  To: 'Marek Marczykowski-Górecki', 'Jan Beulich'
  Cc: 'Andrew Cooper', 'xen-devel'

> -----Original Message-----
> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Sent: 05 June 2020 13:02
> To: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Paul Durrant <paul@xen.org>; xen-devel <xen-
> devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote:
> > On 05.06.2020 11:09, Jan Beulich wrote:
> > > On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > >> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > >> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > >> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > >> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > >> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > >> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > >> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > >> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >
> > > Perhaps in this message could you also log
> > > v->domain->is_shutting_down, v->defer_shutdown, and
> > > v->paused_for_shutdown?
> >
> > And v->domain->is_shut_down please.
> 
> Here it is:
> 
> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> (XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6
> (XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d1v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> 
> (and then the stacktrace saying it's from vmexit handler)
> 
> Regarding BUG/WARN - do you think I could get any more info then? I
> really don't mind crashing that system, it's a virtual machine
> currently used only for debugging this issue.

In your logging, is that handle_pio with is_shutting_down == 1 the very last one, or is the 'Unexpected PIO' coming from another one issued afterwards?

The reason I ask is that hvmemul_do_io() can call hvm_send_ioreq() to start an I/O when is_shutting_down is set, but will write the local io_req.state back to NONE even when X86EMUL_RETRY is returned. Thus another call to handle_pio() will try to start a new I/O but will fail with X86EMUL_UNHANDLEABLE in hvm_send_ioreq() because the ioreq state in the shared page will not be NONE.

  Paul

> 
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 12:01             ` Marek Marczykowski-Górecki
  2020-06-05 12:39               ` Paul Durrant
@ 2020-06-05 12:44               ` Andrew Cooper
  2020-06-05 14:13               ` Jan Beulich
  2 siblings, 0 replies; 40+ messages in thread
From: Andrew Cooper @ 2020-06-05 12:44 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Jan Beulich; +Cc: xen-devel, Paul Durrant


[-- Attachment #1.1: Type: text/plain, Size: 3895 bytes --]

On 05/06/2020 13:01, Marek Marczykowski-Górecki wrote:
> On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote:
>> On 05.06.2020 11:09, Jan Beulich wrote:
>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>> Perhaps in this message could you also log
>>> v->domain->is_shutting_down, v->defer_shutdown, and
>>> v->paused_for_shutdown?
>> And v->domain->is_shut_down please.
> Here it is:
>
> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> (XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6
> (XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 Unexpected PIO status 1, port 0xb004 read 0xffff
>
> (and then the stacktrace saying it's from vmexit handler)
>
> Regarding BUG/WARN - do you think I could get any more info then? I
> really don't mind crashing that system, it's a virtual machine
> currently used only for debugging this issue.

I specifically recommended BUG() so it crashed cleanly and didn't loop
infinitely spewing into the logs.

From the analysis so far, whatever is going wrong is going wrong between
XEN_DMOP_remote_shutdown and the first bad PIO, so I don't think letting
the system loop further is going to be helpful.

~Andrew


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 12:39               ` Paul Durrant
@ 2020-06-05 13:04                 ` 'Marek Marczykowski-Górecki'
  2020-06-05 13:24                   ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: 'Marek Marczykowski-Górecki' @ 2020-06-05 13:04 UTC (permalink / raw)
  To: paul; +Cc: 'Andrew Cooper', 'Jan Beulich', 'xen-devel'

[-- Attachment #1: Type: text/plain, Size: 5255 bytes --]

On Fri, Jun 05, 2020 at 01:39:31PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > Sent: 05 June 2020 13:02
> > To: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Paul Durrant <paul@xen.org>; xen-devel <xen-
> > devel@lists.xenproject.org>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > 
> > On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote:
> > > On 05.06.2020 11:09, Jan Beulich wrote:
> > > > On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > > >> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > >> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > > >> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > > >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > > >> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > > >> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > > >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > > >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > > >> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > > >> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > > >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > > >> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > > >> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > > >> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > >
> > > > Perhaps in this message could you also log
> > > > v->domain->is_shutting_down, v->defer_shutdown, and
> > > > v->paused_for_shutdown?
> > >
> > > And v->domain->is_shut_down please.
> > 
> > Here it is:
> > 
> > (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > (XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6
> > (XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6
> > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown
> > 0 is_shut_down 0
> > (XEN) d1v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > 
> > (and then the stacktrace saying it's from vmexit handler)
> > 
> > Regarding BUG/WARN - do you think I could get any more info then? I
> > really don't mind crashing that system, it's a virtual machine
> > currently used only for debugging this issue.
> 
> In your logging, is that handle_pio with is_shutting_down == 1 the very last one, or is the 'Unexpected PIO' coming from another one issued afterwards?

That's the same function call - handle_pio message is before hvmemul_do_pio_buffer() and Unexpected PIO is after.

Here is the current debugging patch: https://gist.github.com/marmarek/da37da3722179057a6e7add4fb361e06

> The reason I ask is that hvmemul_do_io() can call hvm_send_ioreq() to start an I/O when is_shutting_down is set, but will write the local io_req.state back to NONE even when X86EMUL_RETRY is returned. Thus another call to handle_pio() will try to start a new I/O but will fail with X86EMUL_UNHANDLEABLE in hvm_send_ioreq() because the ioreq state in the shared page will not be NONE.

Isn't it a problem that hvm_send_ioreq() can be called called if is_shutting_down is set?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:04                 ` 'Marek Marczykowski-Górecki'
@ 2020-06-05 13:24                   ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 13:24 UTC (permalink / raw)
  To: 'Marek Marczykowski-Górecki'
  Cc: 'Andrew Cooper', 'Jan Beulich', 'xen-devel'

> -----Original Message-----
> From: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> Sent: 05 June 2020 14:04
> To: paul@xen.org
> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-
> devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On Fri, Jun 05, 2020 at 01:39:31PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > Sent: 05 June 2020 13:02
> > > To: Jan Beulich <jbeulich@suse.com>
> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Paul Durrant <paul@xen.org>; xen-devel <xen-
> > > devel@lists.xenproject.org>
> > > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > >
> > > On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote:
> > > > On 05.06.2020 11:09, Jan Beulich wrote:
> > > > > On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > > > >> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > > > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > > >> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > > > >> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > > > >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > > > >> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > > >> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > > > >> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > > > >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > > > >> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > > > >> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > > > >> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > > > >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > > > >> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > > >> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > > > >> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > > > >> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > > > >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > > > >
> > > > > Perhaps in this message could you also log
> > > > > v->domain->is_shutting_down, v->defer_shutdown, and
> > > > > v->paused_for_shutdown?
> > > >
> > > > And v->domain->is_shut_down please.
> > >
> > > Here it is:
> > >
> > > (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > > (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > > (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > > (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > > (XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6
> > > (XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6
> > > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1
> paused_for_shutdown
> > > 0 is_shut_down 0
> > > (XEN) d1v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > >
> > > (and then the stacktrace saying it's from vmexit handler)
> > >
> > > Regarding BUG/WARN - do you think I could get any more info then? I
> > > really don't mind crashing that system, it's a virtual machine
> > > currently used only for debugging this issue.
> >
> > In your logging, is that handle_pio with is_shutting_down == 1 the very last one, or is the
> 'Unexpected PIO' coming from another one issued afterwards?
> 
> That's the same function call - handle_pio message is before hvmemul_do_pio_buffer() and Unexpected
> PIO is after.
> 
> Here is the current debugging patch: https://gist.github.com/marmarek/da37da3722179057a6e7add4fb361e06
> 

Ok.

> > The reason I ask is that hvmemul_do_io() can call hvm_send_ioreq() to start an I/O when
> is_shutting_down is set, but will write the local io_req.state back to NONE even when X86EMUL_RETRY is
> returned. Thus another call to handle_pio() will try to start a new I/O but will fail with
> X86EMUL_UNHANDLEABLE in hvm_send_ioreq() because the ioreq state in the shared page will not be NONE.
> 
> Isn't it a problem that hvm_send_ioreq() can be called called if is_shutting_down is set?
> 

I don't think so... as long as it is not called again a second time.

  Paul

> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 11:05           ` Paul Durrant
  2020-06-05 11:25             ` Paul Durrant
@ 2020-06-05 13:32             ` Jan Beulich
  2020-06-05 13:37               ` Paul Durrant
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05 13:32 UTC (permalink / raw)
  To: paul
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

On 05.06.2020 13:05, Paul Durrant wrote:
> Sorry, only just catching up with this...
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 05 June 2020 10:09
>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
>> Durrant <paul@xen.org>
>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>
>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>>>>>> Then, we get the main issue:
>>>>>>>
>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>>>>>     (XEN) domain_crash called from io.c:178
>>>>>>>
>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
>>>>>> be any I/O there anymore, and hence no I/O emulation)?
>>>>>
>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
>>>>> intermittently set, and skip the pause in domain_shutdown()
>>>>>
>>>>> I presume this lack of pause is to allow the vcpu in question to still
>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
>>>>> 0 clarifying details).
>>>>>
>>>>> What *should* happen is that, after consuming the reply, the vcpu should
>>>>> notice and pause itself, at which point it would yield to the
>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
>>>>>
>>>>> Evidentially, this is not happening.
>>>>
>>>> We can't tell yet, until ...
>>>>
>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
>>>>> confirm whether we're getting into handle_pio() via the
>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
>>>>> we're fully re-entering the guest).
>>>>
>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
>>>> instead of plain "true"?
>>>>
>>>> Adding Paul to Cc, as being the maintainer here.
>>>
>>> Got it, by sticking BUG() just before that domain_crash() in
>>> handle_pio(). And also vcpu 0 of both HVM domains do have
>>> v->defer_shutdown.
>>
>> As per the log they did get it set. I'd be curious of the flag's
>> value (as well as v->paused_for_shutdown's) at the point of the
>> problematic handle_pio() invocation (see below). It may be
>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
>> - before exiting to guest context (in order to then come back
>> and call handle_pio()) the vCPU ought to be getting through
>> there. No indication of it doing so would be a sign that there's
>> a code path bypassing the call to vcpu_end_shutdown_deferral().
>>
>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>
>> Perhaps in this message could you also log
>> v->domain->is_shutting_down, v->defer_shutdown, and
>> v->paused_for_shutdown? (Would be nice if, after having made
>> changes to your debugging patch, you could point again at the
>> precise version you've used for the log provided.)
>>
>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
>>> (XEN) Xen BUG at io.c:178
>>
>> Btw, instead of BUG(), WARN() or dump_execution_state() would
>> likely also do, keeping Xen alive.
>>
> 
> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?

Where would this originate?

> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a server but then not finding a vcpu match in ioreq_vcpu_list.

I was suspecting such, but at least the tearing down of all servers
happens only from relinquish-resources, which gets started only
after ->is_shut_down got set (unless the tool stack invoked
XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
set for the domain).

For individually unregistered servers - yes, if qemu did so, this
would be a problem. They need to remain registered until all vCPU-s
in the domain got paused.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 11:25             ` Paul Durrant
@ 2020-06-05 13:36               ` Jan Beulich
  2020-06-05 13:43                 ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05 13:36 UTC (permalink / raw)
  To: paul
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

On 05.06.2020 13:25, Paul Durrant wrote:
>> -----Original Message-----
>> From: Paul Durrant <xadimgnik@gmail.com>
>> Sent: 05 June 2020 12:06
>> To: 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
>> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
>> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>
>> Sorry, only just catching up with this...
>>
>>> -----Original Message-----
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 05 June 2020 10:09
>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
>>> Durrant <paul@xen.org>
>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>>
>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>>>>>>> Then, we get the main issue:
>>>>>>>>
>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>>>>>>     (XEN) domain_crash called from io.c:178
>>>>>>>>
>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
>>>>>>
>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
>>>>>> intermittently set, and skip the pause in domain_shutdown()
>>>>>>
>>>>>> I presume this lack of pause is to allow the vcpu in question to still
>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
>>>>>> 0 clarifying details).
>>>>>>
>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
>>>>>> notice and pause itself, at which point it would yield to the
>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
>>>>>>
>>>>>> Evidentially, this is not happening.
>>>>>
>>>>> We can't tell yet, until ...
>>>>>
>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
>>>>>> confirm whether we're getting into handle_pio() via the
>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
>>>>>> we're fully re-entering the guest).
>>>>>
>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
>>>>> instead of plain "true"?
>>>>>
>>>>> Adding Paul to Cc, as being the maintainer here.
>>>>
>>>> Got it, by sticking BUG() just before that domain_crash() in
>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
>>>> v->defer_shutdown.
>>>
>>> As per the log they did get it set. I'd be curious of the flag's
>>> value (as well as v->paused_for_shutdown's) at the point of the
>>> problematic handle_pio() invocation (see below). It may be
>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
>>> - before exiting to guest context (in order to then come back
>>> and call handle_pio()) the vCPU ought to be getting through
>>> there. No indication of it doing so would be a sign that there's
>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
>>>
>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>
>>> Perhaps in this message could you also log
>>> v->domain->is_shutting_down, v->defer_shutdown, and
>>> v->paused_for_shutdown? (Would be nice if, after having made
>>> changes to your debugging patch, you could point again at the
>>> precise version you've used for the log provided.)
>>>
>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
>>>> (XEN) Xen BUG at io.c:178
>>>
>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
>>> likely also do, keeping Xen alive.
>>>
>>
>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we wouldn't be
>> seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which
>> suggests a race with ioreq server teardown, possibly due to selecting a server but then not finding a
>> vcpu match in ioreq_vcpu_list.
> 
> Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash ) and then clearing defer_shutdown looks a bit odd... Just because the domain shutdown code has been set that way doesn't mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code could easily be called from one vcpu whilst another has emulation pending.

I'm confused: The deferral is of shutting down the domain, not of
a specific instance of emulation. When a guest crashed I understand
this code is intended to make sure shutting down because of the
crash won't get deferred because of in-progress emulation anywhere.

Marek didn't provide any hints so far that the guest may be crashing,
so I think if there is an issue here, it's likely only a tangential
one anyway.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:32             ` Jan Beulich
@ 2020-06-05 13:37               ` Paul Durrant
  2020-06-05 13:57                 ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 13:37 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 June 2020 14:32
> To: paul@xen.org
> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 13:05, Paul Durrant wrote:
> > Sorry, only just catching up with this...
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 05 June 2020 10:09
> >> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> >> Durrant <paul@xen.org>
> >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>
> >> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> >>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> >>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> >>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> >>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>>>>>> Then, we get the main issue:
> >>>>>>>
> >>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>>>>>     (XEN) domain_crash called from io.c:178
> >>>>>>>
> >>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> >>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> >>>>>> be any I/O there anymore, and hence no I/O emulation)?
> >>>>>
> >>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> >>>>> intermittently set, and skip the pause in domain_shutdown()
> >>>>>
> >>>>> I presume this lack of pause is to allow the vcpu in question to still
> >>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> >>>>> 0 clarifying details).
> >>>>>
> >>>>> What *should* happen is that, after consuming the reply, the vcpu should
> >>>>> notice and pause itself, at which point it would yield to the
> >>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> >>>>>
> >>>>> Evidentially, this is not happening.
> >>>>
> >>>> We can't tell yet, until ...
> >>>>
> >>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> >>>>> confirm whether we're getting into handle_pio() via the
> >>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> >>>>> we're fully re-entering the guest).
> >>>>
> >>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> >>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> >>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> >>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> >>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> >>>> instead of plain "true"?
> >>>>
> >>>> Adding Paul to Cc, as being the maintainer here.
> >>>
> >>> Got it, by sticking BUG() just before that domain_crash() in
> >>> handle_pio(). And also vcpu 0 of both HVM domains do have
> >>> v->defer_shutdown.
> >>
> >> As per the log they did get it set. I'd be curious of the flag's
> >> value (as well as v->paused_for_shutdown's) at the point of the
> >> problematic handle_pio() invocation (see below). It may be
> >> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> >> - before exiting to guest context (in order to then come back
> >> and call handle_pio()) the vCPU ought to be getting through
> >> there. No indication of it doing so would be a sign that there's
> >> a code path bypassing the call to vcpu_end_shutdown_deferral().
> >>
> >>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> >>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> >>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> >>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> >>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> >>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> >>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> >>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> >>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>
> >> Perhaps in this message could you also log
> >> v->domain->is_shutting_down, v->defer_shutdown, and
> >> v->paused_for_shutdown? (Would be nice if, after having made
> >> changes to your debugging patch, you could point again at the
> >> precise version you've used for the log provided.)
> >>
> >>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> >>> (XEN) Xen BUG at io.c:178
> >>
> >> Btw, instead of BUG(), WARN() or dump_execution_state() would
> >> likely also do, keeping Xen alive.
> >>
> >
> > A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?
> 
> Where would this originate?

I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )' at the top of hvm_send_ioreq().

> 
> > That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly
> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a
> server but then not finding a vcpu match in ioreq_vcpu_list.
> 
> I was suspecting such, but at least the tearing down of all servers
> happens only from relinquish-resources, which gets started only
> after ->is_shut_down got set (unless the tool stack invoked
> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> set for the domain).
> 
> For individually unregistered servers - yes, if qemu did so, this
> would be a problem. They need to remain registered until all vCPU-s
> in the domain got paused.

It shouldn't be a problem should it? Destroying an individual server is only done with the domain paused, so no vcpus can be running at the time.

  Paul

> 
> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:36               ` Jan Beulich
@ 2020-06-05 13:43                 ` Paul Durrant
  2020-06-05 13:46                   ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 13:43 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 June 2020 14:37
> To: paul@xen.org
> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 13:25, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Paul Durrant <xadimgnik@gmail.com>
> >> Sent: 05 June 2020 12:06
> >> To: 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki'
> <marmarek@invisiblethingslab.com>
> >> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> >> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>
> >> Sorry, only just catching up with this...
> >>
> >>> -----Original Message-----
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>> Sent: 05 June 2020 10:09
> >>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> >>> Durrant <paul@xen.org>
> >>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>>
> >>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> >>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> >>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> >>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> >>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>>>>>>> Then, we get the main issue:
> >>>>>>>>
> >>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>>>>>>     (XEN) domain_crash called from io.c:178
> >>>>>>>>
> >>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> >>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> >>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> >>>>>>
> >>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> >>>>>> intermittently set, and skip the pause in domain_shutdown()
> >>>>>>
> >>>>>> I presume this lack of pause is to allow the vcpu in question to still
> >>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> >>>>>> 0 clarifying details).
> >>>>>>
> >>>>>> What *should* happen is that, after consuming the reply, the vcpu should
> >>>>>> notice and pause itself, at which point it would yield to the
> >>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> >>>>>>
> >>>>>> Evidentially, this is not happening.
> >>>>>
> >>>>> We can't tell yet, until ...
> >>>>>
> >>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> >>>>>> confirm whether we're getting into handle_pio() via the
> >>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> >>>>>> we're fully re-entering the guest).
> >>>>>
> >>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> >>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> >>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> >>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> >>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> >>>>> instead of plain "true"?
> >>>>>
> >>>>> Adding Paul to Cc, as being the maintainer here.
> >>>>
> >>>> Got it, by sticking BUG() just before that domain_crash() in
> >>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> >>>> v->defer_shutdown.
> >>>
> >>> As per the log they did get it set. I'd be curious of the flag's
> >>> value (as well as v->paused_for_shutdown's) at the point of the
> >>> problematic handle_pio() invocation (see below). It may be
> >>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> >>> - before exiting to guest context (in order to then come back
> >>> and call handle_pio()) the vCPU ought to be getting through
> >>> there. No indication of it doing so would be a sign that there's
> >>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> >>>
> >>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> >>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> >>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> >>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> >>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> >>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> >>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> >>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>
> >>> Perhaps in this message could you also log
> >>> v->domain->is_shutting_down, v->defer_shutdown, and
> >>> v->paused_for_shutdown? (Would be nice if, after having made
> >>> changes to your debugging patch, you could point again at the
> >>> precise version you've used for the log provided.)
> >>>
> >>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> >>>> (XEN) Xen BUG at io.c:178
> >>>
> >>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> >>> likely also do, keeping Xen alive.
> >>>
> >>
> >> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we wouldn't
> be
> >> seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which
> >> suggests a race with ioreq server teardown, possibly due to selecting a server but then not finding
> a
> >> vcpu match in ioreq_vcpu_list.
> >
> > Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash ) and then clearing
> defer_shutdown looks a bit odd... Just because the domain shutdown code has been set that way doesn't
> mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code could easily be called from one
> vcpu whilst another has emulation pending.
> 
> I'm confused: The deferral is of shutting down the domain, not of
> a specific instance of emulation.

Now I'm confused... defer_shutdown is per-vcpu.

> When a guest crashed I understand
> this code is intended to make sure shutting down because of the
> crash won't get deferred because of in-progress emulation anywhere.
> 

OK, I neglected to notice the vcpu_pause_nosync() just below so, yes, there should be no way we can erroneously try to complete emulation after this point.

  Paul

> Marek didn't provide any hints so far that the guest may be crashing,
> so I think if there is an issue here, it's likely only a tangential
> one anyway.
> 
> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:43                 ` Paul Durrant
@ 2020-06-05 13:46                   ` Jan Beulich
  2020-06-05 13:49                     ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05 13:46 UTC (permalink / raw)
  To: paul
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

On 05.06.2020 15:43, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 05 June 2020 14:37
>> To: paul@xen.org
>> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
>> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>
>> On 05.06.2020 13:25, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Paul Durrant <xadimgnik@gmail.com>
>>>> Sent: 05 June 2020 12:06
>>>> To: 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki'
>> <marmarek@invisiblethingslab.com>
>>>> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
>>>> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>>>
>>>> Sorry, only just catching up with this...
>>>>
>>>>> -----Original Message-----
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 05 June 2020 10:09
>>>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
>>>>> Durrant <paul@xen.org>
>>>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>>>>
>>>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
>>>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
>>>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
>>>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>>>>>>>>> Then, we get the main issue:
>>>>>>>>>>
>>>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>>>>>>>>     (XEN) domain_crash called from io.c:178
>>>>>>>>>>
>>>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>>>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
>>>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>>>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
>>>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
>>>>>>>>
>>>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
>>>>>>>> intermittently set, and skip the pause in domain_shutdown()
>>>>>>>>
>>>>>>>> I presume this lack of pause is to allow the vcpu in question to still
>>>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
>>>>>>>> 0 clarifying details).
>>>>>>>>
>>>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
>>>>>>>> notice and pause itself, at which point it would yield to the
>>>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
>>>>>>>>
>>>>>>>> Evidentially, this is not happening.
>>>>>>>
>>>>>>> We can't tell yet, until ...
>>>>>>>
>>>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
>>>>>>>> confirm whether we're getting into handle_pio() via the
>>>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
>>>>>>>> we're fully re-entering the guest).
>>>>>>>
>>>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
>>>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
>>>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
>>>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
>>>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
>>>>>>> instead of plain "true"?
>>>>>>>
>>>>>>> Adding Paul to Cc, as being the maintainer here.
>>>>>>
>>>>>> Got it, by sticking BUG() just before that domain_crash() in
>>>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
>>>>>> v->defer_shutdown.
>>>>>
>>>>> As per the log they did get it set. I'd be curious of the flag's
>>>>> value (as well as v->paused_for_shutdown's) at the point of the
>>>>> problematic handle_pio() invocation (see below). It may be
>>>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
>>>>> - before exiting to guest context (in order to then come back
>>>>> and call handle_pio()) the vCPU ought to be getting through
>>>>> there. No indication of it doing so would be a sign that there's
>>>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
>>>>>
>>>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>
>>>>> Perhaps in this message could you also log
>>>>> v->domain->is_shutting_down, v->defer_shutdown, and
>>>>> v->paused_for_shutdown? (Would be nice if, after having made
>>>>> changes to your debugging patch, you could point again at the
>>>>> precise version you've used for the log provided.)
>>>>>
>>>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
>>>>>> (XEN) Xen BUG at io.c:178
>>>>>
>>>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
>>>>> likely also do, keeping Xen alive.
>>>>>
>>>>
>>>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we wouldn't
>> be
>>>> seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which
>>>> suggests a race with ioreq server teardown, possibly due to selecting a server but then not finding
>> a
>>>> vcpu match in ioreq_vcpu_list.
>>>
>>> Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash ) and then clearing
>> defer_shutdown looks a bit odd... Just because the domain shutdown code has been set that way doesn't
>> mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code could easily be called from one
>> vcpu whilst another has emulation pending.
>>
>> I'm confused: The deferral is of shutting down the domain, not of
>> a specific instance of emulation.
> 
> Now I'm confused... defer_shutdown is per-vcpu.

Right - each vCPU can individually defer shutting down of the domain
as a whole.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:46                   ` Jan Beulich
@ 2020-06-05 13:49                     ` Paul Durrant
  2020-06-05 15:48                       ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 13:49 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 June 2020 14:47
> To: paul@xen.org
> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 15:43, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 05 June 2020 14:37
> >> To: paul@xen.org
> >> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> >> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>
> >> On 05.06.2020 13:25, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Paul Durrant <xadimgnik@gmail.com>
> >>>> Sent: 05 June 2020 12:06
> >>>> To: 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki'
> >> <marmarek@invisiblethingslab.com>
> >>>> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> >>>> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>>>
> >>>> Sorry, only just catching up with this...
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>> Sent: 05 June 2020 10:09
> >>>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> >>>>> Durrant <paul@xen.org>
> >>>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>>>>
> >>>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> >>>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> >>>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> >>>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> >>>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>>>>>>>>> Then, we get the main issue:
> >>>>>>>>>>
> >>>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>>>>>>>>     (XEN) domain_crash called from io.c:178
> >>>>>>>>>>
> >>>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >>>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> >>>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >>>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> >>>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> >>>>>>>>
> >>>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> >>>>>>>> intermittently set, and skip the pause in domain_shutdown()
> >>>>>>>>
> >>>>>>>> I presume this lack of pause is to allow the vcpu in question to still
> >>>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> >>>>>>>> 0 clarifying details).
> >>>>>>>>
> >>>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
> >>>>>>>> notice and pause itself, at which point it would yield to the
> >>>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> >>>>>>>>
> >>>>>>>> Evidentially, this is not happening.
> >>>>>>>
> >>>>>>> We can't tell yet, until ...
> >>>>>>>
> >>>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> >>>>>>>> confirm whether we're getting into handle_pio() via the
> >>>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> >>>>>>>> we're fully re-entering the guest).
> >>>>>>>
> >>>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> >>>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> >>>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> >>>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> >>>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> >>>>>>> instead of plain "true"?
> >>>>>>>
> >>>>>>> Adding Paul to Cc, as being the maintainer here.
> >>>>>>
> >>>>>> Got it, by sticking BUG() just before that domain_crash() in
> >>>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> >>>>>> v->defer_shutdown.
> >>>>>
> >>>>> As per the log they did get it set. I'd be curious of the flag's
> >>>>> value (as well as v->paused_for_shutdown's) at the point of the
> >>>>> problematic handle_pio() invocation (see below). It may be
> >>>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> >>>>> - before exiting to guest context (in order to then come back
> >>>>> and call handle_pio()) the vCPU ought to be getting through
> >>>>> there. No indication of it doing so would be a sign that there's
> >>>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> >>>>>
> >>>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >>>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> >>>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> >>>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>
> >>>>> Perhaps in this message could you also log
> >>>>> v->domain->is_shutting_down, v->defer_shutdown, and
> >>>>> v->paused_for_shutdown? (Would be nice if, after having made
> >>>>> changes to your debugging patch, you could point again at the
> >>>>> precise version you've used for the log provided.)
> >>>>>
> >>>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> >>>>>> (XEN) Xen BUG at io.c:178
> >>>>>
> >>>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> >>>>> likely also do, keeping Xen alive.
> >>>>>
> >>>>
> >>>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we
> wouldn't
> >> be
> >>>> seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which
> >>>> suggests a race with ioreq server teardown, possibly due to selecting a server but then not
> finding
> >> a
> >>>> vcpu match in ioreq_vcpu_list.
> >>>
> >>> Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash ) and then clearing
> >> defer_shutdown looks a bit odd... Just because the domain shutdown code has been set that way
> doesn't
> >> mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code could easily be called from
> one
> >> vcpu whilst another has emulation pending.
> >>
> >> I'm confused: The deferral is of shutting down the domain, not of
> >> a specific instance of emulation.
> >
> > Now I'm confused... defer_shutdown is per-vcpu.
> 
> Right - each vCPU can individually defer shutting down of the domain
> as a whole.
> 

Ok, I think we're only going to make more progress if we know exactly where the X86EMUL_UNHANDLEABLE is coming from.

  Paul




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:37               ` Paul Durrant
@ 2020-06-05 13:57                 ` Jan Beulich
  2020-06-05 15:39                   ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05 13:57 UTC (permalink / raw)
  To: paul, 'Marek Marczykowski-Górecki'
  Cc: 'Andrew Cooper', 'xen-devel'

On 05.06.2020 15:37, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 05 June 2020 14:32
>> To: paul@xen.org
>> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
>> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>
>> On 05.06.2020 13:05, Paul Durrant wrote:
>>> Sorry, only just catching up with this...
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 05 June 2020 10:09
>>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
>>>> Durrant <paul@xen.org>
>>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>>>
>>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
>>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
>>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
>>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>>>>>>>> Then, we get the main issue:
>>>>>>>>>
>>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>>>>>>>     (XEN) domain_crash called from io.c:178
>>>>>>>>>
>>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
>>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
>>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
>>>>>>>
>>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
>>>>>>> intermittently set, and skip the pause in domain_shutdown()
>>>>>>>
>>>>>>> I presume this lack of pause is to allow the vcpu in question to still
>>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
>>>>>>> 0 clarifying details).
>>>>>>>
>>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
>>>>>>> notice and pause itself, at which point it would yield to the
>>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
>>>>>>>
>>>>>>> Evidentially, this is not happening.
>>>>>>
>>>>>> We can't tell yet, until ...
>>>>>>
>>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
>>>>>>> confirm whether we're getting into handle_pio() via the
>>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
>>>>>>> we're fully re-entering the guest).
>>>>>>
>>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
>>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
>>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
>>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
>>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
>>>>>> instead of plain "true"?
>>>>>>
>>>>>> Adding Paul to Cc, as being the maintainer here.
>>>>>
>>>>> Got it, by sticking BUG() just before that domain_crash() in
>>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
>>>>> v->defer_shutdown.
>>>>
>>>> As per the log they did get it set. I'd be curious of the flag's
>>>> value (as well as v->paused_for_shutdown's) at the point of the
>>>> problematic handle_pio() invocation (see below). It may be
>>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
>>>> - before exiting to guest context (in order to then come back
>>>> and call handle_pio()) the vCPU ought to be getting through
>>>> there. No indication of it doing so would be a sign that there's
>>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
>>>>
>>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>
>>>> Perhaps in this message could you also log
>>>> v->domain->is_shutting_down, v->defer_shutdown, and
>>>> v->paused_for_shutdown? (Would be nice if, after having made
>>>> changes to your debugging patch, you could point again at the
>>>> precise version you've used for the log provided.)
>>>>
>>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
>>>>> (XEN) Xen BUG at io.c:178
>>>>
>>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
>>>> likely also do, keeping Xen alive.
>>>>
>>>
>>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?
>>
>> Where would this originate?
> 
> I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )' at the top of hvm_send_ioreq().

Ah yes. But this is just one way of things possibly going wrong. Plus
the function will return true when ->defer_shutdown is already or
(wrongly) still set.

>>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly
>> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a
>> server but then not finding a vcpu match in ioreq_vcpu_list.
>>
>> I was suspecting such, but at least the tearing down of all servers
>> happens only from relinquish-resources, which gets started only
>> after ->is_shut_down got set (unless the tool stack invoked
>> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
>> set for the domain).
>>
>> For individually unregistered servers - yes, if qemu did so, this
>> would be a problem. They need to remain registered until all vCPU-s
>> in the domain got paused.
> 
> It shouldn't be a problem should it? Destroying an individual server is only done with the domain paused, so no vcpus can be running at the time.

Consider the case of one getting destroyed after it has already
returned data, but the originating vCPU didn't consume that data
yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
won't find the matching server anymore, and hence the chain
hvm_wait_for_io() -> hvm_io_assist() ->
vcpu_end_shutdown_deferral() would be skipped. handle_pio()
would then still correctly consume the result.

Marek - to verify this doesn't happen (sorry, my qemu knowledge
is rather limited, and hence I don't know whether this can
happen at all), could you also log hvm_destroy_ioreq_server()
invocations?

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 11:18   ` Marek Marczykowski-Górecki
@ 2020-06-05 13:59     ` Jan Beulich
  2020-06-05 15:10       ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-05 13:59 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 05.06.2020 13:18, Marek Marczykowski-Górecki wrote:
> On Fri, Jun 05, 2020 at 11:38:17AM +0200, Jan Beulich wrote:
>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>> Hi,
>>>
>>> (continuation of a thread from #xendevel)
>>>
>>> During system shutdown quite often I hit infinite stream of errors like
>>> this:
>>>
>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>     (XEN) domain_crash called from io.c:178
>>>
>>> This is all running on Xen 4.13.0 (I think I've got this with 4.13.1
>>> too), nested within KVM. The KVM part means everything is very slow, so
>>> various race conditions are much more likely to happen.
>>>
>>> It started happening not long ago, and I'm pretty sure it's about
>>> updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0.
>>>
>>> Thanks to Andrew and Roger, I've managed to collect more info.
>>>
>>> Context:
>>>     dom0: pv
>>>     dom1: hvm
>>>     dom2: stubdom for dom1
>>>     dom3: hvm
>>>     dom4: stubdom for dom3
>>>     dom5: pvh
>>>     dom6: pvh
>>>
>>> It starts I think ok:
>>>
>>>     (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>     (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>>     (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>>     (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>
>> I can't seem to be able to spot the call site of this, in any of
>> qemu, libxl, or libxc. I'm in particular curious as to the further
>> actions taken on the domain after this was invoked: Do any ioreq
>> servers get unregistered immediately (which I think would be a
>> problem)?
> 
> It is here:
> https://github.com/qemu/qemu/blob/master/hw/i386/xen/xen-hvm.c#L1539
> 
> I think it's called from cpu_handle_ioreq(), and I think the request
> state is set to STATE_IORESP_READY before exiting (unless there is some
> exit() hidden in another function used there).

Thanks. There's nothing in surrounding code there that would unregister
an ioreq server. But as said elsewhere, I don't know qemu very well,
and hence I may easily overlook how else one may get unregistered
prematurely.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 12:01             ` Marek Marczykowski-Górecki
  2020-06-05 12:39               ` Paul Durrant
  2020-06-05 12:44               ` Andrew Cooper
@ 2020-06-05 14:13               ` Jan Beulich
  2 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2020-06-05 14:13 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Andrew Cooper, xen-devel, Paul Durrant

On 05.06.2020 14:01, Marek Marczykowski-Górecki wrote:
> On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote:
>> On 05.06.2020 11:09, Jan Beulich wrote:
>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>
>>> Perhaps in this message could you also log
>>> v->domain->is_shutting_down, v->defer_shutdown, and
>>> v->paused_for_shutdown?
>>
>> And v->domain->is_shut_down please.
> 
> Here it is:
> 
> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> (XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6
> (XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0

To me this is a clear indication that we did exit to guest context
with ->defer_shutdown set.

What I'm missing from your debugging patch is logging when the
default case of the first switch() in hvmemul_do_io() gets hit. I
think I said yesterday that I consider this a fair candidate of
where the X86EMUL_UNHANDLEABLE is coming from.

On top of that, with what we've sort of learned today, could you
log (or worse) any instances of handle_pio() getting called with
->defer_shutdown set? Afaict this should never happen, but you
may hit this case earlier than for the call out of the VMEXIT
handler, which would then move us closer to the root of the issue.

With "(or worse)" I mean it could also be as heavy as BUG(), ...

> Regarding BUG/WARN - do you think I could get any more info then? I
> really don't mind crashing that system, it's a virtual machine
> currently used only for debugging this issue.

... and the selection here really depends on what overall impact
you expect. I.e. I'm with Andrew that BUG() may be the construct
of choice if otherwise you get overly much output. In other cases
it may allow you to hit the same case again, with perhaps
slightly changed other state, giving further hints on where the
issue starts.

One thing that's not clear to me here: In the title you say
handle_pio() loops, but with the domain getting crashed I can't
seem to see that happening. Of course it may be a wrong
understanding /interpretation of mine that it is the guest doing
repeated I/O from/to port 0xb004.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:59     ` Jan Beulich
@ 2020-06-05 15:10       ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 15:10 UTC (permalink / raw)
  To: 'Jan Beulich', 'Marek Marczykowski-Górecki'
  Cc: 'xen-devel'

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 05 June 2020 15:00
> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Cc: xen-devel <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 13:18, Marek Marczykowski-Górecki wrote:
> > On Fri, Jun 05, 2020 at 11:38:17AM +0200, Jan Beulich wrote:
> >> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>> Hi,
> >>>
> >>> (continuation of a thread from #xendevel)
> >>>
> >>> During system shutdown quite often I hit infinite stream of errors like
> >>> this:
> >>>
> >>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>     (XEN) domain_crash called from io.c:178
> >>>
> >>> This is all running on Xen 4.13.0 (I think I've got this with 4.13.1
> >>> too), nested within KVM. The KVM part means everything is very slow, so
> >>> various race conditions are much more likely to happen.
> >>>
> >>> It started happening not long ago, and I'm pretty sure it's about
> >>> updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0.
> >>>
> >>> Thanks to Andrew and Roger, I've managed to collect more info.
> >>>
> >>> Context:
> >>>     dom0: pv
> >>>     dom1: hvm
> >>>     dom2: stubdom for dom1
> >>>     dom3: hvm
> >>>     dom4: stubdom for dom3
> >>>     dom5: pvh
> >>>     dom6: pvh
> >>>
> >>> It starts I think ok:
> >>>
> >>>     (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>     (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >>>     (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >>>     (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >>
> >> I can't seem to be able to spot the call site of this, in any of
> >> qemu, libxl, or libxc. I'm in particular curious as to the further
> >> actions taken on the domain after this was invoked: Do any ioreq
> >> servers get unregistered immediately (which I think would be a
> >> problem)?
> >
> > It is here:
> > https://github.com/qemu/qemu/blob/master/hw/i386/xen/xen-hvm.c#L1539
> >
> > I think it's called from cpu_handle_ioreq(), and I think the request
> > state is set to STATE_IORESP_READY before exiting (unless there is some
> > exit() hidden in another function used there).
> 
> Thanks. There's nothing in surrounding code there that would unregister
> an ioreq server. But as said elsewhere, I don't know qemu very well,
> and hence I may easily overlook how else one may get unregistered
> prematurely.
> 

See https://git.qemu.org/?p=qemu.git;a=commit;h=ba7fdd64b6714af7e42dfbe5969caf62c0823f75

This makes sure the server is destroyed in the exit notifier (called when the QEMU process is killed)

  Paul

> Jan




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:57                 ` Jan Beulich
@ 2020-06-05 15:39                   ` Paul Durrant
  2020-06-05 16:18                     ` 'Marek Marczykowski-Górecki'
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 15:39 UTC (permalink / raw)
  To: 'Jan Beulich', 'Marek Marczykowski-Górecki'
  Cc: 'Andrew Cooper', 'xen-devel'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 June 2020 14:57
> To: paul@xen.org; 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 15:37, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 05 June 2020 14:32
> >> To: paul@xen.org
> >> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> >> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>
> >> On 05.06.2020 13:05, Paul Durrant wrote:
> >>> Sorry, only just catching up with this...
> >>>
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 05 June 2020 10:09
> >>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> >>>> Durrant <paul@xen.org>
> >>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>>>
> >>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> >>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> >>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> >>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> >>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>>>>>>>> Then, we get the main issue:
> >>>>>>>>>
> >>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>>>>>>>     (XEN) domain_crash called from io.c:178
> >>>>>>>>>
> >>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> >>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> >>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> >>>>>>>
> >>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> >>>>>>> intermittently set, and skip the pause in domain_shutdown()
> >>>>>>>
> >>>>>>> I presume this lack of pause is to allow the vcpu in question to still
> >>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> >>>>>>> 0 clarifying details).
> >>>>>>>
> >>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
> >>>>>>> notice and pause itself, at which point it would yield to the
> >>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> >>>>>>>
> >>>>>>> Evidentially, this is not happening.
> >>>>>>
> >>>>>> We can't tell yet, until ...
> >>>>>>
> >>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> >>>>>>> confirm whether we're getting into handle_pio() via the
> >>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> >>>>>>> we're fully re-entering the guest).
> >>>>>>
> >>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> >>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> >>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> >>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> >>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> >>>>>> instead of plain "true"?
> >>>>>>
> >>>>>> Adding Paul to Cc, as being the maintainer here.
> >>>>>
> >>>>> Got it, by sticking BUG() just before that domain_crash() in
> >>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> >>>>> v->defer_shutdown.
> >>>>
> >>>> As per the log they did get it set. I'd be curious of the flag's
> >>>> value (as well as v->paused_for_shutdown's) at the point of the
> >>>> problematic handle_pio() invocation (see below). It may be
> >>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> >>>> - before exiting to guest context (in order to then come back
> >>>> and call handle_pio()) the vCPU ought to be getting through
> >>>> there. No indication of it doing so would be a sign that there's
> >>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> >>>>
> >>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> >>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> >>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>
> >>>> Perhaps in this message could you also log
> >>>> v->domain->is_shutting_down, v->defer_shutdown, and
> >>>> v->paused_for_shutdown? (Would be nice if, after having made
> >>>> changes to your debugging patch, you could point again at the
> >>>> precise version you've used for the log provided.)
> >>>>
> >>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> >>>>> (XEN) Xen BUG at io.c:178
> >>>>
> >>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> >>>> likely also do, keeping Xen alive.
> >>>>
> >>>
> >>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?
> >>
> >> Where would this originate?
> >
> > I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )' at the top of
> hvm_send_ioreq().
> 
> Ah yes. But this is just one way of things possibly going wrong. Plus
> the function will return true when ->defer_shutdown is already or
> (wrongly) still set.
> 
> >>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly
> >> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a
> >> server but then not finding a vcpu match in ioreq_vcpu_list.
> >>
> >> I was suspecting such, but at least the tearing down of all servers
> >> happens only from relinquish-resources, which gets started only
> >> after ->is_shut_down got set (unless the tool stack invoked
> >> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> >> set for the domain).
> >>
> >> For individually unregistered servers - yes, if qemu did so, this
> >> would be a problem. They need to remain registered until all vCPU-s
> >> in the domain got paused.
> >
> > It shouldn't be a problem should it? Destroying an individual server is only done with the domain
> paused, so no vcpus can be running at the time.
> 
> Consider the case of one getting destroyed after it has already
> returned data, but the originating vCPU didn't consume that data
> yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
> won't find the matching server anymore, and hence the chain
> hvm_wait_for_io() -> hvm_io_assist() ->
> vcpu_end_shutdown_deferral() would be skipped. handle_pio()
> would then still correctly consume the result.

True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from the first switch).

> 
> Marek - to verify this doesn't happen (sorry, my qemu knowledge
> is rather limited, and hence I don't know whether this can
> happen at all), could you also log hvm_destroy_ioreq_server()
> invocations?
> 
> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 13:49                     ` Paul Durrant
@ 2020-06-05 15:48                       ` Paul Durrant
  2020-06-05 17:13                         ` 'Marek Marczykowski-Górecki'
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 15:48 UTC (permalink / raw)
  To: paul, 'Jan Beulich'
  Cc: 'Andrew Cooper',
	'Marek Marczykowski-Górecki', 'xen-devel'

> -----Original Message-----
> From: Paul Durrant <xadimgnik@gmail.com>
> Sent: 05 June 2020 14:49
> To: 'Jan Beulich' <jbeulich@suse.com>
> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Sent: 05 June 2020 14:47
> > To: paul@xen.org
> > Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> > <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >
> > On 05.06.2020 15:43, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Jan Beulich <jbeulich@suse.com>
> > >> Sent: 05 June 2020 14:37
> > >> To: paul@xen.org
> > >> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> > >> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > >>
> > >> On 05.06.2020 13:25, Paul Durrant wrote:
> > >>>> -----Original Message-----
> > >>>> From: Paul Durrant <xadimgnik@gmail.com>
> > >>>> Sent: 05 June 2020 12:06
> > >>>> To: 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki'
> > >> <marmarek@invisiblethingslab.com>
> > >>>> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> > >>>> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > >>>>
> > >>>> Sorry, only just catching up with this...
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Jan Beulich <jbeulich@suse.com>
> > >>>>> Sent: 05 June 2020 10:09
> > >>>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > >>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>;
> Paul
> > >>>>> Durrant <paul@xen.org>
> > >>>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > >>>>>
> > >>>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > >>>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> > >>>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> > >>>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> > >>>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> > >>>>>>>>>> Then, we get the main issue:
> > >>>>>>>>>>
> > >>>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> > >>>>>>>>>>     (XEN) domain_crash called from io.c:178
> > >>>>>>>>>>
> > >>>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> > >>>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> > >>>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> > >>>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> > >>>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> > >>>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> > >>>>>>>>
> > >>>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> > >>>>>>>> intermittently set, and skip the pause in domain_shutdown()
> > >>>>>>>>
> > >>>>>>>> I presume this lack of pause is to allow the vcpu in question to still
> > >>>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> > >>>>>>>> 0 clarifying details).
> > >>>>>>>>
> > >>>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
> > >>>>>>>> notice and pause itself, at which point it would yield to the
> > >>>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> > >>>>>>>>
> > >>>>>>>> Evidentially, this is not happening.
> > >>>>>>>
> > >>>>>>> We can't tell yet, until ...
> > >>>>>>>
> > >>>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> > >>>>>>>> confirm whether we're getting into handle_pio() via the
> > >>>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> > >>>>>>>> we're fully re-entering the guest).
> > >>>>>>>
> > >>>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> > >>>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> > >>>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> > >>>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> > >>>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> > >>>>>>> instead of plain "true"?
> > >>>>>>>
> > >>>>>>> Adding Paul to Cc, as being the maintainer here.
> > >>>>>>
> > >>>>>> Got it, by sticking BUG() just before that domain_crash() in
> > >>>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> > >>>>>> v->defer_shutdown.
> > >>>>>
> > >>>>> As per the log they did get it set. I'd be curious of the flag's
> > >>>>> value (as well as v->paused_for_shutdown's) at the point of the
> > >>>>> problematic handle_pio() invocation (see below). It may be
> > >>>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> > >>>>> - before exiting to guest context (in order to then come back
> > >>>>> and call handle_pio()) the vCPU ought to be getting through
> > >>>>> there. No indication of it doing so would be a sign that there's
> > >>>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> > >>>>>
> > >>>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > >>>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > >>>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > >>>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>
> > >>>>> Perhaps in this message could you also log
> > >>>>> v->domain->is_shutting_down, v->defer_shutdown, and
> > >>>>> v->paused_for_shutdown? (Would be nice if, after having made
> > >>>>> changes to your debugging patch, you could point again at the
> > >>>>> precise version you've used for the log provided.)
> > >>>>>
> > >>>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > >>>>>> (XEN) Xen BUG at io.c:178
> > >>>>>
> > >>>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> > >>>>> likely also do, keeping Xen alive.
> > >>>>>
> > >>>>
> > >>>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That would mean we
> > wouldn't
> > >> be
> > >>>> seeing the "Unexpected PIO" message. From that message this clearly X86EMUL_UNHANDLEABLE which
> > >>>> suggests a race with ioreq server teardown, possibly due to selecting a server but then not
> > finding
> > >> a
> > >>>> vcpu match in ioreq_vcpu_list.
> > >>>
> > >>> Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash ) and then
> clearing
> > >> defer_shutdown looks a bit odd... Just because the domain shutdown code has been set that way
> > doesn't
> > >> mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code could easily be called from
> > one
> > >> vcpu whilst another has emulation pending.
> > >>
> > >> I'm confused: The deferral is of shutting down the domain, not of
> > >> a specific instance of emulation.
> > >
> > > Now I'm confused... defer_shutdown is per-vcpu.
> >
> > Right - each vCPU can individually defer shutting down of the domain
> > as a whole.
> >
> 
> Ok, I think we're only going to make more progress if we know exactly where the X86EMUL_UNHANDLEABLE
> is coming from.
> 

This (untested) patch might help:

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index c55c4bc4bc..8aa8779ba2 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
     ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;

     if ( hvm_ioreq_needs_completion(ioreq) )
-    {
-        ioreq->state = STATE_IORESP_READY;
         ioreq->data = data;
-    }
-    else
-        ioreq->state = STATE_IOREQ_NONE;

     msix_write_completion(v);
     vcpu_end_shutdown_deferral(v);
@@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
         }
     }

+    ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ?
+        STATE_IORESP_READY : STATE_IOREQ_NONE;
+
     io_completion = vio->io_completion;
     vio->io_completion = HVMIO_no_completion;




^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 15:39                   ` Paul Durrant
@ 2020-06-05 16:18                     ` 'Marek Marczykowski-Górecki'
  2020-06-08  8:13                       ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: 'Marek Marczykowski-Górecki' @ 2020-06-05 16:18 UTC (permalink / raw)
  To: paul; +Cc: 'Andrew Cooper', 'Jan Beulich', 'xen-devel'

[-- Attachment #1: Type: text/plain, Size: 10021 bytes --]

On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Sent: 05 June 2020 14:57
> > To: paul@xen.org; 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> > Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > 
> > On 05.06.2020 15:37, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Jan Beulich <jbeulich@suse.com>
> > >> Sent: 05 June 2020 14:32
> > >> To: paul@xen.org
> > >> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> > >> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > >>
> > >> On 05.06.2020 13:05, Paul Durrant wrote:
> > >>> Sorry, only just catching up with this...
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Jan Beulich <jbeulich@suse.com>
> > >>>> Sent: 05 June 2020 10:09
> > >>>> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > >>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> > >>>> Durrant <paul@xen.org>
> > >>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > >>>>
> > >>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > >>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> > >>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> > >>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> > >>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> > >>>>>>>>> Then, we get the main issue:
> > >>>>>>>>>
> > >>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> > >>>>>>>>>     (XEN) domain_crash called from io.c:178
> > >>>>>>>>>
> > >>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> > >>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> > >>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> > >>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> > >>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> > >>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> > >>>>>>>
> > >>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> > >>>>>>> intermittently set, and skip the pause in domain_shutdown()
> > >>>>>>>
> > >>>>>>> I presume this lack of pause is to allow the vcpu in question to still
> > >>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> > >>>>>>> 0 clarifying details).
> > >>>>>>>
> > >>>>>>> What *should* happen is that, after consuming the reply, the vcpu should
> > >>>>>>> notice and pause itself, at which point it would yield to the
> > >>>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> > >>>>>>>
> > >>>>>>> Evidentially, this is not happening.
> > >>>>>>
> > >>>>>> We can't tell yet, until ...
> > >>>>>>
> > >>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> > >>>>>>> confirm whether we're getting into handle_pio() via the
> > >>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> > >>>>>>> we're fully re-entering the guest).
> > >>>>>>
> > >>>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> > >>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> > >>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> > >>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> > >>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> > >>>>>> instead of plain "true"?
> > >>>>>>
> > >>>>>> Adding Paul to Cc, as being the maintainer here.
> > >>>>>
> > >>>>> Got it, by sticking BUG() just before that domain_crash() in
> > >>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> > >>>>> v->defer_shutdown.
> > >>>>
> > >>>> As per the log they did get it set. I'd be curious of the flag's
> > >>>> value (as well as v->paused_for_shutdown's) at the point of the
> > >>>> problematic handle_pio() invocation (see below). It may be
> > >>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> > >>>> - before exiting to guest context (in order to then come back
> > >>>> and call handle_pio()) the vCPU ought to be getting through
> > >>>> there. No indication of it doing so would be a sign that there's
> > >>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> > >>>>
> > >>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > >>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > >>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > >>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>
> > >>>> Perhaps in this message could you also log
> > >>>> v->domain->is_shutting_down, v->defer_shutdown, and
> > >>>> v->paused_for_shutdown? (Would be nice if, after having made
> > >>>> changes to your debugging patch, you could point again at the
> > >>>> precise version you've used for the log provided.)
> > >>>>
> > >>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > >>>>> (XEN) Xen BUG at io.c:178
> > >>>>
> > >>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> > >>>> likely also do, keeping Xen alive.
> > >>>>
> > >>>
> > >>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?
> > >>
> > >> Where would this originate?
> > >
> > > I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )' at the top of
> > hvm_send_ioreq().
> > 
> > Ah yes. But this is just one way of things possibly going wrong. Plus
> > the function will return true when ->defer_shutdown is already or
> > (wrongly) still set.
> > 
> > >>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly
> > >> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a
> > >> server but then not finding a vcpu match in ioreq_vcpu_list.
> > >>
> > >> I was suspecting such, but at least the tearing down of all servers
> > >> happens only from relinquish-resources, which gets started only
> > >> after ->is_shut_down got set (unless the tool stack invoked
> > >> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> > >> set for the domain).
> > >>
> > >> For individually unregistered servers - yes, if qemu did so, this
> > >> would be a problem. They need to remain registered until all vCPU-s
> > >> in the domain got paused.
> > >
> > > It shouldn't be a problem should it? Destroying an individual server is only done with the domain
> > paused, so no vcpus can be running at the time.
> > 
> > Consider the case of one getting destroyed after it has already
> > returned data, but the originating vCPU didn't consume that data
> > yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
> > won't find the matching server anymore, and hence the chain
> > hvm_wait_for_io() -> hvm_io_assist() ->
> > vcpu_end_shutdown_deferral() would be skipped. handle_pio()
> > would then still correctly consume the result.
> 
> True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from the first switch).

I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in
hvmemul_do_io(). And it happens shortly after ioreq server is destroyed:

(XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
(XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
(XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
(XEN) d11v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:180:d11v0 hvmemul_do_io X86EMUL_UNHANDLEABLE: io_req.state 1
(XEN) d11v0 Unexpected PIO status 1, port 0xb004 read 0xffff

I've also made handle_pio message printed on v->defer_shutdown=1
regardless of the port, but didn't hit any other case than read from
0xb004.

Now, I'm going to try your patch.

> > Marek - to verify this doesn't happen (sorry, my qemu knowledge
> > is rather limited, and hence I don't know whether this can
> > happen at all), could you also log hvm_destroy_ioreq_server()
> > invocations?
> > 
> > Jan
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 15:48                       ` Paul Durrant
@ 2020-06-05 17:13                         ` 'Marek Marczykowski-Górecki'
  2020-06-05 17:24                           ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: 'Marek Marczykowski-Górecki' @ 2020-06-05 17:13 UTC (permalink / raw)
  To: paul; +Cc: 'Andrew Cooper', 'Jan Beulich', 'xen-devel'

[-- Attachment #1: Type: text/plain, Size: 4986 bytes --]

On Fri, Jun 05, 2020 at 04:48:39PM +0100, Paul Durrant wrote:
> This (untested) patch might help:

It is different now. I don't get domain_crash because of
X86EMUL_UNHANDLEABLE anymore, but I still see handle_pio looping for
some time. But it eventually ends, not really sure why.

I've tried the patch with a modification to make it build:

> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index c55c4bc4bc..8aa8779ba2 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
>      ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;
> 
>      if ( hvm_ioreq_needs_completion(ioreq) )
> -    {
> -        ioreq->state = STATE_IORESP_READY;
>          ioreq->data = data;
> -    }
> -    else
> -        ioreq->state = STATE_IOREQ_NONE;
> 
>      msix_write_completion(v);
>      vcpu_end_shutdown_deferral(v);
> @@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
>          }
>      }
> 
> +    ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ?
       vio->io_req->state ... &vio->io_req

> +        STATE_IORESP_READY : STATE_IOREQ_NONE;
> +
>      io_completion = vio->io_completion;
>      vio->io_completion = HVMIO_no_completion;
> 

The full patch (together with my debug prints):
https://gist.github.com/marmarek/da37da3722179057a6e7add4fb361e06

Note some of those X86EMUL_UNHANDLEABLE logged below are about an
intermediate state, not really hvmemul_do_io return value.

And the log:
(XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
(XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
(XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
(XEN) d4v0 hvm_destroy_ioreq_server called for 3, id 0
(XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
(XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d1v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d1v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d1v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
(XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
(XEN) grant_table.c:3702:d0v0 Grant release 0x24 ref 0x199 flags 0x2 d5
(XEN) grant_table.c:3702:d0v0 Grant release 0x25 ref 0x19a flags 0x2 d5
(XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
(XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
(XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d3v0 handle_pio port 0xb004 write 0xe3f8 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
(XEN) d3v0 handle_pio port 0xb000 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
(XEN) d3v0 handle_pio port 0xb000 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0

The last one repeats for some time, like 30s or some more (18425 times).
Note the port is different than before. Is it a guest waiting for being
destroyed after requesting so?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 17:13                         ` 'Marek Marczykowski-Górecki'
@ 2020-06-05 17:24                           ` Paul Durrant
  2020-06-05 20:43                             ` 'Marek Marczykowski-Górecki'
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-05 17:24 UTC (permalink / raw)
  To: 'Marek Marczykowski-Górecki'
  Cc: 'Andrew Cooper', 'Jan Beulich', 'xen-devel'

> -----Original Message-----
> From: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> Sent: 05 June 2020 18:14
> To: paul@xen.org
> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-
> devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On Fri, Jun 05, 2020 at 04:48:39PM +0100, Paul Durrant wrote:
> > This (untested) patch might help:
> 
> It is different now. I don't get domain_crash because of
> X86EMUL_UNHANDLEABLE anymore, but I still see handle_pio looping for
> some time. But it eventually ends, not really sure why.

That'll be the shutdown deferral, which I realised later that I forgot about...

> 
> I've tried the patch with a modification to make it build:
> 
> > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> > index c55c4bc4bc..8aa8779ba2 100644
> > --- a/xen/arch/x86/hvm/ioreq.c
> > +++ b/xen/arch/x86/hvm/ioreq.c
> > @@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
> >      ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;
> >
> >      if ( hvm_ioreq_needs_completion(ioreq) )
> > -    {
> > -        ioreq->state = STATE_IORESP_READY;
> >          ioreq->data = data;
> > -    }
> > -    else
> > -        ioreq->state = STATE_IOREQ_NONE;
> >
> >      msix_write_completion(v);
> >      vcpu_end_shutdown_deferral(v);

In fact, move both of these lines...

> > @@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
> >          }
> >      }
> >
> > +    ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ?
>        vio->io_req->state ... &vio->io_req
> 
> > +        STATE_IORESP_READY : STATE_IOREQ_NONE;
> > +

... to here too.

> >      io_completion = vio->io_completion;
> >      vio->io_completion = HVMIO_no_completion;
> >
> 
> The full patch (together with my debug prints):
> https://gist.github.com/marmarek/da37da3722179057a6e7add4fb361e06
> 
> Note some of those X86EMUL_UNHANDLEABLE logged below are about an
> intermediate state, not really hvmemul_do_io return value.
> 
> And the log:
> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> (XEN) d4v0 hvm_destroy_ioreq_server called for 3, id 0
> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d1v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d1v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 defer_shutdown 0 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d1v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> (XEN) grant_table.c:3702:d0v0 Grant release 0x24 ref 0x199 flags 0x2 d5
> (XEN) grant_table.c:3702:d0v0 Grant release 0x25 ref 0x19a flags 0x2 d5
> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d3v0 handle_pio port 0xb004 write 0xe3f8 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) emulate.c:263:d3v0 hvmemul_do_io got X86EMUL_UNHANDLEABLE from hvm_io_intercept req state 1
> (XEN) d3v0 handle_pio port 0xb000 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown
> 0 is_shut_down 0
> (XEN) d3v0 handle_pio port 0xb000 read 0x0000 is_shutting_down 1 defer_shutdown 1 paused_for_shutdown
> 0 is_shut_down 0
> 
> The last one repeats for some time, like 30s or some more (18425 times).
> Note the port is different than before. Is it a guest waiting for being
> destroyed after requesting so?
> 

I guess it is the destroy being held off by the shutdown deferral? Hopefully the above tweaks should sort that out.

  Paul

> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 17:24                           ` Paul Durrant
@ 2020-06-05 20:43                             ` 'Marek Marczykowski-Górecki'
  2020-06-07 13:32                               ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: 'Marek Marczykowski-Górecki' @ 2020-06-05 20:43 UTC (permalink / raw)
  To: paul; +Cc: 'Andrew Cooper', 'Jan Beulich', 'xen-devel'

[-- Attachment #1: Type: text/plain, Size: 2726 bytes --]

On Fri, Jun 05, 2020 at 06:24:20PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> > Sent: 05 June 2020 18:14
> > To: paul@xen.org
> > Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-
> > devel@lists.xenproject.org>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> > 
> > On Fri, Jun 05, 2020 at 04:48:39PM +0100, Paul Durrant wrote:
> > > This (untested) patch might help:
> > 
> > It is different now. I don't get domain_crash because of
> > X86EMUL_UNHANDLEABLE anymore, but I still see handle_pio looping for
> > some time. But it eventually ends, not really sure why.
> 
> That'll be the shutdown deferral, which I realised later that I forgot about...
> 
> > 
> > I've tried the patch with a modification to make it build:
> > 
> > > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> > > index c55c4bc4bc..8aa8779ba2 100644
> > > --- a/xen/arch/x86/hvm/ioreq.c
> > > +++ b/xen/arch/x86/hvm/ioreq.c
> > > @@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
> > >      ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;
> > >
> > >      if ( hvm_ioreq_needs_completion(ioreq) )
> > > -    {
> > > -        ioreq->state = STATE_IORESP_READY;
> > >          ioreq->data = data;
> > > -    }
> > > -    else
> > > -        ioreq->state = STATE_IOREQ_NONE;
> > >
> > >      msix_write_completion(v);
> > >      vcpu_end_shutdown_deferral(v);
> 
> In fact, move both of these lines...
> 
> > > @@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
> > >          }
> > >      }
> > >
> > > +    ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ?
> >        vio->io_req->state ... &vio->io_req
> > 
> > > +        STATE_IORESP_READY : STATE_IOREQ_NONE;
> > > +
> 
> ... to here too.
> 
> > >      io_completion = vio->io_completion;
> > >      vio->io_completion = HVMIO_no_completion;
> > >
> > 
> > The full patch (together with my debug prints):
> > https://gist.github.com/marmarek/da37da3722179057a6e7add4fb361e06

The current patch:
https://gist.github.com/marmarek/5ae795129c1be2dae13bfc517547c0f1

> I guess it is the destroy being held off by the shutdown deferral? Hopefully the above tweaks should sort that out.

Yes, now it works (tried 5 times in a row, previously it crashed on
the first or the second one). Thanks!

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 20:43                             ` 'Marek Marczykowski-Górecki'
@ 2020-06-07 13:32                               ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2020-06-07 13:32 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, xen-devel, Jan Beulich, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 3033 bytes --]

On Fri, 5 Jun 2020, 21:43 Marek Marczykowski-Górecki, <
marmarek@invisiblethingslab.com> wrote:

> On Fri, Jun 05, 2020 at 06:24:20PM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>
> > > Sent: 05 June 2020 18:14
> > > To: paul@xen.org
> > > Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Andrew Cooper' <
> andrew.cooper3@citrix.com>; 'xen-devel' <xen-
> > > devel@lists.xenproject.org>
> > > Subject: Re: handle_pio looping during domain shutdown, with qemu
> 4.2.0 in stubdom
> > >
> > > On Fri, Jun 05, 2020 at 04:48:39PM +0100, Paul Durrant wrote:
> > > > This (untested) patch might help:
> > >
> > > It is different now. I don't get domain_crash because of
> > > X86EMUL_UNHANDLEABLE anymore, but I still see handle_pio looping for
> > > some time. But it eventually ends, not really sure why.
> >
> > That'll be the shutdown deferral, which I realised later that I forgot
> about...
> >
> > >
> > > I've tried the patch with a modification to make it build:
> > >
> > > > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> > > > index c55c4bc4bc..8aa8779ba2 100644
> > > > --- a/xen/arch/x86/hvm/ioreq.c
> > > > +++ b/xen/arch/x86/hvm/ioreq.c
> > > > @@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu
> *sv, uint64_t data)
> > > >      ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;
> > > >
> > > >      if ( hvm_ioreq_needs_completion(ioreq) )
> > > > -    {
> > > > -        ioreq->state = STATE_IORESP_READY;
> > > >          ioreq->data = data;
> > > > -    }
> > > > -    else
> > > > -        ioreq->state = STATE_IOREQ_NONE;
> > > >
> > > >      msix_write_completion(v);
> > > >      vcpu_end_shutdown_deferral(v);
> >
> > In fact, move both of these lines...
> >
> > > > @@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
> > > >          }
> > > >      }
> > > >
> > > > +    ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ?
> > >        vio->io_req->state ... &vio->io_req
> > >
> > > > +        STATE_IORESP_READY : STATE_IOREQ_NONE;
> > > > +
> >
> > ... to here too.
> >
> > > >      io_completion = vio->io_completion;
> > > >      vio->io_completion = HVMIO_no_completion;
> > > >
> > >
> > > The full patch (together with my debug prints):
> > > https://gist.github.com/marmarek/da37da3722179057a6e7add4fb361e06
>
> The current patch:
> https://gist.github.com/marmarek/5ae795129c1be2dae13bfc517547c0f1
>
> > I guess it is the destroy being held off by the shutdown deferral?
> Hopefully the above tweaks should sort that out.
>
> Yes, now it works (tried 5 times in a row, previously it crashed on
> the first or the second one). Thanks!
>
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>

Cool. I will send a proper patch tomorrow.

  Paul

[-- Attachment #2: Type: text/html, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-05 16:18                     ` 'Marek Marczykowski-Górecki'
@ 2020-06-08  8:13                       ` Jan Beulich
  2020-06-08  9:15                         ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-08  8:13 UTC (permalink / raw)
  To: 'Marek Marczykowski-Górecki', paul
  Cc: 'Andrew Cooper', 'xen-devel'

On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote:
> On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote:
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 05 June 2020 14:57
>>>
>>> On 05.06.2020 15:37, Paul Durrant wrote:
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 05 June 2020 14:32
>>>>>
>>>>> On 05.06.2020 13:05, Paul Durrant wrote:
>>>>>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly
>>>>> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a
>>>>> server but then not finding a vcpu match in ioreq_vcpu_list.
>>>>>
>>>>> I was suspecting such, but at least the tearing down of all servers
>>>>> happens only from relinquish-resources, which gets started only
>>>>> after ->is_shut_down got set (unless the tool stack invoked
>>>>> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
>>>>> set for the domain).
>>>>>
>>>>> For individually unregistered servers - yes, if qemu did so, this
>>>>> would be a problem. They need to remain registered until all vCPU-s
>>>>> in the domain got paused.
>>>>
>>>> It shouldn't be a problem should it? Destroying an individual server is only done with the domain
>>> paused, so no vcpus can be running at the time.
>>>
>>> Consider the case of one getting destroyed after it has already
>>> returned data, but the originating vCPU didn't consume that data
>>> yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
>>> won't find the matching server anymore, and hence the chain
>>> hvm_wait_for_io() -> hvm_io_assist() ->
>>> vcpu_end_shutdown_deferral() would be skipped. handle_pio()
>>> would then still correctly consume the result.
>>
>> True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from the first switch).
> 
> I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in
> hvmemul_do_io(). And it happens shortly after ioreq server is destroyed:
> 
> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
> (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
> (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0

Can either of you tell why this is? As said before, qemu shouldn't
start tearing down ioreq servers until the domain has made it out
of all shutdown deferrals, and all its vCPU-s have been paused.
For the moment I think the proposed changes, while necessary, will
mask another issue elsewhere. The @releaseDomain xenstore watch,
being the trigger I would consider relevant here, will trigger
only once XEN_DOMINF_shutdown is reported set for a domain, which
gets derived from d->is_shut_down (i.e. not mistakenly
d->is_shutting_down).

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-08  8:13                       ` Jan Beulich
@ 2020-06-08  9:15                         ` Paul Durrant
  2020-06-08  9:24                           ` Jürgen Groß
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-08  9:15 UTC (permalink / raw)
  To: 'Jan Beulich', 'Marek Marczykowski-Górecki',
	'Ian Jackson'
  Cc: 'Andrew Cooper', 'xen-devel'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 08 June 2020 09:14
> To: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; paul@xen.org
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote:
> > On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote:
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>> Sent: 05 June 2020 14:57
> >>>
> >>> On 05.06.2020 15:37, Paul Durrant wrote:
> >>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>> Sent: 05 June 2020 14:32
> >>>>>
> >>>>> On 05.06.2020 13:05, Paul Durrant wrote:
> >>>>>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this
> clearly
> >>>>> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting
> a
> >>>>> server but then not finding a vcpu match in ioreq_vcpu_list.
> >>>>>
> >>>>> I was suspecting such, but at least the tearing down of all servers
> >>>>> happens only from relinquish-resources, which gets started only
> >>>>> after ->is_shut_down got set (unless the tool stack invoked
> >>>>> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> >>>>> set for the domain).
> >>>>>
> >>>>> For individually unregistered servers - yes, if qemu did so, this
> >>>>> would be a problem. They need to remain registered until all vCPU-s
> >>>>> in the domain got paused.
> >>>>
> >>>> It shouldn't be a problem should it? Destroying an individual server is only done with the domain
> >>> paused, so no vcpus can be running at the time.
> >>>
> >>> Consider the case of one getting destroyed after it has already
> >>> returned data, but the originating vCPU didn't consume that data
> >>> yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
> >>> won't find the matching server anymore, and hence the chain
> >>> hvm_wait_for_io() -> hvm_io_assist() ->
> >>> vcpu_end_shutdown_deferral() would be skipped. handle_pio()
> >>> would then still correctly consume the result.
> >>
> >> True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to
> IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from
> the first switch).
> >
> > I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in
> > hvmemul_do_io(). And it happens shortly after ioreq server is destroyed:
> >
> > (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
> > (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
> > (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
> > (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
> 
> Can either of you tell why this is? As said before, qemu shouldn't
> start tearing down ioreq servers until the domain has made it out
> of all shutdown deferrals, and all its vCPU-s have been paused.
> For the moment I think the proposed changes, while necessary, will
> mask another issue elsewhere. The @releaseDomain xenstore watch,
> being the trigger I would consider relevant here, will trigger
> only once XEN_DOMINF_shutdown is reported set for a domain, which
> gets derived from d->is_shut_down (i.e. not mistakenly
> d->is_shutting_down).

I can't find anything that actually calls xendevicemodel_shutdown(). It was added by:

commit 1462f9ea8f4219d520a530787b80c986e050aa98
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Fri Sep 15 17:21:14 2017 +0100

    tools: libxendevicemodel: Provide xendevicemodel_shutdown

    Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>

Perhaps Ian can shed more light on it?

  Paul

> 
> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-08  9:15                         ` Paul Durrant
@ 2020-06-08  9:24                           ` Jürgen Groß
  2020-06-08 12:38                             ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jürgen Groß @ 2020-06-08  9:24 UTC (permalink / raw)
  To: paul, 'Jan Beulich',
	'Marek Marczykowski-Górecki', 'Ian Jackson'
  Cc: 'Andrew Cooper', 'xen-devel'

On 08.06.20 11:15, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 08 June 2020 09:14
>> To: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; paul@xen.org
>> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
>>
>> On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote:
>>> On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote:
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 05 June 2020 14:57
>>>>>
>>>>> On 05.06.2020 15:37, Paul Durrant wrote:
>>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>>> Sent: 05 June 2020 14:32
>>>>>>>
>>>>>>> On 05.06.2020 13:05, Paul Durrant wrote:
>>>>>>>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this
>> clearly
>>>>>>> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting
>> a
>>>>>>> server but then not finding a vcpu match in ioreq_vcpu_list.
>>>>>>>
>>>>>>> I was suspecting such, but at least the tearing down of all servers
>>>>>>> happens only from relinquish-resources, which gets started only
>>>>>>> after ->is_shut_down got set (unless the tool stack invoked
>>>>>>> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
>>>>>>> set for the domain).
>>>>>>>
>>>>>>> For individually unregistered servers - yes, if qemu did so, this
>>>>>>> would be a problem. They need to remain registered until all vCPU-s
>>>>>>> in the domain got paused.
>>>>>>
>>>>>> It shouldn't be a problem should it? Destroying an individual server is only done with the domain
>>>>> paused, so no vcpus can be running at the time.
>>>>>
>>>>> Consider the case of one getting destroyed after it has already
>>>>> returned data, but the originating vCPU didn't consume that data
>>>>> yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
>>>>> won't find the matching server anymore, and hence the chain
>>>>> hvm_wait_for_io() -> hvm_io_assist() ->
>>>>> vcpu_end_shutdown_deferral() would be skipped. handle_pio()
>>>>> would then still correctly consume the result.
>>>>
>>>> True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to
>> IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from
>> the first switch).
>>>
>>> I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in
>>> hvmemul_do_io(). And it happens shortly after ioreq server is destroyed:
>>>
>>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
>>> (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
>>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
>>> (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
>>
>> Can either of you tell why this is? As said before, qemu shouldn't
>> start tearing down ioreq servers until the domain has made it out
>> of all shutdown deferrals, and all its vCPU-s have been paused.
>> For the moment I think the proposed changes, while necessary, will
>> mask another issue elsewhere. The @releaseDomain xenstore watch,
>> being the trigger I would consider relevant here, will trigger
>> only once XEN_DOMINF_shutdown is reported set for a domain, which
>> gets derived from d->is_shut_down (i.e. not mistakenly
>> d->is_shutting_down).
> 
> I can't find anything that actually calls xendevicemodel_shutdown(). It was added by:

destroy_hvm_domain() in qemu does.


Juergen


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-08  9:24                           ` Jürgen Groß
@ 2020-06-08 12:38                             ` Paul Durrant
  2020-06-08 12:46                               ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2020-06-08 12:38 UTC (permalink / raw)
  To: 'Jürgen Groß', 'Jan Beulich',
	'Marek Marczykowski-Górecki', 'Ian Jackson'
  Cc: 'Andrew Cooper', 'xen-devel'

> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 08 June 2020 10:25
> To: paul@xen.org; 'Jan Beulich' <jbeulich@suse.com>; 'Marek Marczykowski-Górecki'
> <marmarek@invisiblethingslab.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 08.06.20 11:15, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 08 June 2020 09:14
> >> To: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; paul@xen.org
> >> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>
> >> On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote:
> >>> On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote:
> >>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>> Sent: 05 June 2020 14:57
> >>>>>
> >>>>> On 05.06.2020 15:37, Paul Durrant wrote:
> >>>>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>>>> Sent: 05 June 2020 14:32
> >>>>>>>
> >>>>>>> On 05.06.2020 13:05, Paul Durrant wrote:
> >>>>>>>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this
> >> clearly
> >>>>>>> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to
> selecting
> >> a
> >>>>>>> server but then not finding a vcpu match in ioreq_vcpu_list.
> >>>>>>>
> >>>>>>> I was suspecting such, but at least the tearing down of all servers
> >>>>>>> happens only from relinquish-resources, which gets started only
> >>>>>>> after ->is_shut_down got set (unless the tool stack invoked
> >>>>>>> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> >>>>>>> set for the domain).
> >>>>>>>
> >>>>>>> For individually unregistered servers - yes, if qemu did so, this
> >>>>>>> would be a problem. They need to remain registered until all vCPU-s
> >>>>>>> in the domain got paused.
> >>>>>>
> >>>>>> It shouldn't be a problem should it? Destroying an individual server is only done with the
> domain
> >>>>> paused, so no vcpus can be running at the time.
> >>>>>
> >>>>> Consider the case of one getting destroyed after it has already
> >>>>> returned data, but the originating vCPU didn't consume that data
> >>>>> yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
> >>>>> won't find the matching server anymore, and hence the chain
> >>>>> hvm_wait_for_io() -> hvm_io_assist() ->
> >>>>> vcpu_end_shutdown_deferral() would be skipped. handle_pio()
> >>>>> would then still correctly consume the result.
> >>>>
> >>>> True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to
> >> IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE
> (from
> >> the first switch).
> >>>
> >>> I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in
> >>> hvmemul_do_io(). And it happens shortly after ioreq server is destroyed:
> >>>
> >>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
> >>> (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
> >>> (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
> >>
> >> Can either of you tell why this is? As said before, qemu shouldn't
> >> start tearing down ioreq servers until the domain has made it out
> >> of all shutdown deferrals, and all its vCPU-s have been paused.
> >> For the moment I think the proposed changes, while necessary, will
> >> mask another issue elsewhere. The @releaseDomain xenstore watch,
> >> being the trigger I would consider relevant here, will trigger
> >> only once XEN_DOMINF_shutdown is reported set for a domain, which
> >> gets derived from d->is_shut_down (i.e. not mistakenly
> >> d->is_shutting_down).
> >
> > I can't find anything that actually calls xendevicemodel_shutdown(). It was added by:
> 
> destroy_hvm_domain() in qemu does.
> 

Ah ok, thanks. So it looks like this should only normally be called when the guest has written to the PIIX to request shutdown. Presumably the hvm_destroy_ioreq_server call we see afterwards is QEMU then exiting.
There is one other circumstance when destroy_hvmdomain() would be called and that is if the ioreq state is not STATE_IOREQ_INPROCESS... in which case there should be an accompanying error message in the qemu log.

  Paul

> 
> Juergen



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-08 12:38                             ` Paul Durrant
@ 2020-06-08 12:46                               ` Jan Beulich
  2020-06-08 12:54                                 ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-06-08 12:46 UTC (permalink / raw)
  To: paul
  Cc: 'Jürgen Groß', 'Andrew Cooper',
	'Ian Jackson', 'Marek Marczykowski-Górecki',
	'xen-devel'

On 08.06.2020 14:38, Paul Durrant wrote:
>> From: Jürgen Groß <jgross@suse.com>
>> Sent: 08 June 2020 10:25
>>
>> On 08.06.20 11:15, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 08 June 2020 09:14
>>>>
>>>> On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote:
>>>>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
>>>>> (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
>>>>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
>>>>> (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
>>>>
>>>> Can either of you tell why this is? As said before, qemu shouldn't
>>>> start tearing down ioreq servers until the domain has made it out
>>>> of all shutdown deferrals, and all its vCPU-s have been paused.
>>>> For the moment I think the proposed changes, while necessary, will
>>>> mask another issue elsewhere. The @releaseDomain xenstore watch,
>>>> being the trigger I would consider relevant here, will trigger
>>>> only once XEN_DOMINF_shutdown is reported set for a domain, which
>>>> gets derived from d->is_shut_down (i.e. not mistakenly
>>>> d->is_shutting_down).
>>>
>>> I can't find anything that actually calls xendevicemodel_shutdown(). It was added by:
>>
>> destroy_hvm_domain() in qemu does.
>>
> 
> Ah ok, thanks. So it looks like this should only normally be called when the guest has written to the PIIX to request shutdown. Presumably the hvm_destroy_ioreq_server call we see afterwards is QEMU then exiting.

If a shutdown request was received, qemu should still not exit as long
as there's in-flight I/O, or as long as there are CPUs running in the
guest. It would seem legitimate to "cancel" in-flight I/O (and perhaps
"reject" further requests arriving), but the device model needs to
remain there as long as there's at least one running CPU. That's no
different on "real" hardware - the machine doesn't "disappear" just
because shutdown was requested.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
  2020-06-08 12:46                               ` Jan Beulich
@ 2020-06-08 12:54                                 ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2020-06-08 12:54 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Jürgen Groß', 'Andrew Cooper',
	'Ian Jackson', 'Marek Marczykowski-Górecki',
	'xen-devel'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 08 June 2020 13:47
> To: paul@xen.org
> Cc: 'Jürgen Groß' <jgross@suse.com>; 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>;
> 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'xen-devel'
> <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 08.06.2020 14:38, Paul Durrant wrote:
> >> From: Jürgen Groß <jgross@suse.com>
> >> Sent: 08 June 2020 10:25
> >>
> >> On 08.06.20 11:15, Paul Durrant wrote:
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 08 June 2020 09:14
> >>>>
> >>>> On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote:
> >>>>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
> >>>>> (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>>>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
> >>>>> (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
> >>>>
> >>>> Can either of you tell why this is? As said before, qemu shouldn't
> >>>> start tearing down ioreq servers until the domain has made it out
> >>>> of all shutdown deferrals, and all its vCPU-s have been paused.
> >>>> For the moment I think the proposed changes, while necessary, will
> >>>> mask another issue elsewhere. The @releaseDomain xenstore watch,
> >>>> being the trigger I would consider relevant here, will trigger
> >>>> only once XEN_DOMINF_shutdown is reported set for a domain, which
> >>>> gets derived from d->is_shut_down (i.e. not mistakenly
> >>>> d->is_shutting_down).
> >>>
> >>> I can't find anything that actually calls xendevicemodel_shutdown(). It was added by:
> >>
> >> destroy_hvm_domain() in qemu does.
> >>
> >
> > Ah ok, thanks. So it looks like this should only normally be called when the guest has written to
> the PIIX to request shutdown. Presumably the hvm_destroy_ioreq_server call we see afterwards is QEMU
> then exiting.
> 
> If a shutdown request was received, qemu should still not exit as long
> as there's in-flight I/O, or as long as there are CPUs running in the
> guest. It would seem legitimate to "cancel" in-flight I/O (and perhaps
> "reject" further requests arriving), but the device model needs to
> remain there as long as there's at least one running CPU. That's no
> different on "real" hardware - the machine doesn't "disappear" just
> because shutdown was requested.

Yes, I would not expect QEMU to exit straight away... or at all, until killed by the toolstack. So that's the next thing to investigate.

  Paul

> 
> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-06-08 12:54 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04  1:46 handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom Marek Marczykowski-Górecki
2020-06-04  7:08 ` Jan Beulich
2020-06-04  7:17   ` Jan Beulich
2020-06-04 11:13   ` Andrew Cooper
2020-06-04 12:36     ` Jan Beulich
2020-06-04 14:25       ` Marek Marczykowski-Górecki
2020-06-05  9:09         ` Jan Beulich
2020-06-05  9:22           ` Jan Beulich
2020-06-05 12:01             ` Marek Marczykowski-Górecki
2020-06-05 12:39               ` Paul Durrant
2020-06-05 13:04                 ` 'Marek Marczykowski-Górecki'
2020-06-05 13:24                   ` Paul Durrant
2020-06-05 12:44               ` Andrew Cooper
2020-06-05 14:13               ` Jan Beulich
2020-06-05 11:05           ` Paul Durrant
2020-06-05 11:25             ` Paul Durrant
2020-06-05 13:36               ` Jan Beulich
2020-06-05 13:43                 ` Paul Durrant
2020-06-05 13:46                   ` Jan Beulich
2020-06-05 13:49                     ` Paul Durrant
2020-06-05 15:48                       ` Paul Durrant
2020-06-05 17:13                         ` 'Marek Marczykowski-Górecki'
2020-06-05 17:24                           ` Paul Durrant
2020-06-05 20:43                             ` 'Marek Marczykowski-Górecki'
2020-06-07 13:32                               ` Paul Durrant
2020-06-05 13:32             ` Jan Beulich
2020-06-05 13:37               ` Paul Durrant
2020-06-05 13:57                 ` Jan Beulich
2020-06-05 15:39                   ` Paul Durrant
2020-06-05 16:18                     ` 'Marek Marczykowski-Górecki'
2020-06-08  8:13                       ` Jan Beulich
2020-06-08  9:15                         ` Paul Durrant
2020-06-08  9:24                           ` Jürgen Groß
2020-06-08 12:38                             ` Paul Durrant
2020-06-08 12:46                               ` Jan Beulich
2020-06-08 12:54                                 ` Paul Durrant
2020-06-05  9:38 ` Jan Beulich
2020-06-05 11:18   ` Marek Marczykowski-Górecki
2020-06-05 13:59     ` Jan Beulich
2020-06-05 15:10       ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.