All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] vm_event: fix race-condition when disabling monitor events
@ 2020-05-16  0:53 Michał Leszczyński
  0 siblings, 0 replies; 2+ messages in thread
From: Michał Leszczyński @ 2020-05-16  0:53 UTC (permalink / raw)
  To: xen-devel

> For the last couple years we have received numerous reports from users of
> monitor vm_events of spurious guest crashes when using events. In particular,
> it has observed that the problem occurs when vm_events are being disabled. The
> nature of the guest crash varied widely and has only occured occasionally. This
> made debugging the issue particularly hard. We had discussions about this issue
> even here on the xen-devel mailinglist with no luck figuring it out.
> 
> The bug has now been identified as a race-condition between register event
> handling and disabling the vm_event interface.
> 
> Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register writes
> if refused by  vm_event reply" is the patch that introduced the error. In this
> patch emulation of register write events can be postponed until the
> corresponding vm_event handler decides whether to allow such write to take
> place. Unfortunately this can only be implemented by performing the deny/allow
> step when the vCPU gets scheduled. Due to that postponed emulation of the event
> if the user decides to pause the VM in the vm_event handler and then disable
> events, the entire emulation step is skipped the next time the vCPU is resumed.
> Even if the user doesn't pause during the vm_event handling but exits
> immediately and disables vm_event, the situation becomes racey as disabling
> vm_event may succeed before the guest's vCPUs get scheduled with the pending
> emulation task. This has been particularly the case with VMs that have several
> vCPUs as after the VM is unpaused it may actually take a long time before all
?vCPUs get scheduled.
> 
> The only solution currently is to poll each vCPU before vm_events are disabled
> to verify they had been scheduled. The following patches resolve this issue in
> a much nicer way.
> 
> Patch 1 adds an option to the monitor_op domctl that needs to be specified if
>     the user wants to actually use the postponed register-write handling
>     mechanism. If that option is not specified then handling is performed the
>     same way as before patch 96760e2fba100d694300a81baddb5740e0f8c0ee.
>     
> Patch 2 performs sanity checking when disabling vm_events to determine whether
>     its safe to free all vm_event structures. The vCPUs still get unpaused to
>     allow them to get scheduled and perform any of their pending operations,
>     but otherwise an -EAGAIN error is returned signaling to the user that they
>     need to wait and try again disabling the interface.
>     
> Patch 3 adds a vm_event specifically to signal to the user when it is safe to
>     continue disabling the interface.
>     
> Shout out to our friends at CERT.pl for stumbling upon a crucial piece of
> information that lead to finally squashing this nasty bug.
> 
> Tamas K Lengyel (3):
>   xen/monitor: Control register values
>   xen/vm_event: add vm_event_check_pending_op
>   xen/vm_event: Add safe to disable vm_event
> 
>  xen/arch/x86/hvm/hvm.c            | 69 +++++++++++++++++++++++--------
>  xen/arch/x86/hvm/monitor.c        | 14 +++++++
>  xen/arch/x86/monitor.c            | 23 ++++++++++-
>  xen/arch/x86/vm_event.c           | 23 +++++++++++
>  xen/common/vm_event.c             | 17 ++++++--
>  xen/include/asm-arm/vm_event.h    |  7 ++++
>  xen/include/asm-x86/domain.h      |  2 +
>  xen/include/asm-x86/hvm/monitor.h |  1 +
>  xen/include/asm-x86/vm_event.h    |  2 +
>  xen/include/public/domctl.h       |  3 ++
>  xen/include/public/vm_event.h     |  8 ++++
>  11 files changed, 147 insertions(+), 22 deletions(-)
> 
> -- 
> 2.26.1

Hi,

I have reproduced the mentioned race-condition between register event handling and disabling the vm_event interface. With Xen 4.13.0 and Windows 7 x64 DomU (2 VCPUs), my test program causes random BSODs on DomU once vm_event interface is disabled. I can confirm that after applying Patch 1 to Xen 4.13.0 the test program doesn't crash the DomU anymore, so it would actually resolve the mentioned bug.

Best regards,
Michał Leszczyński
CERT Poland


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH 0/3] vm_event: fix race-condition when disabling monitor events
@ 2020-05-15 16:52 Tamas K Lengyel
  0 siblings, 0 replies; 2+ messages in thread
From: Tamas K Lengyel @ 2020-05-15 16:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Stefano Stabellini,
	Jan Beulich, Alexandru Isaila, Volodymyr Babchuk,
	Roger Pau Monné

For the last couple years we have received numerous reports from users of
monitor vm_events of spurious guest crashes when using events. In particular,
it has observed that the problem occurs when vm_events are being disabled. The
nature of the guest crash varied widely and has only occured occasionally. This
made debugging the issue particularly hard. We had discussions about this issue
even here on the xen-devel mailinglist with no luck figuring it out.

The bug has now been identified as a race-condition between register event
handling and disabling the vm_event interface.

Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register writes
if refused by  vm_event reply" is the patch that introduced the error. In this
patch emulation of register write events can be postponed until the
corresponding vm_event handler decides whether to allow such write to take
place. Unfortunately this can only be implemented by performing the deny/allow
step when the vCPU gets scheduled. Due to that postponed emulation of the event
if the user decides to pause the VM in the vm_event handler and then disable
events, the entire emulation step is skipped the next time the vCPU is resumed.
Even if the user doesn't pause during the vm_event handling but exits
immediately and disables vm_event, the situation becomes racey as disabling
vm_event may succeed before the guest's vCPUs get scheduled with the pending
emulation task. This has been particularly the case with VMs that have several
vCPUs as after the VM is unpaused it may actually take a long time before all
vCPUs get scheduled.

The only solution currently is to poll each vCPU before vm_events are disabled
to verify they had been scheduled. The following patches resolve this issue in
a much nicer way.

Patch 1 adds an option to the monitor_op domctl that needs to be specified if
    the user wants to actually use the postponed register-write handling
    mechanism. If that option is not specified then handling is performed the
    same way as before patch 96760e2fba100d694300a81baddb5740e0f8c0ee.
    
Patch 2 performs sanity checking when disabling vm_events to determine whether
    its safe to free all vm_event structures. The vCPUs still get unpaused to
    allow them to get scheduled and perform any of their pending operations,
    but otherwise an -EAGAIN error is returned signaling to the user that they
    need to wait and try again disabling the interface.
    
Patch 3 adds a vm_event specifically to signal to the user when it is safe to
    continue disabling the interface.
    
Shout out to our friends at CERT.pl for stumbling upon a crucial piece of
information that lead to finally squashing this nasty bug.

Tamas K Lengyel (3):
  xen/monitor: Control register values
  xen/vm_event: add vm_event_check_pending_op
  xen/vm_event: Add safe to disable vm_event

 xen/arch/x86/hvm/hvm.c            | 69 +++++++++++++++++++++++--------
 xen/arch/x86/hvm/monitor.c        | 14 +++++++
 xen/arch/x86/monitor.c            | 23 ++++++++++-
 xen/arch/x86/vm_event.c           | 23 +++++++++++
 xen/common/vm_event.c             | 17 ++++++--
 xen/include/asm-arm/vm_event.h    |  7 ++++
 xen/include/asm-x86/domain.h      |  2 +
 xen/include/asm-x86/hvm/monitor.h |  1 +
 xen/include/asm-x86/vm_event.h    |  2 +
 xen/include/public/domctl.h       |  3 ++
 xen/include/public/vm_event.h     |  8 ++++
 11 files changed, 147 insertions(+), 22 deletions(-)

-- 
2.26.1



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-16  0:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-16  0:53 [PATCH 0/3] vm_event: fix race-condition when disabling monitor events Michał Leszczyński
  -- strict thread matches above, loose matches on Subject: below --
2020-05-15 16:52 Tamas K Lengyel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.