xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Tamas K Lengyel <tamas@tklengyel.com>
To: Xen-devel <xen-devel@lists.xenproject.org>
Cc: "Petre Pircalabu" <ppircalabu@bitdefender.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Julien Grall" <julien@xen.org>, "Wei Liu" <wl@xen.org>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Ian Jackson" <ian.jackson@eu.citrix.com>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Jan Beulich" <jbeulich@suse.com>,
	"Alexandru Isaila" <aisaila@bitdefender.com>,
	"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events
Date: Mon, 1 Jun 2020 12:58:44 -0600	[thread overview]
Message-ID: <CABfawh=YNVqYgnTwAaqTdWxNW_m2z7KD7ku0mWZGDnDMcha1+A@mail.gmail.com> (raw)
In-Reply-To: <cover.1590028160.git.tamas@tklengyel.com>

On Wed, May 20, 2020 at 8:31 PM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> For the last couple years we have received numerous reports from users of
> monitor vm_events of spurious guest crashes when using events. In particular,
> it has observed that the problem occurs when vm_events are being disabled. The
> nature of the guest crash varied widely and has only occured occasionally. This
> made debugging the issue particularly hard. We had discussions about this issue
> even here on the xen-devel mailinglist with no luck figuring it out.
>
> The bug has now been identified as a race-condition between register event
> handling and disabling the vm_event interface.
>
> Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register writes
> if refused by  vm_event reply" is the patch that introduced the error. In this
> patch emulation of register write events can be postponed until the
> corresponding vm_event handler decides whether to allow such write to take
> place. Unfortunately this can only be implemented by performing the deny/allow
> step when the vCPU gets scheduled. Due to that postponed emulation of the event
> if the user decides to pause the VM in the vm_event handler and then disable
> events, the entire emulation step is skipped the next time the vCPU is resumed.
> Even if the user doesn't pause during the vm_event handling but exits
> immediately and disables vm_event, the situation becomes racey as disabling
> vm_event may succeed before the guest's vCPUs get scheduled with the pending
> emulation task. This has been particularly the case with VMS  that have several
> vCPUs as after the VM is unpaused it may actually take a long time before all
> vCPUs get scheduled.
>
> The only solution currently is to poll each vCPU before vm_events are disabled
> to verify they had been scheduled before it is safe to disable vm_events. The
> following patches resolve this issue in a much nicer way.
>
> Patch 1 adds an option to the monitor_op domctl that needs to be specified if
>     the user wants to actually use the postponed register-write handling
>     mechanism. If that option is not specified then handling is performed the
>     same way as before patch 96760e2fba100d694300a81baddb5740e0f8c0ee.
>
> Patch 2 performs sanity checking when disabling vm_events to determine whether
>     its safe to free all vm_event structures. The vCPUs still get unpaused to
>     allow them to get scheduled and perform any of their pending operations,
>     but otherwise an -EAGAIN error is returned signaling to the user that they
>     need to wait and try again disabling the interface.
>
> Patch 3 adds a vm_event specifically to signal to the user when it is safe to
>     continue disabling the interface.
>
> Shout out to our friends at CERT.pl for stumbling upon a crucial piece of
> information that lead to finally squashing this nasty bug.
>
> v2: minor adjustments based on Jan's comments

Patch ping.

Tamas


      parent reply	other threads:[~2020-06-01 19:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21  2:31 [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
2020-05-21  2:31 ` [PATCH v2 for-4.14 1/3] xen/monitor: Control register values Tamas K Lengyel
2020-06-02 11:08   ` Roger Pau Monné
2020-06-02 12:40     ` Tamas K Lengyel
2020-06-02 12:47       ` Jan Beulich
2020-06-02 12:51         ` Tamas K Lengyel
2020-06-02 13:00           ` Jan Beulich
2020-06-02 13:10             ` Tamas K Lengyel
2020-06-03  8:04               ` Roger Pau Monné
2020-06-02 13:01       ` Roger Pau Monné
2020-06-02 13:04         ` Jan Beulich
2020-06-02 13:07           ` Tamas K Lengyel
2020-06-02 13:09         ` Tamas K Lengyel
2020-05-21  2:31 ` [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op Tamas K Lengyel
2020-06-02 11:47   ` Roger Pau Monné
2020-06-02 11:50     ` Jan Beulich
2020-06-02 12:43     ` Tamas K Lengyel
2020-05-21  2:31 ` [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event Tamas K Lengyel
2020-06-02 12:54   ` Roger Pau Monné
2020-06-02 13:06     ` Tamas K Lengyel
2020-06-01 18:58 ` Tamas K Lengyel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABfawh=YNVqYgnTwAaqTdWxNW_m2z7KD7ku0mWZGDnDMcha1+A@mail.gmail.com' \
    --to=tamas@tklengyel.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=aisaila@bitdefender.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=ppircalabu@bitdefender.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).