All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Fam Zheng <famz@redhat.com>, Juan Quintela <quintela@redhat.com>,
	mdroth@linux.vnet.ibm.com, Eric Blake <eblake@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Wed, 6 Sep 2017 11:57:05 +0100	[thread overview]
Message-ID: <20170906105704.GC2215@work-vm> (raw)
In-Reply-To: <20170906105414.GL15510@redhat.com>

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Wed, Sep 06, 2017 at 11:48:51AM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Wed, Sep 06, 2017 at 10:48:46AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > On Wed, Aug 23, 2017 at 02:51:03PM +0800, Peter Xu wrote:
> > > > > > v2:
> > > > > > - fixed "make check" error that patchew reported
> > > > > > - moved the thread_join upper in monitor_data_destroy(), before
> > > > > >   resources are released
> > > > > > - added one new patch (current patch 3) that fixes a nasty risk
> > > > > >   condition with IOWatchPoll.  Please see commit message for more
> > > > > >   information.
> > > > > > - added a g_main_context_wakeup() to make sure the separate loop
> > > > > >   thread can be kicked always when we want to destroy the per-monitor
> > > > > >   threads.
> > > > > > - added one new patch (current patch 8) to introduce migration mgmt
> > > > > >   lock for migrate_incoming.
> > > > > > 
> > > > > > This is an extended work for migration postcopy recovery. This series
> > > > > > is tested with the following series to make sure it solves the monitor
> > > > > > hang problem that we have encountered for postcopy recovery:
> > > > > > 
> > > > > >   [RFC 00/29] Migration: postcopy failure recovery
> > > > > >   [RFC 0/6] migration: re-use migrate_incoming for postcopy recovery
> > > > > > 
> > > > > > The root problem is that, monitor commands are all handled in main
> > > > > > loop thread now, no matter how many monitors we specify. And, if main
> > > > > > loop thread hangs due to some reason, all monitors will be stuck.
> > > > > > This can be done in reversed order as well: if any of the monitor
> > > > > > hangs, it will hang the main loop, and the rest of the monitors (if
> > > > > > there is any).
> > > > > > 
> > > > > > That affects postcopy recovery, since the recovery requires user input
> > > > > > on destination side.  If monitors hang, the destination VM dies and
> > > > > > lose hope for even a final recovery.
> > > > > > 
> > > > > > So, sometimes we need to make sure the monitor be alive, at least one
> > > > > > of them.
> > > > > > 
> > > > > > The whole idea of this series is that instead if handling monitor
> > > > > > commands all in main loop thread, we do it separately in per-monitor
> > > > > > threads.  Then, even if main loop thread hangs at any point by any
> > > > > > reason, per-monitor thread can still survive.  Further, we add hint in
> > > > > > QMP/HMP to show whether a command can be executed without QMP, if so,
> > > > > > we avoid taking BQL when running that command.  It greatly reduced
> > > > > > contention of BQL.  Now the only user of that new parameter (currently
> > > > > > I call it "without-bql") is "migrate-incoming" command, which is the
> > > > > > only command to rescue a paused postcopy migration.
> > > > > > 
> > > > > > However, even with the series, it does not mean that per-monitor
> > > > > > threads will never hang.  One example is that we can still run "info
> > > > > > vcpus" in per-monitor threads during a paused postcopy (in that state,
> > > > > > page faults are never handled, and "info cpus" will never return since
> > > > > > it tries to sync every vcpus).  So to make sure it does not hang, we
> > > > > > not only need the per-monitor thread, the user should be careful as
> > > > > > well on how to use it.
> > > > > > 
> > > > > > For postcopy recovery, we may need dedicated monitor channel for
> > > > > > recovery.  In other words, a destination VM that supports postcopy
> > > > > > recovery would possibly need:
> > > > > > 
> > > > > >   -qmp MAIN_CHANNEL -qmp RECOVERY_CHANNEL
> > > > > 
> > > > > I think this is a really horrible thing to expose to management applications.
> > > > > They should not need to be aware of fact that QEMU is buggy and thus requires
> > > > > that certain commands be run on different monitors to work around the bug.
> > > > 
> > > > It's unfortunately baked in way too deep to fix in the near term; the
> > > > BQL is just too cantagious and we have a fundamental design of running
> > > > all the main IO emulation in one thread.
> > > > 
> > > > > I'd much prefer to see the problem described handled transparently inside
> > > > > QEMU. One approach is have a dedicated thread in QEMU responsible for all
> > > > > monitor I/O. This thread should never actually execute monitor commands
> > > > > though, it would simply parse the command request and put data onto a queue
> > > > > of pending commands, thus it could never hang. The command queue could be
> > > > > processed by the main thread, or by another thread that is interested.
> > > > > eg the migration thread could process any queued commands related to
> > > > > migration directly.
> > > > 
> > > > That requires a change in the current API to allow async command
> > > > completion (OK that is something Marc-Andre's world has) so that
> > > > from the one connection you can have multiple outstanding commands.
> > > > Hmm unless....
> > > > 
> > > > We've also got problems that some commands don't like being run outside
> > > > of the main thread (see Fam's reply on the 21st pointing out that a lot
> > > > of block commands would assert).
> > > > 
> > > > I think the way to move to what you describe would be:
> > > >   a) A separate thread for monitor IO
> > > >       This seems a separate problem
> > > >       How hard is that?  Will all the current IO mechanisms used
> > > >       for monitors just work if we run them in a separate thread?
> > > >       What about mux?
> > > > 
> > > >   b) Initially all commands get dispatched to the main thread
> > > >      so nothing changes about the API.
> > > > 
> > > >   c) We create a new thread for the lock-free commands, and route
> > > >       lock-free commands down it.
> > > > 
> > > >   d) We start with a rule that on any one monitor connection we
> > > >   don't allow you to start a command until the previous one has
> > > >   finished
> > > > 
> > > > (d) allows us to avoid any API changes, but allows us to do lock-free
> > > > stuff on a separate connection like Peter's world.
> > > > We can drop (d) once we have a way of doing async commands.
> > > > We can add dispatching to more threads once someone describes
> > > > what they want from those threads.
> > > > 
> > > > Does that work for you Dan?
> > > 
> > > It would *provided* that we do (c) for the commands Peter wants for
> > > this migration series.  IOW, I don't want to have to have logic in
> > > libvirt that either needs to add a 2nd monitor server, or open a 2nd
> > > monitor connection, to deal with migration post-copy recovery in some
> > > versions of QEMU.  So whatever is needed to make post-copy recovery
> > > work has to be done for (c).
> > 
> > But then doesn't that mean you're requiring us to break (d) and change
> > the QMP interface to libvirt so it can do async stuff?
> 
> Depends on your definition of break - I'm assuming there's either a way
> to opt-in to use of a async mode for existing commands in (c), or that
> async commands would be added in parallel with existing sync commands.
> IOW, its not a API breakage - its an opt-in extension of existing
> functionality.

But you'd need to do async commands for all commands you issued to avoid
blocking the io thread so that you could then issue the recovery
commands.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-09-06 10:57 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23  6:51 [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 1/8] monitor: move skip_flush into monitor_data_init Peter Xu
2017-08-23 16:31   ` Dr. David Alan Gilbert
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll Peter Xu
2017-08-23 17:35   ` Dr. David Alan Gilbert
2017-08-25  4:25     ` Peter Xu
2017-08-25  9:30       ` Dr. David Alan Gilbert
2017-08-28  5:53         ` Peter Xu
2017-09-08 17:29           ` Dr. David Alan Gilbert
2017-08-25 15:27   ` Marc-André Lureau
2017-08-25 15:33     ` Dr. David Alan Gilbert
2017-08-25 16:07       ` Marc-André Lureau
2017-08-25 16:12         ` Dr. David Alan Gilbert
2017-08-25 16:21           ` Marc-André Lureau
2017-08-25 16:29             ` Dr. David Alan Gilbert
2017-08-26  8:33               ` Marc-André Lureau
2017-08-28  3:05         ` Peter Xu
2017-08-28 10:11           ` Marc-André Lureau
2017-08-28 12:48             ` Peter Xu
2017-09-05 18:58               ` Dr. David Alan Gilbert
2017-08-28 11:08         ` Markus Armbruster
2017-08-28 12:28           ` Marc-André Lureau
2017-08-28 16:24             ` Markus Armbruster
2017-08-28 17:24               ` Marc-André Lureau
2017-08-29  6:27                 ` Markus Armbruster
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 3/8] char-io: fix possible risk on IOWatchPoll Peter Xu
2017-08-25 14:44   ` Marc-André Lureau
2017-08-26  7:19   ` Fam Zheng
2017-08-28  5:56     ` Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 4/8] QAPI: new QMP command option "without-bql" Peter Xu
2017-08-23 17:44   ` Dr. David Alan Gilbert
2017-08-23 23:37     ` Fam Zheng
2017-08-25  5:37       ` Peter Xu
2017-08-25  9:14         ` Dr. David Alan Gilbert
2017-08-28  8:08           ` Peter Xu
2017-09-08 17:38             ` Dr. David Alan Gilbert
2017-08-25  5:35     ` Peter Xu
2017-08-25  9:06       ` Dr. David Alan Gilbert
2017-08-28  8:26         ` Peter Xu
2017-09-08 17:52           ` Dr. David Alan Gilbert
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 5/8] hmp: support "without_bql" Peter Xu
2017-08-23 17:46   ` Dr. David Alan Gilbert
2017-08-25  5:44     ` Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 6/8] migration: qmp: migrate_incoming don't need BQL Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 7/8] migration: hmp: " Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 8/8] migration: add incoming mgmt lock Peter Xu
2017-08-23 18:01   ` Dr. David Alan Gilbert
2017-08-25  5:49     ` Peter Xu
2017-08-25  9:34       ` Dr. David Alan Gilbert
2017-08-28  8:39         ` Peter Xu
2017-08-29 11:03 ` [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread Daniel P. Berrange
2017-08-30  7:06   ` Markus Armbruster
2017-08-30 10:13     ` Daniel P. Berrange
2017-08-31  3:31       ` Peter Xu
2017-08-31  9:14         ` Daniel P. Berrange
2017-09-06  9:48   ` Dr. David Alan Gilbert
2017-09-06 10:46     ` Daniel P. Berrange
2017-09-06 10:48       ` Dr. David Alan Gilbert
2017-09-06 10:54         ` Daniel P. Berrange
2017-09-06 10:57           ` Dr. David Alan Gilbert [this message]
2017-09-06 11:06             ` Daniel P. Berrange
2017-09-06 11:31               ` Dr. David Alan Gilbert
2017-09-06 11:54                 ` Daniel P. Berrange
2017-09-07  8:13                   ` Peter Xu
2017-09-07  8:49                     ` Stefan Hajnoczi
2017-09-07  9:18                       ` Dr. David Alan Gilbert
2017-09-07 10:19                         ` Stefan Hajnoczi
2017-09-07 10:24                         ` Peter Xu
2017-09-07  8:55                     ` Daniel P. Berrange
2017-09-07  9:19                       ` Dr. David Alan Gilbert
2017-09-07  9:22                         ` Daniel P. Berrange
2017-09-07  9:27                           ` Dr. David Alan Gilbert
2017-09-07 11:19                         ` Markus Armbruster
2017-09-07 11:31                           ` Dr. David Alan Gilbert
2017-09-07  9:15                     ` Dr. David Alan Gilbert
2017-09-07  9:25                       ` Daniel P. Berrange
2017-09-07 12:59                     ` Markus Armbruster
2017-09-07 13:22                       ` Daniel P. Berrange
2017-09-07 17:41                         ` Markus Armbruster
2017-09-07 18:09                           ` Dr. David Alan Gilbert
2017-09-08  8:41                             ` Markus Armbruster
2017-09-08  9:32                               ` Dr. David Alan Gilbert
2017-09-08 11:49                                 ` Markus Armbruster
2017-09-08 13:19                                   ` Stefan Hajnoczi
2017-09-11 10:32                                   ` Peter Xu
2017-09-11 10:36                                     ` Peter Xu
2017-09-11 10:43                                   ` Daniel P. Berrange
2017-09-08  9:27                           ` Daniel P. Berrange
2017-09-07 14:20                       ` Dr. David Alan Gilbert
2017-09-07 17:41                         ` Markus Armbruster
2017-09-07 18:04                           ` Dr. David Alan Gilbert
2017-09-07 10:04                   ` Dr. David Alan Gilbert
2017-09-07 10:08                     ` Daniel P. Berrange
2017-09-07 13:59                 ` Eric Blake
2017-09-06 14:50 ` Stefan Hajnoczi
2017-09-06 15:14   ` Dr. David Alan Gilbert
2017-09-07  7:38     ` Peter Xu
2017-09-07  8:58     ` Stefan Hajnoczi
2017-09-07  9:35       ` Dr. David Alan Gilbert
2017-09-07 10:09         ` Stefan Hajnoczi
2017-09-07 12:02           ` Peter Xu
2017-09-07 16:53             ` Stefan Hajnoczi
2017-09-07 17:14               ` Dr. David Alan Gilbert
2017-09-07 17:35                 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170906105704.GC2215@work-vm \
    --to=dgilbert@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.