All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Peter Xu" <peterx@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Fam Zheng" <famz@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"John Snow" <jsnow@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Wed, 30 Aug 2017 11:13:11 +0100	[thread overview]
Message-ID: <20170830101311.GB18526@redhat.com> (raw)
In-Reply-To: <877exllekz.fsf@dusky.pond.sub.org>

On Wed, Aug 30, 2017 at 09:06:20AM +0200, Markus Armbruster wrote:
> "Daniel P. Berrange" <berrange@redhat.com> writes:
> 
> > On Wed, Aug 23, 2017 at 02:51:03PM +0800, Peter Xu wrote:
> 
> >> However, even with the series, it does not mean that per-monitor
> >> threads will never hang.  One example is that we can still run "info
> >> vcpus" in per-monitor threads during a paused postcopy (in that state,
> >> page faults are never handled, and "info cpus" will never return since
> >> it tries to sync every vcpus).  So to make sure it does not hang, we
> >> not only need the per-monitor thread, the user should be careful as
> >> well on how to use it.
> >> 
> >> For postcopy recovery, we may need dedicated monitor channel for
> >> recovery.  In other words, a destination VM that supports postcopy
> >> recovery would possibly need:
> >> 
> >>   -qmp MAIN_CHANNEL -qmp RECOVERY_CHANNEL
> 
> Where RECOVERY_CHANNEL isn't necessarily just for postcopy, but for any
> "emergency" QMP access.  If you use it only for commands that cannot
> hang (i.e. terminate in bounded time), then you'll always be able to get
> commands accepted there in bounded time.
> 
> > I think this is a really horrible thing to expose to management applications.
> > They should not need to be aware of fact that QEMU is buggy and thus requires
> > that certain commands be run on different monitors to work around the bug.
> 
> These are (serious) design limitations, not bugs in the narrow sense of
> the word.
> 
> However, I quite agree that the need for clients to know whether a
> monitor command can hang is impractical for the general case.  What
> might be practical is a QMP monitor mode that accepts only known
> hang-free commands.  Hang-free could be introspectable.
> 
> In case you consider that ugly: it's best to explore the design space
> first, and recoil from "ugly" second.

Actually you slightly mis-interpreted me there. I think it is ok for
applications to have knowledge about whether a particular command
may hang or not. Given that knowledge it should *not*, however, require
that the application issue such commands on separate monitor channels.
It is entirely possible to handle hang-free commands on the existing
channel.

> > I'd much prefer to see the problem described handled transparently inside
> > QEMU. One approach is have a dedicated thread in QEMU responsible for all
> > monitor I/O. This thread should never actually execute monitor commands
> > though, it would simply parse the command request and put data onto a queue
> > of pending commands, thus it could never hang. The command queue could be
> > processed by the main thread, or by another thread that is interested.
> > eg the migration thread could process any queued commands related to
> > migration directly.
> 
> The monitor itself can't hang then, but the thread(s) dequeuing parsed
> commands can.

If certain commands are hang-free then you can have a dedicated thread
that only de-queues & processes the hang-free commands. The approach I
outlined is exactly how libvirt deals with its own RPC dispatch. We have
certain commands that are guaranteed to not hang, which are processed by
a dedicated pool of threads. So even if all normal RPC commands have
hung, you can still run a subset of hang-free RPC commands.

> 
> To maintain commands' synchronous semantics, their replies need to be
> sent in order, which of course reintroduces the hangs.

The requirement for such ordering is just an arbitrary restriction that
QEMU currently imposes. It is reasonable to allow arbitrary ordering of
responses (which is what libvirt does in its RPC layer). Admittedly at
this stage though, we would likely require some "opt in" handshake when
initializing QMP for the app to tell QEMU it can cope with out of order
replies. It would require that each command request has a unique serial
number, which is included in the associated reply, so apps can match
them up. We used to have that but iirc it was then removed.

There's other ways to deal with this, such as the job starting idea you
mention below.

The key point though is that I don't think creating multiple monitor
servers is a desirable approach - it is just a hack to avoid dealing
with the root cause problems. 

> Let's take a step back from the implementation, and talk about
> *behavior* instead.
> 
> You prefer to have "the problem described handled transparently inside
> QEMU".  I read that as "QEMU must ensure the QMP monitor is available at
> all times".  "Available" means it accepts commands in bounded time.
> Some commands will always finish in bounded time once accepted, others
> may not, and whether they do may depend on the commands currently in
> flight.
> 
> Commands that can always start and always terminate in bounded time are
> no problem.
> 
> All the other commands have to become "job-starting": the QMP command
> kicks off a "job", which runs concurrently with the QMP monitor for some
> (possibly unbounded) time, then finishes.  Jobs can be examined (say to
> monitor progress, if the job supports that) and controlled (say to
> cancel, if the job supports that).
> 
> A few commands are already job-starting: migrate, the block job family,
> dump-guest-memory with detach=true.  Whether they're already hang-free I
> can't say; they could do risky work in their synchronous part.
> 
> Many commands that can hang are not job-starting.
> 
> Changing a command from "do the job" to "merely start the job" is a
> compatibility break.
> 
> We could make the change opt-in to preserve compatibility.  But is
> preserving a compatible QMP monitor that is prone to hang wortwhile?
> 
> If no, we may choose to use the resulting compatibility break to also
> switch the packaging of jobs from the current "synchronous command +
> broadcast message when done" to some variation of asynchronous command.
> But that should be discussed in a separate thread, and only after we
> know how we plan to ensure monitor availability.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2017-08-30 10:13 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23  6:51 [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 1/8] monitor: move skip_flush into monitor_data_init Peter Xu
2017-08-23 16:31   ` Dr. David Alan Gilbert
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll Peter Xu
2017-08-23 17:35   ` Dr. David Alan Gilbert
2017-08-25  4:25     ` Peter Xu
2017-08-25  9:30       ` Dr. David Alan Gilbert
2017-08-28  5:53         ` Peter Xu
2017-09-08 17:29           ` Dr. David Alan Gilbert
2017-08-25 15:27   ` Marc-André Lureau
2017-08-25 15:33     ` Dr. David Alan Gilbert
2017-08-25 16:07       ` Marc-André Lureau
2017-08-25 16:12         ` Dr. David Alan Gilbert
2017-08-25 16:21           ` Marc-André Lureau
2017-08-25 16:29             ` Dr. David Alan Gilbert
2017-08-26  8:33               ` Marc-André Lureau
2017-08-28  3:05         ` Peter Xu
2017-08-28 10:11           ` Marc-André Lureau
2017-08-28 12:48             ` Peter Xu
2017-09-05 18:58               ` Dr. David Alan Gilbert
2017-08-28 11:08         ` Markus Armbruster
2017-08-28 12:28           ` Marc-André Lureau
2017-08-28 16:24             ` Markus Armbruster
2017-08-28 17:24               ` Marc-André Lureau
2017-08-29  6:27                 ` Markus Armbruster
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 3/8] char-io: fix possible risk on IOWatchPoll Peter Xu
2017-08-25 14:44   ` Marc-André Lureau
2017-08-26  7:19   ` Fam Zheng
2017-08-28  5:56     ` Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 4/8] QAPI: new QMP command option "without-bql" Peter Xu
2017-08-23 17:44   ` Dr. David Alan Gilbert
2017-08-23 23:37     ` Fam Zheng
2017-08-25  5:37       ` Peter Xu
2017-08-25  9:14         ` Dr. David Alan Gilbert
2017-08-28  8:08           ` Peter Xu
2017-09-08 17:38             ` Dr. David Alan Gilbert
2017-08-25  5:35     ` Peter Xu
2017-08-25  9:06       ` Dr. David Alan Gilbert
2017-08-28  8:26         ` Peter Xu
2017-09-08 17:52           ` Dr. David Alan Gilbert
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 5/8] hmp: support "without_bql" Peter Xu
2017-08-23 17:46   ` Dr. David Alan Gilbert
2017-08-25  5:44     ` Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 6/8] migration: qmp: migrate_incoming don't need BQL Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 7/8] migration: hmp: " Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 8/8] migration: add incoming mgmt lock Peter Xu
2017-08-23 18:01   ` Dr. David Alan Gilbert
2017-08-25  5:49     ` Peter Xu
2017-08-25  9:34       ` Dr. David Alan Gilbert
2017-08-28  8:39         ` Peter Xu
2017-08-29 11:03 ` [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread Daniel P. Berrange
2017-08-30  7:06   ` Markus Armbruster
2017-08-30 10:13     ` Daniel P. Berrange [this message]
2017-08-31  3:31       ` Peter Xu
2017-08-31  9:14         ` Daniel P. Berrange
2017-09-06  9:48   ` Dr. David Alan Gilbert
2017-09-06 10:46     ` Daniel P. Berrange
2017-09-06 10:48       ` Dr. David Alan Gilbert
2017-09-06 10:54         ` Daniel P. Berrange
2017-09-06 10:57           ` Dr. David Alan Gilbert
2017-09-06 11:06             ` Daniel P. Berrange
2017-09-06 11:31               ` Dr. David Alan Gilbert
2017-09-06 11:54                 ` Daniel P. Berrange
2017-09-07  8:13                   ` Peter Xu
2017-09-07  8:49                     ` Stefan Hajnoczi
2017-09-07  9:18                       ` Dr. David Alan Gilbert
2017-09-07 10:19                         ` Stefan Hajnoczi
2017-09-07 10:24                         ` Peter Xu
2017-09-07  8:55                     ` Daniel P. Berrange
2017-09-07  9:19                       ` Dr. David Alan Gilbert
2017-09-07  9:22                         ` Daniel P. Berrange
2017-09-07  9:27                           ` Dr. David Alan Gilbert
2017-09-07 11:19                         ` Markus Armbruster
2017-09-07 11:31                           ` Dr. David Alan Gilbert
2017-09-07  9:15                     ` Dr. David Alan Gilbert
2017-09-07  9:25                       ` Daniel P. Berrange
2017-09-07 12:59                     ` Markus Armbruster
2017-09-07 13:22                       ` Daniel P. Berrange
2017-09-07 17:41                         ` Markus Armbruster
2017-09-07 18:09                           ` Dr. David Alan Gilbert
2017-09-08  8:41                             ` Markus Armbruster
2017-09-08  9:32                               ` Dr. David Alan Gilbert
2017-09-08 11:49                                 ` Markus Armbruster
2017-09-08 13:19                                   ` Stefan Hajnoczi
2017-09-11 10:32                                   ` Peter Xu
2017-09-11 10:36                                     ` Peter Xu
2017-09-11 10:43                                   ` Daniel P. Berrange
2017-09-08  9:27                           ` Daniel P. Berrange
2017-09-07 14:20                       ` Dr. David Alan Gilbert
2017-09-07 17:41                         ` Markus Armbruster
2017-09-07 18:04                           ` Dr. David Alan Gilbert
2017-09-07 10:04                   ` Dr. David Alan Gilbert
2017-09-07 10:08                     ` Daniel P. Berrange
2017-09-07 13:59                 ` Eric Blake
2017-09-06 14:50 ` Stefan Hajnoczi
2017-09-06 15:14   ` Dr. David Alan Gilbert
2017-09-07  7:38     ` Peter Xu
2017-09-07  8:58     ` Stefan Hajnoczi
2017-09-07  9:35       ` Dr. David Alan Gilbert
2017-09-07 10:09         ` Stefan Hajnoczi
2017-09-07 12:02           ` Peter Xu
2017-09-07 16:53             ` Stefan Hajnoczi
2017-09-07 17:14               ` Dr. David Alan Gilbert
2017-09-07 17:35                 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170830101311.GB18526@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.