All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: "Peter Xu" <peterx@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Fam Zheng" <famz@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"John Snow" <jsnow@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Wed, 30 Aug 2017 09:06:20 +0200	[thread overview]
Message-ID: <877exllekz.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <20170829110357.GG3783@redhat.com> (Daniel P. Berrange's message of "Tue, 29 Aug 2017 12:03:57 +0100")

"Daniel P. Berrange" <berrange@redhat.com> writes:

> On Wed, Aug 23, 2017 at 02:51:03PM +0800, Peter Xu wrote:
>> v2:
>> - fixed "make check" error that patchew reported
>> - moved the thread_join upper in monitor_data_destroy(), before
>>   resources are released
>> - added one new patch (current patch 3) that fixes a nasty risk
>>   condition with IOWatchPoll.  Please see commit message for more
>>   information.
>> - added a g_main_context_wakeup() to make sure the separate loop
>>   thread can be kicked always when we want to destroy the per-monitor
>>   threads.
>> - added one new patch (current patch 8) to introduce migration mgmt
>>   lock for migrate_incoming.
>> 
>> This is an extended work for migration postcopy recovery. This series
>> is tested with the following series to make sure it solves the monitor
>> hang problem that we have encountered for postcopy recovery:
>> 
>>   [RFC 00/29] Migration: postcopy failure recovery
>>   [RFC 0/6] migration: re-use migrate_incoming for postcopy recovery
>> 
>> The root problem is that, monitor commands are all handled in main
>> loop thread now, no matter how many monitors we specify. And, if main
>> loop thread hangs due to some reason, all monitors will be stuck.
>> This can be done in reversed order as well: if any of the monitor
>> hangs, it will hang the main loop, and the rest of the monitors (if
>> there is any).

Yes.

>> That affects postcopy recovery, since the recovery requires user input
>> on destination side.  If monitors hang, the destination VM dies and
>> lose hope for even a final recovery.
>> 
>> So, sometimes we need to make sure the monitor be alive, at least one
>> of them.
>> 
>> The whole idea of this series is that instead if handling monitor
>> commands all in main loop thread, we do it separately in per-monitor
>> threads.  Then, even if main loop thread hangs at any point by any
>> reason, per-monitor thread can still survive.

This takes care of "monitor hangs because other parts of the main loop
(including other monitors) hang".  It doesn't take care of "monitor
hangs because the current monitor command hangs".

>>                                                Further, we add hint in
>> QMP/HMP to show whether a command can be executed without QMP, if so,
>> we avoid taking BQL when running that command.  It greatly reduced
>> contention of BQL.  Now the only user of that new parameter (currently
>> I call it "without-bql") is "migrate-incoming" command, which is the
>> only command to rescue a paused postcopy migration.

This takes care of one way commands can hang.  There are other ways;
NFS server going AWOL is a classic.  I don't know whether any other way
applies to migrate-incoming.

>> However, even with the series, it does not mean that per-monitor
>> threads will never hang.  One example is that we can still run "info
>> vcpus" in per-monitor threads during a paused postcopy (in that state,
>> page faults are never handled, and "info cpus" will never return since
>> it tries to sync every vcpus).  So to make sure it does not hang, we
>> not only need the per-monitor thread, the user should be careful as
>> well on how to use it.
>> 
>> For postcopy recovery, we may need dedicated monitor channel for
>> recovery.  In other words, a destination VM that supports postcopy
>> recovery would possibly need:
>> 
>>   -qmp MAIN_CHANNEL -qmp RECOVERY_CHANNEL

Where RECOVERY_CHANNEL isn't necessarily just for postcopy, but for any
"emergency" QMP access.  If you use it only for commands that cannot
hang (i.e. terminate in bounded time), then you'll always be able to get
commands accepted there in bounded time.

> I think this is a really horrible thing to expose to management applications.
> They should not need to be aware of fact that QEMU is buggy and thus requires
> that certain commands be run on different monitors to work around the bug.

These are (serious) design limitations, not bugs in the narrow sense of
the word.

However, I quite agree that the need for clients to know whether a
monitor command can hang is impractical for the general case.  What
might be practical is a QMP monitor mode that accepts only known
hang-free commands.  Hang-free could be introspectable.

In case you consider that ugly: it's best to explore the design space
first, and recoil from "ugly" second.

> I'd much prefer to see the problem described handled transparently inside
> QEMU. One approach is have a dedicated thread in QEMU responsible for all
> monitor I/O. This thread should never actually execute monitor commands
> though, it would simply parse the command request and put data onto a queue
> of pending commands, thus it could never hang. The command queue could be
> processed by the main thread, or by another thread that is interested.
> eg the migration thread could process any queued commands related to
> migration directly.

The monitor itself can't hang then, but the thread(s) dequeuing parsed
commands can.

To maintain commands' synchronous semantics, their replies need to be
sent in order, which of course reintroduces the hangs.

Let's take a step back from the implementation, and talk about
*behavior* instead.

You prefer to have "the problem described handled transparently inside
QEMU".  I read that as "QEMU must ensure the QMP monitor is available at
all times".  "Available" means it accepts commands in bounded time.
Some commands will always finish in bounded time once accepted, others
may not, and whether they do may depend on the commands currently in
flight.

Commands that can always start and always terminate in bounded time are
no problem.

All the other commands have to become "job-starting": the QMP command
kicks off a "job", which runs concurrently with the QMP monitor for some
(possibly unbounded) time, then finishes.  Jobs can be examined (say to
monitor progress, if the job supports that) and controlled (say to
cancel, if the job supports that).

A few commands are already job-starting: migrate, the block job family,
dump-guest-memory with detach=true.  Whether they're already hang-free I
can't say; they could do risky work in their synchronous part.

Many commands that can hang are not job-starting.

Changing a command from "do the job" to "merely start the job" is a
compatibility break.

We could make the change opt-in to preserve compatibility.  But is
preserving a compatible QMP monitor that is prone to hang wortwhile?

If no, we may choose to use the resulting compatibility break to also
switch the packaging of jobs from the current "synchronous command +
broadcast message when done" to some variation of asynchronous command.
But that should be discussed in a separate thread, and only after we
know how we plan to ensure monitor availability.

  reply	other threads:[~2017-08-30  7:06 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23  6:51 [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 1/8] monitor: move skip_flush into monitor_data_init Peter Xu
2017-08-23 16:31   ` Dr. David Alan Gilbert
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll Peter Xu
2017-08-23 17:35   ` Dr. David Alan Gilbert
2017-08-25  4:25     ` Peter Xu
2017-08-25  9:30       ` Dr. David Alan Gilbert
2017-08-28  5:53         ` Peter Xu
2017-09-08 17:29           ` Dr. David Alan Gilbert
2017-08-25 15:27   ` Marc-André Lureau
2017-08-25 15:33     ` Dr. David Alan Gilbert
2017-08-25 16:07       ` Marc-André Lureau
2017-08-25 16:12         ` Dr. David Alan Gilbert
2017-08-25 16:21           ` Marc-André Lureau
2017-08-25 16:29             ` Dr. David Alan Gilbert
2017-08-26  8:33               ` Marc-André Lureau
2017-08-28  3:05         ` Peter Xu
2017-08-28 10:11           ` Marc-André Lureau
2017-08-28 12:48             ` Peter Xu
2017-09-05 18:58               ` Dr. David Alan Gilbert
2017-08-28 11:08         ` Markus Armbruster
2017-08-28 12:28           ` Marc-André Lureau
2017-08-28 16:24             ` Markus Armbruster
2017-08-28 17:24               ` Marc-André Lureau
2017-08-29  6:27                 ` Markus Armbruster
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 3/8] char-io: fix possible risk on IOWatchPoll Peter Xu
2017-08-25 14:44   ` Marc-André Lureau
2017-08-26  7:19   ` Fam Zheng
2017-08-28  5:56     ` Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 4/8] QAPI: new QMP command option "without-bql" Peter Xu
2017-08-23 17:44   ` Dr. David Alan Gilbert
2017-08-23 23:37     ` Fam Zheng
2017-08-25  5:37       ` Peter Xu
2017-08-25  9:14         ` Dr. David Alan Gilbert
2017-08-28  8:08           ` Peter Xu
2017-09-08 17:38             ` Dr. David Alan Gilbert
2017-08-25  5:35     ` Peter Xu
2017-08-25  9:06       ` Dr. David Alan Gilbert
2017-08-28  8:26         ` Peter Xu
2017-09-08 17:52           ` Dr. David Alan Gilbert
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 5/8] hmp: support "without_bql" Peter Xu
2017-08-23 17:46   ` Dr. David Alan Gilbert
2017-08-25  5:44     ` Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 6/8] migration: qmp: migrate_incoming don't need BQL Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 7/8] migration: hmp: " Peter Xu
2017-08-23  6:51 ` [Qemu-devel] [RFC v2 8/8] migration: add incoming mgmt lock Peter Xu
2017-08-23 18:01   ` Dr. David Alan Gilbert
2017-08-25  5:49     ` Peter Xu
2017-08-25  9:34       ` Dr. David Alan Gilbert
2017-08-28  8:39         ` Peter Xu
2017-08-29 11:03 ` [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread Daniel P. Berrange
2017-08-30  7:06   ` Markus Armbruster [this message]
2017-08-30 10:13     ` Daniel P. Berrange
2017-08-31  3:31       ` Peter Xu
2017-08-31  9:14         ` Daniel P. Berrange
2017-09-06  9:48   ` Dr. David Alan Gilbert
2017-09-06 10:46     ` Daniel P. Berrange
2017-09-06 10:48       ` Dr. David Alan Gilbert
2017-09-06 10:54         ` Daniel P. Berrange
2017-09-06 10:57           ` Dr. David Alan Gilbert
2017-09-06 11:06             ` Daniel P. Berrange
2017-09-06 11:31               ` Dr. David Alan Gilbert
2017-09-06 11:54                 ` Daniel P. Berrange
2017-09-07  8:13                   ` Peter Xu
2017-09-07  8:49                     ` Stefan Hajnoczi
2017-09-07  9:18                       ` Dr. David Alan Gilbert
2017-09-07 10:19                         ` Stefan Hajnoczi
2017-09-07 10:24                         ` Peter Xu
2017-09-07  8:55                     ` Daniel P. Berrange
2017-09-07  9:19                       ` Dr. David Alan Gilbert
2017-09-07  9:22                         ` Daniel P. Berrange
2017-09-07  9:27                           ` Dr. David Alan Gilbert
2017-09-07 11:19                         ` Markus Armbruster
2017-09-07 11:31                           ` Dr. David Alan Gilbert
2017-09-07  9:15                     ` Dr. David Alan Gilbert
2017-09-07  9:25                       ` Daniel P. Berrange
2017-09-07 12:59                     ` Markus Armbruster
2017-09-07 13:22                       ` Daniel P. Berrange
2017-09-07 17:41                         ` Markus Armbruster
2017-09-07 18:09                           ` Dr. David Alan Gilbert
2017-09-08  8:41                             ` Markus Armbruster
2017-09-08  9:32                               ` Dr. David Alan Gilbert
2017-09-08 11:49                                 ` Markus Armbruster
2017-09-08 13:19                                   ` Stefan Hajnoczi
2017-09-11 10:32                                   ` Peter Xu
2017-09-11 10:36                                     ` Peter Xu
2017-09-11 10:43                                   ` Daniel P. Berrange
2017-09-08  9:27                           ` Daniel P. Berrange
2017-09-07 14:20                       ` Dr. David Alan Gilbert
2017-09-07 17:41                         ` Markus Armbruster
2017-09-07 18:04                           ` Dr. David Alan Gilbert
2017-09-07 10:04                   ` Dr. David Alan Gilbert
2017-09-07 10:08                     ` Daniel P. Berrange
2017-09-07 13:59                 ` Eric Blake
2017-09-06 14:50 ` Stefan Hajnoczi
2017-09-06 15:14   ` Dr. David Alan Gilbert
2017-09-07  7:38     ` Peter Xu
2017-09-07  8:58     ` Stefan Hajnoczi
2017-09-07  9:35       ` Dr. David Alan Gilbert
2017-09-07 10:09         ` Stefan Hajnoczi
2017-09-07 12:02           ` Peter Xu
2017-09-07 16:53             ` Stefan Hajnoczi
2017-09-07 17:14               ` Dr. David Alan Gilbert
2017-09-07 17:35                 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877exllekz.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.