All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed
Date: Fri, 19 May 2017 13:51:43 +0100	[thread overview]
Message-ID: <20170519125139.GM2081@work-vm> (raw)
In-Reply-To: <20170519082538.GD4912@redhat.com>

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.

The thing is it didn't used to be able to; prior to your conversion to
channel, postcopy would reject being started with exec: because it
couldn't open a return path, so it was safe.

> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
return path; but it does depend on how the exec: behaves on the
destination.
If the destination discards data written to it, then I think the
behaviour would be:
   a) Page requests would just get dropped, they'd eventually get
fulfilled by the background page transmissions, but that could mean that
a page request would wait for minutes for the page.
   b) The qemu main thread on the destination can be blocked by that, so
the monitor might not respond until the page request is fulfilled.
   c) I'm not quite sure what would happen to the source return-path
thread

The behaviour seems to have changed between 2.9.0 (f26 package) and my
reasonably recent head build.

2.9.0 gives me:
(qemu) migrate_set_speed 1B
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d "exec:cat > out"
RP: Received invalid message 0x0000 length 0x0000
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: failed
total time: 0 milliseconds

So that's the return path thread trying to read from the exec: not
getting anything and failing.

On head-ish it doesn't fail, the source qemu doesn't hang, however
the migration never completes - possibly because it's waiting for
the MIG_RP_MSG_SHUT from the destination.
A migration_cancel also doesn't work for 'exec' because it doesn't
support shutdown() - it just sticks in 'cancelling'.
On a socket that was broken like this the cancel would work because
it issues a shutdown() which causes the socket to cleanup.

Personally I'd rather fix this by still not supporting exec:,
making shutdown() work on exec (by kill'ing the child process)
means at least cancel would work, but it still wouldn't be pretty
for a postcopy, and still doesn't help Peter solve this problem
which is a nasty problem QEMU has had for ages.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  parent reply	other threads:[~2017-05-19 12:52 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-19  6:43 [Qemu-devel] [PATCH RFC 0/6] migration: enable return-path for precopy Peter Xu
2017-05-19  6:43 ` [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed Peter Xu
2017-05-19  8:25   ` Daniel P. Berrange
2017-05-19  8:30     ` Daniel P. Berrange
2017-05-19  9:55       ` Peter Xu
2017-05-19 12:54       ` Dr. David Alan Gilbert
2017-05-19  9:51     ` Peter Xu
2017-05-19 10:03       ` Daniel P. Berrange
2017-05-19 12:51     ` Dr. David Alan Gilbert [this message]
2017-05-19 12:56       ` Daniel P. Berrange
2017-05-19 13:02         ` Dr. David Alan Gilbert
2017-05-19 13:13           ` Daniel P. Berrange
2017-05-19 14:33             ` Dr. David Alan Gilbert
2017-05-19 14:51               ` Daniel P. Berrange
2017-05-19 18:41                 ` Dr. David Alan Gilbert
2017-05-22  8:26                   ` Daniel P. Berrange
2017-05-19  6:43 ` [Qemu-devel] [PATCH RFC 2/6] migration: isolate return path on src Peter Xu
2017-05-30 13:31   ` Juan Quintela
2017-05-19  6:43 ` [Qemu-devel] [PATCH RFC 3/6] migration: fix leak of src file on dst Peter Xu
2017-05-30 15:49   ` Juan Quintela
2017-05-31  9:51   ` Juan Quintela
2017-05-19  6:43 ` [Qemu-devel] [PATCH RFC 4/6] migration: shut src return path unconditionally Peter Xu
2017-05-19 19:28   ` Dr. David Alan Gilbert
2017-05-30 15:50   ` Juan Quintela
2017-05-31  7:31     ` Peter Xu
2017-05-31  7:36       ` Juan Quintela
2017-05-30 15:59   ` Juan Quintela
2017-06-05 20:22   ` Eric Blake
2017-06-06  2:00     ` Peter Xu
2017-05-19  6:43 ` [Qemu-devel] [PATCH RFC 5/6] migration: let MigrationState be an QObject Peter Xu
2017-05-30 15:57   ` Juan Quintela
2017-05-31  7:33     ` Peter Xu
2017-05-19  6:43 ` [Qemu-devel] [PATCH RFC 6/6] migration: enable return path for precopy Peter Xu
2017-05-30 15:59   ` Juan Quintela
2017-05-31  7:38     ` Peter Xu
2017-05-31  7:43       ` Juan Quintela
2017-05-31  8:04         ` Peter Xu
2017-05-31  8:12           ` Juan Quintela
2017-05-19  6:48 ` [Qemu-devel] [PATCH RFC 0/6] migration: enable return-path " Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170519125139.GM2081@work-vm \
    --to=dgilbert@redhat.com \
    --cc=berrange@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.