All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, a.perevalov@samsung.com,
	marcandre.lureau@redhat.com, maxime.coquelin@redhat.com,
	quintela@redhat.com, peterx@redhat.com, lvivier@redhat.com,
	aarcange@redhat.com
Subject: Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
Date: Fri, 7 Jul 2017 13:01:56 +0100	[thread overview]
Message-ID: <20170707120155.GE2451@work-vm> (raw)
In-Reply-To: <20170703205127-mutt-send-email-mst@kernel.org>

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> > 
> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> > 
> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> 
> Not urgent to fix.
> 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> 
> And it can happen under a lock quite easily.
> What exactly is proposed here?
> Maybe we want to reuse the new channel that the IOMMU uses.

There's no fundamental reason to get deadlocks as long as you
get it right; the qemu thread that processes the user-fault's
is a separate independent thread, so once it's going the client
can do whatever it likes and it will get woken up without
intervention.
Some care is needed around the postcopy-end; reception of the
message that tells you to drop the userfault enables (which
frees anything that hasn't been woken) must be allowed to happen
for the postcopy complete;  we take care that QEMUs fault
thread lives on until that message is acknowledged.

I'm more worried about how this will work in a full packet switch
when one vhost-user client for an incoming migration stalls
the whole switch unless care is taken about the design.
How do we figure out whether this is going to fly on a full stack?
That's my main reason for getting this WIP set out here to
get comments.

> >    There's a nasty hack of a lock around the set_mem_table message.
> 
> Yes.
> 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> 
> At the protocol level, I'd like to rename the feature to
> USER_PAGEFAULT. Client does not really know anything about
> copies, it's all internal to qemu.
> Spec can document that it's used by qemu for postcopy.

OK, tbh I suspect that using it for anything else would be tricky
without adding more protocol features for that other use case.

Dave

> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-07-07 12:05 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
2017-07-10  9:28   ` Peter Xu
2017-07-12 16:48     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
2017-07-10 10:03   ` Peter Xu
2017-08-24 16:59     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
2017-07-03 17:44   ` Michael S. Tsirkin
2017-08-14 17:27     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
2017-07-10 10:19   ` Peter Xu
2017-07-12 16:54     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
2017-07-10 10:31   ` Peter Xu
2017-07-12 17:14     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
2017-07-24 12:10   ` Maxime Coquelin
2017-07-26 17:12     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
2017-07-24 14:36   ` Maxime Coquelin
2017-07-26 17:42     ` Dr. David Alan Gilbert
2017-07-26 18:03       ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
2017-07-24 15:22   ` Maxime Coquelin
2017-07-24 17:50     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
2017-07-24 17:31   ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
2017-07-11  3:31   ` Peter Xu
2017-07-14 17:15     ` Dr. David Alan Gilbert
2017-07-17  2:59       ` Peter Xu
2017-08-17 17:29         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
2017-07-11  4:22   ` Peter Xu
2017-07-12 15:00     ` Andrea Arcangeli
2017-07-14  2:45       ` Peter Xu
2017-07-14 14:18       ` Michael S. Tsirkin
2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
2017-08-07  4:49   ` Alexey Perevalov
2017-08-08 17:06     ` Dr. David Alan Gilbert
2017-08-09 11:02       ` Alexey Perevalov
2017-08-10  8:55         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
2017-07-04 19:34   ` Maxime Coquelin
2017-07-07 11:53     ` Dr. David Alan Gilbert
2017-07-07 12:52       ` Maxime Coquelin
2017-10-03 13:23       ` Dr. David Alan Gilbert
2017-10-06 12:22         ` Maxime Coquelin
2017-10-09 12:12           ` Dr. David Alan Gilbert
2017-10-12  7:22             ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
2017-07-04 19:38   ` Maxime Coquelin
2017-07-04 21:59   ` Michael S. Tsirkin
2017-07-05 17:16     ` Dr. David Alan Gilbert
2017-07-05 23:28       ` Michael S. Tsirkin
2017-08-18 19:19     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
2017-07-27 11:35   ` Maxime Coquelin
2017-08-24 14:53     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
2017-07-04 14:09   ` Maxime Coquelin
2017-07-07 11:39     ` Dr. David Alan Gilbert
2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
2017-07-03 11:03   ` Marc-André Lureau
2017-07-03 11:48     ` Dr. David Alan Gilbert
2017-07-07 10:51     ` Dr. David Alan Gilbert
     [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
2017-07-03 13:58   ` Alexey
2017-07-03 16:49     ` Dr. David Alan Gilbert
2017-07-03 17:42       ` Alexey
2017-07-03 17:55 ` Michael S. Tsirkin
2017-07-07 12:01   ` Dr. David Alan Gilbert [this message]
2017-07-07 15:35     ` Michael S. Tsirkin
2017-07-07 17:26       ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170707120155.GE2451@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.