All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alexey <a.perevalov@samsung.com>
Cc: qemu-devel@nongnu.org, marcandre.lureau@redhat.com,
	maxime.coquelin@redhat.com, mst@redhat.com, quintela@redhat.com,
	peterx@redhat.com, lvivier@redhat.com, aarcange@redhat.com
Subject: Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
Date: Mon, 3 Jul 2017 17:49:26 +0100	[thread overview]
Message-ID: <20170703164925.GC2206@work-vm> (raw)
In-Reply-To: <20170703135135.GA4557@aperevalov-ubuntu>

* Alexey (a.perevalov@samsung.com) wrote:
> 
> Hello, David!
> 
> Thank for you patch set.
> 
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> userfault fd should be only one per all affected processes. But
> why are you opening userfaultfd on client side, why not to pass
> userfault fd which was opened at QEMU side?

I just checked with Andrea on the semantics, and ufd don't work like that.
Any given userfaultfd only works on the address space of the process
that opened it; so if you want a process to block on it's memory space
it's the one that has to open the ufd.
(I don't think I knew that when I wrote the code!)
The nice thing about that is that you never get too confused about
address spaces - any one ufd always has one address space in it's ioctls
associated with one process.

> I guess, it could
> be several virtual switches with different ports (it's exotic
> configuration, but configuration when we have one QEMU, one vswitchd,
> and serveral vhost-user ports is typical), and as example,
> QEMU could be connected to these vswitches through these ports.
> In this case you will obtain 2 different userfault fd in QEMU.
> In case of one QEMU, one vswitchd and several vhost-user ports,
> you are keeping userfaultfd in VuDev structure on client side,
> looks like it's virtion_net sibling from DPDK, and that structure
> is per vhost-user connection (per one port).

Multiple switches make sense to me actually; running two switches
and having redundant routes in each VM let you live update the switch
process one at a time.

> So from my point of view it's better to open fd on QEMU side, and pass it
> the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.

Yes I see where you're coming from; but it's one address space per-ufd;
If you had one ufd then you'd have to change the messages to be
  'pid ... is waiting on address ....'
and all the ioctls for doing wakes etc would have to gain a PID.

> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> Not so clear for me why QEMU have to inform vhost client,
> due to single userfault fd, and kernel should wake up another faulted
> thread/processes.
> In my approach I just to send information about copied/received page
> to vhot client, to be able to enable previously disabled VRING.

The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl
on the ufd that tells the kernel it can unblock a process that's
trying to access the page.
(Their is potential to remove some of that - if we can get the
kernel to wake all the waiters for a physical page when a UFFDIO_COPY
is done it would remove a lot of those).

> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
> some part or pages already arrived and copied, so I'm doing
> hole here according to received map.

right, so I'm assuming they'll hit ufd faults and be immediately
WAKEd when I find the bit is set in the received-bitmap.

> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> > 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> That's why I decided to disable VRINGs, but not the way as you did
> in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
> here could be synchronization problem, maybe similar problem as you described in
> "vhost+postcopy: Lock around set_mem_table"
> 
> Unfortunately, my patches isn't yet ready.

That's OK; these patches just-about work; only enough for
me to post them and ask for opinions.

Dave

> > 
> >    There's a nasty hack of a lock around the set_mem_table message.
> > 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> > 
> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
> > 
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-07-03 16:49 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
2017-07-10  9:28   ` Peter Xu
2017-07-12 16:48     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
2017-07-10 10:03   ` Peter Xu
2017-08-24 16:59     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
2017-07-03 17:44   ` Michael S. Tsirkin
2017-08-14 17:27     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
2017-07-10 10:19   ` Peter Xu
2017-07-12 16:54     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
2017-07-10 10:31   ` Peter Xu
2017-07-12 17:14     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
2017-07-24 12:10   ` Maxime Coquelin
2017-07-26 17:12     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
2017-07-24 14:36   ` Maxime Coquelin
2017-07-26 17:42     ` Dr. David Alan Gilbert
2017-07-26 18:03       ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
2017-07-24 15:22   ` Maxime Coquelin
2017-07-24 17:50     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
2017-07-24 17:31   ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
2017-07-11  3:31   ` Peter Xu
2017-07-14 17:15     ` Dr. David Alan Gilbert
2017-07-17  2:59       ` Peter Xu
2017-08-17 17:29         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
2017-07-11  4:22   ` Peter Xu
2017-07-12 15:00     ` Andrea Arcangeli
2017-07-14  2:45       ` Peter Xu
2017-07-14 14:18       ` Michael S. Tsirkin
2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
2017-08-07  4:49   ` Alexey Perevalov
2017-08-08 17:06     ` Dr. David Alan Gilbert
2017-08-09 11:02       ` Alexey Perevalov
2017-08-10  8:55         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
2017-07-04 19:34   ` Maxime Coquelin
2017-07-07 11:53     ` Dr. David Alan Gilbert
2017-07-07 12:52       ` Maxime Coquelin
2017-10-03 13:23       ` Dr. David Alan Gilbert
2017-10-06 12:22         ` Maxime Coquelin
2017-10-09 12:12           ` Dr. David Alan Gilbert
2017-10-12  7:22             ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
2017-07-04 19:38   ` Maxime Coquelin
2017-07-04 21:59   ` Michael S. Tsirkin
2017-07-05 17:16     ` Dr. David Alan Gilbert
2017-07-05 23:28       ` Michael S. Tsirkin
2017-08-18 19:19     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
2017-07-27 11:35   ` Maxime Coquelin
2017-08-24 14:53     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
2017-07-04 14:09   ` Maxime Coquelin
2017-07-07 11:39     ` Dr. David Alan Gilbert
2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
2017-07-03 11:03   ` Marc-André Lureau
2017-07-03 11:48     ` Dr. David Alan Gilbert
2017-07-07 10:51     ` Dr. David Alan Gilbert
     [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
2017-07-03 13:58   ` Alexey
2017-07-03 16:49     ` Dr. David Alan Gilbert [this message]
2017-07-03 17:42       ` Alexey
2017-07-03 17:55 ` Michael S. Tsirkin
2017-07-07 12:01   ` Dr. David Alan Gilbert
2017-07-07 15:35     ` Michael S. Tsirkin
2017-07-07 17:26       ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170703164925.GC2206@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.