All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
@ 2017-06-28 19:00 Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
                   ` (31 more replies)
  0 siblings, 32 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is a RFC/WIP series that enables postcopy migration
with shared memory to a vhost-user process.
It's based off current-head + Juan's load_cleanup series, and
Alexey's bitmap series (v4).  It's very lightly tested and seems
to work, but it's quite rough.

I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
use the new feature, since this is about the simplest
client around.

Structure:

The basic idea is that near the start of postcopy, the client
opens its own userfaultfd fd and sends that back to QEMU over
the socket it's already using for VHUST_USER_* commands.
Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
areas with userfaultfd and sends the mapped addresses back to QEMU.

QEMU then reads the clients UFD in it's fault thread and issues
requests back to the source as needed.
QEMU also issues 'WAKE' ioctls on the UFD to let the client know
that the page has arrived and can carry on.

A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
the QEMU knows the client can talk postcopy.
Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
added to guide the process along.

Current known issues:
   I've not tested it with hugepages yet; and I suspect the madvises
   will need tweaking for it.

   The qemu gets to see the base addresses that the client has its
   regions mapped at; that's not great for security

   Take care of deadlocking; any thread in the client that
   accesses a userfault protected page can stall.

   There's a nasty hack of a lock around the set_mem_table message.

   I've not looked at the recent IOMMU code.

   Some cleanup and a lot of corner cases need thinking about.

   There are probably plenty of unknown issues as well.

Test setup:
  I'm running on one host at the moment, with the guest
  scping a large file from the host as it migrates.
  The setup is based on one I found in the vhost-user setups.
  You'll need a recent kernel for the shared memory support
  in userfaultfd, and userfault isn't that happy if a process
  using shared memory core's - so make sure you have the
  latest fixes.

SESS=vhost
ulimit -c unlimited
tmux -L $SESS new-session -d
tmux -L $SESS set-option -g history-limit 30000
# Start a router using the system qemu
tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
tmux -L $SESS set-option -g set-remain-on-exit on
# Start source vhost bridge
tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
sleep 0.5
tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
# Start dest vhost bridge
tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
1:5556 2>dst-vub-log"
sleep 0.5
tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
tmux -L $SESS send-keys -t source "migrate_set_speed 20M
tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on

then once booted:
tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
(Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)


Dave

Dr. David Alan Gilbert (29):
  RAMBlock/migration: Add migration flags
  migrate: Update ram_block_discard_range for shared
  qemu_ram_block_host_offset
  migration/ram: ramblock_recv_bitmap_test_byte_offset
  postcopy: use UFFDIO_ZEROPAGE only when available
  postcopy: Add notifier chain
  postcopy: Add vhost-user flag for postcopy and check it
  vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
  vhub: Support sending fds back to qemu
  vhub: Open userfaultfd
  postcopy: Allow registering of fd handler
  vhost+postcopy: Register shared ufd with postcopy
  vhost+postcopy: Transmit 'listen' to client
  vhost+postcopy: Register new regions with the ufd
  vhost+postcopy: Send address back to qemu
  vhost+postcopy: Stash RAMBlock and offset
  vhost+postcopy: Send requests to source for shared pages
  vhost+postcopy: Resolve client address
  postcopy: wake shared
  postcopy: postcopy_notify_shared_wake
  vhost+postcopy: Add vhost waker
  vhost+postcopy: Call wakeups
  vub+postcopy: madvises
  vhost+postcopy: Lock around set_mem_table
  vhu: enable = false on get_vring_base
  vhost: Add VHOST_USER_POSTCOPY_END message
  vhost+postcopy: Wire up POSTCOPY_END notify
  postcopy: Allow shared memory
  vhost-user: Claim support for postcopy

 contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
 contrib/libvhost-user/libvhost-user.h |   8 +
 exec.c                                |  44 +++--
 hw/virtio/trace-events                |  13 ++
 hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
 include/exec/cpu-common.h             |   3 +
 include/exec/ram_addr.h               |   2 +
 migration/migration.c                 |   3 +
 migration/migration.h                 |   8 +
 migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
 migration/postcopy-ram.h              |  69 +++++++
 migration/ram.c                       |   5 +
 migration/ram.h                       |   1 +
 migration/savevm.c                    |  13 ++
 migration/trace-events                |   6 +
 trace-events                          |   3 +
 vl.c                                  |   4 +-
 17 files changed, 926 insertions(+), 84 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-10  9:28   ` Peter Xu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a migration flags field to each RAMBlock so that the migration
code can hold a set of private flags on the RAMBlock.
Add accessors.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 10 ++++++++++
 include/exec/cpu-common.h |  2 ++
 include/exec/ram_addr.h   |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/exec.c b/exec.c
index 42ad1eaedd..69fc5c9b07 100644
--- a/exec.c
+++ b/exec.c
@@ -1741,6 +1741,16 @@ size_t qemu_ram_pagesize_largest(void)
     return largest;
 }
 
+uint32_t qemu_ram_get_migration_flags(const RAMBlock *rb)
+{
+    return rb->migration_flags;
+}
+
+void qemu_ram_set_migration_flags(RAMBlock *rb, uint32_t flags)
+{
+    rb->migration_flags = flags;
+}
+
 static int memory_try_enable_merging(void *addr, size_t len)
 {
     if (!machine_mem_merge(current_machine)) {
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 4d45a72ea9..4af179b543 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -72,6 +72,8 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
+void qemu_ram_set_migration_flags(RAMBlock *rb, uint32_t flags);
+uint32_t qemu_ram_get_migration_flags(const RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index af5bf26080..0cb6c5cb73 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -32,6 +32,8 @@ struct RAMBlock {
     ram_addr_t max_length;
     void (*resized)(const char*, uint64_t length, void *host);
     uint32_t flags;
+    /* These flags are owned by migration, initialised to 0 */
+    uint32_t migration_flags;
     /* Protected by iothread lock.  */
     char idstr[256];
     /* RCU-enabled, writes protected by the ramlist lock */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-10 10:03   ` Peter Xu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The choice of call to discard a block is getting more complicated
for other cases.   We use fallocate PUNCH_HOLE in any file cases;
it works for both hugepage and for tmpfs.
We use the DONTNEED for non-hugepage cases either where they're
anonymous or where they're private.

Care should be taken when trying other backing files.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c       | 28 ++++++++++++++++------------
 trace-events |  3 +++
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 69fc5c9b07..4e61226a16 100644
--- a/exec.c
+++ b/exec.c
@@ -3557,6 +3557,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
     }
 
     if ((start + length) <= rb->used_length) {
+        bool need_madvise, need_fallocate;
         uint8_t *host_endaddr = host_startaddr + length;
         if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
             error_report("ram_block_discard_range: Unaligned end address: %p",
@@ -3566,23 +3567,26 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 
         errno = ENOTSUP; /* If we are missing MADVISE etc */
 
-        if (rb->page_size == qemu_host_page_size) {
-#if defined(CONFIG_MADVISE)
-            /* Note: We need the madvise MADV_DONTNEED behaviour of definitely
-             * freeing the page.
-             */
-            ret = madvise(host_startaddr, length, MADV_DONTNEED);
-#endif
-        } else {
-            /* Huge page case  - unfortunately it can't do DONTNEED, but
-             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
-             * huge page file.
-             */
+        /* The logic here is messy;
+         *    madvise DONTNEED fails for hugepages
+         *    fallocate works on hugepages and shmem
+         */
+        need_madvise = (rb->page_size == qemu_host_page_size) &&
+                       (rb->fd == -1 || !(rb->flags & RAM_SHARED));
+        need_fallocate = rb->fd != -1;
+        if (ret == -1 && need_fallocate) {
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
             ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                             start, length);
 #endif
         }
+        if (need_madvise && (!need_fallocate || (ret == 0))) {
+#if defined(CONFIG_MADVISE)
+            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
+#endif
+        }
+        trace_ram_block_discard_range(rb->idstr, host_startaddr,
+                                      need_madvise, need_fallocate, ret);
         if (ret) {
             ret = -errno;
             error_report("ram_block_discard_range: Failed to discard range "
diff --git a/trace-events b/trace-events
index bae63fdb1d..2c6e8d2160 100644
--- a/trace-events
+++ b/trace-events
@@ -55,6 +55,9 @@ dma_complete(void *dbs, int ret, void *cb) "dbs=%p ret=%d cb=%p"
 dma_blk_cb(void *dbs, int ret) "dbs=%p ret=%d"
 dma_map_wait(void *dbs) "dbs=%p"
 
+# exec.c
+ram_block_discard_range(const char *rbname, void *hva, bool need_madvise, bool need_fallocate, int ret) "%s@%p: madvise: %d fallocate: %d ret: %d"
+
 # memory.c
 memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr %#"PRIx64" value %#"PRIx64" size %u"
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr %#"PRIx64" value %#"PRIx64" size %u"
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-03 17:44   ` Michael S. Tsirkin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Utility to give the offset of a host pointer within a RAMBlock
(assuming we already know it's in that RAMBlock)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 6 ++++++
 include/exec/cpu-common.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/exec.c b/exec.c
index 4e61226a16..a1499b9bee 100644
--- a/exec.c
+++ b/exec.c
@@ -2218,6 +2218,12 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
     return ramblock_ptr(block, addr);
 }
 
+/* Return the offset of a hostpointer within a ramblock */
+ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host)
+{
+    return (uint8_t *)host - (uint8_t *)rb->host;
+}
+
 /*
  * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
  * in that RAMBlock.
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 4af179b543..fa1ec22d66 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -66,6 +66,7 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr);
 RAMBlock *qemu_ram_block_by_name(const char *name);
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
                                    ram_addr_t *offset);
+ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host);
 void qemu_ram_set_idstr(RAMBlock *block, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(RAMBlock *block);
 const char *qemu_ram_get_idstr(RAMBlock *rb);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Utility for testing the map when you already know the offset
in the RAMBlock.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 5 +++++
 migration/ram.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 3daa69b2c3..6dcb9e8b43 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -165,6 +165,11 @@ int ramblock_recv_bitmap_test(void *host_addr, RAMBlock *rb)
                     rb->receivedmap);
 }
 
+bool ramblock_recv_bitmap_test_byte_offset(uint64_t byte_offset, RAMBlock *rb)
+{
+    return test_bit(byte_offset >> TARGET_PAGE_BITS, rb->receivedmap);
+}
+
 void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb)
 {
     set_bit_atomic(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
diff --git a/migration/ram.h b/migration/ram.h
index 98d68df03d..0fcf85d904 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -55,6 +55,7 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
 void ramblock_recv_map_init(void);
 int ramblock_recv_bitmap_test(void *host_addr, RAMBlock *rb);
+bool ramblock_recv_bitmap_test_byte_offset(uint64_t byte_offset, RAMBlock *rb);
 void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb);
 void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb);
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-10 10:19   ` Peter Xu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use the recently added migration flag to hold whether
each RAMBlock has the UFFDIO_ZEROPAGE capability, use it
when it's available.

This allows the use of postcopy on tmpfs as well as hugepage
backed files.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.h    |  4 ++++
 migration/postcopy-ram.c | 12 +++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index d9a268a3af..d109635d08 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -20,6 +20,10 @@
 #include "exec/cpu-common.h"
 #include "qemu/coroutine_int.h"
 
+/* Migration flags to be set using qemu_ram_set_migration_flags */
+/* Postcopy can atomically zero pages in this RAMBlock */
+#define QEMU_MIGFLAG_POSTCOPY_ZERO   0x00000001
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index be2a8f8e02..96338a8070 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -408,6 +408,12 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
         error_report("%s userfault: Region doesn't support COPY", __func__);
         return -1;
     }
+    if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
+        RAMBlock *rb = qemu_ram_block_by_name(block_name);
+        qemu_ram_set_migration_flags(rb, qemu_ram_get_migration_flags(rb) |
+                                         QEMU_MIGFLAG_POSTCOPY_ZERO);
+    }
+
 
     return 0;
 }
@@ -620,11 +626,11 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
                              RAMBlock *rb)
 {
+    size_t pagesize = qemu_ram_pagesize(rb);
     trace_postcopy_place_page_zero(host);
 
-    if (qemu_ram_pagesize(rb) == getpagesize()) {
-        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(),
-                                rb)) {
+    if (qemu_ram_get_migration_flags(rb) & QEMU_MIGFLAG_POSTCOPY_ZERO) {
+        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, rb)) {
             int e = errno;
             error_report("%s: %s zero host: %p",
                          __func__, strerror(e), host);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-10 10:31   ` Peter Xu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a notifier chain for postcopy with a 'reason' flag
and an opportunity for a notifier member to return an error.

Call it when enabling postcopy.

This will initially used to enable devices to declare they're unable
to postcopy and later to notify of devices of stages within postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 41 +++++++++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.h | 26 ++++++++++++++++++++++++++
 vl.c                     |  4 +++-
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 96338a8070..64f5a8b003 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,6 +23,8 @@
 #include "savevm.h"
 #include "postcopy-ram.h"
 #include "ram.h"
+#include "qapi/error.h"
+#include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
 #include "qemu/error-report.h"
@@ -45,6 +47,38 @@ struct PostcopyDiscardState {
     unsigned int nsentcmds;
 };
 
+/* A notifier chain for postcopy
+ * The notifier should return 0 if it's OK, or a
+ * -errno on error.
+ * The notifier should expect an Error ** as it's data
+ */
+static NotifierWithReturnList postcopy_notifier_list;
+
+void postcopy_infrastructure_init(void)
+{
+    notifier_with_return_list_init(&postcopy_notifier_list);
+}
+
+void postcopy_add_notifier(NotifierWithReturn *nn)
+{
+    notifier_with_return_list_add(&postcopy_notifier_list, nn);
+}
+
+void postcopy_remove_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
+{
+    struct PostcopyNotifyData pnd;
+    pnd.reason = reason;
+    pnd.errp = errp;
+
+    return notifier_with_return_list_notify(&postcopy_notifier_list,
+                                            &pnd);
+}
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -133,6 +167,7 @@ bool postcopy_ram_supported_by_host(void)
     struct uffdio_register reg_struct;
     struct uffdio_range range_struct;
     uint64_t feature_mask;
+    Error *local_err = NULL;
 
     if (qemu_target_page_size() > pagesize) {
         error_report("Target page size bigger than host page size");
@@ -146,6 +181,12 @@ bool postcopy_ram_supported_by_host(void)
         goto out;
     }
 
+    /* Give devices a chance to object */
+    if (postcopy_notify(POSTCOPY_NOTIFY_PROBE, &local_err)) {
+        error_report_err(local_err);
+        goto out;
+    }
+
     /* Version and features check */
     if (!ufd_version_check(ufd)) {
         goto out;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 78a3591322..d688411674 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -114,4 +114,30 @@ PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
 
+/*
+ * To be called once at the start before any device initialisation
+ */
+void postcopy_infrastructure_init(void);
+
+/* Add a notifier to a list to be called when checking whether the devices
+ * can support postcopy.
+ * It's data is a *PostcopyNotifyData
+ * It should return 0 if OK, or a negative value on failure.
+ * On failure it must set the data->errp to an error.
+ *
+ */
+enum PostcopyNotifyReason {
+    POSTCOPY_NOTIFY_PROBE = 0,
+};
+
+struct PostcopyNotifyData {
+    enum PostcopyNotifyReason reason;
+    Error **errp;
+};
+
+void postcopy_add_notifier(NotifierWithReturn *nn);
+void postcopy_remove_notifier(NotifierWithReturn *n);
+/* Call the notifier list set by postcopy_add_start_notifier */
+int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
+
 #endif
diff --git a/vl.c b/vl.c
index a2bd69f4e0..b6c660a703 100644
--- a/vl.c
+++ b/vl.c
@@ -93,8 +93,9 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "hw/audio/soundhw.h"
 #include "audio/audio.h"
-#include "sysemu/cpus.h"
 #include "migration/colo.h"
+#include "migration/postcopy-ram.h"
+#include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hax.h"
 #include "qapi/qobject-input-visitor.h"
@@ -3060,6 +3061,7 @@ int main(int argc, char **argv, char **envp)
     module_call_init(MODULE_INIT_OPTS);
 
     runstate_init();
+    postcopy_infrastructure_init();
 
     if (qcrypto_init(&err) < 0) {
         error_reportf_err(err, "cannot initialize crypto: ");
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a vhost feature flag for postcopy support, and
use the postcopy notifier to check it before allowing postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.h |  1 +
 hw/virtio/vhost-user.c                | 39 ++++++++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 53ef222c0b..82a6bf4549 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -34,6 +34,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_MQ = 0,
     VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
     VHOST_USER_PROTOCOL_F_RARP = 2,
+    VHOST_USER_PROTOCOL_F_POSTCOPY = 6,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 958ee09bcb..78eaf9b022 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -17,6 +17,8 @@
 #include "sysemu/kvm.h"
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 
 #include <sys/ioctl.h>
 #include <sys/socket.h>
@@ -33,7 +35,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
     VHOST_USER_PROTOCOL_F_NET_MTU = 4,
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
-
+    VHOST_USER_PROTOCOL_F_POSTCOPY = 6,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -121,8 +123,10 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
+    struct vhost_dev *dev;
     CharBackend *chr;
     int slave_fd;
+    NotifierWithReturn postcopy_notifier;
 };
 
 static bool ioeventfd_enabled(void)
@@ -701,6 +705,33 @@ out:
     return ret;
 }
 
+static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
+                                        void *opaque)
+{
+    struct PostcopyNotifyData *pnd = opaque;
+    struct vhost_user *u = container_of(notifier, struct vhost_user,
+                                         postcopy_notifier);
+    struct vhost_dev *dev = u->dev;
+
+    switch (pnd->reason) {
+    case POSTCOPY_NOTIFY_PROBE:
+        if (!virtio_has_feature(dev->protocol_features,
+                                VHOST_USER_PROTOCOL_F_POSTCOPY)) {
+            /* TODO: Get the device name into this error somehow */
+            error_setg(pnd->errp,
+                       "vhost-user backend not capable of postcopy");
+            return -ENOENT;
+        }
+        break;
+
+    default:
+        /* We ignore notifications we don't know */
+        break;
+    }
+
+    return 0;
+}
+
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features, protocol_features;
@@ -712,6 +743,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u = g_new0(struct vhost_user, 1);
     u->chr = opaque;
     u->slave_fd = -1;
+    u->dev = dev;
     dev->opaque = u;
 
     err = vhost_user_get_features(dev, &features);
@@ -763,11 +795,15 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
                    "VHOST_USER_PROTOCOL_F_LOG_SHMFD feature.");
     }
 
+    u->postcopy_notifier.notify = vhost_user_postcopy_notifier;
+
     err = vhost_setup_slave_channel(dev);
     if (err < 0) {
         return err;
     }
 
+    postcopy_add_notifier(&u->postcopy_notifier);
+
     return 0;
 }
 
@@ -778,6 +814,7 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = dev->opaque;
+    postcopy_remove_notifier(&u->postcopy_notifier);
     if (u->slave_fd >= 0) {
         close(u->slave_fd);
         u->slave_fd = -1;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up a notifier to send a VHOST_USER_POSTCOPY_ADVISE
message on an incoming advise.

Later patches will fill in the behaviour/contents of the
message.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 14 ++++++++++
 contrib/libvhost-user/libvhost-user.h |  1 +
 hw/virtio/vhost-user.c                | 48 +++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.h              |  1 +
 migration/savevm.c                    |  6 +++++
 5 files changed, 70 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 9efb9dac0e..ca7af5453a 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -63,6 +63,7 @@ vu_request_to_string(int req)
         REQ(VHOST_USER_SET_VRING_ENABLE),
         REQ(VHOST_USER_SEND_RARP),
         REQ(VHOST_USER_INPUT_GET_CONFIG),
+        REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -744,6 +745,17 @@ vu_set_vring_enable_exec(VuDev *dev, VhostUserMsg *vmsg)
 }
 
 static bool
+vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
+{
+    /* TODO: Open ufd, pass it back in the request
+    /* TODO: Add addresses 
+     */
+    vmsg->payload.u64 = 0xcafe;
+    vmsg->size = sizeof(vmsg->payload.u64);
+    return true; /* = send a reply */
+}
+
+static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
     int do_reply = 0;
@@ -806,6 +818,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_get_queue_num_exec(dev, vmsg);
     case VHOST_USER_SET_VRING_ENABLE:
         return vu_set_vring_enable_exec(dev, vmsg);
+    case VHOST_USER_POSTCOPY_ADVISE:
+        return vu_set_postcopy_advise(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 82a6bf4549..8bb35582ea 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -63,6 +63,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_VRING_ENABLE = 18,
     VHOST_USER_SEND_RARP = 19,
     VHOST_USER_INPUT_GET_CONFIG = 20,
+    VHOST_USER_POSTCOPY_ADVISE  = 23,
     VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 78eaf9b022..ee9a1ac8a3 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -65,6 +65,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_NET_SET_MTU = 20,
     VHOST_USER_SET_SLAVE_REQ_FD = 21,
     VHOST_USER_IOTLB_MSG = 22,
+    VHOST_USER_POSTCOPY_ADVISE  = 23,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -705,6 +706,50 @@ out:
     return ret;
 }
 
+/*
+ * Called at the start of an inbound postcopy on reception of the
+ * 'advise' command.
+ */
+static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
+{
+    struct vhost_user *u = dev->opaque;
+    CharBackend *chr = u->chr;
+    int ufd;
+    VhostUserMsg msg = {
+        .request = VHOST_USER_POSTCOPY_ADVISE,
+        .flags = VHOST_USER_VERSION,
+    };
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        error_setg(errp, "Failed to send postcopy_advise to vhost");
+        return -1;
+    }
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        error_setg(errp, "Failed to get postcopy_advise reply from vhost");
+        return -1;
+    }
+
+    if (msg.request != VHOST_USER_POSTCOPY_ADVISE) {
+        error_setg(errp, "Unexpected msg type. Expected %d received %d",
+                     VHOST_USER_POSTCOPY_ADVISE, msg.request);
+        return -1;
+    }
+
+    if (msg.size != sizeof(msg.payload.u64)) {
+        error_setg(errp, "Received bad msg size.");
+        return -1;
+    }
+    ufd = qemu_chr_fe_get_msgfd(chr);
+    if (ufd < 0) {
+        error_setg(errp, "%s: Failed to get ufd", __func__);
+        return -1;
+    }
+
+    /* TODO: register ufd with userfault thread */
+    return 0;
+}
+
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
@@ -724,6 +769,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
         }
         break;
 
+    case POSTCOPY_NOTIFY_INBOUND_ADVISE:
+        return vhost_user_postcopy_advise(dev, pnd->errp);
+
     default:
         /* We ignore notifications we don't know */
         break;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index d688411674..70d4b09659 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -128,6 +128,7 @@ void postcopy_infrastructure_init(void);
  */
 enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_PROBE = 0,
+    POSTCOPY_NOTIFY_INBOUND_ADVISE,
 };
 
 struct PostcopyNotifyData {
diff --git a/migration/savevm.c b/migration/savevm.c
index a12637ef76..4977baf9b7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1365,6 +1365,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
     uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
+    Error *local_err = NULL;
 
     trace_loadvm_postcopy_handle_advise();
     if (ps != POSTCOPY_INCOMING_NONE) {
@@ -1412,6 +1413,11 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_ADVISE, &local_err)) {
+        error_report_err(local_err);
+        return -1;
+    }
+
     if (ram_postcopy_incoming_init(mis)) {
         return -1;
     }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow replies with fds (for postcopy)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index ca7af5453a..e3a32755cf 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -214,6 +214,30 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
 {
     int rc;
     uint8_t *p = (uint8_t *)vmsg;
+    char control[CMSG_SPACE(VHOST_MEMORY_MAX_NREGIONS * sizeof(int))] = { };
+    struct iovec iov = {
+        .iov_base = (char *)vmsg,
+        .iov_len = VHOST_USER_HDR_SIZE,
+    };
+    struct msghdr msg = {
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+        .msg_control = control,
+    };
+    struct cmsghdr *cmsg;
+
+    memset(control, 0, sizeof(control));
+    if (vmsg->fds) {
+        size_t fdsize = vmsg->fd_num * sizeof(int);
+        msg.msg_controllen = CMSG_SPACE(fdsize);
+        cmsg = CMSG_FIRSTHDR(&msg);
+        cmsg->cmsg_len = CMSG_LEN(fdsize);
+        cmsg->cmsg_level = SOL_SOCKET;
+        cmsg->cmsg_type = SCM_RIGHTS;
+        memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
+    } else {
+        msg.msg_controllen = 0;
+    }
 
     /* Set the version in the flags when sending the reply */
     vmsg->flags &= ~VHOST_USER_VERSION_MASK;
@@ -221,7 +245,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
     vmsg->flags |= VHOST_USER_REPLY_MASK;
 
     do {
-        rc = write(conn_fd, p, VHOST_USER_HDR_SIZE);
+        rc = sendmsg(conn_fd, &msg, 0);
     } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
     do {
@@ -314,6 +338,7 @@ vu_get_features_exec(VuDev *dev, VhostUserMsg *vmsg)
     }
 
     vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->fd_num = 0;
 
     DPRINT("Sending back to guest u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
 
@@ -455,6 +480,7 @@ vu_set_log_base_exec(VuDev *dev, VhostUserMsg *vmsg)
     dev->log_size = log_mmap_size;
 
     vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->fd_num = 0;
 
     return true;
 }
@@ -699,6 +725,7 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
 
     vmsg->payload.u64 = features;
     vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->fd_num = 0;
 
     return true;
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-24 12:10   ` Maxime Coquelin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a userfaultfd (on a postcopy_advise) and send it back in
the reply to the qemu for it to monitor.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 24 +++++++++++++++++++++---
 contrib/libvhost-user/libvhost-user.h |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index e3a32755cf..62e97f6b84 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -15,6 +15,7 @@
 
 #include <qemu/osdep.h>
 #include <sys/eventfd.h>
+#include <sys/syscall.h>
 #include <linux/vhost.h>
 
 #include "qemu/atomic.h"
@@ -774,11 +775,28 @@ vu_set_vring_enable_exec(VuDev *dev, VhostUserMsg *vmsg)
 static bool
 vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
 {
-    /* TODO: Open ufd, pass it back in the request
-    /* TODO: Add addresses 
-     */
+    struct uffdio_api api_struct;
+
+    dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    /* TODO: Add addresses */
     vmsg->payload.u64 = 0xcafe;
     vmsg->size = sizeof(vmsg->payload.u64);
+
+    if (dev->postcopy_ufd == -1) {
+        vu_panic(dev, "Userfaultfd not available: %s", strerror(errno));
+        return false;
+    }
+    api_struct.api = UFFD_API;
+    api_struct.features = 0;
+    if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
+        vu_panic(dev, "Failed UFFDIO_API: %s", strerror(errno));
+        close(dev->postcopy_ufd);
+        return false;
+    }
+    /* TODO: Stash feature flags somewhere */
+    /* Return a ufd to the QEMU */
+    vmsg->fd_num = 1;
+    vmsg->fds[0] = dev->postcopy_ufd;
     return true; /* = send a reply */
 }
 
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 8bb35582ea..3e65a962da 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -231,6 +231,9 @@ struct VuDev {
      * re-initialize */
     vu_panic_cb panic;
     const VuDevIface *iface;
+
+    /* Postcopy data */
+    int postcopy_ufd;
 };
 
 typedef struct VuVirtqElement {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow other userfaultfd's to be registered into the fault thread
so that handlers for shared memory can get responses.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c    |   3 +
 migration/migration.h    |   2 +
 migration/postcopy-ram.c | 212 +++++++++++++++++++++++++++++++++++------------
 migration/postcopy-ram.h |  21 +++++
 migration/trace-events   |   2 +
 5 files changed, 186 insertions(+), 54 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 89df832625..a88b007a8b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -134,6 +134,8 @@ MigrationIncomingState *migration_incoming_get_current(void)
     if (!once) {
         mis_current.state = MIGRATION_STATUS_NONE;
         memset(&mis_current, 0, sizeof(MigrationIncomingState));
+        mis_current.postcopy_remote_fds = g_array_new(FALSE, TRUE,
+                                                   sizeof(struct PostCopyFD));
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
         once = true;
@@ -157,6 +159,7 @@ void migration_incoming_state_destroy(void)
         qemu_fclose(mis->from_src_file);
         mis->from_src_file = NULL;
     }
+    g_array_free(mis->postcopy_remote_fds, TRUE);
 
     qemu_event_destroy(&mis->main_thread_load_event);
 }
diff --git a/migration/migration.h b/migration/migration.h
index d109635d08..1210b217a1 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -51,6 +51,8 @@ struct MigrationIncomingState {
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
     void     *postcopy_tmp_zero_page;
+    /* PostCopyFD's for external userfaultfds & handlers of shared memory */
+    GArray   *postcopy_remote_fds;
 
     QEMUBH *bh;
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 64f5a8b003..d43a73c108 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -467,29 +467,43 @@ static void *postcopy_ram_fault_thread(void *opaque)
     MigrationIncomingState *mis = opaque;
     struct uffd_msg msg;
     int ret;
+    size_t index;
     RAMBlock *rb = NULL;
     RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
     trace_postcopy_ram_fault_thread_entry();
     qemu_sem_post(&mis->fault_thread_sem);
 
+    struct pollfd *pfd;
+    size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
+
+    pfd = g_new0(struct pollfd, pfd_len);
+
+    pfd[0].fd = mis->userfault_fd;
+    pfd[0].events = POLLIN;
+    pfd[1].fd = mis->userfault_quit_fd;
+    pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+    trace_postcopy_ram_fault_thread_fds_core(pfd[0].fd, pfd[1].fd);
+    for (index = 0; index < mis->postcopy_remote_fds->len; index++) {
+        struct PostCopyFD *pcfd = &g_array_index(mis->postcopy_remote_fds,
+                                                 struct PostCopyFD, index);
+        pfd[2 + index].fd = pcfd->fd;
+        pfd[2 + index].events = POLLIN;
+        trace_postcopy_ram_fault_thread_fds_extra(2 + index, pcfd->idstr,
+                                                  pcfd->fd);
+    }
+
     while (true) {
         ram_addr_t rb_offset;
-        struct pollfd pfd[2];
+        int poll_result;
 
         /*
          * We're mainly waiting for the kernel to give us a faulting HVA,
          * however we can be told to quit via userfault_quit_fd which is
          * an eventfd
          */
-        pfd[0].fd = mis->userfault_fd;
-        pfd[0].events = POLLIN;
-        pfd[0].revents = 0;
-        pfd[1].fd = mis->userfault_quit_fd;
-        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
-        pfd[1].revents = 0;
-
-        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+        poll_result = poll(pfd, pfd_len, -1 /* Wait forever */);
+        if (poll_result == -1) {
             error_report("%s: userfault poll: %s", __func__, strerror(errno));
             break;
         }
@@ -499,57 +513,118 @@ static void *postcopy_ram_fault_thread(void *opaque)
             break;
         }
 
-        ret = read(mis->userfault_fd, &msg, sizeof(msg));
-        if (ret != sizeof(msg)) {
-            if (errno == EAGAIN) {
-                /*
-                 * if a wake up happens on the other thread just after
-                 * the poll, there is nothing to read.
-                 */
-                continue;
+        if (pfd[0].revents) {
+            poll_result--;
+            ret = read(mis->userfault_fd, &msg, sizeof(msg));
+            if (ret != sizeof(msg)) {
+                if (errno == EAGAIN) {
+                    /*
+                     * if a wake up happens on the other thread just after
+                     * the poll, there is nothing to read.
+                     */
+                    continue;
+                }
+                if (ret < 0) {
+                    error_report("%s: Failed to read full userfault "
+                                 "message: %s",
+                                 __func__, strerror(errno));
+                    break;
+                } else {
+                    error_report("%s: Read %d bytes from userfaultfd "
+                                 "expected %zd",
+                                 __func__, ret, sizeof(msg));
+                    break; /* Lost alignment, don't know what we'd read next */
+                }
+            }
+            if (msg.event != UFFD_EVENT_PAGEFAULT) {
+                error_report("%s: Read unexpected event %ud from userfaultfd",
+                             __func__, msg.event);
+                continue; /* It's not a page fault, shouldn't happen */
             }
-            if (ret < 0) {
-                error_report("%s: Failed to read full userfault message: %s",
-                             __func__, strerror(errno));
+
+            rb = qemu_ram_block_from_host(
+                     (void *)(uintptr_t)msg.arg.pagefault.address,
+                     true, &rb_offset);
+            if (!rb) {
+                error_report("postcopy_ram_fault_thread: Fault outside guest: %"
+                             PRIx64, (uint64_t)msg.arg.pagefault.address);
                 break;
-            } else {
-                error_report("%s: Read %d bytes from userfaultfd expected %zd",
-                             __func__, ret, sizeof(msg));
-                break; /* Lost alignment, don't know what we'd read next */
             }
-        }
-        if (msg.event != UFFD_EVENT_PAGEFAULT) {
-            error_report("%s: Read unexpected event %ud from userfaultfd",
-                         __func__, msg.event);
-            continue; /* It's not a page fault, shouldn't happen */
-        }
 
-        rb = qemu_ram_block_from_host(
-                 (void *)(uintptr_t)msg.arg.pagefault.address,
-                 true, &rb_offset);
-        if (!rb) {
-            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
-                         PRIx64, (uint64_t)msg.arg.pagefault.address);
-            break;
-        }
+            rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
+            trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
+                                                    qemu_ram_get_idstr(rb),
+                                                    rb_offset);
 
-        rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
-        trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
-                                                qemu_ram_get_idstr(rb),
-                                                rb_offset);
+            /*
+             * Send the request to the source - we want to request one
+             * of our host page sizes (which is >= TPS)
+             */
+            if (rb != last_rb) {
+                last_rb = rb;
+                migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                         rb_offset, qemu_ram_pagesize(rb));
+            } else {
+                /* Save some space */
+                migrate_send_rp_req_pages(mis, NULL,
+                                         rb_offset, qemu_ram_pagesize(rb));
+            }
+        }
 
-        /*
-         * Send the request to the source - we want to request one
-         * of our host page sizes (which is >= TPS)
-         */
-        if (rb != last_rb) {
-            last_rb = rb;
-            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                     rb_offset, qemu_ram_pagesize(rb));
-        } else {
-            /* Save some space */
-            migrate_send_rp_req_pages(mis, NULL,
-                                     rb_offset, qemu_ram_pagesize(rb));
+        /* Now handle any requests from external processes on shared memory */
+        /* TODO: May need to handle devices deregistering during postcopy */
+        for (index = 2; index < pfd_len && poll_result; index++) {
+            if (pfd[index].revents) {
+                struct PostCopyFD *pcfd =
+                    &g_array_index(mis->postcopy_remote_fds,
+                                   struct PostCopyFD, index - 2);
+
+                poll_result--;
+                if (pfd[index].revents & POLLERR) {
+                    error_report("%s: POLLERR on poll %zd fd=%d",
+                                 __func__, index, pcfd->fd);
+                    pfd[index].events = 0;
+                    continue;
+                }
+
+                ret = read(pcfd->fd, &msg, sizeof(msg));
+                if (ret != sizeof(msg)) {
+                    if (errno == EAGAIN) {
+                        /*
+                         * if a wake up happens on the other thread just after
+                         * the poll, there is nothing to read.
+                         */
+                        continue;
+                    }
+                    if (ret < 0) {
+                        error_report("%s: Failed to read full userfault "
+                                     "message: %s (shared) revents=%d",
+                                     __func__, strerror(errno),
+                                     pfd[index].revents);
+                        /*TODO: Could just disable this sharer */
+                        break;
+                    } else {
+                        error_report("%s: Read %d bytes from userfaultfd "
+                                     "expected %zd (shared)",
+                                     __func__, ret, sizeof(msg));
+                        /*TODO: Could just disable this sharer */
+                        break; /*Lost alignment,don't know what we'd read next*/
+                    }
+                }
+                if (msg.event != UFFD_EVENT_PAGEFAULT) {
+                    error_report("%s: Read unexpected event %ud "
+                                 "from userfaultfd (shared)",
+                                 __func__, msg.event);
+                    continue; /* It's not a page fault, shouldn't happen */
+                }
+                /* Call the device handler registered with us */
+                ret = pcfd->handler(pcfd, &msg);
+                if (ret) {
+                    error_report("%s: Failed to resolve shared fault on %zd/%s",
+                                 __func__, index, pcfd->idstr);
+                    /* TODO: Fail? Disable this sharer? */
+                }
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
@@ -879,3 +954,32 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
 {
     return atomic_xchg(&incoming_postcopy_state, new_state);
 }
+
+/* Register a handler for external shared memory postcopy
+ * called on the destination.
+ */
+void postcopy_register_shared_ufd(struct PostCopyFD *pcfd)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    mis->postcopy_remote_fds = g_array_append_val(mis->postcopy_remote_fds,
+                                                  *pcfd);
+}
+
+/* Unregister a handler for external shared memory postcopy
+ */
+void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
+{
+    guint i;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    GArray *pcrfds = mis->postcopy_remote_fds;
+
+    for (i = 0; i < pcrfds->len; i++) {
+        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
+        if (cur->fd == pcfd->fd) {
+            mis->postcopy_remote_fds = g_array_remove_index(pcrfds, i);
+            return;
+        }
+    }
+}
+
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 70d4b09659..ba8a8ffec5 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -141,4 +141,25 @@ void postcopy_remove_notifier(NotifierWithReturn *n);
 /* Call the notifier list set by postcopy_add_start_notifier */
 int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
 
+struct PostCopyFD;
+
+/* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
+typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd);
+
+struct PostCopyFD {
+    int fd;
+    /* Data to pass to handler */
+    void *data;
+    /* Handler to be called whenever we get a poll event */
+    pcfdhandler handler;
+    /* A string to use in error messages */
+    char *idstr;
+};
+
+/* Register a userfaultfd owned by an external process for
+ * shared memory.
+ */
+void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
+void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index cb2c4b5b40..522cfa170d 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -189,6 +189,8 @@ postcopy_place_page_zero(void *host_addr) "host=%p"
 postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
+postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d"
+postcopy_ram_fault_thread_fds_extra(size_t index, const char *name, int fd) "%zd/%s: %d"
 postcopy_ram_fault_thread_quit(void) ""
 postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Register the UFD that comes in as the response to the 'advise' method
with the postcopy code.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost-user.c   | 23 ++++++++++++++++++++++-
 migration/postcopy-ram.h |  2 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ee9a1ac8a3..b98fbe4834 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -24,6 +24,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <linux/vhost.h>
+#include <linux/userfaultfd.h>
 
 #define VHOST_MEMORY_MAX_NREGIONS    8
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
@@ -128,6 +129,7 @@ struct vhost_user {
     CharBackend *chr;
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
+    struct PostCopyFD  postcopy_fd;
 };
 
 static bool ioeventfd_enabled(void)
@@ -707,6 +709,19 @@ out:
 }
 
 /*
+ * Called back from the postcopy fault thread when a fault is received on our
+ * ufd.
+ * TODO: This is Linux specific
+ */
+static int vhost_user_postcopy_fault_handler(struct PostCopyFD *pcfd,
+                                             void *ufd)
+{
+    struct uffd_msg *msg = ufd;
+
+    return 0;
+}
+
+/*
  * Called at the start of an inbound postcopy on reception of the
  * 'advise' command.
  */
@@ -745,8 +760,14 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
         error_setg(errp, "%s: Failed to get ufd", __func__);
         return -1;
     }
+    fcntl(ufd, F_SETFL, O_NONBLOCK);
 
-    /* TODO: register ufd with userfault thread */
+    /* register ufd with userfault thread */
+    u->postcopy_fd.fd = ufd;
+    u->postcopy_fd.data = dev;
+    u->postcopy_fd.handler = vhost_user_postcopy_fault_handler;
+    u->postcopy_fd.idstr = "vhost-user"; /* Need to find unique name */
+    postcopy_register_shared_ufd(&u->postcopy_fd);
     return 0;
 }
 
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ba8a8ffec5..28c216cc7a 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -153,7 +153,7 @@ struct PostCopyFD {
     /* Handler to be called whenever we get a poll event */
     pcfdhandler handler;
     /* A string to use in error messages */
-    char *idstr;
+    const char *idstr;
 };
 
 /* Register a userfaultfd owned by an external process for
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-24 14:36   ` Maxime Coquelin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Notify the vhost-user client on reception of the 'postcopy-listen'
event from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 16 ++++++++++++++++
 contrib/libvhost-user/libvhost-user.h |  2 ++
 hw/virtio/trace-events                |  3 +++
 hw/virtio/vhost-user.c                | 23 +++++++++++++++++++++++
 migration/postcopy-ram.h              |  1 +
 migration/savevm.c                    |  7 +++++++
 6 files changed, 52 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 62e97f6b84..6de339fb7a 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -15,7 +15,9 @@
 
 #include <qemu/osdep.h>
 #include <sys/eventfd.h>
+#include <sys/ioctl.h>
 #include <sys/syscall.h>
+#include <linux/userfaultfd.h>
 #include <linux/vhost.h>
 
 #include "qemu/atomic.h"
@@ -65,6 +67,7 @@ vu_request_to_string(int req)
         REQ(VHOST_USER_SEND_RARP),
         REQ(VHOST_USER_INPUT_GET_CONFIG),
         REQ(VHOST_USER_POSTCOPY_ADVISE),
+        REQ(VHOST_USER_POSTCOPY_LISTEN),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -801,6 +804,17 @@ vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
 }
 
 static bool
+vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
+{
+    if (dev->nregions) {
+        vu_panic(dev, "Regions already registered at postcopy-listen");
+        return false;
+    }
+    dev->postcopy_listening = true;
+
+    return false;
+}
+static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
     int do_reply = 0;
@@ -865,6 +879,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_vring_enable_exec(dev, vmsg);
     case VHOST_USER_POSTCOPY_ADVISE:
         return vu_set_postcopy_advise(dev, vmsg);
+    case VHOST_USER_POSTCOPY_LISTEN:
+        return vu_set_postcopy_listen(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 3e65a962da..86e1934ddb 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -64,6 +64,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_SEND_RARP = 19,
     VHOST_USER_INPUT_GET_CONFIG = 20,
     VHOST_USER_POSTCOPY_ADVISE  = 23,
+    VHOST_USER_POSTCOPY_LISTEN  = 24,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -234,6 +235,7 @@ struct VuDev {
 
     /* Postcopy data */
     int postcopy_ufd;
+    bool postcopy_listening;
 };
 
 typedef struct VuVirtqElement {
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e24d8fa997..1076dbbb1d 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -1,5 +1,8 @@
 # See docs/tracing.txt for syntax documentation.
 
+# hw/virtio/vhost-user.c
+vhost_user_postcopy_listen(void) ""
+
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
 virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b98fbe4834..1f70f5760f 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -67,6 +67,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_SLAVE_REQ_FD = 21,
     VHOST_USER_IOTLB_MSG = 22,
     VHOST_USER_POSTCOPY_ADVISE  = 23,
+    VHOST_USER_POSTCOPY_LISTEN  = 24,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -771,6 +772,25 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
     return 0;
 }
 
+/*
+ * Called at the switch to postcopy on reception of the 'listen' command.
+ */
+static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
+{
+    VhostUserMsg msg = {
+        .request = VHOST_USER_POSTCOPY_LISTEN,
+        .flags = VHOST_USER_VERSION,
+    };
+
+    trace_vhost_user_postcopy_listen();
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        error_setg(errp, "Failed to send postcopy_listen to vhost");
+        return -1;
+    }
+
+    return 0;
+}
+
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
@@ -793,6 +813,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     case POSTCOPY_NOTIFY_INBOUND_ADVISE:
         return vhost_user_postcopy_advise(dev, pnd->errp);
 
+    case POSTCOPY_NOTIFY_INBOUND_LISTEN:
+        return vhost_user_postcopy_listen(dev, pnd->errp);
+
     default:
         /* We ignore notifications we don't know */
         break;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 28c216cc7a..873c147b68 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -129,6 +129,7 @@ void postcopy_infrastructure_init(void);
 enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_PROBE = 0,
     POSTCOPY_NOTIFY_INBOUND_ADVISE,
+    POSTCOPY_NOTIFY_INBOUND_LISTEN,
 };
 
 struct PostcopyNotifyData {
diff --git a/migration/savevm.c b/migration/savevm.c
index 4977baf9b7..4b6294b7d4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1579,6 +1579,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
     trace_loadvm_postcopy_handle_listen();
+    Error *local_err = NULL;
+
     if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
         error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
         return -1;
@@ -1600,6 +1602,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, &local_err)) {
+        error_report_err(local_err);
+        return -1;
+    }
+
     if (mis->have_listen_thread) {
         error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
         return -1;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-24 15:22   ` Maxime Coquelin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When new regions are sent to the client using SET_MEM_TABLE, register
them with the userfaultfd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 6de339fb7a..be7470e3a9 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -450,6 +450,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
                    dev_region->mmap_addr);
         }
 
+        if (dev->postcopy_listening) {
+            /* We should already have an open ufd need to mark each memory
+             * range as ufd.
+             * Note: Do we need any madvises? Well it's not been accessed
+             * yet, still probably need no THP to be safe, discard to be safe?
+             */
+            struct uffdio_register reg_struct;
+            /* Note: We might need to go back to using mmap_addr and
+             * len + mmap_offset for * huge pages, but then we do hope not to
+             * see accesses in that area below the offset
+             */
+            reg_struct.range.start = (uintptr_t)(dev_region->mmap_addr +
+                                                 dev_region->mmap_offset);
+            reg_struct.range.len = dev_region->size;
+            reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+            if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
+                vu_panic(dev, "%s: Failed to userfault region %d: (ufd=%d)%s\n",
+                         __func__, i, strerror(errno), dev->postcopy_ufd);
+                continue;
+            }
+            if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+                vu_panic(dev, "%s Region (%d) doesn't support COPY",
+                         __func__, i);
+                continue;
+            }
+            DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
+                    __func__, i, reg_struct.range.start, reg_struct.range.len);
+            /* TODO: Stash 'zero' support flags somewhere */
+            /* TODO: Get address back to QEMU */
+
+        }
+
         close(vmsg->fds[i]);
     }
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-24 17:31   ` Maxime Coquelin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We need a better way, but at the moment we need the address of the
mappings sent back to qemu so it can interpret the messages on the
userfaultfd it reads.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 15 +++++++++-
 hw/virtio/trace-events                |  1 +
 hw/virtio/vhost-user.c                | 54 +++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index be7470e3a9..0658b6e847 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -479,13 +479,26 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
             DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
                     __func__, i, reg_struct.range.start, reg_struct.range.len);
             /* TODO: Stash 'zero' support flags somewhere */
-            /* TODO: Get address back to QEMU */
 
+            /* TODO: We need to find a way for the qemu not to see the virtual
+             * addresses of the clients, so as to keep better separation.
+             */
+            /* Return the address to QEMU so that it can translate the ufd
+             * fault addresses back.
+             */
+            msg_region->userspace_addr = (uintptr_t)(mmap_addr +
+                                                     dev_region->mmap_offset);
         }
 
         close(vmsg->fds[i]);
     }
 
+    if (dev->postcopy_listening) {
+        /* Need to return the addresses - send the updated message back */
+        vmsg->fd_num = 0;
+        return true;
+    }
+
     return false;
 }
 
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 1076dbbb1d..f7be340a45 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -2,6 +2,7 @@
 
 # hw/virtio/vhost-user.c
 vhost_user_postcopy_listen(void) ""
+vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
 
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1f70f5760f..6be3e7ff2d 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -19,6 +19,7 @@
 #include "qemu/sockets.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "trace.h"
 
 #include <sys/ioctl.h>
 #include <sys/socket.h>
@@ -131,6 +132,7 @@ struct vhost_user {
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
+    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
 };
 
 static bool ioeventfd_enabled(void)
@@ -298,6 +300,7 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
 static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
+    struct vhost_user *u = dev->opaque;
     int fds[VHOST_MEMORY_MAX_NREGIONS];
     int i, fd;
     size_t fd_num = 0;
@@ -348,6 +351,57 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         return -1;
     }
 
+    if (u->postcopy_fd.handler) {
+        VhostUserMsg msg_reply;
+        int region_i, reply_i;
+        if (vhost_user_read(dev, &msg_reply) < 0) {
+            return -1;
+        }
+
+        if (msg_reply.request != VHOST_USER_SET_MEM_TABLE) {
+            error_report("%s: Received unexpected msg type."
+                         "Expected %d received %d", __func__,
+                         VHOST_USER_SET_MEM_TABLE, msg_reply.request);
+            return -1;
+        }
+        /* We're using the same structure, just reusing one of the
+         * fields, so it should be the same size.
+         */
+        if (msg_reply.size != msg.size) {
+            error_report("%s: Unexpected size for postcopy reply "
+                         "%d vs %d", __func__, msg_reply.size, msg.size);
+            return -1;
+        }
+
+        memset(u->postcopy_client_bases, 0,
+               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+
+        /* They're in the same order as the regions that were sent
+         * but some of the regions were skipped (above) if they
+         * didn't have fd's
+        */
+        for (reply_i = 0, region_i = 0;
+             region_i < dev->mem->nregions;
+             region_i++) {
+            if (reply_i < fd_num &&
+                msg_reply.payload.memory.regions[region_i].guest_phys_addr ==
+                dev->mem->regions[region_i].guest_phys_addr) {
+                u->postcopy_client_bases[region_i] =
+                    msg_reply.payload.memory.regions[reply_i].userspace_addr;
+                trace_vhost_user_set_mem_table_postcopy(
+                    msg_reply.payload.memory.regions[reply_i].userspace_addr,
+                    msg.payload.memory.regions[reply_i].userspace_addr,
+                    reply_i, region_i);
+                reply_i++;
+            }
+        }
+        if (reply_i != fd_num) {
+            error_report("%s: postcopy reply not fully consumed "
+                         "%d vs %zd",
+                         __func__, reply_i, fd_num);
+            return -1;
+        }
+    }
     if (reply_supported) {
         return process_message_reply(dev, &msg);
     }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-11  3:31   ` Peter Xu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Stash the RAMBlock and offset for later use looking up
addresses.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events |  1 +
 hw/virtio/vhost-user.c | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index f7be340a45..1fd194363a 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -3,6 +3,7 @@
 # hw/virtio/vhost-user.c
 vhost_user_postcopy_listen(void) ""
 vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
+vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
 
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6be3e7ff2d..3185af7a45 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -133,6 +133,11 @@ struct vhost_user {
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
     uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
+    RAMBlock          *region_rb[VHOST_MEMORY_MAX_NREGIONS];
+    /* The offset from the start of the RAMBlock to the start of the
+     * vhost region.
+     */
+    ram_addr_t         region_rb_offset[VHOST_MEMORY_MAX_NREGIONS];
 };
 
 static bool ioeventfd_enabled(void)
@@ -324,8 +329,14 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
         mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
                                      &offset);
+        u->region_rb_offset[i] = offset;
+        u->region_rb[i] = mr->ram_block;
         fd = memory_region_get_fd(mr);
         if (fd > 0) {
+            trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
+                                                  reg->memory_size,
+                                                  reg->guest_phys_addr,
+                                                  reg->userspace_addr, offset);
             msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
             msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
             msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Send requests back to the source for shared page requests.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.h    |  2 ++
 migration/postcopy-ram.c | 31 ++++++++++++++++++++++++++++---
 migration/postcopy-ram.h |  3 +++
 migration/trace-events   |  2 ++
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 1210b217a1..5fa0bb7b53 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -49,6 +49,8 @@ struct MigrationIncomingState {
     int       userfault_quit_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
+    /* RAMBlock of last request sent to source */
+    RAMBlock *last_rb;
     void     *postcopy_tmp_page;
     void     *postcopy_tmp_zero_page;
     /* PostCopyFD's for external userfaultfds & handlers of shared memory */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d43a73c108..cbf10236f0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -460,6 +460,31 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
 }
 
 /*
+ * Callback from shared fault handlers to ask for a page,
+ * the page must be specified by a RAMBlock and an offset in that rb
+ */
+int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
+                                 uint64_t client_addr, uint64_t rb_offset)
+{
+    size_t pagesize = qemu_ram_pagesize(rb);
+    uint64_t aligned_rbo = rb_offset & ~(pagesize - 1);
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    trace_postcopy_request_shared_page(pcfd->idstr, qemu_ram_get_idstr(rb),
+                                       rb_offset);
+    /* TODO: Check bitmap to see if we already have the page */
+    if (rb != mis->last_rb) {
+        mis->last_rb = rb;
+        migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                  aligned_rbo, pagesize);
+    } else {
+        /* Save some space */
+        migrate_send_rp_req_pages(mis, NULL, aligned_rbo, pagesize);
+    }
+    return 0;
+}
+
+/*
  * Handle faults detected by the USERFAULT markings
  */
 static void *postcopy_ram_fault_thread(void *opaque)
@@ -469,9 +494,9 @@ static void *postcopy_ram_fault_thread(void *opaque)
     int ret;
     size_t index;
     RAMBlock *rb = NULL;
-    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
     trace_postcopy_ram_fault_thread_entry();
+    mis->last_rb = NULL; /* last RAMBlock we sent part of */
     qemu_sem_post(&mis->fault_thread_sem);
 
     struct pollfd *pfd;
@@ -560,8 +585,8 @@ static void *postcopy_ram_fault_thread(void *opaque)
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            if (rb != last_rb) {
-                last_rb = rb;
+            if (rb != mis->last_rb) {
+                mis->last_rb = rb;
                 migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
                                          rb_offset, qemu_ram_pagesize(rb));
             } else {
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 873c147b68..69e88b0174 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -162,5 +162,8 @@ struct PostCopyFD {
  */
 void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
 void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+/* Callback from shared fault handlers to ask for a page */
+int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
+                                 uint64_t client_addr, uint64_t offset);
 
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index 522cfa170d..7ac8f9cf41 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -197,6 +197,8 @@ postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
+postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset %"PRIx64
+
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Resolve fault addresses read off the clients UFD into RAMBlock
and offset, and call back to the postcopy code to ask for the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events |  3 +++
 hw/virtio/vhost-user.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 1fd194363a..3cec81bb1e 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -1,6 +1,9 @@
 # See docs/tracing.txt for syntax documentation.
 
 # hw/virtio/vhost-user.c
+vhost_user_postcopy_fault_handler(const char *name, uint64_t fault_address, int nregions) "%s: @%"PRIx64" nregions:%d"
+vhost_user_postcopy_fault_handler_loop(int i, uint64_t client_base, uint64_t size) "%d: client%"PRIx64" +%"PRIx64
+vhost_user_postcopy_fault_handler_found(int i, uint64_t region_offset, uint64_t rb_offset) "%d: region_offset: %"PRIx64" rb_offset:%"PRIx64
 vhost_user_postcopy_listen(void) ""
 vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
 vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3185af7a45..92620830e4 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -782,9 +782,35 @@ out:
 static int vhost_user_postcopy_fault_handler(struct PostCopyFD *pcfd,
                                              void *ufd)
 {
+    struct vhost_dev *dev = pcfd->data;
+    struct vhost_user *u = dev->opaque;
     struct uffd_msg *msg = ufd;
+    uint64_t faultaddr = msg->arg.pagefault.address;
+    RAMBlock *rb = NULL;
+    uint64_t rb_offset;
+    int i;
 
-    return 0;
+    trace_vhost_user_postcopy_fault_handler(pcfd->idstr, faultaddr,
+                                            dev->mem->nregions);
+    for (i = 0; i < dev->mem->nregions; i++) {
+        trace_vhost_user_postcopy_fault_handler_loop(i,
+                u->postcopy_client_bases[i], dev->mem->regions[i].memory_size);
+        if (faultaddr >= u->postcopy_client_bases[i]) {
+            /* Ofset of the fault address in the vhost region */
+            uint64_t region_offset = faultaddr - u->postcopy_client_bases[i];
+            if (region_offset <= dev->mem->regions[i].memory_size) {
+                rb_offset = region_offset + u->region_rb_offset[i];
+                trace_vhost_user_postcopy_fault_handler_found(i,
+                        region_offset, rb_offset);
+                rb = u->region_rb[i];
+                return postcopy_request_shared_page(pcfd, rb, faultaddr,
+                                                    rb_offset);
+            }
+        }
+    }
+    error_report("%s: Failed to find region for fault %" PRIx64,
+                 __func__, faultaddr);
+    return -1;
 }
 
 /*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 19/29] postcopy: wake shared
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Send a 'wake' request on a userfaultfd for a shared process.
The address in the clients address space is specified together
with the RAMBlock it was resolved to.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 26 ++++++++++++++++++++++++++
 migration/postcopy-ram.h |  6 ++++++
 migration/trace-events   |  1 +
 3 files changed, 33 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cbf10236f0..072b355991 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -459,6 +459,25 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+int postcopy_wake_shared(struct PostCopyFD *pcfd,
+                         uint64_t client_addr,
+                         RAMBlock *rb)
+{
+    size_t pagesize = qemu_ram_pagesize(rb);
+    struct uffdio_range range;
+    int ret;
+    trace_postcopy_wake_shared(client_addr, qemu_ram_get_idstr(rb));
+    range.start = client_addr & ~(pagesize - 1);
+    range.len = pagesize;
+    ret = ioctl(pcfd->fd, UFFDIO_WAKE, &range);
+    if (ret) {
+        error_report("%s: Failed to wake: %zx in %s (%s)",
+                     __func__, client_addr, qemu_ram_get_idstr(rb),
+                     strerror(errno));
+    }
+    return ret;
+}
+
 /*
  * Callback from shared fault handlers to ask for a page,
  * the page must be specified by a RAMBlock and an offset in that rb
@@ -877,6 +896,13 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
     return NULL;
 }
 
+int postcopy_wake_shared(struct PostCopyFD *pcfd,
+                         uint64_t client_addr,
+                         RAMBlock *rb)
+{
+    assert(0);
+    return -1;
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 69e88b0174..d2b2f5f4aa 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -162,6 +162,12 @@ struct PostCopyFD {
  */
 void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
 void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+/* Notify a client ufd that a page is available
+ * Note: The 'client_address' is in the address space of the client
+ * program not QEMU
+ */
+int postcopy_wake_shared(struct PostCopyFD *pcfd, uint64_t client_addr,
+                         RAMBlock *rb);
 /* Callback from shared fault handlers to ask for a page */
 int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                  uint64_t client_addr, uint64_t offset);
diff --git a/migration/trace-events b/migration/trace-events
index 7ac8f9cf41..85a35be518 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -198,6 +198,7 @@ postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
 postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset %"PRIx64
+postcopy_wake_shared(uint64_t client_addr, const char *rb) "at %"PRIx64" in %s"
 
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a hook to allow a client userfaultfd to be 'woken'
when a page arrives, and a walker that calls that
hook for relevant clients given a RAMBlock and offset.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 16 ++++++++++++++++
 migration/postcopy-ram.h | 10 ++++++++++
 2 files changed, 26 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 072b355991..e6b8160f09 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -753,6 +753,22 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
     return ret;
 }
 
+int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
+{
+    int i;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    GArray *pcrfds = mis->postcopy_remote_fds;
+
+    for (i = 0; i < pcrfds->len; i++) {
+        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
+        int ret = cur->waker(cur, rb, offset);
+        if (ret) {
+            return ret;
+        }
+    }
+    return 0;
+}
+
 /*
  * Place a host page (from) at (host) atomically
  * returns 0 on success
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index d2b2f5f4aa..ecf731c689 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -146,6 +146,10 @@ struct PostCopyFD;
 
 /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
 typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd);
+/* Notification to wake, either on place or on reception of
+ * a fault on something that's already arrived (race)
+ */
+typedef int (*pcfdwake)(struct PostCopyFD *pcfd, RAMBlock *rb, uint64_t offset);
 
 struct PostCopyFD {
     int fd;
@@ -153,6 +157,8 @@ struct PostCopyFD {
     void *data;
     /* Handler to be called whenever we get a poll event */
     pcfdhandler handler;
+    /* Notification to wake shared client */
+    pcfdwake waker;
     /* A string to use in error messages */
     const char *idstr;
 };
@@ -162,6 +168,10 @@ struct PostCopyFD {
  */
 void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
 void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+/* Call each of the shared 'waker's registerd telling them of
+ * availability of a block.
+ */
+int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset);
 /* Notify a client ufd that a page is available
  * Note: The 'client_address' is in the address space of the client
  * program not QEMU
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Register a waker function in vhost-user code to be notified when
pages arrive or requests to previously mapped pages get requested.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events |  3 +++
 hw/virtio/vhost-user.c | 26 ++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 3cec81bb1e..e320a90941 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -5,6 +5,9 @@ vhost_user_postcopy_fault_handler(const char *name, uint64_t fault_address, int
 vhost_user_postcopy_fault_handler_loop(int i, uint64_t client_base, uint64_t size) "%d: client%"PRIx64" +%"PRIx64
 vhost_user_postcopy_fault_handler_found(int i, uint64_t region_offset, uint64_t rb_offset) "%d: region_offset: %"PRIx64" rb_offset:%"PRIx64
 vhost_user_postcopy_listen(void) ""
+vhost_user_postcopy_waker(const char *rb, uint64_t rb_offset) "%s + %"PRIx64
+vhost_user_postcopy_waker_found(uint64_t client_addr) "%"PRIx64
+vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + %"PRIx64
 vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
 vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 92620830e4..0f2e05f817 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -813,6 +813,31 @@ static int vhost_user_postcopy_fault_handler(struct PostCopyFD *pcfd,
     return -1;
 }
 
+static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
+                                     uint64_t offset)
+{
+    struct vhost_dev *dev = pcfd->data;
+    struct vhost_user *u = dev->opaque;
+    int i;
+
+    trace_vhost_user_postcopy_waker(qemu_ram_get_idstr(rb), offset);
+    /* Translate the offset into an address in the clients address space */
+    for (i = 0; i < dev->mem->nregions; i++) {
+        if (u->region_rb[i] == rb &&
+            offset >= u->region_rb_offset[i] &&
+            offset < (u->region_rb_offset[i] +
+                      dev->mem->regions[i].memory_size)) {
+            uint64_t client_addr = (offset - u->region_rb_offset[i]) +
+                                   u->postcopy_client_bases[i];
+            trace_vhost_user_postcopy_waker_found(client_addr);
+            return postcopy_wake_shared(pcfd, client_addr, rb);
+        }
+    }
+
+    trace_vhost_user_postcopy_waker_nomatch(qemu_ram_get_idstr(rb), offset);
+    return 0;
+}
+
 /*
  * Called at the start of an inbound postcopy on reception of the
  * 'advise' command.
@@ -858,6 +883,7 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
     u->postcopy_fd.fd = ufd;
     u->postcopy_fd.data = dev;
     u->postcopy_fd.handler = vhost_user_postcopy_fault_handler;
+    u->postcopy_fd.waker = vhost_user_postcopy_waker;
     u->postcopy_fd.idstr = "vhost-user"; /* Need to find unique name */
     postcopy_register_shared_ufd(&u->postcopy_fd);
     return 0;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-11  4:22   ` Peter Xu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Cause the vhost-user client to be woken up whenever:
  a) We place a page in postcopy mode
  b) We get a fault and the page has already been received

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 14 ++++++++++----
 migration/trace-events   |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e6b8160f09..b97fc9398b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -491,7 +491,11 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
 
     trace_postcopy_request_shared_page(pcfd->idstr, qemu_ram_get_idstr(rb),
                                        rb_offset);
-    /* TODO: Check bitmap to see if we already have the page */
+    if (ramblock_recv_bitmap_test_byte_offset(aligned_rbo, rb)) {
+        trace_postcopy_request_shared_page_present(pcfd->idstr,
+                                        qemu_ram_get_idstr(rb), rb_offset);
+        return postcopy_wake_shared(pcfd, client_addr, rb);
+    }
     if (rb != mis->last_rb) {
         mis->last_rb = rb;
         migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
@@ -792,7 +796,8 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
     }
 
     trace_postcopy_place_page(host);
-    return 0;
+    return postcopy_notify_shared_wake(rb,
+                                       qemu_ram_block_host_offset(rb, host));
 }
 
 /*
@@ -813,6 +818,9 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
 
             return -e;
         }
+        return postcopy_notify_shared_wake(rb,
+                                           qemu_ram_block_host_offset(rb,
+                                                                      host));
     } else {
         /* The kernel can't use UFFDIO_ZEROPAGE for hugepages */
         if (!mis->postcopy_tmp_zero_page) {
@@ -832,8 +840,6 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
         return postcopy_place_page(mis, host, mis->postcopy_tmp_zero_page,
                                    rb);
     }
-
-    return 0;
 }
 
 /*
diff --git a/migration/trace-events b/migration/trace-events
index 85a35be518..b2f7d85704 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -198,6 +198,7 @@ postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
 postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset %"PRIx64
+postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset %"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at %"PRIx64" in %s"
 
 save_xbzrle_page_skipping(void) ""
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 23/29] vub+postcopy: madvises
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-08-07  4:49   ` Alexey Perevalov
  2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Clear the area and turn off THP.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 0658b6e847..ceddeac74f 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
         }
 
         if (dev->postcopy_listening) {
+            int ret;
             /* We should already have an open ufd need to mark each memory
              * range as ufd.
-             * Note: Do we need any madvises? Well it's not been accessed
-             * yet, still probably need no THP to be safe, discard to be safe?
              */
+
+            /* Discard any mapping we have here; note I can't use MADV_REMOVE
+             * or fallocate to make the hole since I don't want to lose
+             * data that's already arrived in the shared process.
+             * TODO: How to do hugepage
+             */
+            ret = madvise((void *)dev_region->mmap_addr,
+                          dev_region->size + dev_region->mmap_offset,
+                          MADV_DONTNEED);
+            if (ret) {
+                fprintf(stderr,
+                        "%s: Failed to madvise(DONTNEED) region %d: %s\n",
+                        __func__, i, strerror(errno));
+            }
+            /* Turn off transparent hugepages so we dont get lose wakeups
+             * in neighbouring pages.
+             * TODO: Turn this backon later.
+             */
+            ret = madvise((void *)dev_region->mmap_addr,
+                          dev_region->size + dev_region->mmap_offset,
+                          MADV_NOHUGEPAGE);
+            if (ret) {
+                /* Note: This can happen legally on kernels that are configured
+                 * without madvise'able hugepages
+                 */
+                fprintf(stderr,
+                        "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
+                        __func__, i, strerror(errno));
+            }
             struct uffdio_register reg_struct;
             /* Note: We might need to go back to using mmap_addr and
              * len + mmap_offset for * huge pages, but then we do hope not to
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-04 19:34   ` Maxime Coquelin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

**HACK - better solution needed **
We have the situation where:

     qemu                      bridge

     send set_mem_table
                              map memory
  a)                          mark area with UFD
                              send reply with map addresses
  b)                          start using
  c) receive reply

  As soon as (a) happens qemu might start seeing faults
from memory accesses (but doesn't until b); but it can't
process those faults until (c) when it's received the
mmap addresses.

Make the fault handler spin until it gets the reply in (c).

At the very least this needs some proper locks, but preferably
we need to split the message.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost-user.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 0f2e05f817..74e4313782 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -138,6 +138,7 @@ struct vhost_user {
      * vhost region.
      */
     ram_addr_t         region_rb_offset[VHOST_MEMORY_MAX_NREGIONS];
+    uint64_t           in_set_mem_table; /*Hack! 1 while waiting for set_mem_table reply */
 };
 
 static bool ioeventfd_enabled(void)
@@ -321,6 +322,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         msg.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
+    atomic_set(&u->in_set_mem_table, true);
     for (i = 0; i < dev->mem->nregions; ++i) {
         struct vhost_memory_region *reg = dev->mem->regions + i;
         ram_addr_t offset;
@@ -351,14 +353,15 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     if (!fd_num) {
         error_report("Failed initializing vhost-user memory map, "
                      "consider using -object memory-backend-file share=on");
+        atomic_set(&u->in_set_mem_table, false);
         return -1;
     }
 
     msg.size = sizeof(msg.payload.memory.nregions);
     msg.size += sizeof(msg.payload.memory.padding);
     msg.size += fd_num * sizeof(VhostUserMemoryRegion);
-
     if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+        atomic_set(&u->in_set_mem_table, false);
         return -1;
     }
 
@@ -373,6 +376,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
             error_report("%s: Received unexpected msg type."
                          "Expected %d received %d", __func__,
                          VHOST_USER_SET_MEM_TABLE, msg_reply.request);
+            atomic_set(&u->in_set_mem_table, false);
             return -1;
         }
         /* We're using the same structure, just reusing one of the
@@ -381,6 +385,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         if (msg_reply.size != msg.size) {
             error_report("%s: Unexpected size for postcopy reply "
                          "%d vs %d", __func__, msg_reply.size, msg.size);
+            atomic_set(&u->in_set_mem_table, false);
             return -1;
         }
 
@@ -410,9 +415,11 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
             error_report("%s: postcopy reply not fully consumed "
                          "%d vs %zd",
                          __func__, reply_i, fd_num);
+            atomic_set(&u->in_set_mem_table, false);
             return -1;
         }
     }
+    atomic_set(&u->in_set_mem_table, false);
     if (reply_supported) {
         return process_message_reply(dev, &msg);
     }
@@ -821,6 +828,11 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
     int i;
 
     trace_vhost_user_postcopy_waker(qemu_ram_get_idstr(rb), offset);
+    while (atomic_mb_read(&u->in_set_mem_table)) {
+        fprintf(stderr, "%s: Spin waiting for memtable\n", __func__);
+        usleep(1000*100);
+    }
+
     /* Translate the offset into an address in the clients address space */
     for (i = 0; i < dev->mem->nregions; i++) {
         if (u->region_rb[i] == rb &&
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-04 19:38   ` Maxime Coquelin
  2017-07-04 21:59   ` Michael S. Tsirkin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  31 siblings, 2 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When we receive a GET_VRING_BASE message set enable = false
to stop any new received packets modifying the ring.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index ceddeac74f..d37052b7b0 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
     vmsg->size = sizeof(vmsg->payload.state);
 
     dev->vq[index].started = false;
+    dev->vq[index].enable = false;
     if (dev->iface->queue_set_started) {
         dev->iface->queue_set_started(dev, index, false);
     }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-27 11:35   ` Maxime Coquelin
  2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

This message is sent just before the end of postcopy to get the
client to stop using userfault since we wont respond to any more
requests.  It should close userfaultfd so that any other pages
get mapped to the backing file automatically by the kernel, since
at this point we know we've received everything.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 23 +++++++++++++++++++++++
 contrib/libvhost-user/libvhost-user.h |  1 +
 hw/virtio/vhost-user.c                |  1 +
 3 files changed, 25 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index d37052b7b0..c1716d1a62 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -68,6 +68,7 @@ vu_request_to_string(int req)
         REQ(VHOST_USER_INPUT_GET_CONFIG),
         REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_POSTCOPY_LISTEN),
+        REQ(VHOST_USER_POSTCOPY_END),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -889,6 +890,26 @@ vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
 
     return false;
 }
+
+static bool
+vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
+{
+    fprintf(stderr, "%s: Entry\n", __func__);
+    dev->postcopy_listening = false;
+    if (dev->postcopy_ufd > 0) {
+        close(dev->postcopy_ufd);
+        dev->postcopy_ufd = -1;
+        fprintf(stderr, "%s: Done close\n", __func__);
+    }
+
+    vmsg->fd_num = 0;
+    vmsg->payload.u64 = 0;
+    vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
+    fprintf(stderr, "%s: exit\n", __func__);
+    return true;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -956,6 +977,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_postcopy_advise(dev, vmsg);
     case VHOST_USER_POSTCOPY_LISTEN:
         return vu_set_postcopy_listen(dev, vmsg);
+    case VHOST_USER_POSTCOPY_END:
+        return vu_set_postcopy_end(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 86e1934ddb..1665c729f0 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -65,6 +65,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_INPUT_GET_CONFIG = 20,
     VHOST_USER_POSTCOPY_ADVISE  = 23,
     VHOST_USER_POSTCOPY_LISTEN  = 24,
+    VHOST_USER_POSTCOPY_END     = 25,
     VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 74e4313782..b29a141703 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -69,6 +69,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_IOTLB_MSG = 22,
     VHOST_USER_POSTCOPY_ADVISE  = 23,
     VHOST_USER_POSTCOPY_LISTEN  = 24,
+    VHOST_USER_POSTCOPY_END     = 25,
     VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up a call to VHOST_USER_POSTCOPY_END message to the vhost clients
right before we ask the listener thread to shutdown.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events   |  2 ++
 hw/virtio/vhost-user.c   | 30 ++++++++++++++++++++++++++++++
 migration/postcopy-ram.c |  5 +++++
 migration/postcopy-ram.h |  1 +
 4 files changed, 38 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e320a90941..f18b67a1f2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -1,6 +1,8 @@
 # See docs/tracing.txt for syntax documentation.
 
 # hw/virtio/vhost-user.c
+vhost_user_postcopy_end_entry(void) ""
+vhost_user_postcopy_end_exit(void) ""
 vhost_user_postcopy_fault_handler(const char *name, uint64_t fault_address, int nregions) "%s: @%"PRIx64" nregions:%d"
 vhost_user_postcopy_fault_handler_loop(int i, uint64_t client_base, uint64_t size) "%d: client%"PRIx64" +%"PRIx64
 vhost_user_postcopy_fault_handler_found(int i, uint64_t region_offset, uint64_t rb_offset) "%d: region_offset: %"PRIx64" rb_offset:%"PRIx64
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b29a141703..4388ce7c69 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -921,6 +921,33 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
     return 0;
 }
 
+/*
+ * Called at the end of postcopy
+ */
+static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
+{
+    VhostUserMsg msg = {
+        .request = VHOST_USER_POSTCOPY_END,
+        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+    };
+    int ret;
+
+    trace_vhost_user_postcopy_end_entry();
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        error_setg(errp, "Failed to send postcopy_end to vhost");
+        return -1;
+    }
+
+    ret = process_message_reply(dev, &msg);
+    if (ret) {
+        error_setg(errp, "Failed to receive reply to postcopy_end");
+        return ret;
+    }
+    trace_vhost_user_postcopy_end_exit();
+
+    return 0;
+}
+
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
@@ -946,6 +973,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     case POSTCOPY_NOTIFY_INBOUND_LISTEN:
         return vhost_user_postcopy_listen(dev, pnd->errp);
 
+    case POSTCOPY_NOTIFY_INBOUND_END:
+        return vhost_user_postcopy_end(dev, pnd->errp);
+
     default:
         /* We ignore notifications we don't know */
         break;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b97fc9398b..fdd53cdf1e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -337,7 +337,12 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 
     if (mis->have_fault_thread) {
         uint64_t tmp64;
+        Error *local_err = NULL;
 
+        if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_END, &local_err)) {
+            error_report_err(local_err);
+            return -1;
+        }
         if (qemu_ram_foreach_block(cleanup_range, mis)) {
             return -1;
         }
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ecf731c689..d0dc838001 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -130,6 +130,7 @@ enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_PROBE = 0,
     POSTCOPY_NOTIFY_INBOUND_ADVISE,
     POSTCOPY_NOTIFY_INBOUND_LISTEN,
+    POSTCOPY_NOTIFY_INBOUND_END,
 };
 
 struct PostcopyNotifyData {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Now that we have the mechanisms in here, allow shared memory in a
postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index fdd53cdf1e..736d272965 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -138,12 +138,6 @@ static int test_ramblock_postcopiable(const char *block_name, void *host_addr,
     RAMBlock *rb = qemu_ram_block_by_name(block_name);
     size_t pagesize = qemu_ram_pagesize(rb);
 
-    if (qemu_ram_is_shared(rb)) {
-        error_report("Postcopy on shared RAM (%s) is not yet supported",
-                     block_name);
-        return 1;
-    }
-
     if (length % pagesize) {
         error_report("Postcopy requires RAM blocks to be a page size multiple,"
                      " block %s is 0x" RAM_ADDR_FMT " bytes with a "
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
@ 2017-06-28 19:00 ` Dr. David Alan Gilbert (git)
  2017-07-04 14:09   ` Maxime Coquelin
  2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-06-28 19:00 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tell QEMU we understand the protocol features needed for postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index c1716d1a62..1c46aecfb3 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -797,7 +797,8 @@ vu_set_vring_err_exec(VuDev *dev, VhostUserMsg *vmsg)
 static bool
 vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
-    uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+    uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+                        1ULL << VHOST_USER_PROTOCOL_F_POSTCOPY;
 
     if (dev->iface->get_protocol_features) {
         features |= dev->iface->get_protocol_features(dev);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
@ 2017-06-29 18:55 ` Dr. David Alan Gilbert
  2017-07-03 11:03   ` Marc-André Lureau
       [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
  2017-07-03 17:55 ` Michael S. Tsirkin
  31 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-29 18:55 UTC (permalink / raw)
  To: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, peterx, lvivier, aarcange

* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is a RFC/WIP series that enables postcopy migration
> with shared memory to a vhost-user process.
> It's based off current-head + Juan's load_cleanup series, and
> Alexey's bitmap series (v4).  It's very lightly tested and seems
> to work, but it's quite rough.

Marc-André asked if I had a git with it all applied; so here we are:
https://github.com/dagrh/qemu/commits/vhost
git@github.com:dagrh/qemu.git on the vhost branch

Dave

> I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> use the new feature, since this is about the simplest
> client around.
> 
> Structure:
> 
> The basic idea is that near the start of postcopy, the client
> opens its own userfaultfd fd and sends that back to QEMU over
> the socket it's already using for VHUST_USER_* commands.
> Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> QEMU then reads the clients UFD in it's fault thread and issues
> requests back to the source as needed.
> QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> that the page has arrived and can carry on.
> 
> A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> the QEMU knows the client can talk postcopy.
> Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> added to guide the process along.
> 
> Current known issues:
>    I've not tested it with hugepages yet; and I suspect the madvises
>    will need tweaking for it.
> 
>    The qemu gets to see the base addresses that the client has its
>    regions mapped at; that's not great for security
> 
>    Take care of deadlocking; any thread in the client that
>    accesses a userfault protected page can stall.
> 
>    There's a nasty hack of a lock around the set_mem_table message.
> 
>    I've not looked at the recent IOMMU code.
> 
>    Some cleanup and a lot of corner cases need thinking about.
> 
>    There are probably plenty of unknown issues as well.
> 
> Test setup:
>   I'm running on one host at the moment, with the guest
>   scping a large file from the host as it migrates.
>   The setup is based on one I found in the vhost-user setups.
>   You'll need a recent kernel for the shared memory support
>   in userfaultfd, and userfault isn't that happy if a process
>   using shared memory core's - so make sure you have the
>   latest fixes.
> 
> SESS=vhost
> ulimit -c unlimited
> tmux -L $SESS new-session -d
> tmux -L $SESS set-option -g history-limit 30000
> # Start a router using the system qemu
> tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> tmux -L $SESS set-option -g set-remain-on-exit on
> # Start source vhost bridge
> tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> # Start dest vhost bridge
> tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> 1:5556 2>dst-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> 
> then once booted:
> tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> 
> 
> Dave
> 
> Dr. David Alan Gilbert (29):
>   RAMBlock/migration: Add migration flags
>   migrate: Update ram_block_discard_range for shared
>   qemu_ram_block_host_offset
>   migration/ram: ramblock_recv_bitmap_test_byte_offset
>   postcopy: use UFFDIO_ZEROPAGE only when available
>   postcopy: Add notifier chain
>   postcopy: Add vhost-user flag for postcopy and check it
>   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
>   vhub: Support sending fds back to qemu
>   vhub: Open userfaultfd
>   postcopy: Allow registering of fd handler
>   vhost+postcopy: Register shared ufd with postcopy
>   vhost+postcopy: Transmit 'listen' to client
>   vhost+postcopy: Register new regions with the ufd
>   vhost+postcopy: Send address back to qemu
>   vhost+postcopy: Stash RAMBlock and offset
>   vhost+postcopy: Send requests to source for shared pages
>   vhost+postcopy: Resolve client address
>   postcopy: wake shared
>   postcopy: postcopy_notify_shared_wake
>   vhost+postcopy: Add vhost waker
>   vhost+postcopy: Call wakeups
>   vub+postcopy: madvises
>   vhost+postcopy: Lock around set_mem_table
>   vhu: enable = false on get_vring_base
>   vhost: Add VHOST_USER_POSTCOPY_END message
>   vhost+postcopy: Wire up POSTCOPY_END notify
>   postcopy: Allow shared memory
>   vhost-user: Claim support for postcopy
> 
>  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
>  contrib/libvhost-user/libvhost-user.h |   8 +
>  exec.c                                |  44 +++--
>  hw/virtio/trace-events                |  13 ++
>  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
>  include/exec/cpu-common.h             |   3 +
>  include/exec/ram_addr.h               |   2 +
>  migration/migration.c                 |   3 +
>  migration/migration.h                 |   8 +
>  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
>  migration/postcopy-ram.h              |  69 +++++++
>  migration/ram.c                       |   5 +
>  migration/ram.h                       |   1 +
>  migration/savevm.c                    |  13 ++
>  migration/trace-events                |   6 +
>  trace-events                          |   3 +
>  vl.c                                  |   4 +-
>  17 files changed, 926 insertions(+), 84 deletions(-)
> 
> -- 
> 2.13.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
@ 2017-07-03 11:03   ` Marc-André Lureau
  2017-07-03 11:48     ` Dr. David Alan Gilbert
  2017-07-07 10:51     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 87+ messages in thread
From: Marc-André Lureau @ 2017-07-03 11:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, qemu-devel, a.perevalov, maxime.coquelin,
	mst, quintela, peterx, lvivier, aarcange

Hi

On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com>
wrote:

> * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
>
> Marc-André asked if I had a git with it all applied; so here we are:
> https://github.com/dagrh/qemu/commits/vhost
> git@github.com:dagrh/qemu.git on the vhost branch
>
>
I started looking at the series, but I am not familiar with ufd/postcopy.
Could you update vhost-user.txt to describe the new messages? Otherwise,
make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated
regression?) Thanks
-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-03 11:03   ` Marc-André Lureau
@ 2017-07-03 11:48     ` Dr. David Alan Gilbert
  2017-07-07 10:51     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-03 11:48 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, a.perevalov, maxime.coquelin, mst, quintela, peterx,
	lvivier, aarcange

* Marc-André Lureau (marcandre.lureau@gmail.com) wrote:
> Hi
> 
> On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com>
> wrote:
> 
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > Hi,
> > >   This is a RFC/WIP series that enables postcopy migration
> > > with shared memory to a vhost-user process.
> > > It's based off current-head + Juan's load_cleanup series, and
> > > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > > to work, but it's quite rough.
> >
> > Marc-André asked if I had a git with it all applied; so here we are:
> > https://github.com/dagrh/qemu/commits/vhost
> > git@github.com:dagrh/qemu.git on the vhost branch
> >
> >
> I started looking at the series, but I am not familiar with ufd/postcopy.

I'm similarly unfamiliar with the vhost code when I started this (which
probably shows!).
The main thing about ufd is that a process registers with the ufd system
and registers an area of memory with it;  accesses to the memory block
until the page is available, a message is sent down the ufd, and whoever
receives that message may then respond by atomically copying a page into
memory, or wakeing the process when it knows the page is there.
This is the first time we've tried to use userfaultfd with shared memory
and it does need a very recent kernel for it (4.11.0 or rhel 7.4 beta)

> Could you update vhost-user.txt to describe the new messages?

See below; I'll add that in.

> Otherwise,
> make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated
> regression?) Thanks

Entirely possible I broke it; I'll have a look - at the moment I'm more
interested in comments on the structure of this set.

Dave

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 481ab56e35..fec4cd0ffe 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -273,6 +273,14 @@ Once the source has finished migration, rings will be stopped by
 the source. No further update must be done before rings are
 restarted.

+In postcopy migration the slave is started before all the memory has been
+received from the source host, and care must be taken to avoid accessing pages
+that have yet to be received.  The slave opens a 'userfault'-fd and registers
+the memory with it; this fd is then passed back over to the master.
+The master services requests on the userfaultfd for pages that are accessed
+and when the page is available it performs WAKE ioctl's on the userfaultfd
+to wake the stalled slave.
+
 IOMMU support
 -------------

@@ -326,6 +334,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
 #define VHOST_USER_PROTOCOL_F_MTU            4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
+#define VHOST_USER_PROTOCOL_F_POSTCOPY       6

 Master message types
 --------------------
@@ -402,12 +411,17 @@ Master message types
       Id: 5
       Equivalent ioctl: VHOST_SET_MEM_TABLE
       Master payload: memory regions description
+      Slave payload: (postcopy only) memory regions description

       Sets the memory map regions on the slave so it can translate the vring
       addresses. In the ancillary data there is an array of file descriptors
       for each memory mapped region. The size and ordering of the fds matches
       the number and ordering of memory regions.

+      When postcopy-listening has been received, SET_MEM_TABLE replies with
+      the bases of the memory mapped regions to the master.  It must have mmap'd
+      the regions and enabled userfaultfd on them.
+
  * VHOST_USER_SET_LOG_BASE

       Id: 6
@@ -580,6 +594,29 @@ Master message types
       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
       has been successfully negotiated.

+ * VHOST_USER_POSTCOPY_ADVISE
+      Id: 23
+      Master payload: N/A
+      Slave payload: userfault fd + u64
+
+      Master advises slave that a migration with postcopy enabled is underway,
+      the slave must open a userfaultfd for later use.
+      Note that at this stage the migration is still in precopy mode.
+
+ * VHOST_USER_POSTCOPY_LISTEN
+      Id: 24
+      Master payload: N/A
+
+      Master advises slave that a transition to postcopy mode has happened.
+
+ * VHOST_USER_POSTCOPY_END
+      Id: 25
+      Slave payload: u64
+
+      Master advises that postcopy migration has now completed.  The
+      slave must disable the userfaultfd. The response is an acknowledgement
+      only.
+
 Slave message types
 -------------------


> -- 
> Marc-André Lureau
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
       [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
@ 2017-07-03 13:58   ` Alexey
  2017-07-03 16:49     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Alexey @ 2017-07-03 13:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, marcandre.lureau, maxime.coquelin, mst, quintela,
	peterx, lvivier, aarcange


Hello, David!

Thank for you patch set.

On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is a RFC/WIP series that enables postcopy migration
> with shared memory to a vhost-user process.
> It's based off current-head + Juan's load_cleanup series, and
> Alexey's bitmap series (v4).  It's very lightly tested and seems
> to work, but it's quite rough.
> 
> I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> use the new feature, since this is about the simplest
> client around.
> 
> Structure:
> 
> The basic idea is that near the start of postcopy, the client
> opens its own userfaultfd fd and sends that back to QEMU over
> the socket it's already using for VHUST_USER_* commands.
> Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> areas with userfaultfd and sends the mapped addresses back to QEMU.

userfault fd should be only one per all affected processes. But
why are you opening userfaultfd on client side, why not to pass
userfault fd which was opened at QEMU side? I guess, it could
be several virtual switches with different ports (it's exotic
configuration, but configuration when we have one QEMU, one vswitchd,
and serveral vhost-user ports is typical), and as example,
QEMU could be connected to these vswitches through these ports.
In this case you will obtain 2 different userfault fd in QEMU.
In case of one QEMU, one vswitchd and several vhost-user ports,
you are keeping userfaultfd in VuDev structure on client side,
looks like it's virtion_net sibling from DPDK, and that structure
is per vhost-user connection (per one port).

So from my point of view it's better to open fd on QEMU side, and pass it
the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.


> 
> QEMU then reads the clients UFD in it's fault thread and issues
> requests back to the source as needed.
> QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> that the page has arrived and can carry on.
Not so clear for me why QEMU have to inform vhost client,
due to single userfault fd, and kernel should wake up another faulted
thread/processes.
In my approach I just to send information about copied/received page
to vhot client, to be able to enable previously disabled VRING.

> 
> A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> the QEMU knows the client can talk postcopy.
> Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> added to guide the process along.
> 
> Current known issues:
>    I've not tested it with hugepages yet; and I suspect the madvises
>    will need tweaking for it.
I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
some part or pages already arrived and copied, so I'm doing
hole here according to received map.

> 
>    The qemu gets to see the base addresses that the client has its
>    regions mapped at; that's not great for security
> 
>    Take care of deadlocking; any thread in the client that
>    accesses a userfault protected page can stall.
That's why I decided to disable VRINGs, but not the way as you did
in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
here could be synchronization problem, maybe similar problem as you described in
"vhost+postcopy: Lock around set_mem_table"

Unfortunately, my patches isn't yet ready.

> 
>    There's a nasty hack of a lock around the set_mem_table message.
> 
>    I've not looked at the recent IOMMU code.
> 
>    Some cleanup and a lot of corner cases need thinking about.
> 
>    There are probably plenty of unknown issues as well.
> 
> Test setup:
>   I'm running on one host at the moment, with the guest
>   scping a large file from the host as it migrates.
>   The setup is based on one I found in the vhost-user setups.
>   You'll need a recent kernel for the shared memory support
>   in userfaultfd, and userfault isn't that happy if a process
>   using shared memory core's - so make sure you have the
>   latest fixes.
> 
> SESS=vhost
> ulimit -c unlimited
> tmux -L $SESS new-session -d
> tmux -L $SESS set-option -g history-limit 30000
> # Start a router using the system qemu
> tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> tmux -L $SESS set-option -g set-remain-on-exit on
> # Start source vhost bridge
> tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> # Start dest vhost bridge
> tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> 1:5556 2>dst-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> 
> then once booted:
> tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> 
> 
> Dave
> 
> Dr. David Alan Gilbert (29):
>   RAMBlock/migration: Add migration flags
>   migrate: Update ram_block_discard_range for shared
>   qemu_ram_block_host_offset
>   migration/ram: ramblock_recv_bitmap_test_byte_offset
>   postcopy: use UFFDIO_ZEROPAGE only when available
>   postcopy: Add notifier chain
>   postcopy: Add vhost-user flag for postcopy and check it
>   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
>   vhub: Support sending fds back to qemu
>   vhub: Open userfaultfd
>   postcopy: Allow registering of fd handler
>   vhost+postcopy: Register shared ufd with postcopy
>   vhost+postcopy: Transmit 'listen' to client
>   vhost+postcopy: Register new regions with the ufd
>   vhost+postcopy: Send address back to qemu
>   vhost+postcopy: Stash RAMBlock and offset
>   vhost+postcopy: Send requests to source for shared pages
>   vhost+postcopy: Resolve client address
>   postcopy: wake shared
>   postcopy: postcopy_notify_shared_wake
>   vhost+postcopy: Add vhost waker
>   vhost+postcopy: Call wakeups
>   vub+postcopy: madvises
>   vhost+postcopy: Lock around set_mem_table
>   vhu: enable = false on get_vring_base
>   vhost: Add VHOST_USER_POSTCOPY_END message
>   vhost+postcopy: Wire up POSTCOPY_END notify
>   postcopy: Allow shared memory
>   vhost-user: Claim support for postcopy
> 
>  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
>  contrib/libvhost-user/libvhost-user.h |   8 +
>  exec.c                                |  44 +++--
>  hw/virtio/trace-events                |  13 ++
>  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
>  include/exec/cpu-common.h             |   3 +
>  include/exec/ram_addr.h               |   2 +
>  migration/migration.c                 |   3 +
>  migration/migration.h                 |   8 +
>  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
>  migration/postcopy-ram.h              |  69 +++++++
>  migration/ram.c                       |   5 +
>  migration/ram.h                       |   1 +
>  migration/savevm.c                    |  13 ++
>  migration/trace-events                |   6 +
>  trace-events                          |   3 +
>  vl.c                                  |   4 +-
>  17 files changed, 926 insertions(+), 84 deletions(-)
> 
> -- 
> 2.13.0
> 
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-03 13:58   ` Alexey
@ 2017-07-03 16:49     ` Dr. David Alan Gilbert
  2017-07-03 17:42       ` Alexey
  0 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-03 16:49 UTC (permalink / raw)
  To: Alexey
  Cc: qemu-devel, marcandre.lureau, maxime.coquelin, mst, quintela,
	peterx, lvivier, aarcange

* Alexey (a.perevalov@samsung.com) wrote:
> 
> Hello, David!
> 
> Thank for you patch set.
> 
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> userfault fd should be only one per all affected processes. But
> why are you opening userfaultfd on client side, why not to pass
> userfault fd which was opened at QEMU side?

I just checked with Andrea on the semantics, and ufd don't work like that.
Any given userfaultfd only works on the address space of the process
that opened it; so if you want a process to block on it's memory space
it's the one that has to open the ufd.
(I don't think I knew that when I wrote the code!)
The nice thing about that is that you never get too confused about
address spaces - any one ufd always has one address space in it's ioctls
associated with one process.

> I guess, it could
> be several virtual switches with different ports (it's exotic
> configuration, but configuration when we have one QEMU, one vswitchd,
> and serveral vhost-user ports is typical), and as example,
> QEMU could be connected to these vswitches through these ports.
> In this case you will obtain 2 different userfault fd in QEMU.
> In case of one QEMU, one vswitchd and several vhost-user ports,
> you are keeping userfaultfd in VuDev structure on client side,
> looks like it's virtion_net sibling from DPDK, and that structure
> is per vhost-user connection (per one port).

Multiple switches make sense to me actually; running two switches
and having redundant routes in each VM let you live update the switch
process one at a time.

> So from my point of view it's better to open fd on QEMU side, and pass it
> the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.

Yes I see where you're coming from; but it's one address space per-ufd;
If you had one ufd then you'd have to change the messages to be
  'pid ... is waiting on address ....'
and all the ioctls for doing wakes etc would have to gain a PID.

> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> Not so clear for me why QEMU have to inform vhost client,
> due to single userfault fd, and kernel should wake up another faulted
> thread/processes.
> In my approach I just to send information about copied/received page
> to vhot client, to be able to enable previously disabled VRING.

The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl
on the ufd that tells the kernel it can unblock a process that's
trying to access the page.
(Their is potential to remove some of that - if we can get the
kernel to wake all the waiters for a physical page when a UFFDIO_COPY
is done it would remove a lot of those).

> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
> some part or pages already arrived and copied, so I'm doing
> hole here according to received map.

right, so I'm assuming they'll hit ufd faults and be immediately
WAKEd when I find the bit is set in the received-bitmap.

> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> > 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> That's why I decided to disable VRINGs, but not the way as you did
> in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
> here could be synchronization problem, maybe similar problem as you described in
> "vhost+postcopy: Lock around set_mem_table"
> 
> Unfortunately, my patches isn't yet ready.

That's OK; these patches just-about work; only enough for
me to post them and ask for opinions.

Dave

> > 
> >    There's a nasty hack of a lock around the set_mem_table message.
> > 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> > 
> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
> > 
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-03 16:49     ` Dr. David Alan Gilbert
@ 2017-07-03 17:42       ` Alexey
  0 siblings, 0 replies; 87+ messages in thread
From: Alexey @ 2017-07-03 17:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lvivier, aarcange, quintela, mst, qemu-devel, peterx,
	maxime.coquelin, marcandre.lureau

On Mon, Jul 03, 2017 at 05:49:26PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
> > 
> > Hello, David!
> > 
> > Thank for you patch set.
> > 
> > On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hi,
> > >   This is a RFC/WIP series that enables postcopy migration
> > > with shared memory to a vhost-user process.
> > > It's based off current-head + Juan's load_cleanup series, and
> > > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > > to work, but it's quite rough.
> > > 
> > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > > use the new feature, since this is about the simplest
> > > client around.
> > > 
> > > Structure:
> > > 
> > > The basic idea is that near the start of postcopy, the client
> > > opens its own userfaultfd fd and sends that back to QEMU over
> > > the socket it's already using for VHUST_USER_* commands.
> > > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > > areas with userfaultfd and sends the mapped addresses back to QEMU.
> > 
> > userfault fd should be only one per all affected processes. But
> > why are you opening userfaultfd on client side, why not to pass
> > userfault fd which was opened at QEMU side?
> 
> I just checked with Andrea on the semantics, and ufd don't work like that.
> Any given userfaultfd only works on the address space of the process
> that opened it; so if you want a process to block on it's memory space
> it's the one that has to open the ufd.

yes it obtains from vma in handle_userfault
ctx = vmf->vma->vm_userfaultfd_ctx.ctx;
so that's per vma

and it set into vma
vma->vm_userfaultfd_ctx.ctx = ctx;
in userfaultfd_register(struct userfaultfd_ctx *ctx,
but into userfaultfd_register it puts from
struct userfaultfd_ctx *ctx = file->private_data;
becase file descriptor was transfered over unix domain socket
(SOL_SOCKET) logically to assume userfaultfd context will be the same.


> (I don't think I knew that when I wrote the code!)
> The nice thing about that is that you never get too confused about
> address spaces - any one ufd always has one address space in it's ioctls
> associated with one process.
> 
> > I guess, it could
> > be several virtual switches with different ports (it's exotic
> > configuration, but configuration when we have one QEMU, one vswitchd,
> > and serveral vhost-user ports is typical), and as example,
> > QEMU could be connected to these vswitches through these ports.
> > In this case you will obtain 2 different userfault fd in QEMU.
> > In case of one QEMU, one vswitchd and several vhost-user ports,
> > you are keeping userfaultfd in VuDev structure on client side,
> > looks like it's virtion_net sibling from DPDK, and that structure
> > is per vhost-user connection (per one port).
> 
> Multiple switches make sense to me actually; running two switches
> and having redundant routes in each VM let you live update the switch
> process one at a time.
> 
> > So from my point of view it's better to open fd on QEMU side, and pass it
> > the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.
> 
> Yes I see where you're coming from; but it's one address space per-ufd;
> If you had one ufd then you'd have to change the messages to be
>   'pid ... is waiting on address ....'
> and all the ioctls for doing wakes etc would have to gain a PID.
> 
> > > 
> > > QEMU then reads the clients UFD in it's fault thread and issues
> > > requests back to the source as needed.
> > > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > > that the page has arrived and can carry on.
> > Not so clear for me why QEMU have to inform vhost client,
> > due to single userfault fd, and kernel should wake up another faulted
> > thread/processes.
> > In my approach I just to send information about copied/received page
> > to vhot client, to be able to enable previously disabled VRING.
> 
> The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl
> on the ufd that tells the kernel it can unblock a process that's
> trying to access the page.
> (Their is potential to remove some of that - if we can get the
> kernel to wake all the waiters for a physical page when a UFFDIO_COPY
> is done it would remove a lot of those).
> 
> > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > > the QEMU knows the client can talk postcopy.
> > > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > > added to guide the process along.
> > > 
> > > Current known issues:
> > >    I've not tested it with hugepages yet; and I suspect the madvises
> > >    will need tweaking for it.
> > I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
> > some part or pages already arrived and copied, so I'm doing
> > hole here according to received map.
> 
> right, so I'm assuming they'll hit ufd faults and be immediately
> WAKEd when I find the bit is set in the received-bitmap.
> 
> > >    The qemu gets to see the base addresses that the client has its
> > >    regions mapped at; that's not great for security
> > > 
> > >    Take care of deadlocking; any thread in the client that
> > >    accesses a userfault protected page can stall.
> > That's why I decided to disable VRINGs, but not the way as you did
> > in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
> > here could be synchronization problem, maybe similar problem as you described in
> > "vhost+postcopy: Lock around set_mem_table"
> > 
> > Unfortunately, my patches isn't yet ready.
> 
> That's OK; these patches just-about work; only enough for
> me to post them and ask for opinions.
> 
> Dave
> 
> > > 
> > >    There's a nasty hack of a lock around the set_mem_table message.
> > > 
> > >    I've not looked at the recent IOMMU code.
> > > 
> > >    Some cleanup and a lot of corner cases need thinking about.
> > > 
> > >    There are probably plenty of unknown issues as well.
> > > 
> > > Test setup:
> > >   I'm running on one host at the moment, with the guest
> > >   scping a large file from the host as it migrates.
> > >   The setup is based on one I found in the vhost-user setups.
> > >   You'll need a recent kernel for the shared memory support
> > >   in userfaultfd, and userfault isn't that happy if a process
> > >   using shared memory core's - so make sure you have the
> > >   latest fixes.
> > > 
> > > SESS=vhost
> > > ulimit -c unlimited
> > > tmux -L $SESS new-session -d
> > > tmux -L $SESS set-option -g history-limit 30000
> > > # Start a router using the system qemu
> > > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > > tmux -L $SESS set-option -g set-remain-on-exit on
> > > # Start source vhost bridge
> > > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > > sleep 0.5
> > > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > > # Start dest vhost bridge
> > > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > > 1:5556 2>dst-vub-log"
> > > sleep 0.5
> > > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > > 
> > > then once booted:
> > > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > > 
> > > 
> > > Dave
> > > 
> > > Dr. David Alan Gilbert (29):
> > >   RAMBlock/migration: Add migration flags
> > >   migrate: Update ram_block_discard_range for shared
> > >   qemu_ram_block_host_offset
> > >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> > >   postcopy: use UFFDIO_ZEROPAGE only when available
> > >   postcopy: Add notifier chain
> > >   postcopy: Add vhost-user flag for postcopy and check it
> > >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> > >   vhub: Support sending fds back to qemu
> > >   vhub: Open userfaultfd
> > >   postcopy: Allow registering of fd handler
> > >   vhost+postcopy: Register shared ufd with postcopy
> > >   vhost+postcopy: Transmit 'listen' to client
> > >   vhost+postcopy: Register new regions with the ufd
> > >   vhost+postcopy: Send address back to qemu
> > >   vhost+postcopy: Stash RAMBlock and offset
> > >   vhost+postcopy: Send requests to source for shared pages
> > >   vhost+postcopy: Resolve client address
> > >   postcopy: wake shared
> > >   postcopy: postcopy_notify_shared_wake
> > >   vhost+postcopy: Add vhost waker
> > >   vhost+postcopy: Call wakeups
> > >   vub+postcopy: madvises
> > >   vhost+postcopy: Lock around set_mem_table
> > >   vhu: enable = false on get_vring_base
> > >   vhost: Add VHOST_USER_POSTCOPY_END message
> > >   vhost+postcopy: Wire up POSTCOPY_END notify
> > >   postcopy: Allow shared memory
> > >   vhost-user: Claim support for postcopy
> > > 
> > >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> > >  contrib/libvhost-user/libvhost-user.h |   8 +
> > >  exec.c                                |  44 +++--
> > >  hw/virtio/trace-events                |  13 ++
> > >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> > >  include/exec/cpu-common.h             |   3 +
> > >  include/exec/ram_addr.h               |   2 +
> > >  migration/migration.c                 |   3 +
> > >  migration/migration.h                 |   8 +
> > >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> > >  migration/postcopy-ram.h              |  69 +++++++
> > >  migration/ram.c                       |   5 +
> > >  migration/ram.h                       |   1 +
> > >  migration/savevm.c                    |  13 ++
> > >  migration/trace-events                |   6 +
> > >  trace-events                          |   3 +
> > >  vl.c                                  |   4 +-
> > >  17 files changed, 926 insertions(+), 84 deletions(-)
> > > 
> > > -- 
> > > 2.13.0
> > > 
> > > 
> > 
> > -- 
> > 
> > BR
> > Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset
  2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
@ 2017-07-03 17:44   ` Michael S. Tsirkin
  2017-08-14 17:27     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Michael S. Tsirkin @ 2017-07-03 17:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:21PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Utility to give the offset of a host pointer within a RAMBlock
> (assuming we already know it's in that RAMBlock)
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  exec.c                    | 6 ++++++
>  include/exec/cpu-common.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index 4e61226a16..a1499b9bee 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2218,6 +2218,12 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
>      return ramblock_ptr(block, addr);
>  }
>  
> +/* Return the offset of a hostpointer within a ramblock */
> +ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host)
> +{
> +    return (uint8_t *)host - (uint8_t *)rb->host;
> +}
> +

I'd also assert that it's within that block.

>  /*
>   * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
>   * in that RAMBlock.
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 4af179b543..fa1ec22d66 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -66,6 +66,7 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr);
>  RAMBlock *qemu_ram_block_by_name(const char *name);
>  RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
>                                     ram_addr_t *offset);
> +ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host);
>  void qemu_ram_set_idstr(RAMBlock *block, const char *name, DeviceState *dev);
>  void qemu_ram_unset_idstr(RAMBlock *block);
>  const char *qemu_ram_get_idstr(RAMBlock *rb);
> -- 
> 2.13.0

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
       [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
@ 2017-07-03 17:55 ` Michael S. Tsirkin
  2017-07-07 12:01   ` Dr. David Alan Gilbert
  31 siblings, 1 reply; 87+ messages in thread
From: Michael S. Tsirkin @ 2017-07-03 17:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is a RFC/WIP series that enables postcopy migration
> with shared memory to a vhost-user process.
> It's based off current-head + Juan's load_cleanup series, and
> Alexey's bitmap series (v4).  It's very lightly tested and seems
> to work, but it's quite rough.
> 
> I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> use the new feature, since this is about the simplest
> client around.
> 
> Structure:
> 
> The basic idea is that near the start of postcopy, the client
> opens its own userfaultfd fd and sends that back to QEMU over
> the socket it's already using for VHUST_USER_* commands.
> Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> QEMU then reads the clients UFD in it's fault thread and issues
> requests back to the source as needed.
> QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> that the page has arrived and can carry on.
> 
> A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> the QEMU knows the client can talk postcopy.
> Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> added to guide the process along.
> 
> Current known issues:
>    I've not tested it with hugepages yet; and I suspect the madvises
>    will need tweaking for it.
> 
>    The qemu gets to see the base addresses that the client has its
>    regions mapped at; that's not great for security

Not urgent to fix.

>    Take care of deadlocking; any thread in the client that
>    accesses a userfault protected page can stall.

And it can happen under a lock quite easily.
What exactly is proposed here?
Maybe we want to reuse the new channel that the IOMMU uses.


>    There's a nasty hack of a lock around the set_mem_table message.

Yes.

>    I've not looked at the recent IOMMU code.
> 
>    Some cleanup and a lot of corner cases need thinking about.
> 
>    There are probably plenty of unknown issues as well.

At the protocol level, I'd like to rename the feature to
USER_PAGEFAULT. Client does not really know anything about
copies, it's all internal to qemu.
Spec can document that it's used by qemu for postcopy.


> Test setup:
>   I'm running on one host at the moment, with the guest
>   scping a large file from the host as it migrates.
>   The setup is based on one I found in the vhost-user setups.
>   You'll need a recent kernel for the shared memory support
>   in userfaultfd, and userfault isn't that happy if a process
>   using shared memory core's - so make sure you have the
>   latest fixes.
> 
> SESS=vhost
> ulimit -c unlimited
> tmux -L $SESS new-session -d
> tmux -L $SESS set-option -g history-limit 30000
> # Start a router using the system qemu
> tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> tmux -L $SESS set-option -g set-remain-on-exit on
> # Start source vhost bridge
> tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> # Start dest vhost bridge
> tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> 1:5556 2>dst-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> 
> then once booted:
> tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> 
> 
> Dave
> 
> Dr. David Alan Gilbert (29):
>   RAMBlock/migration: Add migration flags
>   migrate: Update ram_block_discard_range for shared
>   qemu_ram_block_host_offset
>   migration/ram: ramblock_recv_bitmap_test_byte_offset
>   postcopy: use UFFDIO_ZEROPAGE only when available
>   postcopy: Add notifier chain
>   postcopy: Add vhost-user flag for postcopy and check it
>   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
>   vhub: Support sending fds back to qemu
>   vhub: Open userfaultfd
>   postcopy: Allow registering of fd handler
>   vhost+postcopy: Register shared ufd with postcopy
>   vhost+postcopy: Transmit 'listen' to client
>   vhost+postcopy: Register new regions with the ufd
>   vhost+postcopy: Send address back to qemu
>   vhost+postcopy: Stash RAMBlock and offset
>   vhost+postcopy: Send requests to source for shared pages
>   vhost+postcopy: Resolve client address
>   postcopy: wake shared
>   postcopy: postcopy_notify_shared_wake
>   vhost+postcopy: Add vhost waker
>   vhost+postcopy: Call wakeups
>   vub+postcopy: madvises
>   vhost+postcopy: Lock around set_mem_table
>   vhu: enable = false on get_vring_base
>   vhost: Add VHOST_USER_POSTCOPY_END message
>   vhost+postcopy: Wire up POSTCOPY_END notify
>   postcopy: Allow shared memory
>   vhost-user: Claim support for postcopy
> 
>  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
>  contrib/libvhost-user/libvhost-user.h |   8 +
>  exec.c                                |  44 +++--
>  hw/virtio/trace-events                |  13 ++
>  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
>  include/exec/cpu-common.h             |   3 +
>  include/exec/ram_addr.h               |   2 +
>  migration/migration.c                 |   3 +
>  migration/migration.h                 |   8 +
>  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
>  migration/postcopy-ram.h              |  69 +++++++
>  migration/ram.c                       |   5 +
>  migration/ram.h                       |   1 +
>  migration/savevm.c                    |  13 ++
>  migration/trace-events                |   6 +
>  trace-events                          |   3 +
>  vl.c                                  |   4 +-
>  17 files changed, 926 insertions(+), 84 deletions(-)
> 
> -- 
> 2.13.0

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy
  2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
@ 2017-07-04 14:09   ` Maxime Coquelin
  2017-07-07 11:39     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-04 14:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Tell QEMU we understand the protocol features needed for postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   contrib/libvhost-user/libvhost-user.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index c1716d1a62..1c46aecfb3 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -797,7 +797,8 @@ vu_set_vring_err_exec(VuDev *dev, VhostUserMsg *vmsg)
>   static bool
>   vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>   {
> -    uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
> +    uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
> +                        1ULL << VHOST_USER_PROTOCOL_F_POSTCOPY;

Maybe advertising VHOST_USER_PROTOCOL_F_POSTCOPY could be done only
if userfaultfd syscall is supported?

>   
>       if (dev->iface->get_protocol_features) {
>           features |= dev->iface->get_protocol_features(dev);
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
@ 2017-07-04 19:34   ` Maxime Coquelin
  2017-07-07 11:53     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-04 19:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> 
> **HACK - better solution needed **
> We have the situation where:
> 
>       qemu                      bridge
> 
>       send set_mem_table
>                                map memory
>    a)                          mark area with UFD
>                                send reply with map addresses
>    b)                          start using
>    c) receive reply
> 
>    As soon as (a) happens qemu might start seeing faults
> from memory accesses (but doesn't until b); but it can't
> process those faults until (c) when it's received the
> mmap addresses.
> 
> Make the fault handler spin until it gets the reply in (c).
> 
> At the very least this needs some proper locks, but preferably
> we need to split the message.

Yes, maybe the slave channel could be used to send the ufds with
a dedicated request? The backend would set the reply-ack flag, so that
it starts accessing the guest memory only when Qemu is ready to handle
faults.

Note that the slave channel support has not been implemented in Qemu's
libvhost-user yet, but this is something I can do if we feel the need.

Maxime

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base
  2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
@ 2017-07-04 19:38   ` Maxime Coquelin
  2017-07-04 21:59   ` Michael S. Tsirkin
  1 sibling, 0 replies; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-04 19:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When we receive a GET_VRING_BASE message set enable = false
> to stop any new received packets modifying the ring.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Maxime
> ---
>   contrib/libvhost-user/libvhost-user.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index ceddeac74f..d37052b7b0 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
>       vmsg->size = sizeof(vmsg->payload.state);
>   
>       dev->vq[index].started = false;
> +    dev->vq[index].enable = false;
>       if (dev->iface->queue_set_started) {
>           dev->iface->queue_set_started(dev, index, false);
>       }
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base
  2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
  2017-07-04 19:38   ` Maxime Coquelin
@ 2017-07-04 21:59   ` Michael S. Tsirkin
  2017-07-05 17:16     ` Dr. David Alan Gilbert
  2017-08-18 19:19     ` Dr. David Alan Gilbert
  1 sibling, 2 replies; 87+ messages in thread
From: Michael S. Tsirkin @ 2017-07-04 21:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When we receive a GET_VRING_BASE message set enable = false
> to stop any new received packets modifying the ring.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

I think I already reviewed a similar patch.
Spec says:
Client must only process each ring when it is started.

IMHO the real fix is to fix client to check the started
flag before processing the ring.

> ---
>  contrib/libvhost-user/libvhost-user.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index ceddeac74f..d37052b7b0 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
>      vmsg->size = sizeof(vmsg->payload.state);
>  
>      dev->vq[index].started = false;
> +    dev->vq[index].enable = false;
>      if (dev->iface->queue_set_started) {
>          dev->iface->queue_set_started(dev, index, false);
>      }
> -- 
> 2.13.0

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base
  2017-07-04 21:59   ` Michael S. Tsirkin
@ 2017-07-05 17:16     ` Dr. David Alan Gilbert
  2017-07-05 23:28       ` Michael S. Tsirkin
  2017-08-18 19:19     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-05 17:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When we receive a GET_VRING_BASE message set enable = false
> > to stop any new received packets modifying the ring.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> I think I already reviewed a similar patch.

Yes you replied to my off-list mail; I hadn't got
around to fixing it yet.

> Spec says:
> Client must only process each ring when it is started.

but in that reply you said the spec said 

  Client must only pass data between the ring and the
  backend, when the ring is enabled.

So does the spec say 'started' or 'enabled'
(Pointer to the spec?)

> IMHO the real fix is to fix client to check the started
> flag before processing the ring.

Yep I can do that.  I was curious however whether it was
specified as 'started' or 'enabled' or both.

Dave

> > ---
> >  contrib/libvhost-user/libvhost-user.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index ceddeac74f..d37052b7b0 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
> >      vmsg->size = sizeof(vmsg->payload.state);
> >  
> >      dev->vq[index].started = false;
> > +    dev->vq[index].enable = false;
> >      if (dev->iface->queue_set_started) {
> >          dev->iface->queue_set_started(dev, index, false);
> >      }
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base
  2017-07-05 17:16     ` Dr. David Alan Gilbert
@ 2017-07-05 23:28       ` Michael S. Tsirkin
  0 siblings, 0 replies; 87+ messages in thread
From: Michael S. Tsirkin @ 2017-07-05 23:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

On Wed, Jul 05, 2017 at 06:16:17PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > When we receive a GET_VRING_BASE message set enable = false
> > > to stop any new received packets modifying the ring.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > I think I already reviewed a similar patch.
> 
> Yes you replied to my off-list mail; I hadn't got
> around to fixing it yet.
> 
> > Spec says:
> > Client must only process each ring when it is started.
> 
> but in that reply you said the spec said 
> 
>   Client must only pass data between the ring and the
>   backend, when the ring is enabled.
> 
> So does the spec say 'started' or 'enabled'

Both. Ring processing is limited by ring being started.
Passing actual data - by ring also being enabled.
With ring started but disabled you drop all packets.

> (Pointer to the spec?)

It's part of QEMU source:
docs/interop/vhost-user.txt


> > IMHO the real fix is to fix client to check the started
> > flag before processing the ring.
> 
> Yep I can do that.  I was curious however whether it was
> specified as 'started' or 'enabled' or both.
> 
> Dave
> 
> > > ---
> > >  contrib/libvhost-user/libvhost-user.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > > index ceddeac74f..d37052b7b0 100644
> > > --- a/contrib/libvhost-user/libvhost-user.c
> > > +++ b/contrib/libvhost-user/libvhost-user.c
> > > @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
> > >      vmsg->size = sizeof(vmsg->payload.state);
> > >  
> > >      dev->vq[index].started = false;
> > > +    dev->vq[index].enable = false;
> > >      if (dev->iface->queue_set_started) {
> > >          dev->iface->queue_set_started(dev, index, false);
> > >      }
> > > -- 
> > > 2.13.0
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-03 11:03   ` Marc-André Lureau
  2017-07-03 11:48     ` Dr. David Alan Gilbert
@ 2017-07-07 10:51     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-07 10:51 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, a.perevalov, maxime.coquelin, mst, quintela, peterx,
	lvivier, aarcange

* Marc-André Lureau (marcandre.lureau@gmail.com) wrote:
> Hi
> 
> On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com>
> wrote:
> 
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > Hi,
> > >   This is a RFC/WIP series that enables postcopy migration
> > > with shared memory to a vhost-user process.
> > > It's based off current-head + Juan's load_cleanup series, and
> > > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > > to work, but it's quite rough.
> >
> > Marc-André asked if I had a git with it all applied; so here we are:
> > https://github.com/dagrh/qemu/commits/vhost
> > git@github.com:dagrh/qemu.git on the vhost branch
> >
> >
> I started looking at the series, but I am not familiar with ufd/postcopy.
> Could you update vhost-user.txt to describe the new messages? Otherwise,
> make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated
> regression?) Thanks

OK, figured that one out;  it was a cleanup path for the postcopy
notifier trying to remove the notifier when because we were testing a
failure path the notifier hadn't been added in the first place.

(That was really nasty to find, for some reason those tests refuse to
generate core dumps; I ended up with a while loop doing gdb --pid
$(pgrep....) to nail the qemu that was about to seg)

Dave

> -- 
> Marc-André Lureau
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy
  2017-07-04 14:09   ` Maxime Coquelin
@ 2017-07-07 11:39     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-07 11:39 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Tell QEMU we understand the protocol features needed for postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >   contrib/libvhost-user/libvhost-user.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index c1716d1a62..1c46aecfb3 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -797,7 +797,8 @@ vu_set_vring_err_exec(VuDev *dev, VhostUserMsg *vmsg)
> >   static bool
> >   vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
> >   {
> > -    uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
> > +    uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
> > +                        1ULL << VHOST_USER_PROTOCOL_F_POSTCOPY;
> 
> Maybe advertising VHOST_USER_PROTOCOL_F_POSTCOPY could be done only
> if userfaultfd syscall is supported?

Done.

Dave

> >       if (dev->iface->get_protocol_features) {
> >           features |= dev->iface->get_protocol_features(dev);
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-07-04 19:34   ` Maxime Coquelin
@ 2017-07-07 11:53     ` Dr. David Alan Gilbert
  2017-07-07 12:52       ` Maxime Coquelin
  2017-10-03 13:23       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-07 11:53 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> > 
> > **HACK - better solution needed **
> > We have the situation where:
> > 
> >       qemu                      bridge
> > 
> >       send set_mem_table
> >                                map memory
> >    a)                          mark area with UFD
> >                                send reply with map addresses
> >    b)                          start using
> >    c) receive reply
> > 
> >    As soon as (a) happens qemu might start seeing faults
> > from memory accesses (but doesn't until b); but it can't
> > process those faults until (c) when it's received the
> > mmap addresses.
> > 
> > Make the fault handler spin until it gets the reply in (c).
> > 
> > At the very least this needs some proper locks, but preferably
> > we need to split the message.
> 
> Yes, maybe the slave channel could be used to send the ufds with
> a dedicated request? The backend would set the reply-ack flag, so that
> it starts accessing the guest memory only when Qemu is ready to handle
> faults.

Yes, that would make life a lot easier.

> Note that the slave channel support has not been implemented in Qemu's
> libvhost-user yet, but this is something I can do if we feel the need.

Can you tell me a bit about how the slave channel works?

Dave

> Maxime
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-03 17:55 ` Michael S. Tsirkin
@ 2017-07-07 12:01   ` Dr. David Alan Gilbert
  2017-07-07 15:35     ` Michael S. Tsirkin
  0 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-07 12:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> > 
> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> > 
> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> 
> Not urgent to fix.
> 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> 
> And it can happen under a lock quite easily.
> What exactly is proposed here?
> Maybe we want to reuse the new channel that the IOMMU uses.

There's no fundamental reason to get deadlocks as long as you
get it right; the qemu thread that processes the user-fault's
is a separate independent thread, so once it's going the client
can do whatever it likes and it will get woken up without
intervention.
Some care is needed around the postcopy-end; reception of the
message that tells you to drop the userfault enables (which
frees anything that hasn't been woken) must be allowed to happen
for the postcopy complete;  we take care that QEMUs fault
thread lives on until that message is acknowledged.

I'm more worried about how this will work in a full packet switch
when one vhost-user client for an incoming migration stalls
the whole switch unless care is taken about the design.
How do we figure out whether this is going to fly on a full stack?
That's my main reason for getting this WIP set out here to
get comments.

> >    There's a nasty hack of a lock around the set_mem_table message.
> 
> Yes.
> 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> 
> At the protocol level, I'd like to rename the feature to
> USER_PAGEFAULT. Client does not really know anything about
> copies, it's all internal to qemu.
> Spec can document that it's used by qemu for postcopy.

OK, tbh I suspect that using it for anything else would be tricky
without adding more protocol features for that other use case.

Dave

> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-07-07 11:53     ` Dr. David Alan Gilbert
@ 2017-07-07 12:52       ` Maxime Coquelin
  2017-10-03 13:23       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-07 12:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 07/07/2017 01:53 PM, Dr. David Alan Gilbert wrote:
> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>
>>
>> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>>>
>>> **HACK - better solution needed **
>>> We have the situation where:
>>>
>>>        qemu                      bridge
>>>
>>>        send set_mem_table
>>>                                 map memory
>>>     a)                          mark area with UFD
>>>                                 send reply with map addresses
>>>     b)                          start using
>>>     c) receive reply
>>>
>>>     As soon as (a) happens qemu might start seeing faults
>>> from memory accesses (but doesn't until b); but it can't
>>> process those faults until (c) when it's received the
>>> mmap addresses.
>>>
>>> Make the fault handler spin until it gets the reply in (c).
>>>
>>> At the very least this needs some proper locks, but preferably
>>> we need to split the message.
>>
>> Yes, maybe the slave channel could be used to send the ufds with
>> a dedicated request? The backend would set the reply-ack flag, so that
>> it starts accessing the guest memory only when Qemu is ready to handle
>> faults.
> 
> Yes, that would make life a lot easier.
> 
>> Note that the slave channel support has not been implemented in Qemu's
>> libvhost-user yet, but this is something I can do if we feel the need.
> 
> Can you tell me a bit about how the slave channel works?

When the backend advertises VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol
feature, Qemu creates a new channel using socketpair() and pass one of
the file descriptor to the backend using a dedicated request.

Then, the backend can send requests to Qemu, using the same protocol
Qemu uses to send requests to the backend. So, as "master" channel, the
backend can set the VHOST_USER_F_NEED_REPLY flag to the request, so that
it can wait for Qemu to ack (or nack) that the request has been handled.

It is currently only used by the backend to send IOTLB miss requests.

Note that you might be careful regarding deadlocks, as the libvhost-user
is single-threaded.

More info may be found in docs/interop/vhost-user.txt
(docs/specs/vhost-user.txt in older versions)
Maxime
> Dave
> 
>> Maxime
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-07 12:01   ` Dr. David Alan Gilbert
@ 2017-07-07 15:35     ` Michael S. Tsirkin
  2017-07-07 17:26       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Michael S. Tsirkin @ 2017-07-07 15:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > >    Take care of deadlocking; any thread in the client that
> > >    accesses a userfault protected page can stall.
> > 
> > And it can happen under a lock quite easily.
> > What exactly is proposed here?
> > Maybe we want to reuse the new channel that the IOMMU uses.
> 
> There's no fundamental reason to get deadlocks as long as you
> get it right; the qemu thread that processes the user-fault's
> is a separate independent thread, so once it's going the client
> can do whatever it likes and it will get woken up without
> intervention.

You take a lock for the channel, then access guest memory.
Then the thread that gets messages from qemu can't get
on the channel to mark range as populated.

> Some care is needed around the postcopy-end; reception of the
> message that tells you to drop the userfault enables (which
> frees anything that hasn't been woken) must be allowed to happen
> for the postcopy complete;  we take care that QEMUs fault
> thread lives on until that message is acknowledged.
>
> I'm more worried about how this will work in a full packet switch
> when one vhost-user client for an incoming migration stalls
> the whole switch unless care is taken about the design.
> How do we figure out whether this is going to fly on a full stack?

It's performance though. Client could run in a separate
thread for a while until migration finishes.
We need to make sure there's explicit documentation
that tells clients at what point they might block.

> That's my main reason for getting this WIP set out here to
> get comments.

What will happen if QEMU dies? Is there a way to unblock the client?

> > >    There's a nasty hack of a lock around the set_mem_table message.
> > 
> > Yes.
> > 
> > >    I've not looked at the recent IOMMU code.
> > > 
> > >    Some cleanup and a lot of corner cases need thinking about.
> > > 
> > >    There are probably plenty of unknown issues as well.
> > 
> > At the protocol level, I'd like to rename the feature to
> > USER_PAGEFAULT. Client does not really know anything about
> > copies, it's all internal to qemu.
> > Spec can document that it's used by qemu for postcopy.
> 
> OK, tbh I suspect that using it for anything else would be tricky
> without adding more protocol features for that other use case.
> 
> Dave

Why exactly? How does client have to know it's migration?

-- 
MST

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
  2017-07-07 15:35     ` Michael S. Tsirkin
@ 2017-07-07 17:26       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-07 17:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > > >    Take care of deadlocking; any thread in the client that
> > > >    accesses a userfault protected page can stall.
> > > 
> > > And it can happen under a lock quite easily.
> > > What exactly is proposed here?
> > > Maybe we want to reuse the new channel that the IOMMU uses.
> > 
> > There's no fundamental reason to get deadlocks as long as you
> > get it right; the qemu thread that processes the user-fault's
> > is a separate independent thread, so once it's going the client
> > can do whatever it likes and it will get woken up without
> > intervention.
> 
> You take a lock for the channel, then access guest memory.
> Then the thread that gets messages from qemu can't get
> on the channel to mark range as populated.

It doesn't need to get the message from qemu to know it's populated
though; qemu performs a WAKE ioctl on the userfaultfd to cause
it to wake, so there's no action needed by the client.
(If it does need to take a lock then ye we have a problem).

> > Some care is needed around the postcopy-end; reception of the
> > message that tells you to drop the userfault enables (which
> > frees anything that hasn't been woken) must be allowed to happen
> > for the postcopy complete;  we take care that QEMUs fault
> > thread lives on until that message is acknowledged.
> >
> > I'm more worried about how this will work in a full packet switch
> > when one vhost-user client for an incoming migration stalls
> > the whole switch unless care is taken about the design.
> > How do we figure out whether this is going to fly on a full stack?
> 
> It's performance though. Client could run in a separate
> thread for a while until migration finishes.
> We need to make sure there's explicit documentation
> that tells clients at what point they might block.

Right.

> > That's my main reason for getting this WIP set out here to
> > get comments.
> 
> What will happen if QEMU dies? Is there a way to unblock the client?

If the client can detect this and close it's userfaultfd then yes;
of course that detection has to be done in a thread that can't be being
blocked by anything related to the userfaultfd that it might be blocked
on.

> > > >    There's a nasty hack of a lock around the set_mem_table message.
> > > 
> > > Yes.
> > > 
> > > >    I've not looked at the recent IOMMU code.
> > > > 
> > > >    Some cleanup and a lot of corner cases need thinking about.
> > > > 
> > > >    There are probably plenty of unknown issues as well.
> > > 
> > > At the protocol level, I'd like to rename the feature to
> > > USER_PAGEFAULT. Client does not really know anything about
> > > copies, it's all internal to qemu.
> > > Spec can document that it's used by qemu for postcopy.
> > 
> > OK, tbh I suspect that using it for anything else would be tricky
> > without adding more protocol features for that other use case.
> > 
> > Dave
> 
> Why exactly? How does client have to know it's migration?

It's more the sequence I worry about; we're reliant on
making sure that the userfaultfd is registered with the RAM before
it's ever accessed, and we unregister at the end.
This all keys in with migration requesting registration at the right
point before loading the devices.

Dave

> -- 
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags
  2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
@ 2017-07-10  9:28   ` Peter Xu
  2017-07-12 16:48     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-10  9:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:19PM +0100, Dr. David Alan Gilbert (git) wrote:

[...]

> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index af5bf26080..0cb6c5cb73 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -32,6 +32,8 @@ struct RAMBlock {
>      ram_addr_t max_length;
>      void (*resized)(const char*, uint64_t length, void *host);
>      uint32_t flags;
> +    /* These flags are owned by migration, initialised to 0 */
> +    uint32_t migration_flags;

Since we have RAMBlock.flags, would it be possible to use that
directly? Currently it only used 3 bits.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared
  2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
@ 2017-07-10 10:03   ` Peter Xu
  2017-08-24 16:59     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-10 10:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:20PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The choice of call to discard a block is getting more complicated
> for other cases.   We use fallocate PUNCH_HOLE in any file cases;
> it works for both hugepage and for tmpfs.
> We use the DONTNEED for non-hugepage cases either where they're
> anonymous or where they're private.
> 
> Care should be taken when trying other backing files.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  exec.c       | 28 ++++++++++++++++------------
>  trace-events |  3 +++
>  2 files changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 69fc5c9b07..4e61226a16 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3557,6 +3557,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>      }
>  
>      if ((start + length) <= rb->used_length) {
> +        bool need_madvise, need_fallocate;
>          uint8_t *host_endaddr = host_startaddr + length;
>          if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
>              error_report("ram_block_discard_range: Unaligned end address: %p",
> @@ -3566,23 +3567,26 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>  
>          errno = ENOTSUP; /* If we are missing MADVISE etc */
>  
> -        if (rb->page_size == qemu_host_page_size) {
> -#if defined(CONFIG_MADVISE)
> -            /* Note: We need the madvise MADV_DONTNEED behaviour of definitely
> -             * freeing the page.
> -             */
> -            ret = madvise(host_startaddr, length, MADV_DONTNEED);
> -#endif
> -        } else {
> -            /* Huge page case  - unfortunately it can't do DONTNEED, but
> -             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
> -             * huge page file.
> -             */
> +        /* The logic here is messy;
> +         *    madvise DONTNEED fails for hugepages
> +         *    fallocate works on hugepages and shmem
> +         */
> +        need_madvise = (rb->page_size == qemu_host_page_size) &&
> +                       (rb->fd == -1 || !(rb->flags & RAM_SHARED));
> +        need_fallocate = rb->fd != -1;
> +        if (ret == -1 && need_fallocate) {

(ret will always be -1 when reach here?)

>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>              ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>                              start, length);
>  #endif
>          }
> +        if (need_madvise && (!need_fallocate || (ret == 0))) {
> +#if defined(CONFIG_MADVISE)
> +            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
> +#endif
> +        }
> +        trace_ram_block_discard_range(rb->idstr, host_startaddr,
> +                                      need_madvise, need_fallocate, ret);

How about make the check easier by:

  if (rb->page_size != qemu_host_page_size ||
      rb->flags & RAM_SHARED) {
      /* Either huge pages or shared memory will contain rb->fd */
      assert(rb->fd);
      fallocate(rb->fd, ...);
  } else {
      madvise();
  }

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available
  2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
@ 2017-07-10 10:19   ` Peter Xu
  2017-07-12 16:54     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-10 10:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:23PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Use the recently added migration flag to hold whether
> each RAMBlock has the UFFDIO_ZEROPAGE capability, use it
> when it's available.
> 
> This allows the use of postcopy on tmpfs as well as hugepage
> backed files.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/migration.h    |  4 ++++
>  migration/postcopy-ram.c | 12 +++++++++---
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index d9a268a3af..d109635d08 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -20,6 +20,10 @@
>  #include "exec/cpu-common.h"
>  #include "qemu/coroutine_int.h"
>  
> +/* Migration flags to be set using qemu_ram_set_migration_flags */
> +/* Postcopy can atomically zero pages in this RAMBlock */
> +#define QEMU_MIGFLAG_POSTCOPY_ZERO   0x00000001
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index be2a8f8e02..96338a8070 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -408,6 +408,12 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>          error_report("%s userfault: Region doesn't support COPY", __func__);
>          return -1;
>      }
> +    if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
> +        RAMBlock *rb = qemu_ram_block_by_name(block_name);
> +        qemu_ram_set_migration_flags(rb, qemu_ram_get_migration_flags(rb) |
> +                                         QEMU_MIGFLAG_POSTCOPY_ZERO);

Shall we use atomic_or() inside qemu_ram_set_migration_flags()? Then
no need to fetch, and we'll be thread safe as well?

> +    }
> +
>  
>      return 0;
>  }
> @@ -620,11 +626,11 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
>                               RAMBlock *rb)
>  {
> +    size_t pagesize = qemu_ram_pagesize(rb);
>      trace_postcopy_place_page_zero(host);
>  
> -    if (qemu_ram_pagesize(rb) == getpagesize()) {
> -        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(),
> -                                rb)) {
> +    if (qemu_ram_get_migration_flags(rb) & QEMU_MIGFLAG_POSTCOPY_ZERO) {

IIUC, _UFFDIO_ZEROPAGE is not supported on huge pages. If so, would
here worth a comment?

> +        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, rb)) {
>              int e = errno;
>              error_report("%s: %s zero host: %p",
>                           __func__, strerror(e), host);
> -- 
> 2.13.0
> 

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain
  2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
@ 2017-07-10 10:31   ` Peter Xu
  2017-07-12 17:14     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-10 10:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:24PM +0100, Dr. David Alan Gilbert (git) wrote:

[...]

> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 78a3591322..d688411674 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -114,4 +114,30 @@ PostcopyState postcopy_state_get(void);
>  /* Set the state and return the old state */
>  PostcopyState postcopy_state_set(PostcopyState new_state);
>  
> +/*
> + * To be called once at the start before any device initialisation

initialization?

> + */
> +void postcopy_infrastructure_init(void);
> +
> +/* Add a notifier to a list to be called when checking whether the devices
> + * can support postcopy.
> + * It's data is a *PostcopyNotifyData
> + * It should return 0 if OK, or a negative value on failure.
> + * On failure it must set the data->errp to an error.
> + *
> + */
> +enum PostcopyNotifyReason {
> +    POSTCOPY_NOTIFY_PROBE = 0,
> +};
> +
> +struct PostcopyNotifyData {
> +    enum PostcopyNotifyReason reason;
> +    Error **errp;
> +};
> +
> +void postcopy_add_notifier(NotifierWithReturn *nn);
> +void postcopy_remove_notifier(NotifierWithReturn *n);
> +/* Call the notifier list set by postcopy_add_start_notifier */
> +int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
> +
>  #endif
> diff --git a/vl.c b/vl.c
> index a2bd69f4e0..b6c660a703 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -93,8 +93,9 @@ int main(int argc, char **argv)
>  #include "sysemu/dma.h"
>  #include "hw/audio/soundhw.h"
>  #include "audio/audio.h"
> -#include "sysemu/cpus.h"
>  #include "migration/colo.h"
> +#include "migration/postcopy-ram.h"
> +#include "sysemu/cpus.h"

(just curious: is moving sysemu/cpus.h intended?)

>  #include "sysemu/kvm.h"
>  #include "sysemu/hax.h"
>  #include "qapi/qobject-input-visitor.h"
> @@ -3060,6 +3061,7 @@ int main(int argc, char **argv, char **envp)
>      module_call_init(MODULE_INIT_OPTS);
>  
>      runstate_init();
> +    postcopy_infrastructure_init();
>  
>      if (qcrypto_init(&err) < 0) {
>          error_reportf_err(err, "cannot initialize crypto: ");
> -- 
> 2.13.0
> 

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset
  2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
@ 2017-07-11  3:31   ` Peter Xu
  2017-07-14 17:15     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-11  3:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Stash the RAMBlock and offset for later use looking up
> addresses.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/trace-events |  1 +
>  hw/virtio/vhost-user.c | 11 +++++++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index f7be340a45..1fd194363a 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -3,6 +3,7 @@
>  # hw/virtio/vhost-user.c
>  vhost_user_postcopy_listen(void) ""
>  vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
> +vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
>  
>  # hw/virtio/virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 6be3e7ff2d..3185af7a45 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -133,6 +133,11 @@ struct vhost_user {
>      NotifierWithReturn postcopy_notifier;
>      struct PostCopyFD  postcopy_fd;
>      uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> +    RAMBlock          *region_rb[VHOST_MEMORY_MAX_NREGIONS];
> +    /* The offset from the start of the RAMBlock to the start of the
> +     * vhost region.
> +     */
> +    ram_addr_t         region_rb_offset[VHOST_MEMORY_MAX_NREGIONS];

Here the array size is VHOST_MEMORY_MAX_NREGIONS, while...

>  };
>  
>  static bool ioeventfd_enabled(void)
> @@ -324,8 +329,14 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>          assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
>          mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
>                                       &offset);
> +        u->region_rb_offset[i] = offset;
> +        u->region_rb[i] = mr->ram_block;

... can i>=VHOST_MEMORY_MAX_NREGIONS here? Or do we only need to note
this down if fd > 0 below?  Thanks,

>          fd = memory_region_get_fd(mr);
>          if (fd > 0) {
> +            trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
> +                                                  reg->memory_size,
> +                                                  reg->guest_phys_addr,
> +                                                  reg->userspace_addr, offset);
>              msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
>              msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
>              msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
> -- 
> 2.13.0
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups
  2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
@ 2017-07-11  4:22   ` Peter Xu
  2017-07-12 15:00     ` Andrea Arcangeli
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-11  4:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Wed, Jun 28, 2017 at 08:00:40PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Cause the vhost-user client to be woken up whenever:
>   a) We place a page in postcopy mode

Just to make sure I understand it correctly - UFFDIO_COPY will only
wake up the waiters on the same userfaultfd context, so we don't need
to wake up QEMU userfaultfd (vcpu threads), but we need to explicitly
wake up other ufds/threads, like vhost-user backends. Am I right?

Thanks,

>   b) We get a fault and the page has already been received

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups
  2017-07-11  4:22   ` Peter Xu
@ 2017-07-12 15:00     ` Andrea Arcangeli
  2017-07-14  2:45       ` Peter Xu
  2017-07-14 14:18       ` Michael S. Tsirkin
  0 siblings, 2 replies; 87+ messages in thread
From: Andrea Arcangeli @ 2017-07-12 15:00 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier

On Tue, Jul 11, 2017 at 12:22:32PM +0800, Peter Xu wrote:
> On Wed, Jun 28, 2017 at 08:00:40PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Cause the vhost-user client to be woken up whenever:
> >   a) We place a page in postcopy mode
> 
> Just to make sure I understand it correctly - UFFDIO_COPY will only
> wake up the waiters on the same userfaultfd context, so we don't need
> to wake up QEMU userfaultfd (vcpu threads), but we need to explicitly
> wake up other ufds/threads, like vhost-user backends. Am I right?

Yes.

Every "uffd" represents one and only one "mm" (i.e. a process). So
there is no way a single UFFDIO_COPY can wake the faults happening on
a process different from the "mm" the uffd is associated with.

vhost-bridge being a different process requires a UFFDIO_WAKE on its
own uffd it passed to qemu in addition of the UFFDIO_COPY that like
you said implicitly wakes the userfaults happening on the qemu process
(vcpus iothread, dataplane etc..).

On a side note there's a way not to wake userfaults implicitly in
UFFDIO_COPY in case you want to wake userfaults in batches but nobody
uses that for now (uffdio_copy.mode |= UFFDIO_COPY_MODE_DONTWAKE).

It'd be theoretically nice to optimize away the additional enter/exit
kernel introduced by the UFFDIO_WAKE and the translation table as
well.

What we could do is to add a UFFDIO_BIND that takes an "fd" as
parameter to the ioctl to bind the two uffd together. Then we could
push logical offsets in addition to the virtual address ranges when
calling UFFDIO_REGISTER_LOGICAL (the logical offsets would then match
the guest physical addresses) so that the UFFDIO_COPY_LOGICAL would
then be able to get a logical range to wakeup that the kernel would
translate into virtual addresses for all uffds bind together. Pushing
offsets into UFFDIO_REGISTER was David's idea.

That would eliminate the enter/exit kernel for the explicit
UFFDIO_WAKE and calling a single UFFDIO_COPY would be enough.

Alternatively we should make the uffd work based on file offsets
instead of virtual addresses but that would involve changes to
filesystems and it only would move the needle on top of tmpfs
(shared=on/off no difference) and hugetlbfs. It would be enough for
vhost-bridge.

Usually the uffd fault lives at the higher level of the virtual memory
subsystem and never deals with file offsets so if we can get away with
logical ranges per-uffd for UFFDIO_REGISTER and UFFDIO_COPY, it may be
simpler and easier to extend automatically to all memory types
supported by uffd (including anon which has no file offset).

No major improvement is to be expected by such an enhancement though
so it's not very high priority to implement. It's not even clear if
the complexity is worth it. Doing one more syscall per page I think
might be measurable only on very fast network. The current way of
operation where uffd are independent of each other and the translation
table is transferred by userland means is quite optimal already and
much simpler. Furthermore for hugetlbfs the performance difference
most certainly wouldn't be measurable, as the enter/exit kernel would
be diluted by a factor of 512 compared to 4k userfaults.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags
  2017-07-10  9:28   ` Peter Xu
@ 2017-07-12 16:48     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-12 16:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:19PM +0100, Dr. David Alan Gilbert (git) wrote:
> 
> [...]
> 
> > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> > index af5bf26080..0cb6c5cb73 100644
> > --- a/include/exec/ram_addr.h
> > +++ b/include/exec/ram_addr.h
> > @@ -32,6 +32,8 @@ struct RAMBlock {
> >      ram_addr_t max_length;
> >      void (*resized)(const char*, uint64_t length, void *host);
> >      uint32_t flags;
> > +    /* These flags are owned by migration, initialised to 0 */
> > +    uint32_t migration_flags;
> 
> Since we have RAMBlock.flags, would it be possible to use that
> directly? Currently it only used 3 bits.  Thanks,

OK, gone - we now have:
bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
void qemu_ram_set_uf_zeroable(RAMBlock *rb);

which work on the new RAM_UF_ZEROPAGE flag.

Dave


> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available
  2017-07-10 10:19   ` Peter Xu
@ 2017-07-12 16:54     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-12 16:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:23PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Use the recently added migration flag to hold whether
> > each RAMBlock has the UFFDIO_ZEROPAGE capability, use it
> > when it's available.
> > 
> > This allows the use of postcopy on tmpfs as well as hugepage
> > backed files.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/migration.h    |  4 ++++
> >  migration/postcopy-ram.c | 12 +++++++++---
> >  2 files changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration/migration.h b/migration/migration.h
> > index d9a268a3af..d109635d08 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -20,6 +20,10 @@
> >  #include "exec/cpu-common.h"
> >  #include "qemu/coroutine_int.h"
> >  
> > +/* Migration flags to be set using qemu_ram_set_migration_flags */
> > +/* Postcopy can atomically zero pages in this RAMBlock */
> > +#define QEMU_MIGFLAG_POSTCOPY_ZERO   0x00000001
> > +
> >  /* State for the incoming migration */
> >  struct MigrationIncomingState {
> >      QEMUFile *from_src_file;
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index be2a8f8e02..96338a8070 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -408,6 +408,12 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> >          error_report("%s userfault: Region doesn't support COPY", __func__);
> >          return -1;
> >      }
> > +    if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
> > +        RAMBlock *rb = qemu_ram_block_by_name(block_name);
> > +        qemu_ram_set_migration_flags(rb, qemu_ram_get_migration_flags(rb) |
> > +                                         QEMU_MIGFLAG_POSTCOPY_ZERO);
> 
> Shall we use atomic_or() inside qemu_ram_set_migration_flags()? Then
> no need to fetch, and we'll be thread safe as well?

I've changed it to a simple |=  in the new qemu_ram_set_uf_zeroable
that works on the rb->flags field.

> > +    }
> > +
> >  
> >      return 0;
> >  }
> > @@ -620,11 +626,11 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> >  int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
> >                               RAMBlock *rb)
> >  {
> > +    size_t pagesize = qemu_ram_pagesize(rb);
> >      trace_postcopy_place_page_zero(host);
> >  
> > -    if (qemu_ram_pagesize(rb) == getpagesize()) {
> > -        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(),
> > -                                rb)) {
> > +    if (qemu_ram_get_migration_flags(rb) & QEMU_MIGFLAG_POSTCOPY_ZERO) {
> 
> IIUC, _UFFDIO_ZEROPAGE is not supported on huge pages. If so, would
> here worth a comment?

Added.

Dave

> > +        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, rb)) {
> >              int e = errno;
> >              error_report("%s: %s zero host: %p",
> >                           __func__, strerror(e), host);
> > -- 
> > 2.13.0
> > 
> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain
  2017-07-10 10:31   ` Peter Xu
@ 2017-07-12 17:14     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-12 17:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:24PM +0100, Dr. David Alan Gilbert (git) wrote:
> 
> [...]
> 
> > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> > index 78a3591322..d688411674 100644
> > --- a/migration/postcopy-ram.h
> > +++ b/migration/postcopy-ram.h
> > @@ -114,4 +114,30 @@ PostcopyState postcopy_state_get(void);
> >  /* Set the state and return the old state */
> >  PostcopyState postcopy_state_set(PostcopyState new_state);
> >  
> > +/*
> > + * To be called once at the start before any device initialisation
> 
> initialization?

en_US vs en_GB :-)

> > + */
> > +void postcopy_infrastructure_init(void);
> > +
> > +/* Add a notifier to a list to be called when checking whether the devices
> > + * can support postcopy.
> > + * It's data is a *PostcopyNotifyData
> > + * It should return 0 if OK, or a negative value on failure.
> > + * On failure it must set the data->errp to an error.
> > + *
> > + */
> > +enum PostcopyNotifyReason {
> > +    POSTCOPY_NOTIFY_PROBE = 0,
> > +};
> > +
> > +struct PostcopyNotifyData {
> > +    enum PostcopyNotifyReason reason;
> > +    Error **errp;
> > +};
> > +
> > +void postcopy_add_notifier(NotifierWithReturn *nn);
> > +void postcopy_remove_notifier(NotifierWithReturn *n);
> > +/* Call the notifier list set by postcopy_add_start_notifier */
> > +int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
> > +
> >  #endif
> > diff --git a/vl.c b/vl.c
> > index a2bd69f4e0..b6c660a703 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -93,8 +93,9 @@ int main(int argc, char **argv)
> >  #include "sysemu/dma.h"
> >  #include "hw/audio/soundhw.h"
> >  #include "audio/audio.h"
> > -#include "sysemu/cpus.h"
> >  #include "migration/colo.h"
> > +#include "migration/postcopy-ram.h"
> > +#include "sysemu/cpus.h"
> 
> (just curious: is moving sysemu/cpus.h intended?)

No! Removed.

Dave
> 
> >  #include "sysemu/kvm.h"
> >  #include "sysemu/hax.h"
> >  #include "qapi/qobject-input-visitor.h"
> > @@ -3060,6 +3061,7 @@ int main(int argc, char **argv, char **envp)
> >      module_call_init(MODULE_INIT_OPTS);
> >  
> >      runstate_init();
> > +    postcopy_infrastructure_init();
> >  
> >      if (qcrypto_init(&err) < 0) {
> >          error_reportf_err(err, "cannot initialize crypto: ");
> > -- 
> > 2.13.0
> > 
> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups
  2017-07-12 15:00     ` Andrea Arcangeli
@ 2017-07-14  2:45       ` Peter Xu
  2017-07-14 14:18       ` Michael S. Tsirkin
  1 sibling, 0 replies; 87+ messages in thread
From: Peter Xu @ 2017-07-14  2:45 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier

On Wed, Jul 12, 2017 at 05:00:04PM +0200, Andrea Arcangeli wrote:
> On Tue, Jul 11, 2017 at 12:22:32PM +0800, Peter Xu wrote:
> > On Wed, Jun 28, 2017 at 08:00:40PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Cause the vhost-user client to be woken up whenever:
> > >   a) We place a page in postcopy mode
> > 
> > Just to make sure I understand it correctly - UFFDIO_COPY will only
> > wake up the waiters on the same userfaultfd context, so we don't need
> > to wake up QEMU userfaultfd (vcpu threads), but we need to explicitly
> > wake up other ufds/threads, like vhost-user backends. Am I right?
> 
> Yes.
> 
> Every "uffd" represents one and only one "mm" (i.e. a process). So
> there is no way a single UFFDIO_COPY can wake the faults happening on
> a process different from the "mm" the uffd is associated with.
> 
> vhost-bridge being a different process requires a UFFDIO_WAKE on its
> own uffd it passed to qemu in addition of the UFFDIO_COPY that like
> you said implicitly wakes the userfaults happening on the qemu process
> (vcpus iothread, dataplane etc..).
> 
> On a side note there's a way not to wake userfaults implicitly in
> UFFDIO_COPY in case you want to wake userfaults in batches but nobody
> uses that for now (uffdio_copy.mode |= UFFDIO_COPY_MODE_DONTWAKE).
> 
> It'd be theoretically nice to optimize away the additional enter/exit
> kernel introduced by the UFFDIO_WAKE and the translation table as
> well.
> 
> What we could do is to add a UFFDIO_BIND that takes an "fd" as
> parameter to the ioctl to bind the two uffd together. Then we could
> push logical offsets in addition to the virtual address ranges when
> calling UFFDIO_REGISTER_LOGICAL (the logical offsets would then match
> the guest physical addresses) so that the UFFDIO_COPY_LOGICAL would
> then be able to get a logical range to wakeup that the kernel would
> translate into virtual addresses for all uffds bind together. Pushing
> offsets into UFFDIO_REGISTER was David's idea.
> 
> That would eliminate the enter/exit kernel for the explicit
> UFFDIO_WAKE and calling a single UFFDIO_COPY would be enough.
> 
> Alternatively we should make the uffd work based on file offsets
> instead of virtual addresses but that would involve changes to
> filesystems and it only would move the needle on top of tmpfs
> (shared=on/off no difference) and hugetlbfs. It would be enough for
> vhost-bridge.

Really glad to know these ideas.

> 
> Usually the uffd fault lives at the higher level of the virtual memory
> subsystem and never deals with file offsets so if we can get away with
> logical ranges per-uffd for UFFDIO_REGISTER and UFFDIO_COPY, it may be
> simpler and easier to extend automatically to all memory types
> supported by uffd (including anon which has no file offset).
> 
> No major improvement is to be expected by such an enhancement though
> so it's not very high priority to implement. It's not even clear if
> the complexity is worth it. Doing one more syscall per page I think
> might be measurable only on very fast network. The current way of
> operation where uffd are independent of each other and the translation
> table is transferred by userland means is quite optimal already and
> much simpler. Furthermore for hugetlbfs the performance difference
> most certainly wouldn't be measurable, as the enter/exit kernel would
> be diluted by a factor of 512 compared to 4k userfaults.

Indeed, performance critical scenarios should be using huge pages, and
that means that extra WAKE will have even smaller impact.

Thanks Andrea!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups
  2017-07-12 15:00     ` Andrea Arcangeli
  2017-07-14  2:45       ` Peter Xu
@ 2017-07-14 14:18       ` Michael S. Tsirkin
  1 sibling, 0 replies; 87+ messages in thread
From: Michael S. Tsirkin @ 2017-07-14 14:18 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Xu, lvivier, qemu-devel, quintela, a.perevalov,
	Dr. David Alan Gilbert (git),
	maxime.coquelin, marcandre.lureau

On Wed, Jul 12, 2017 at 05:00:04PM +0200, Andrea Arcangeli wrote:
> What we could do is to add a UFFDIO_BIND that takes an "fd" as
> parameter to the ioctl to bind the two uffd together. Then we could
> push logical offsets in addition to the virtual address ranges when
> calling UFFDIO_REGISTER_LOGICAL (the logical offsets would then match
> the guest physical addresses) so that the UFFDIO_COPY_LOGICAL would
> then be able to get a logical range to wakeup that the kernel would
> translate into virtual addresses for all uffds bind together. Pushing
> offsets into UFFDIO_REGISTER was David's idea.

I think it was mine originally just in an off-list discussion.
To me it seems cleaner to do UFFDIO_DUP which gives you
a new fd bound to the current one, though.

> That would eliminate the enter/exit kernel for the explicit
> UFFDIO_WAKE and calling a single UFFDIO_COPY would be enough.
> 
> Alternatively we should make the uffd work based on file offsets
> instead of virtual addresses but that would involve changes to
> filesystems and it only would move the needle on top of tmpfs
> (shared=on/off no difference) and hugetlbfs. It would be enough for
> vhost-bridge.
> 
> Usually the uffd fault lives at the higher level of the virtual memory
> subsystem and never deals with file offsets so if we can get away with
> logical ranges per-uffd for UFFDIO_REGISTER and UFFDIO_COPY, it may be
> simpler and easier to extend automatically to all memory types
> supported by uffd (including anon which has no file offset).
> 
> No major improvement is to be expected by such an enhancement though
> so it's not very high priority to implement. It's not even clear if
> the complexity is worth it. Doing one more syscall per page I think
> might be measurable only on very fast network. The current way of
> operation where uffd are independent of each other and the translation
> table is transferred by userland means is quite optimal already and
> much simpler. Furthermore for hugetlbfs the performance difference
> most certainly wouldn't be measurable, as the enter/exit kernel would
> be diluted by a factor of 512 compared to 4k userfaults.
> 
> Thanks,
> Andrea

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset
  2017-07-11  3:31   ` Peter Xu
@ 2017-07-14 17:15     ` Dr. David Alan Gilbert
  2017-07-17  2:59       ` Peter Xu
  0 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-14 17:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Stash the RAMBlock and offset for later use looking up
> > addresses.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  hw/virtio/trace-events |  1 +
> >  hw/virtio/vhost-user.c | 11 +++++++++++
> >  2 files changed, 12 insertions(+)
> > 
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index f7be340a45..1fd194363a 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -3,6 +3,7 @@
> >  # hw/virtio/vhost-user.c
> >  vhost_user_postcopy_listen(void) ""
> >  vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
> > +vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
> >  
> >  # hw/virtio/virtio.c
> >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 6be3e7ff2d..3185af7a45 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -133,6 +133,11 @@ struct vhost_user {
> >      NotifierWithReturn postcopy_notifier;
> >      struct PostCopyFD  postcopy_fd;
> >      uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> > +    RAMBlock          *region_rb[VHOST_MEMORY_MAX_NREGIONS];
> > +    /* The offset from the start of the RAMBlock to the start of the
> > +     * vhost region.
> > +     */
> > +    ram_addr_t         region_rb_offset[VHOST_MEMORY_MAX_NREGIONS];
> 
> Here the array size is VHOST_MEMORY_MAX_NREGIONS, while...
> 
> >  };
> >  
> >  static bool ioeventfd_enabled(void)
> > @@ -324,8 +329,14 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
> >          assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> >          mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> >                                       &offset);
> > +        u->region_rb_offset[i] = offset;
> > +        u->region_rb[i] = mr->ram_block;
> 
> ... can i>=VHOST_MEMORY_MAX_NREGIONS here? Or do we only need to note
> this down if fd > 0 below?  Thanks,

I don't *think* so - I mean:
    for (i = 0; i < dev->mem->nregions; ++i) {

so if that's the maximum number of regions and that's the number of
regions we should be safe???

Dave

> >          fd = memory_region_get_fd(mr);
> >          if (fd > 0) {
> > +            trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
> > +                                                  reg->memory_size,
> > +                                                  reg->guest_phys_addr,
> > +                                                  reg->userspace_addr, offset);
> >              msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
> >              msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
> >              msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
> > -- 
> > 2.13.0
> > 
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset
  2017-07-14 17:15     ` Dr. David Alan Gilbert
@ 2017-07-17  2:59       ` Peter Xu
  2017-08-17 17:29         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Xu @ 2017-07-17  2:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

On Fri, Jul 14, 2017 at 06:15:54PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Wed, Jun 28, 2017 at 08:00:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Stash the RAMBlock and offset for later use looking up
> > > addresses.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  hw/virtio/trace-events |  1 +
> > >  hw/virtio/vhost-user.c | 11 +++++++++++
> > >  2 files changed, 12 insertions(+)
> > > 
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index f7be340a45..1fd194363a 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -3,6 +3,7 @@
> > >  # hw/virtio/vhost-user.c
> > >  vhost_user_postcopy_listen(void) ""
> > >  vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
> > > +vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
> > >  
> > >  # hw/virtio/virtio.c
> > >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 6be3e7ff2d..3185af7a45 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -133,6 +133,11 @@ struct vhost_user {
> > >      NotifierWithReturn postcopy_notifier;
> > >      struct PostCopyFD  postcopy_fd;
> > >      uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> > > +    RAMBlock          *region_rb[VHOST_MEMORY_MAX_NREGIONS];
> > > +    /* The offset from the start of the RAMBlock to the start of the
> > > +     * vhost region.
> > > +     */
> > > +    ram_addr_t         region_rb_offset[VHOST_MEMORY_MAX_NREGIONS];
> > 
> > Here the array size is VHOST_MEMORY_MAX_NREGIONS, while...
> > 
> > >  };
> > >  
> > >  static bool ioeventfd_enabled(void)
> > > @@ -324,8 +329,14 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
> > >          assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > >          mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > >                                       &offset);
> > > +        u->region_rb_offset[i] = offset;
> > > +        u->region_rb[i] = mr->ram_block;
> > 
> > ... can i>=VHOST_MEMORY_MAX_NREGIONS here? Or do we only need to note
> > this down if fd > 0 below?  Thanks,
> 
> I don't *think* so - I mean:
>     for (i = 0; i < dev->mem->nregions; ++i) {
> 
> so if that's the maximum number of regions and that's the number of
> regions we should be safe???

That's my concern - looks like dev->mem->nregions can be bigger than
that? At least I didn't really see a restriction on its size. The size
is changed in following stack:

  vhost_region_add
    vhost_set_memory
      vhost_dev_assign_memory

And it's dynamic extended, without checks.

Indeed in the function vhost_user_set_mem_table() we have:

    int fds[VHOST_MEMORY_MAX_NREGIONS];

But we are safe iiuc because we also have assertions to protect:

    assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
    fds[fd_num++] = fd;

Do we at least need that assert?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd
  2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
@ 2017-07-24 12:10   ` Maxime Coquelin
  2017-07-26 17:12     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-24 12:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Open a userfaultfd (on a postcopy_advise) and send it back in
> the reply to the qemu for it to monitor.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   contrib/libvhost-user/libvhost-user.c | 24 +++++++++++++++++++++---
>   contrib/libvhost-user/libvhost-user.h |  3 +++
>   2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index e3a32755cf..62e97f6b84 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -15,6 +15,7 @@
>   
>   #include <qemu/osdep.h>
>   #include <sys/eventfd.h>
> +#include <sys/syscall.h>
>   #include <linux/vhost.h>
>   
>   #include "qemu/atomic.h"
> @@ -774,11 +775,28 @@ vu_set_vring_enable_exec(VuDev *dev, VhostUserMsg *vmsg)
>   static bool
>   vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
>   {
> -    /* TODO: Open ufd, pass it back in the request
> -    /* TODO: Add addresses
> -     */
> +    struct uffdio_api api_struct;
> +
> +    dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> +    /* TODO: Add addresses */
>       vmsg->payload.u64 = 0xcafe;
>       vmsg->size = sizeof(vmsg->payload.u64);
> +
> +    if (dev->postcopy_ufd == -1) {
> +        vu_panic(dev, "Userfaultfd not available: %s", strerror(errno));
> +        return false;

I think we may want to reply something even in case of error.

Indeed, if something goes wrong on backend side, Qemu will remain
blocked waiting for the reply.

> +    }
> +    api_struct.api = UFFD_API;
> +    api_struct.features = 0;
> +    if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
> +        vu_panic(dev, "Failed UFFDIO_API: %s", strerror(errno));
> +        close(dev->postcopy_ufd);
> +        return false;
Ditto
> +    }
> +    /* TODO: Stash feature flags somewhere */
> +    /* Return a ufd to the QEMU */
> +    vmsg->fd_num = 1;
> +    vmsg->fds[0] = dev->postcopy_ufd;
>       return true; /* = send a reply */
>   }
>   
> diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
> index 8bb35582ea..3e65a962da 100644
> --- a/contrib/libvhost-user/libvhost-user.h
> +++ b/contrib/libvhost-user/libvhost-user.h
> @@ -231,6 +231,9 @@ struct VuDev {
>        * re-initialize */
>       vu_panic_cb panic;
>       const VuDevIface *iface;
> +
> +    /* Postcopy data */
> +    int postcopy_ufd;
>   };
>   
>   typedef struct VuVirtqElement {
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client
  2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
@ 2017-07-24 14:36   ` Maxime Coquelin
  2017-07-26 17:42     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-24 14:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index b98fbe4834..1f70f5760f 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -67,6 +67,7 @@ typedef enum VhostUserRequest {
>       VHOST_USER_SET_SLAVE_REQ_FD = 21,
>       VHOST_USER_IOTLB_MSG = 22,
>       VHOST_USER_POSTCOPY_ADVISE  = 23,
> +    VHOST_USER_POSTCOPY_LISTEN  = 24,
>       VHOST_USER_MAX
>   } VhostUserRequest;
>   
> @@ -771,6 +772,25 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>       return 0;
>   }
>   
> +/*
> + * Called at the switch to postcopy on reception of the 'listen' command.
> + */
> +static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
> +{
> +    VhostUserMsg msg = {
> +        .request = VHOST_USER_POSTCOPY_LISTEN,
> +        .flags = VHOST_USER_VERSION,
> +    };

I think it should use REPLY_ACK feature when available for two reasons:
1. The backend could reply nack if nregions is already set.
2. When leaving vhost_user_postcopy_listen(), the message might likely
    not have been handled yet by the backend.

> +    trace_vhost_user_postcopy_listen();
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        error_setg(errp, "Failed to send postcopy_listen to vhost");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>   static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>                                           void *opaque)
>   {
> @@ -793,6 +813,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>       case POSTCOPY_NOTIFY_INBOUND_ADVISE:
>           return vhost_user_postcopy_advise(dev, pnd->errp);
>   
> +    case POSTCOPY_NOTIFY_INBOUND_LISTEN:
> +        return vhost_user_postcopy_listen(dev, pnd->errp);
> +
>       default:
>           /* We ignore notifications we don't know */
>           break;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd
  2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
@ 2017-07-24 15:22   ` Maxime Coquelin
  2017-07-24 17:50     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-24 15:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When new regions are sent to the client using SET_MEM_TABLE, register
> them with the userfaultfd.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   contrib/libvhost-user/libvhost-user.c | 33 +++++++++++++++++++++++++++++++++
>   1 file changed, 33 insertions(+)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index 6de339fb7a..be7470e3a9 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -450,6 +450,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>                      dev_region->mmap_addr);
>           }
>   
> +        if (dev->postcopy_listening) {
> +            /* We should already have an open ufd need to mark each memory
> +             * range as ufd.
> +             * Note: Do we need any madvises? Well it's not been accessed
> +             * yet, still probably need no THP to be safe, discard to be safe?
> +             */
> +            struct uffdio_register reg_struct;
> +            /* Note: We might need to go back to using mmap_addr and
> +             * len + mmap_offset for * huge pages, but then we do hope not to
> +             * see accesses in that area below the offset
> +             */
> +            reg_struct.range.start = (uintptr_t)(dev_region->mmap_addr +
> +                                                 dev_region->mmap_offset);
> +            reg_struct.range.len = dev_region->size;
> +            reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
> +
> +            if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
> +                vu_panic(dev, "%s: Failed to userfault region %d: (ufd=%d)%s\n",
> +                         __func__, i, strerror(errno), dev->postcopy_ufd);
Aren't the two last args swapped?

> +                continue;
> +            }
> +            if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
> +                vu_panic(dev, "%s Region (%d) doesn't support COPY",
> +                         __func__, i);
> +                continue;
> +            }
> +            DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
> +                    __func__, i, reg_struct.range.start, reg_struct.range.len);
> +            /* TODO: Stash 'zero' support flags somewhere */
> +            /* TODO: Get address back to QEMU */
> +
> +        }
> +
>           close(vmsg->fds[i]);
>       }
>   
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu
  2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
@ 2017-07-24 17:31   ` Maxime Coquelin
  0 siblings, 0 replies; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-24 17:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

<snip\>

> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 1f70f5760f..6be3e7ff2d 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -19,6 +19,7 @@
>   #include "qemu/sockets.h"
>   #include "migration/migration.h"
>   #include "migration/postcopy-ram.h"
> +#include "trace.h"
>   
>   #include <sys/ioctl.h>
>   #include <sys/socket.h>
> @@ -131,6 +132,7 @@ struct vhost_user {
>       int slave_fd;
>       NotifierWithReturn postcopy_notifier;
>       struct PostCopyFD  postcopy_fd;
> +    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
>   };
>   
>   static bool ioeventfd_enabled(void)
> @@ -298,6 +300,7 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>   static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                       struct vhost_memory *mem)
>   {
> +    struct vhost_user *u = dev->opaque;
>       int fds[VHOST_MEMORY_MAX_NREGIONS];
>       int i, fd;
>       size_t fd_num = 0;
> @@ -348,6 +351,57 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>           return -1;
>       }
>   
> +    if (u->postcopy_fd.handler) {
> +        VhostUserMsg msg_reply;
> +        int region_i, reply_i;
> +        if (vhost_user_read(dev, &msg_reply) < 0) {
> +            return -1;
> +        }
> +
> +        if (msg_reply.request != VHOST_USER_SET_MEM_TABLE) {
> +            error_report("%s: Received unexpected msg type."
> +                         "Expected %d received %d", __func__,
> +                         VHOST_USER_SET_MEM_TABLE, msg_reply.request);
> +            return -1;
> +        }
> +        /* We're using the same structure, just reusing one of the
> +         * fields, so it should be the same size.
> +         */
> +        if (msg_reply.size != msg.size) {
> +            error_report("%s: Unexpected size for postcopy reply "
> +                         "%d vs %d", __func__, msg_reply.size, msg.size);
> +            return -1;
> +        }
> +
> +        memset(u->postcopy_client_bases, 0,
> +               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +
> +        /* They're in the same order as the regions that were sent
> +         * but some of the regions were skipped (above) if they
> +         * didn't have fd's
> +        */
> +        for (reply_i = 0, region_i = 0;
> +             region_i < dev->mem->nregions;
> +             region_i++) {
> +            if (reply_i < fd_num &&
> +                msg_reply.payload.memory.regions[region_i].guest_phys_addr ==
> +                dev->mem->regions[region_i].guest_phys_addr) {
> +                u->postcopy_client_bases[region_i] =
> +                    msg_reply.payload.memory.regions[reply_i].userspace_addr;
> +                trace_vhost_user_set_mem_table_postcopy(
> +                    msg_reply.payload.memory.regions[reply_i].userspace_addr,
> +                    msg.payload.memory.regions[reply_i].userspace_addr,
> +                    reply_i, region_i);
> +                reply_i++;
> +            }
> +        }
> +        if (reply_i != fd_num) {
> +            error_report("%s: postcopy reply not fully consumed "
> +                         "%d vs %zd",
> +                         __func__, reply_i, fd_num);
> +            return -1;
> +        }
> +    }
>       if (reply_supported) {
>           return process_message_reply(dev, &msg);
>       }
> 

If the backend supports reply-ack feature, the VHOST_USER_F_NEED_REPLY
flag is set to the message earlier in this function.

In this case, when postcopy is also enabled, it means that for this
request, to replies will have to be send by the backend.

Maybe it would be better not to set the VHOST_USER_F_NEED_REPLY in this
specific case? Of course, it should be documented in the spec.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd
  2017-07-24 15:22   ` Maxime Coquelin
@ 2017-07-24 17:50     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-24 17:50 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When new regions are sent to the client using SET_MEM_TABLE, register
> > them with the userfaultfd.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >   contrib/libvhost-user/libvhost-user.c | 33 +++++++++++++++++++++++++++++++++
> >   1 file changed, 33 insertions(+)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index 6de339fb7a..be7470e3a9 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -450,6 +450,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> >                      dev_region->mmap_addr);
> >           }
> > +        if (dev->postcopy_listening) {
> > +            /* We should already have an open ufd need to mark each memory
> > +             * range as ufd.
> > +             * Note: Do we need any madvises? Well it's not been accessed
> > +             * yet, still probably need no THP to be safe, discard to be safe?
> > +             */
> > +            struct uffdio_register reg_struct;
> > +            /* Note: We might need to go back to using mmap_addr and
> > +             * len + mmap_offset for * huge pages, but then we do hope not to
> > +             * see accesses in that area below the offset
> > +             */
> > +            reg_struct.range.start = (uintptr_t)(dev_region->mmap_addr +
> > +                                                 dev_region->mmap_offset);
> > +            reg_struct.range.len = dev_region->size;
> > +            reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
> > +
> > +            if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
> > +                vu_panic(dev, "%s: Failed to userfault region %d: (ufd=%d)%s\n",
> > +                         __func__, i, strerror(errno), dev->postcopy_ufd);
> Aren't the two last args swapped?

Fixed.
Thanks for spotting that; adding a GCC_FMT_ATTR(2, 3) before vu_panic
doesn't show any more from me, but there are a couple of existing cases
that look wrong;  I think they just need a * before them, but am not
100% sure.

Dave

> > +                continue;
> > +            }
> > +            if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
> > +                vu_panic(dev, "%s Region (%d) doesn't support COPY",
> > +                         __func__, i);
> > +                continue;
> > +            }
> > +            DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
> > +                    __func__, i, reg_struct.range.start, reg_struct.range.len);
> > +            /* TODO: Stash 'zero' support flags somewhere */
> > +            /* TODO: Get address back to QEMU */
> > +
> > +        }
> > +
> >           close(vmsg->fds[i]);
> >       }
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd
  2017-07-24 12:10   ` Maxime Coquelin
@ 2017-07-26 17:12     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-26 17:12 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Open a userfaultfd (on a postcopy_advise) and send it back in
> > the reply to the qemu for it to monitor.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >   contrib/libvhost-user/libvhost-user.c | 24 +++++++++++++++++++++---
> >   contrib/libvhost-user/libvhost-user.h |  3 +++
> >   2 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index e3a32755cf..62e97f6b84 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -15,6 +15,7 @@
> >   #include <qemu/osdep.h>
> >   #include <sys/eventfd.h>
> > +#include <sys/syscall.h>
> >   #include <linux/vhost.h>
> >   #include "qemu/atomic.h"
> > @@ -774,11 +775,28 @@ vu_set_vring_enable_exec(VuDev *dev, VhostUserMsg *vmsg)
> >   static bool
> >   vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
> >   {
> > -    /* TODO: Open ufd, pass it back in the request
> > -    /* TODO: Add addresses
> > -     */
> > +    struct uffdio_api api_struct;
> > +
> > +    dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> > +    /* TODO: Add addresses */
> >       vmsg->payload.u64 = 0xcafe;
> >       vmsg->size = sizeof(vmsg->payload.u64);
> > +
> > +    if (dev->postcopy_ufd == -1) {
> > +        vu_panic(dev, "Userfaultfd not available: %s", strerror(errno));
> > +        return false;
> 
> I think we may want to reply something even in case of error.
> 
> Indeed, if something goes wrong on backend side, Qemu will remain
> blocked waiting for the reply.

Fixed.

> > +    }
> > +    api_struct.api = UFFD_API;
> > +    api_struct.features = 0;
> > +    if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
> > +        vu_panic(dev, "Failed UFFDIO_API: %s", strerror(errno));
> > +        close(dev->postcopy_ufd);
> > +        return false;
> Ditto

Fixed.

Thanks,

Dave

> > +    }
> > +    /* TODO: Stash feature flags somewhere */
> > +    /* Return a ufd to the QEMU */
> > +    vmsg->fd_num = 1;
> > +    vmsg->fds[0] = dev->postcopy_ufd;
> >       return true; /* = send a reply */
> >   }
> > diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
> > index 8bb35582ea..3e65a962da 100644
> > --- a/contrib/libvhost-user/libvhost-user.h
> > +++ b/contrib/libvhost-user/libvhost-user.h
> > @@ -231,6 +231,9 @@ struct VuDev {
> >        * re-initialize */
> >       vu_panic_cb panic;
> >       const VuDevIface *iface;
> > +
> > +    /* Postcopy data */
> > +    int postcopy_ufd;
> >   };
> >   typedef struct VuVirtqElement {
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client
  2017-07-24 14:36   ` Maxime Coquelin
@ 2017-07-26 17:42     ` Dr. David Alan Gilbert
  2017-07-26 18:03       ` Maxime Coquelin
  0 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-26 17:42 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index b98fbe4834..1f70f5760f 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -67,6 +67,7 @@ typedef enum VhostUserRequest {
> >       VHOST_USER_SET_SLAVE_REQ_FD = 21,
> >       VHOST_USER_IOTLB_MSG = 22,
> >       VHOST_USER_POSTCOPY_ADVISE  = 23,
> > +    VHOST_USER_POSTCOPY_LISTEN  = 24,
> >       VHOST_USER_MAX
> >   } VhostUserRequest;
> > @@ -771,6 +772,25 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
> >       return 0;
> >   }
> > +/*
> > + * Called at the switch to postcopy on reception of the 'listen' command.
> > + */
> > +static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
> > +{
> > +    VhostUserMsg msg = {
> > +        .request = VHOST_USER_POSTCOPY_LISTEN,
> > +        .flags = VHOST_USER_VERSION,
> > +    };
> 
> I think it should use REPLY_ACK feature when available for two reasons:
> 1. The backend could reply nack if nregions is already set.
> 2. When leaving vhost_user_postcopy_listen(), the message might likely
>    not have been handled yet by the backend.

OK, I can do that.  What are the rules on features like that?   Can I
just assume that if you've got POSTCOPY then you'll ack for new messages
we add?
The current contrib/libvhost-user code doesn't seem to do acks yet, but
I can add it here quite easily.

Dave

> > +    trace_vhost_user_postcopy_listen();
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        error_setg(errp, "Failed to send postcopy_listen to vhost");
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >   static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
> >                                           void *opaque)
> >   {
> > @@ -793,6 +813,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
> >       case POSTCOPY_NOTIFY_INBOUND_ADVISE:
> >           return vhost_user_postcopy_advise(dev, pnd->errp);
> > +    case POSTCOPY_NOTIFY_INBOUND_LISTEN:
> > +        return vhost_user_postcopy_listen(dev, pnd->errp);
> > +
> >       default:
> >           /* We ignore notifications we don't know */
> >           break;
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client
  2017-07-26 17:42     ` Dr. David Alan Gilbert
@ 2017-07-26 18:03       ` Maxime Coquelin
  0 siblings, 0 replies; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-26 18:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 07/26/2017 07:42 PM, Dr. David Alan Gilbert wrote:
> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>
>>
>> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index b98fbe4834..1f70f5760f 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -67,6 +67,7 @@ typedef enum VhostUserRequest {
>>>        VHOST_USER_SET_SLAVE_REQ_FD = 21,
>>>        VHOST_USER_IOTLB_MSG = 22,
>>>        VHOST_USER_POSTCOPY_ADVISE  = 23,
>>> +    VHOST_USER_POSTCOPY_LISTEN  = 24,
>>>        VHOST_USER_MAX
>>>    } VhostUserRequest;
>>> @@ -771,6 +772,25 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>>>        return 0;
>>>    }
>>> +/*
>>> + * Called at the switch to postcopy on reception of the 'listen' command.
>>> + */
>>> +static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
>>> +{
>>> +    VhostUserMsg msg = {
>>> +        .request = VHOST_USER_POSTCOPY_LISTEN,
>>> +        .flags = VHOST_USER_VERSION,
>>> +    };
>>
>> I think it should use REPLY_ACK feature when available for two reasons:
>> 1. The backend could reply nack if nregions is already set.
>> 2. When leaving vhost_user_postcopy_listen(), the message might likely
>>     not have been handled yet by the backend.
> 
> OK, I can do that.  What are the rules on features like that?   Can I
> just assume that if you've got POSTCOPY then you'll ack for new messages
> we add?

You can have a look at vhost_user_net_set_mtu() for example.
Before sending the message, just set VHOST_USER_NEED_REPLY_MASK bit in
message's flags if VHOST_USER_PROTOCOL_F_REPLY_ACK is advertised by the
backend.

After having sent the message, instead of returning directly, wait for
the ack/nack by calling process_message_reply() if
VHOST_USER_PROTOCOL_F_REPLY_ACK is advertised.

> The current contrib/libvhost-user code doesn't seem to do acks yet, but
> I can add it here quite easily.

Yes, agree it is not implemented in libvhost-user, but we will be able
to test it with DPDK prototype.

Thanks,
Maxime

> Dave
> 
>>> +    trace_vhost_user_postcopy_listen();
>>> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>>> +        error_setg(errp, "Failed to send postcopy_listen to vhost");
>>> +        return -1;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>>>                                            void *opaque)
>>>    {
>>> @@ -793,6 +813,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>>>        case POSTCOPY_NOTIFY_INBOUND_ADVISE:
>>>            return vhost_user_postcopy_advise(dev, pnd->errp);
>>> +    case POSTCOPY_NOTIFY_INBOUND_LISTEN:
>>> +        return vhost_user_postcopy_listen(dev, pnd->errp);
>>> +
>>>        default:
>>>            /* We ignore notifications we don't know */
>>>            break;
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message
  2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
@ 2017-07-27 11:35   ` Maxime Coquelin
  2017-08-24 14:53     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-07-27 11:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> This message is sent just before the end of postcopy to get the
> client to stop using userfault since we wont respond to any more
> requests.  It should close userfaultfd so that any other pages
> get mapped to the backing file automatically by the kernel, since
> at this point we know we've received everything.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   contrib/libvhost-user/libvhost-user.c | 23 +++++++++++++++++++++++
>   contrib/libvhost-user/libvhost-user.h |  1 +
>   hw/virtio/vhost-user.c                |  1 +
>   3 files changed, 25 insertions(+)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index d37052b7b0..c1716d1a62 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -68,6 +68,7 @@ vu_request_to_string(int req)
>           REQ(VHOST_USER_INPUT_GET_CONFIG),
>           REQ(VHOST_USER_POSTCOPY_ADVISE),
>           REQ(VHOST_USER_POSTCOPY_LISTEN),
> +        REQ(VHOST_USER_POSTCOPY_END),
>           REQ(VHOST_USER_MAX),
>       };
>   #undef REQ
> @@ -889,6 +890,26 @@ vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
>   
>       return false;
>   }
> +
> +static bool
> +vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
> +{
> +    fprintf(stderr, "%s: Entry\n", __func__);
> +    dev->postcopy_listening = false;
> +    if (dev->postcopy_ufd > 0) {
> +        close(dev->postcopy_ufd);
> +        dev->postcopy_ufd = -1;
> +        fprintf(stderr, "%s: Done close\n", __func__);
> +    }
> +
> +    vmsg->fd_num = 0;
> +    vmsg->payload.u64 = 0;
> +    vmsg->size = sizeof(vmsg->payload.u64);
> +    vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
> +    fprintf(stderr, "%s: exit\n", __func__);
> +    return true;
> +}
> +

It is what reply-ack is done for, so to avoid code duplication,
maybe Qemu could set VHOST_USER_NEED_REPLY_MASK bit when reply-ack
feature is supported.

I'm wondering if we shouldn't consider adding reply-ack feature support
to libvhost-user, and make postcopy support to depend on this feature.

Cheers,
Maxime

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises
  2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
@ 2017-08-07  4:49   ` Alexey Perevalov
  2017-08-08 17:06     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Alexey Perevalov @ 2017-08-07  4:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, marcandre.lureau, maxime.coquelin, mst, quintela,
	peterx, lvivier, aarcange

On 06/28/2017 10:00 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Clear the area and turn off THP.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   contrib/libvhost-user/libvhost-user.c | 32 ++++++++++++++++++++++++++++++--
>   1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index 0658b6e847..ceddeac74f 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>           }
>   
>           if (dev->postcopy_listening) {
> +            int ret;
>               /* We should already have an open ufd need to mark each memory
>                * range as ufd.
> -             * Note: Do we need any madvises? Well it's not been accessed
> -             * yet, still probably need no THP to be safe, discard to be safe?
>                */
> +
> +            /* Discard any mapping we have here; note I can't use MADV_REMOVE
> +             * or fallocate to make the hole since I don't want to lose
> +             * data that's already arrived in the shared process.
> +             * TODO: How to do hugepage
> +             */
Hi, David, frankly saying, I stuck with my solution, and I have also 
another issues,
but here I could suggest solution for hugepages. I think we could 
transmit a received pages
bitmap in VHOST_USER_SET_MEM_TABLE (VhostUserMemoryRegion), but it will 
raise a compatibility issue,
or introduce special message type for that and send it before 
VHOST_USER_SET_MEM_TABLE.
So it will be  possible to do fallocate on received bitmap basis, just 
skip already copied pages.
If you wish, I could send patches, rebased on yours, for doing it.

> +            ret = madvise((void *)dev_region->mmap_addr,
> +                          dev_region->size + dev_region->mmap_offset,
> +                          MADV_DONTNEED);
> +            if (ret) {
> +                fprintf(stderr,
> +                        "%s: Failed to madvise(DONTNEED) region %d: %s\n",
> +                        __func__, i, strerror(errno));
> +            }
> +            /* Turn off transparent hugepages so we dont get lose wakeups
> +             * in neighbouring pages.
> +             * TODO: Turn this backon later.
> +             */
> +            ret = madvise((void *)dev_region->mmap_addr,
> +                          dev_region->size + dev_region->mmap_offset,
> +                          MADV_NOHUGEPAGE);
> +            if (ret) {
> +                /* Note: This can happen legally on kernels that are configured
> +                 * without madvise'able hugepages
> +                 */
> +                fprintf(stderr,
> +                        "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
> +                        __func__, i, strerror(errno));
> +            }
>               struct uffdio_register reg_struct;
>               /* Note: We might need to go back to using mmap_addr and
>                * len + mmap_offset for * huge pages, but then we do hope not to


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises
  2017-08-07  4:49   ` Alexey Perevalov
@ 2017-08-08 17:06     ` Dr. David Alan Gilbert
  2017-08-09 11:02       ` Alexey Perevalov
  0 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-08 17:06 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: qemu-devel, marcandre.lureau, maxime.coquelin, mst, quintela,
	peterx, lvivier, aarcange

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> On 06/28/2017 10:00 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Clear the area and turn off THP.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >   contrib/libvhost-user/libvhost-user.c | 32 ++++++++++++++++++++++++++++++--
> >   1 file changed, 30 insertions(+), 2 deletions(-)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index 0658b6e847..ceddeac74f 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> >           }
> >           if (dev->postcopy_listening) {
> > +            int ret;
> >               /* We should already have an open ufd need to mark each memory
> >                * range as ufd.
> > -             * Note: Do we need any madvises? Well it's not been accessed
> > -             * yet, still probably need no THP to be safe, discard to be safe?
> >                */
> > +
> > +            /* Discard any mapping we have here; note I can't use MADV_REMOVE
> > +             * or fallocate to make the hole since I don't want to lose
> > +             * data that's already arrived in the shared process.
> > +             * TODO: How to do hugepage
> > +             */
> Hi, David, frankly saying, I stuck with my solution, and I have also another
> issues,
> but here I could suggest solution for hugepages. I think we could transmit a
> received pages
> bitmap in VHOST_USER_SET_MEM_TABLE (VhostUserMemoryRegion), but it will
> raise a compatibility issue,
> or introduce special message type for that and send it before
> VHOST_USER_SET_MEM_TABLE.
> So it will be  possible to do fallocate on received bitmap basis, just skip
> already copied pages.
> If you wish, I could send patches, rebased on yours, for doing it.

What we found works is that actually we don't need to do a discard -
since we've only just done the mmap of the arena, nothing will be
occupying it on the shared client, so we don't need to discard.

We've had a postcopy migrate work now, with a few hacks we're still
cleaning up, both on vhost-user-bridge and dpdk; so I'll get this
updated and reposted.

Dave

> > +            ret = madvise((void *)dev_region->mmap_addr,
> > +                          dev_region->size + dev_region->mmap_offset,
> > +                          MADV_DONTNEED);
> > +            if (ret) {
> > +                fprintf(stderr,
> > +                        "%s: Failed to madvise(DONTNEED) region %d: %s\n",
> > +                        __func__, i, strerror(errno));
> > +            }
> > +            /* Turn off transparent hugepages so we dont get lose wakeups
> > +             * in neighbouring pages.
> > +             * TODO: Turn this backon later.
> > +             */
> > +            ret = madvise((void *)dev_region->mmap_addr,
> > +                          dev_region->size + dev_region->mmap_offset,
> > +                          MADV_NOHUGEPAGE);
> > +            if (ret) {
> > +                /* Note: This can happen legally on kernels that are configured
> > +                 * without madvise'able hugepages
> > +                 */
> > +                fprintf(stderr,
> > +                        "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
> > +                        __func__, i, strerror(errno));
> > +            }
> >               struct uffdio_register reg_struct;
> >               /* Note: We might need to go back to using mmap_addr and
> >                * len + mmap_offset for * huge pages, but then we do hope not to
> 
> 
> -- 
> Best regards,
> Alexey Perevalov
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises
  2017-08-08 17:06     ` Dr. David Alan Gilbert
@ 2017-08-09 11:02       ` Alexey Perevalov
  2017-08-10  8:55         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Alexey Perevalov @ 2017-08-09 11:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, marcandre.lureau, maxime.coquelin, mst, quintela,
	peterx, lvivier, aarcange

On 08/08/2017 08:06 PM, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
>> On 06/28/2017 10:00 PM, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> Clear the area and turn off THP.
>>>
>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> ---
>>>    contrib/libvhost-user/libvhost-user.c | 32 ++++++++++++++++++++++++++++++--
>>>    1 file changed, 30 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
>>> index 0658b6e847..ceddeac74f 100644
>>> --- a/contrib/libvhost-user/libvhost-user.c
>>> +++ b/contrib/libvhost-user/libvhost-user.c
>>> @@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>>>            }
>>>            if (dev->postcopy_listening) {
>>> +            int ret;
>>>                /* We should already have an open ufd need to mark each memory
>>>                 * range as ufd.
>>> -             * Note: Do we need any madvises? Well it's not been accessed
>>> -             * yet, still probably need no THP to be safe, discard to be safe?
>>>                 */
>>> +
>>> +            /* Discard any mapping we have here; note I can't use MADV_REMOVE
>>> +             * or fallocate to make the hole since I don't want to lose
>>> +             * data that's already arrived in the shared process.
>>> +             * TODO: How to do hugepage
>>> +             */
>> Hi, David, frankly saying, I stuck with my solution, and I have also another
>> issues,
>> but here I could suggest solution for hugepages. I think we could transmit a
>> received pages
>> bitmap in VHOST_USER_SET_MEM_TABLE (VhostUserMemoryRegion), but it will
>> raise a compatibility issue,
>> or introduce special message type for that and send it before
>> VHOST_USER_SET_MEM_TABLE.
>> So it will be  possible to do fallocate on received bitmap basis, just skip
>> already copied pages.
>> If you wish, I could send patches, rebased on yours, for doing it.
> What we found works is that actually we don't need to do a discard -
> since we've only just done the mmap of the arena, nothing will be
> occupying it on the shared client, so we don't need to discard.
Looks like yes, I checked on kernel from Andrea's git,
there is any more EEXIST error in case when client doesn't
fallocate.

>
> We've had a postcopy migrate work now, with a few hacks we're still
> cleaning up, both on vhost-user-bridge and dpdk; so I'll get this
> updated and reposted.
In you patch series vring is disabling in case of VHOST_USER_GET_VRING_BASE.
It's being called when vhost-user server want's to stop vring.
QEMU is enabling vring as soon as virtual machine is started, so I 
didn't see
explicit vring disabling for migrating VRING.
So migrating VRING is protected just by uffd_register, isn't it? And PMD 
thread (any
vhost-user thread which accessing migrating VRING) will wait page 
copying in this case,
right?

>
> Dave
>
>>> +            ret = madvise((void *)dev_region->mmap_addr,
>>> +                          dev_region->size + dev_region->mmap_offset,
>>> +                          MADV_DONTNEED);
>>> +            if (ret) {
>>> +                fprintf(stderr,
>>> +                        "%s: Failed to madvise(DONTNEED) region %d: %s\n",
>>> +                        __func__, i, strerror(errno));
>>> +            }
>>> +            /* Turn off transparent hugepages so we dont get lose wakeups
>>> +             * in neighbouring pages.
>>> +             * TODO: Turn this backon later.
>>> +             */
>>> +            ret = madvise((void *)dev_region->mmap_addr,
>>> +                          dev_region->size + dev_region->mmap_offset,
>>> +                          MADV_NOHUGEPAGE);
>>> +            if (ret) {
>>> +                /* Note: This can happen legally on kernels that are configured
>>> +                 * without madvise'able hugepages
>>> +                 */
>>> +                fprintf(stderr,
>>> +                        "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
>>> +                        __func__, i, strerror(errno));
>>> +            }
>>>                struct uffdio_register reg_struct;
>>>                /* Note: We might need to go back to using mmap_addr and
>>>                 * len + mmap_offset for * huge pages, but then we do hope not to
>>
>> -- 
>> Best regards,
>> Alexey Perevalov
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises
  2017-08-09 11:02       ` Alexey Perevalov
@ 2017-08-10  8:55         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-10  8:55 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: qemu-devel, marcandre.lureau, maxime.coquelin, mst, quintela,
	peterx, lvivier, aarcange

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> On 08/08/2017 08:06 PM, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > On 06/28/2017 10:00 PM, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Clear the area and turn off THP.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >    contrib/libvhost-user/libvhost-user.c | 32 ++++++++++++++++++++++++++++++--
> > > >    1 file changed, 30 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > > > index 0658b6e847..ceddeac74f 100644
> > > > --- a/contrib/libvhost-user/libvhost-user.c
> > > > +++ b/contrib/libvhost-user/libvhost-user.c
> > > > @@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> > > >            }
> > > >            if (dev->postcopy_listening) {
> > > > +            int ret;
> > > >                /* We should already have an open ufd need to mark each memory
> > > >                 * range as ufd.
> > > > -             * Note: Do we need any madvises? Well it's not been accessed
> > > > -             * yet, still probably need no THP to be safe, discard to be safe?
> > > >                 */
> > > > +
> > > > +            /* Discard any mapping we have here; note I can't use MADV_REMOVE
> > > > +             * or fallocate to make the hole since I don't want to lose
> > > > +             * data that's already arrived in the shared process.
> > > > +             * TODO: How to do hugepage
> > > > +             */
> > > Hi, David, frankly saying, I stuck with my solution, and I have also another
> > > issues,
> > > but here I could suggest solution for hugepages. I think we could transmit a
> > > received pages
> > > bitmap in VHOST_USER_SET_MEM_TABLE (VhostUserMemoryRegion), but it will
> > > raise a compatibility issue,
> > > or introduce special message type for that and send it before
> > > VHOST_USER_SET_MEM_TABLE.
> > > So it will be  possible to do fallocate on received bitmap basis, just skip
> > > already copied pages.
> > > If you wish, I could send patches, rebased on yours, for doing it.
> > What we found works is that actually we don't need to do a discard -
> > since we've only just done the mmap of the arena, nothing will be
> > occupying it on the shared client, so we don't need to discard.
> Looks like yes, I checked on kernel from Andrea's git,
> there is any more EEXIST error in case when client doesn't
> fallocate.
> 
> > 
> > We've had a postcopy migrate work now, with a few hacks we're still
> > cleaning up, both on vhost-user-bridge and dpdk; so I'll get this
> > updated and reposted.
> In you patch series vring is disabling in case of VHOST_USER_GET_VRING_BASE.
> It's being called when vhost-user server want's to stop vring.
> QEMU is enabling vring as soon as virtual machine is started, so I didn't
> see
> explicit vring disabling for migrating VRING.
> So migrating VRING is protected just by uffd_register, isn't it? And PMD
> thread (any
> vhost-user thread which accessing migrating VRING) will wait page copying in
> this case,
> right?

Yes I believe that's the case; although I don't know the structure of
dpdk to know the effect of that.

Dave

> 
> > 
> > Dave
> > 
> > > > +            ret = madvise((void *)dev_region->mmap_addr,
> > > > +                          dev_region->size + dev_region->mmap_offset,
> > > > +                          MADV_DONTNEED);
> > > > +            if (ret) {
> > > > +                fprintf(stderr,
> > > > +                        "%s: Failed to madvise(DONTNEED) region %d: %s\n",
> > > > +                        __func__, i, strerror(errno));
> > > > +            }
> > > > +            /* Turn off transparent hugepages so we dont get lose wakeups
> > > > +             * in neighbouring pages.
> > > > +             * TODO: Turn this backon later.
> > > > +             */
> > > > +            ret = madvise((void *)dev_region->mmap_addr,
> > > > +                          dev_region->size + dev_region->mmap_offset,
> > > > +                          MADV_NOHUGEPAGE);
> > > > +            if (ret) {
> > > > +                /* Note: This can happen legally on kernels that are configured
> > > > +                 * without madvise'able hugepages
> > > > +                 */
> > > > +                fprintf(stderr,
> > > > +                        "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
> > > > +                        __func__, i, strerror(errno));
> > > > +            }
> > > >                struct uffdio_register reg_struct;
> > > >                /* Note: We might need to go back to using mmap_addr and
> > > >                 * len + mmap_offset for * huge pages, but then we do hope not to
> > > 
> > > -- 
> > > Best regards,
> > > Alexey Perevalov
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > 
> 
> -- 
> Best regards,
> Alexey Perevalov
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset
  2017-07-03 17:44   ` Michael S. Tsirkin
@ 2017-08-14 17:27     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-14 17:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:21PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Utility to give the offset of a host pointer within a RAMBlock
> > (assuming we already know it's in that RAMBlock)
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  exec.c                    | 6 ++++++
> >  include/exec/cpu-common.h | 1 +
> >  2 files changed, 7 insertions(+)
> > 
> > diff --git a/exec.c b/exec.c
> > index 4e61226a16..a1499b9bee 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2218,6 +2218,12 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
> >      return ramblock_ptr(block, addr);
> >  }
> >  
> > +/* Return the offset of a hostpointer within a ramblock */
> > +ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host)
> > +{
> > +    return (uint8_t *)host - (uint8_t *)rb->host;
> > +}
> > +
> 
> I'd also assert that it's within that block.

Done

> >  /*
> >   * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
> >   * in that RAMBlock.
> > diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> > index 4af179b543..fa1ec22d66 100644
> > --- a/include/exec/cpu-common.h
> > +++ b/include/exec/cpu-common.h
> > @@ -66,6 +66,7 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr);
> >  RAMBlock *qemu_ram_block_by_name(const char *name);
> >  RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> >                                     ram_addr_t *offset);
> > +ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host);
> >  void qemu_ram_set_idstr(RAMBlock *block, const char *name, DeviceState *dev);
> >  void qemu_ram_unset_idstr(RAMBlock *block);
> >  const char *qemu_ram_get_idstr(RAMBlock *rb);
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset
  2017-07-17  2:59       ` Peter Xu
@ 2017-08-17 17:29         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-17 17:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Jul 14, 2017 at 06:15:54PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Wed, Jun 28, 2017 at 08:00:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Stash the RAMBlock and offset for later use looking up
> > > > addresses.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  hw/virtio/trace-events |  1 +
> > > >  hw/virtio/vhost-user.c | 11 +++++++++++
> > > >  2 files changed, 12 insertions(+)
> > > > 
> > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > index f7be340a45..1fd194363a 100644
> > > > --- a/hw/virtio/trace-events
> > > > +++ b/hw/virtio/trace-events
> > > > @@ -3,6 +3,7 @@
> > > >  # hw/virtio/vhost-user.c
> > > >  vhost_user_postcopy_listen(void) ""
> > > >  vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:%"PRIx64" for hva: %"PRIx64" reply %d region %d"
> > > > +vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:%"PRIx64" GPA:%"PRIx64" QVA/userspace:%"PRIx64" RB offset:%"PRIx64
> > > >  
> > > >  # hw/virtio/virtio.c
> > > >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 6be3e7ff2d..3185af7a45 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -133,6 +133,11 @@ struct vhost_user {
> > > >      NotifierWithReturn postcopy_notifier;
> > > >      struct PostCopyFD  postcopy_fd;
> > > >      uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> > > > +    RAMBlock          *region_rb[VHOST_MEMORY_MAX_NREGIONS];
> > > > +    /* The offset from the start of the RAMBlock to the start of the
> > > > +     * vhost region.
> > > > +     */
> > > > +    ram_addr_t         region_rb_offset[VHOST_MEMORY_MAX_NREGIONS];
> > > 
> > > Here the array size is VHOST_MEMORY_MAX_NREGIONS, while...
> > > 
> > > >  };
> > > >  
> > > >  static bool ioeventfd_enabled(void)
> > > > @@ -324,8 +329,14 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
> > > >          assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > > >          mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > > >                                       &offset);
> > > > +        u->region_rb_offset[i] = offset;
> > > > +        u->region_rb[i] = mr->ram_block;
> > > 
> > > ... can i>=VHOST_MEMORY_MAX_NREGIONS here? Or do we only need to note
> > > this down if fd > 0 below?  Thanks,
> > 
> > I don't *think* so - I mean:
> >     for (i = 0; i < dev->mem->nregions; ++i) {
> > 
> > so if that's the maximum number of regions and that's the number of
> > regions we should be safe???
> 
> That's my concern - looks like dev->mem->nregions can be bigger than
> that? At least I didn't really see a restriction on its size. The size
> is changed in following stack:
> 
>   vhost_region_add
>     vhost_set_memory
>       vhost_dev_assign_memory
> 
> And it's dynamic extended, without checks.
> 
> Indeed in the function vhost_user_set_mem_table() we have:
> 
>     int fds[VHOST_MEMORY_MAX_NREGIONS];
> 
> But we are safe iiuc because we also have assertions to protect:
> 
>     assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
>     fds[fd_num++] = fd;
> 
> Do we at least need that assert?

I think you're right that it can be validly larger than MAX_NREGIONS;
I think the idea is that the dev->mem->regions can be larger,
with the limit being only on the number that actually have fd's.

I'll rework the structure.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base
  2017-07-04 21:59   ` Michael S. Tsirkin
  2017-07-05 17:16     ` Dr. David Alan Gilbert
@ 2017-08-18 19:19     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-18 19:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin,
	quintela, peterx, lvivier, aarcange

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When we receive a GET_VRING_BASE message set enable = false
> > to stop any new received packets modifying the ring.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> I think I already reviewed a similar patch.
> Spec says:
> Client must only process each ring when it is started.
> 
> IMHO the real fix is to fix client to check the started
> flag before processing the ring.

Done, I added a vu_queue_started to match vu_queue_enabled
and then used it.

Dave

> > ---
> >  contrib/libvhost-user/libvhost-user.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index ceddeac74f..d37052b7b0 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
> >      vmsg->size = sizeof(vmsg->payload.state);
> >  
> >      dev->vq[index].started = false;
> > +    dev->vq[index].enable = false;
> >      if (dev->iface->queue_set_started) {
> >          dev->iface->queue_set_started(dev, index, false);
> >      }
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message
  2017-07-27 11:35   ` Maxime Coquelin
@ 2017-08-24 14:53     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-24 14:53 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > This message is sent just before the end of postcopy to get the
> > client to stop using userfault since we wont respond to any more
> > requests.  It should close userfaultfd so that any other pages
> > get mapped to the backing file automatically by the kernel, since
> > at this point we know we've received everything.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >   contrib/libvhost-user/libvhost-user.c | 23 +++++++++++++++++++++++
> >   contrib/libvhost-user/libvhost-user.h |  1 +
> >   hw/virtio/vhost-user.c                |  1 +
> >   3 files changed, 25 insertions(+)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index d37052b7b0..c1716d1a62 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -68,6 +68,7 @@ vu_request_to_string(int req)
> >           REQ(VHOST_USER_INPUT_GET_CONFIG),
> >           REQ(VHOST_USER_POSTCOPY_ADVISE),
> >           REQ(VHOST_USER_POSTCOPY_LISTEN),
> > +        REQ(VHOST_USER_POSTCOPY_END),
> >           REQ(VHOST_USER_MAX),
> >       };
> >   #undef REQ
> > @@ -889,6 +890,26 @@ vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
> >       return false;
> >   }
> > +
> > +static bool
> > +vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
> > +{
> > +    fprintf(stderr, "%s: Entry\n", __func__);
> > +    dev->postcopy_listening = false;
> > +    if (dev->postcopy_ufd > 0) {
> > +        close(dev->postcopy_ufd);
> > +        dev->postcopy_ufd = -1;
> > +        fprintf(stderr, "%s: Done close\n", __func__);
> > +    }
> > +
> > +    vmsg->fd_num = 0;
> > +    vmsg->payload.u64 = 0;
> > +    vmsg->size = sizeof(vmsg->payload.u64);
> > +    vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
> > +    fprintf(stderr, "%s: exit\n", __func__);
> > +    return true;
> > +}
> > +
> 
> It is what reply-ack is done for, so to avoid code duplication,
> maybe Qemu could set VHOST_USER_NEED_REPLY_MASK bit when reply-ack
> feature is supported.
> 
> I'm wondering if we shouldn't consider adding reply-ack feature support
> to libvhost-user, and make postcopy support to depend on this feature.

Yes, that would make sense;  for the moment I'm adding what I think is
the same replies in the places I need ack's, and setting the VHOST_USER_NEED_REPLY_MASK
so for the messages where I need it the semantics should be the same as
that.

Dave

> Cheers,
> Maxime
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared
  2017-07-10 10:03   ` Peter Xu
@ 2017-08-24 16:59     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-24 16:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, a.perevalov, marcandre.lureau, maxime.coquelin, mst,
	quintela, lvivier, aarcange

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:20PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The choice of call to discard a block is getting more complicated
> > for other cases.   We use fallocate PUNCH_HOLE in any file cases;
> > it works for both hugepage and for tmpfs.
> > We use the DONTNEED for non-hugepage cases either where they're
> > anonymous or where they're private.
> > 
> > Care should be taken when trying other backing files.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  exec.c       | 28 ++++++++++++++++------------
> >  trace-events |  3 +++
> >  2 files changed, 19 insertions(+), 12 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 69fc5c9b07..4e61226a16 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -3557,6 +3557,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >      }
> >  
> >      if ((start + length) <= rb->used_length) {
> > +        bool need_madvise, need_fallocate;
> >          uint8_t *host_endaddr = host_startaddr + length;
> >          if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
> >              error_report("ram_block_discard_range: Unaligned end address: %p",
> > @@ -3566,23 +3567,26 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >  
> >          errno = ENOTSUP; /* If we are missing MADVISE etc */
> >  
> > -        if (rb->page_size == qemu_host_page_size) {
> > -#if defined(CONFIG_MADVISE)
> > -            /* Note: We need the madvise MADV_DONTNEED behaviour of definitely
> > -             * freeing the page.
> > -             */
> > -            ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > -#endif
> > -        } else {
> > -            /* Huge page case  - unfortunately it can't do DONTNEED, but
> > -             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
> > -             * huge page file.
> > -             */
> > +        /* The logic here is messy;
> > +         *    madvise DONTNEED fails for hugepages
> > +         *    fallocate works on hugepages and shmem
> > +         */
> > +        need_madvise = (rb->page_size == qemu_host_page_size) &&
> > +                       (rb->fd == -1 || !(rb->flags & RAM_SHARED));
> > +        need_fallocate = rb->fd != -1;
> > +        if (ret == -1 && need_fallocate) {
> 
> (ret will always be -1 when reach here?)

Yes, I was just making the code independent of order.

> >  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> >              ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> >                              start, length);
> >  #endif
> >          }
> > +        if (need_madvise && (!need_fallocate || (ret == 0))) {
> > +#if defined(CONFIG_MADVISE)
> > +            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
> > +#endif
> > +        }
> > +        trace_ram_block_discard_range(rb->idstr, host_startaddr,
> > +                                      need_madvise, need_fallocate, ret);
> 
> How about make the check easier by:
> 
>   if (rb->page_size != qemu_host_page_size ||
>       rb->flags & RAM_SHARED) {
>       /* Either huge pages or shared memory will contain rb->fd */
>       assert(rb->fd);
>       fallocate(rb->fd, ...);
>   } else {
>       madvise();
>   }

I've reworked this.
There are situations where you want both (I think!) - for
shared memory, that's not a hugepage, you do an fallocate
to clear the underlying storage, and then do an madvise
to force the local mappings to be cleared.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-07-07 11:53     ` Dr. David Alan Gilbert
  2017-07-07 12:52       ` Maxime Coquelin
@ 2017-10-03 13:23       ` Dr. David Alan Gilbert
  2017-10-06 12:22         ` Maxime Coquelin
  1 sibling, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-03 13:23 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> > 
> > 
> > On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> > > 
> > > **HACK - better solution needed **
> > > We have the situation where:
> > > 
> > >       qemu                      bridge
> > > 
> > >       send set_mem_table
> > >                                map memory
> > >    a)                          mark area with UFD
> > >                                send reply with map addresses
> > >    b)                          start using
> > >    c) receive reply
> > > 
> > >    As soon as (a) happens qemu might start seeing faults
> > > from memory accesses (but doesn't until b); but it can't
> > > process those faults until (c) when it's received the
> > > mmap addresses.
> > > 
> > > Make the fault handler spin until it gets the reply in (c).
> > > 
> > > At the very least this needs some proper locks, but preferably
> > > we need to split the message.
> > 
> > Yes, maybe the slave channel could be used to send the ufds with
> > a dedicated request? The backend would set the reply-ack flag, so that
> > it starts accessing the guest memory only when Qemu is ready to handle
> > faults.
> 
> Yes, that would make life a lot easier.
> 
> > Note that the slave channel support has not been implemented in Qemu's
> > libvhost-user yet, but this is something I can do if we feel the need.
> 
> Can you tell me a bit about how the slave channel works?

I've looked at the slave-channel; and I'm worried that it's not suitable
for this case.
The problem is that 'slave_read' is wired to a fd_handler that I think
is serviced by the main thread, and while postcopy is running I don't
want to rely on the operation of the main thread (since it could be
blocked by a page fault).
I could still use an explicit ack at that point though over the main
channel I think (or use the slave synchronously?).

Dave

> Dave
> 
> > Maxime
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-10-03 13:23       ` Dr. David Alan Gilbert
@ 2017-10-06 12:22         ` Maxime Coquelin
  2017-10-09 12:12           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 87+ messages in thread
From: Maxime Coquelin @ 2017-10-06 12:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 10/03/2017 03:23 PM, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>>
>>>
>>> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
>>>> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>>>>
>>>> **HACK - better solution needed **
>>>> We have the situation where:
>>>>
>>>>        qemu                      bridge
>>>>
>>>>        send set_mem_table
>>>>                                 map memory
>>>>     a)                          mark area with UFD
>>>>                                 send reply with map addresses
>>>>     b)                          start using
>>>>     c) receive reply
>>>>
>>>>     As soon as (a) happens qemu might start seeing faults
>>>> from memory accesses (but doesn't until b); but it can't
>>>> process those faults until (c) when it's received the
>>>> mmap addresses.
>>>>
>>>> Make the fault handler spin until it gets the reply in (c).
>>>>
>>>> At the very least this needs some proper locks, but preferably
>>>> we need to split the message.
>>>
>>> Yes, maybe the slave channel could be used to send the ufds with
>>> a dedicated request? The backend would set the reply-ack flag, so that
>>> it starts accessing the guest memory only when Qemu is ready to handle
>>> faults.
>>
>> Yes, that would make life a lot easier.
>>
>>> Note that the slave channel support has not been implemented in Qemu's
>>> libvhost-user yet, but this is something I can do if we feel the need.
>>
>> Can you tell me a bit about how the slave channel works?
> 
> I've looked at the slave-channel; and I'm worried that it's not suitable
> for this case.
> The problem is that 'slave_read' is wired to a fd_handler that I think
> is serviced by the main thread,
I confirm, this is serviced by the main thread.

> and while postcopy is running I don't
> want to rely on the operation of the main thread (since it could be
> blocked by a page fault).

IIUC, you mean QEMU being blocked by a page fault.
In this case, I don't think this is an issue, because QEMU doesn't rely
on the backend to handle the page fault, so the slave request can be
handled only once QEMU has handled the fault.

Maybe I am missing something?

> I could still use an explicit ack at that point though over the main
> channel I think (or use the slave synchronously?).

Can you please elaborate, I'm not sure to understand what you mean.

Thanks,
Maxime

> Dave
> 
>> Dave
>>
>>> Maxime
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-10-06 12:22         ` Maxime Coquelin
@ 2017-10-09 12:12           ` Dr. David Alan Gilbert
  2017-10-12  7:22             ` Maxime Coquelin
  0 siblings, 1 reply; 87+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-09 12:12 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange

* Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> 
> 
> On 10/03/2017 03:23 PM, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
> > > > 
> > > > 
> > > > On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> > > > > 
> > > > > **HACK - better solution needed **
> > > > > We have the situation where:
> > > > > 
> > > > >        qemu                      bridge
> > > > > 
> > > > >        send set_mem_table
> > > > >                                 map memory
> > > > >     a)                          mark area with UFD
> > > > >                                 send reply with map addresses
> > > > >     b)                          start using
> > > > >     c) receive reply
> > > > > 
> > > > >     As soon as (a) happens qemu might start seeing faults
> > > > > from memory accesses (but doesn't until b); but it can't
> > > > > process those faults until (c) when it's received the
> > > > > mmap addresses.
> > > > > 
> > > > > Make the fault handler spin until it gets the reply in (c).
> > > > > 
> > > > > At the very least this needs some proper locks, but preferably
> > > > > we need to split the message.
> > > > 
> > > > Yes, maybe the slave channel could be used to send the ufds with
> > > > a dedicated request? The backend would set the reply-ack flag, so that
> > > > it starts accessing the guest memory only when Qemu is ready to handle
> > > > faults.
> > > 
> > > Yes, that would make life a lot easier.
> > > 
> > > > Note that the slave channel support has not been implemented in Qemu's
> > > > libvhost-user yet, but this is something I can do if we feel the need.
> > > 
> > > Can you tell me a bit about how the slave channel works?
> > 
> > I've looked at the slave-channel; and I'm worried that it's not suitable
> > for this case.
> > The problem is that 'slave_read' is wired to a fd_handler that I think
> > is serviced by the main thread,
> I confirm, this is serviced by the main thread.
> 
> > and while postcopy is running I don't
> > want to rely on the operation of the main thread (since it could be
> > blocked by a page fault).
> 
> IIUC, you mean QEMU being blocked by a page fault.

Yes.

> In this case, I don't think this is an issue, because QEMU doesn't rely
> on the backend to handle the page fault, so the slave request can be
> handled only once QEMU has handled the fault.
> 
> Maybe I am missing something?

It feels delicate;  with the vhost client blocked waiting for the ack
from the qemu to the registration reply on the slave, and some other
part blocked by a page fault, it makes it sound likely to hit deadlocks
even if I can't put my finger on one.

> > I could still use an explicit ack at that point though over the main
> > channel I think (or use the slave synchronously?).
> 
> Can you please elaborate, I'm not sure to understand what you mean.

In the world I'm currently working on I've got it just using the main
channel but:
   settable -> client
   settable-results -> qemu
   ack -> client

all over the main channel with each side waiting for the other.

Dave

> 
> Thanks,
> Maxime
> 
> > Dave
> > 
> > > Dave
> > > 
> > > > Maxime
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table
  2017-10-09 12:12           ` Dr. David Alan Gilbert
@ 2017-10-12  7:22             ` Maxime Coquelin
  0 siblings, 0 replies; 87+ messages in thread
From: Maxime Coquelin @ 2017-10-12  7:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, a.perevalov, marcandre.lureau, mst, quintela, peterx,
	lvivier, aarcange



On 10/09/2017 02:12 PM, Dr. David Alan Gilbert wrote:
> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>
>>
>> On 10/03/2017 03:23 PM, Dr. David Alan Gilbert wrote:
>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>>>>
>>>>>
>>>>> On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:
>>>>>> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>>>>>>
>>>>>> **HACK - better solution needed **
>>>>>> We have the situation where:
>>>>>>
>>>>>>         qemu                      bridge
>>>>>>
>>>>>>         send set_mem_table
>>>>>>                                  map memory
>>>>>>      a)                          mark area with UFD
>>>>>>                                  send reply with map addresses
>>>>>>      b)                          start using
>>>>>>      c) receive reply
>>>>>>
>>>>>>      As soon as (a) happens qemu might start seeing faults
>>>>>> from memory accesses (but doesn't until b); but it can't
>>>>>> process those faults until (c) when it's received the
>>>>>> mmap addresses.
>>>>>>
>>>>>> Make the fault handler spin until it gets the reply in (c).
>>>>>>
>>>>>> At the very least this needs some proper locks, but preferably
>>>>>> we need to split the message.
>>>>>
>>>>> Yes, maybe the slave channel could be used to send the ufds with
>>>>> a dedicated request? The backend would set the reply-ack flag, so that
>>>>> it starts accessing the guest memory only when Qemu is ready to handle
>>>>> faults.
>>>>
>>>> Yes, that would make life a lot easier.
>>>>
>>>>> Note that the slave channel support has not been implemented in Qemu's
>>>>> libvhost-user yet, but this is something I can do if we feel the need.
>>>>
>>>> Can you tell me a bit about how the slave channel works?
>>>
>>> I've looked at the slave-channel; and I'm worried that it's not suitable
>>> for this case.
>>> The problem is that 'slave_read' is wired to a fd_handler that I think
>>> is serviced by the main thread,
>> I confirm, this is serviced by the main thread.
>>
>>> and while postcopy is running I don't
>>> want to rely on the operation of the main thread (since it could be
>>> blocked by a page fault).
>>
>> IIUC, you mean QEMU being blocked by a page fault.
> 
> Yes.
> 
>> In this case, I don't think this is an issue, because QEMU doesn't rely
>> on the backend to handle the page fault, so the slave request can be
>> handled only once QEMU has handled the fault.
>>
>> Maybe I am missing something?
> 
> It feels delicate;  with the vhost client blocked waiting for the ack
> from the qemu to the registration reply on the slave, and some other
> part blocked by a page fault, it makes it sound likely to hit deadlocks
> even if I can't put my finger on one.

Right, it is hard to be sure there is no risk of deadlock.

>>> I could still use an explicit ack at that point though over the main
>>> channel I think (or use the slave synchronously?).
>>
>> Can you please elaborate, I'm not sure to understand what you mean.
> 
> In the world I'm currently working on I've got it just using the main
> channel but:
>     settable -> client
>     settable-results -> qemu
>     ack -> client
> 
> all over the main channel with each side waiting for the other.

Ok, thanks for the clarification.
Maxime

> Dave
> 
>>
>> Thanks,
>> Maxime
>>
>>> Dave
>>>
>>>> Dave
>>>>
>>>>> Maxime
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2017-10-12  7:23 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
2017-07-10  9:28   ` Peter Xu
2017-07-12 16:48     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
2017-07-10 10:03   ` Peter Xu
2017-08-24 16:59     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
2017-07-03 17:44   ` Michael S. Tsirkin
2017-08-14 17:27     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
2017-07-10 10:19   ` Peter Xu
2017-07-12 16:54     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
2017-07-10 10:31   ` Peter Xu
2017-07-12 17:14     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
2017-07-24 12:10   ` Maxime Coquelin
2017-07-26 17:12     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
2017-07-24 14:36   ` Maxime Coquelin
2017-07-26 17:42     ` Dr. David Alan Gilbert
2017-07-26 18:03       ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
2017-07-24 15:22   ` Maxime Coquelin
2017-07-24 17:50     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
2017-07-24 17:31   ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
2017-07-11  3:31   ` Peter Xu
2017-07-14 17:15     ` Dr. David Alan Gilbert
2017-07-17  2:59       ` Peter Xu
2017-08-17 17:29         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
2017-07-11  4:22   ` Peter Xu
2017-07-12 15:00     ` Andrea Arcangeli
2017-07-14  2:45       ` Peter Xu
2017-07-14 14:18       ` Michael S. Tsirkin
2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
2017-08-07  4:49   ` Alexey Perevalov
2017-08-08 17:06     ` Dr. David Alan Gilbert
2017-08-09 11:02       ` Alexey Perevalov
2017-08-10  8:55         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
2017-07-04 19:34   ` Maxime Coquelin
2017-07-07 11:53     ` Dr. David Alan Gilbert
2017-07-07 12:52       ` Maxime Coquelin
2017-10-03 13:23       ` Dr. David Alan Gilbert
2017-10-06 12:22         ` Maxime Coquelin
2017-10-09 12:12           ` Dr. David Alan Gilbert
2017-10-12  7:22             ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
2017-07-04 19:38   ` Maxime Coquelin
2017-07-04 21:59   ` Michael S. Tsirkin
2017-07-05 17:16     ` Dr. David Alan Gilbert
2017-07-05 23:28       ` Michael S. Tsirkin
2017-08-18 19:19     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
2017-07-27 11:35   ` Maxime Coquelin
2017-08-24 14:53     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
2017-07-04 14:09   ` Maxime Coquelin
2017-07-07 11:39     ` Dr. David Alan Gilbert
2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
2017-07-03 11:03   ` Marc-André Lureau
2017-07-03 11:48     ` Dr. David Alan Gilbert
2017-07-07 10:51     ` Dr. David Alan Gilbert
     [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
2017-07-03 13:58   ` Alexey
2017-07-03 16:49     ` Dr. David Alan Gilbert
2017-07-03 17:42       ` Alexey
2017-07-03 17:55 ` Michael S. Tsirkin
2017-07-07 12:01   ` Dr. David Alan Gilbert
2017-07-07 15:35     ` Michael S. Tsirkin
2017-07-07 17:26       ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.