qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating
@ 2020-02-26 15:52 David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu,
	Paul Durrant, Alex Williamson, Michael S. Tsirkin, Shannon Zhao,
	Igor Mammedov, Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

This is the follow up of
    "[PATCH RFC] memory: Don't allow to resize RAM while migrating" [1]

This series contains some (slightly modified) patches also contained in:
    "[PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations
     under POSIX" [2]
That series will be based on this series. The last patch (#13) in this
series could be moved to the other series, but I decided to include it in
here for now (similar context).

I realized that resizing RAM blocks while the guest is being migrated
(precopy: resize while still running on the source, postcopy: resize
 while already running on the target) is buggy. In case of precopy, we
can simply cancel migration. Postcopy handling is more involved. Resizing
can currently happen during a guest reboot, triggered by ACPI rebuilds.

Along with the fixes, some cleanups.

[1] https://lkml.kernel.org/r/20200213172016.196609-1-david@redhat.com
[2] https://lkml.kernel.org/r/20200212134254.11073-1-david@redhat.com

I am using the prototype of virtio-mem to test (which also makes use of
resizable allocations). Things I was able to reproduce:
- Resize while still running on the migration source. Migration is canceled
-- Test case for "migraton/ram: Handle RAM block resizes during precopy"
- Resize (grow+shrink) on the migration target during postcopy migration
  in the precopy stage (when syncing RAM blocks).
-- Test case for "migration/ram: Discard RAM when growing RAM blocks after
   ram_postcopy_incoming_init()" and overall RAM size synchronization in
   the precopy stage.
- Resize (grow+shrink) on the migration tagret during postcopy migration
  while already running on the target.
-- Test case for "migration/ram: Handle RAM block resizes during postcopy"
-- Test case for "migration/ram: Tolerate partially changed mappings in
   postcopy code" - I can see that -ENOENT is actually triggered and that
   migration succeeds. Migration seems to work just fine.

In addition I run avocado-vt migration tests + usual QEMU checks.

v2 -> v3:
- Rebased on current master
- Added RBs
- "migration/ram: Tolerate partially changed mappings in postcopy code"
-- Extended the comment for the uffdio unregister part.

v1 -> v2:
- "util: vfio-helpers: Factor out and fix processing of existing ram
   blocks"
-- Stringify error
- "migraton/ram: Handle RAM block resizes during precopy"
-- Simplified check if we're migrating on the source
- "exec: Relax range check in ram_block_discard_range()"
-- Added to make discard during resizes actually work
- "migration/ram: Discard new RAM when growing RAM blocks after
   ram_postcopy_incoming_init()"
-- Better checks if in the right postcopy mode.
-- Better patch subject/description/comments
- "migration/ram: Handle RAM block resizes during postcopy"
-- Better comments
-- Adapt to changed postcopy checks
- "migrate/ram: Get rid of "place_source" in ram_load_postcopy()"
-- Dropped, as broken
- "migration/ram: Tolerate partially changed mappings in postcopy code"
-- Better comment / description. Clarify that no implicit wakeup will
   happen
-- Warn on EINVAL (older kernels)
-- Wake up any waiter explicitly

David Hildenbrand (13):
  util: vfio-helpers: Factor out and fix processing of existing ram
    blocks
  stubs/ram-block: Remove stubs that are no longer needed
  numa: Teach ram block notifiers about resizeable ram blocks
  numa: Make all callbacks of ram block notifiers optional
  migration/ram: Handle RAM block resizes during precopy
  exec: Relax range check in ram_block_discard_range()
  migration/ram: Discard RAM when growing RAM blocks after
    ram_postcopy_incoming_init()
  migration/ram: Simplify host page handling in ram_load_postcopy()
  migration/ram: Consolidate variable reset after placement in
    ram_load_postcopy()
  migration/ram: Handle RAM block resizes during postcopy
  migration/multifd: Print used_length of memory block
  migration/ram: Use offset_in_ramblock() in range checks
  migration/ram: Tolerate partially changed mappings in postcopy code

 exec.c                     |  27 +++++--
 hw/core/numa.c             |  41 +++++++++--
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   1 +
 include/exec/memory.h      |  10 +--
 include/exec/ramblock.h    |  10 +++
 include/exec/ramlist.h     |  13 ++--
 migration/migration.c      |   9 ++-
 migration/migration.h      |   1 +
 migration/multifd.c        |   2 +-
 migration/postcopy-ram.c   |  54 +++++++++++++-
 migration/ram.c            | 144 ++++++++++++++++++++++++++++---------
 stubs/ram-block.c          |  20 ------
 target/i386/hax-mem.c      |   5 +-
 target/i386/sev.c          |  18 ++---
 util/vfio-helpers.c        |  41 ++++-------
 16 files changed, 285 insertions(+), 118 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Paolo Bonzini, Richard Henderson

Factor it out into common code when a new notifier is registered, just
as done with the memory region notifier. This allows us to have the
logic about how to process existing ram blocks at a central place (which
will be extended soon).

Just like when adding a new ram block, we have to register the max_length
for now. We don't have a way to get notified about resizes yet, and some
memory would not be mapped when growing the ram block.

Note: Currently, ram blocks are only "fake resized". All memory
(max_length) is accessible.

We can get rid of a bunch of functions in stubs/ram-block.c . Print the
warning from inside qemu_vfio_ram_block_added().

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                    |  5 +++++
 hw/core/numa.c            | 14 ++++++++++++++
 include/exec/cpu-common.h |  1 +
 util/vfio-helpers.c       | 29 ++++++++---------------------
 4 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 231d6e5641..b2a65b793f 100644
--- a/exec.c
+++ b/exec.c
@@ -1965,6 +1965,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
     return rb->used_length;
 }
 
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb)
+{
+    return rb->max_length;
+}
+
 bool qemu_ram_is_shared(RAMBlock *rb)
 {
     return rb->flags & RAM_SHARED;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 316bc50d75..dc5e5b4046 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -854,9 +854,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
     }
 }
 
+static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
+{
+    const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    void *host = qemu_ram_get_host_addr(rb);
+    RAMBlockNotifier *notifier = opaque;
+
+    if (host) {
+        notifier->ram_block_added(notifier, host, max_size);
+    }
+    return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
+    /* Notify about all existing ram blocks. */
+    qemu_ram_foreach_block(ram_block_notify_add_single, n);
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index b47e5630e7..09decb8d93 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
 void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index ddd9a96e76..260570ae19 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -376,8 +376,14 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
                                       void *host, size_t size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+    int ret;
+
     trace_qemu_vfio_ram_block_added(s, host, size);
-    qemu_vfio_dma_map(s, host, size, false, NULL);
+    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    if (ret) {
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size,
+                     strerror(-ret));
+    }
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -390,33 +396,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
     }
 }
 
-static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
-{
-    void *host_addr = qemu_ram_get_host_addr(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
-    int ret;
-    QEMUVFIOState *s = opaque;
-
-    if (!host_addr) {
-        return 0;
-    }
-    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
-    if (ret) {
-        fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-                host_addr, (uint64_t)length);
-    }
-    return 0;
-}
-
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
     qemu_mutex_init(&s->lock);
     s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
     s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
-    ram_block_notifier_add(&s->ram_notifier);
     s->low_water_mark = QEMU_VFIO_IOVA_MIN;
     s->high_water_mark = QEMU_VFIO_IOVA_MAX;
-    qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
+    ram_block_notifier_add(&s->ram_notifier);
 }
 
 /**
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 02/13] stubs/ram-block: Remove stubs that are no longer needed
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 03/13] numa: Teach ram block notifiers about resizeable ram blocks David Hildenbrand
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Current code no longer needs these stubs to compile. Let's just remove
them.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 stubs/ram-block.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/stubs/ram-block.c b/stubs/ram-block.c
index 73c0a3ee08..10855b52dd 100644
--- a/stubs/ram-block.c
+++ b/stubs/ram-block.c
@@ -2,21 +2,6 @@
 #include "exec/ramlist.h"
 #include "exec/cpu-common.h"
 
-void *qemu_ram_get_host_addr(RAMBlock *rb)
-{
-    return 0;
-}
-
-ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
-{
-    return 0;
-}
-
-ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
-{
-    return 0;
-}
-
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 }
@@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 }
-
-int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
-{
-    return 0;
-}
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 03/13] numa: Teach ram block notifiers about resizeable ram blocks
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 04/13] numa: Make all callbacks of ram block notifiers optional David Hildenbrand
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Stabellini, Eduardo Habkost, Juan Quintela,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Paul Durrant, Igor Mammedov, Michael S. Tsirkin, xen-devel,
	Anthony Perard, Paolo Bonzini, Richard Henderson

Ram block notifiers are currently not aware of resizes. Especially to
handle resizes during migration, but also to implement actually resizeable
ram blocks (make everything between used_length and max_length
inaccessible), we want to teach ram block notifiers about resizeable
ram.

Introduce the basic infrastructure but keep using max_size in the
existing notifiers. Supply the max_size when adding and removing ram
blocks. Also, notify on resizes.

Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: xen-devel@lists.xenproject.org
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                     | 13 +++++++++++--
 hw/core/numa.c             | 22 +++++++++++++++++-----
 hw/i386/xen/xen-mapcache.c |  7 ++++---
 include/exec/ramlist.h     | 13 +++++++++----
 target/i386/hax-mem.c      |  5 +++--
 target/i386/sev.c          | 18 ++++++++++--------
 util/vfio-helpers.c        | 16 ++++++++--------
 7 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index b2a65b793f..3ee1498761 100644
--- a/exec.c
+++ b/exec.c
@@ -2078,6 +2078,8 @@ static int memory_try_enable_merging(void *addr, size_t len)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const ram_addr_t oldsize = block->used_length;
+
     assert(block);
 
     newsize = HOST_PAGE_ALIGN(newsize);
@@ -2102,6 +2104,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return -EINVAL;
     }
 
+    /* Notify before modifying the ram block and touching the bitmaps. */
+    if (block->host) {
+        ram_block_notify_resize(block->host, oldsize, newsize);
+    }
+
     cpu_physical_memory_clear_dirty_range(block->offset, block->used_length);
     block->used_length = newsize;
     cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
@@ -2268,7 +2275,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
             qemu_madvise(new_block->host, new_block->max_length,
                          QEMU_MADV_DONTFORK);
         }
-        ram_block_notify_add(new_block->host, new_block->max_length);
+        ram_block_notify_add(new_block->host, new_block->used_length,
+                             new_block->max_length);
     }
 }
 
@@ -2448,7 +2456,8 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     if (block->host) {
-        ram_block_notify_remove(block->host, block->max_length);
+        ram_block_notify_remove(block->host, block->used_length,
+                                block->max_length);
     }
 
     qemu_mutex_lock_ramlist();
diff --git a/hw/core/numa.c b/hw/core/numa.c
index dc5e5b4046..fe6ca5c50d 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -857,11 +857,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 {
     const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    const ram_addr_t size = qemu_ram_get_used_length(rb);
     void *host = qemu_ram_get_host_addr(rb);
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
-        notifier->ram_block_added(notifier, host, max_size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
 }
@@ -878,20 +879,31 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
     QLIST_REMOVE(n, next);
 }
 
-void ram_block_notify_add(void *host, size_t size)
+void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_added(notifier, host, size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
 }
 
-void ram_block_notify_remove(void *host, size_t size)
+void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_removed(notifier, host, size);
+        notifier->ram_block_removed(notifier, host, size, max_size);
+    }
+}
+
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (notifier->ram_block_resized) {
+            notifier->ram_block_resized(notifier, host, old_size, new_size);
+        }
     }
 }
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 5b120ed44b..d6dcea65d1 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -169,7 +169,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 
     if (entry->vaddr_base != NULL) {
         if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-            ram_block_notify_remove(entry->vaddr_base, entry->size);
+            ram_block_notify_remove(entry->vaddr_base, entry->size,
+                                    entry->size);
         }
         if (munmap(entry->vaddr_base, entry->size) != 0) {
             perror("unmap fails");
@@ -211,7 +212,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
     }
 
     if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-        ram_block_notify_add(vaddr_base, size);
+        ram_block_notify_add(vaddr_base, size, size);
     }
 
     entry->vaddr_base = vaddr_base;
@@ -452,7 +453,7 @@ static void xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)
     }
 
     pentry->next = entry->next;
-    ram_block_notify_remove(entry->vaddr_base, entry->size);
+    ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
     if (munmap(entry->vaddr_base, entry->size) != 0) {
         perror("unmap fails");
         exit(-1);
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index bc4faa1b00..293c0ddabe 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -65,15 +65,20 @@ void qemu_mutex_lock_ramlist(void);
 void qemu_mutex_unlock_ramlist(void);
 
 struct RAMBlockNotifier {
-    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size);
-    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size);
+    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size,
+                            size_t max_size);
+    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size,
+                              size_t max_size);
+    void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
+                              size_t new_size);
     QLIST_ENTRY(RAMBlockNotifier) next;
 };
 
 void ram_block_notifier_add(RAMBlockNotifier *n);
 void ram_block_notifier_remove(RAMBlockNotifier *n);
-void ram_block_notify_add(void *host, size_t size);
-void ram_block_notify_remove(void *host, size_t size);
+void ram_block_notify_add(void *host, size_t size, size_t max_size);
+void ram_block_notify_remove(void *host, size_t size, size_t max_size);
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
 
 void ram_block_dump(Monitor *mon);
 
diff --git a/target/i386/hax-mem.c b/target/i386/hax-mem.c
index 6bb5a24917..454d7fb212 100644
--- a/target/i386/hax-mem.c
+++ b/target/i386/hax-mem.c
@@ -293,7 +293,8 @@ static MemoryListener hax_memory_listener = {
     .priority = 10,
 };
 
-static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                                size_t max_size)
 {
     /*
      * We must register each RAM block with the HAXM kernel module, or
@@ -304,7 +305,7 @@ static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
      * host physical pages for the RAM block as part of this registration
      * process, hence the name hax_populate_ram().
      */
-    if (hax_populate_ram((uint64_t)(uintptr_t)host, size) < 0) {
+    if (hax_populate_ram((uint64_t)(uintptr_t)host, max_size) < 0) {
         fprintf(stderr, "HAX failed to populate RAM\n");
         abort();
     }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 024bb24e51..6b4cee24a2 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -129,7 +129,8 @@ sev_set_guest_state(SevState new_state)
 }
 
 static void
-sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                    size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -146,19 +147,20 @@ sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_register_region(host, size);
+    trace_kvm_memcrypt_register_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_REG_REGION, &range);
     if (r) {
         error_report("%s: failed to register region (%p+%#zx) error '%s'",
-                     __func__, host, size, strerror(errno));
+                     __func__, host, max_size, strerror(errno));
         exit(1);
     }
 }
 
 static void
-sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+                      size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -175,13 +177,13 @@ sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_unregister_region(host, size);
+    trace_kvm_memcrypt_unregister_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_UNREG_REGION, &range);
     if (r) {
         error_report("%s: failed to unregister region (%p+%#zx)",
-                     __func__, host, size);
+                     __func__, host, max_size);
     }
 }
 
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 260570ae19..9ec01bfe26 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,26 +372,26 @@ fail_container:
     return ret;
 }
 
-static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
-                                      void *host, size_t size)
+static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
+                                      size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, size);
-    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, max_size);
+    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
     if (ret) {
-        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size,
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, max_size,
                      strerror(-ret));
     }
 }
 
-static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
-                                        void *host, size_t size)
+static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
+                                        size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, size);
+        trace_qemu_vfio_ram_block_removed(s, host, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 04/13] numa: Make all callbacks of ram block notifiers optional
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-02-26 15:52 ` [PATCH v3 03/13] numa: Teach ram block notifiers about resizeable ram blocks David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Let's make add/remove optional. We want to introduce a RAM block
notifier for RAM migration, that's only interested in resizes.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/numa.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index fe6ca5c50d..37ce175e13 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -870,8 +870,11 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
+
     /* Notify about all existing ram blocks. */
-    qemu_ram_foreach_block(ram_block_notify_add_single, n);
+    if (n->ram_block_added) {
+        qemu_ram_foreach_block(ram_block_notify_add_single, n);
+    }
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
@@ -884,7 +887,9 @@ void ram_block_notify_add(void *host, size_t size, size_t max_size)
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_added(notifier, host, size, max_size);
+        if (notifier->ram_block_added) {
+            notifier->ram_block_added(notifier, host, size, max_size);
+        }
     }
 }
 
@@ -893,7 +898,9 @@ void ram_block_notify_remove(void *host, size_t size, size_t max_size)
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_removed(notifier, host, size, max_size);
+        if (notifier->ram_block_removed) {
+            notifier->ram_block_removed(notifier, host, size, max_size);
+        }
     }
 }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 05/13] migration/ram: Handle RAM block resizes during precopy
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-02-26 15:52 ` [PATCH v3 04/13] numa: Make all callbacks of ram block notifiers optional David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu,
	Michael S. Tsirkin, Shannon Zhao, Igor Mammedov, Paolo Bonzini,
	Alex Bennée, Richard Henderson

Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of precopy, the ram block size must not change on the source,
after syncing the RAM block list in ram_save_setup(), so as long as the
guest is still running on the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Use the ram block notifier to get notified about resizes. Let's simply
cancel migration and indicate the reason. We'll continue running on the
source. No harm done.

Update the documentation. Postcopy will be handled separately.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shannon Zhao <shannon.zhao@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                |  5 +++--
 include/exec/memory.h | 10 ++++++----
 migration/migration.c |  9 +++++++--
 migration/migration.h |  1 +
 migration/ram.c       | 31 +++++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index 3ee1498761..d30a5d297a 100644
--- a/exec.c
+++ b/exec.c
@@ -2069,8 +2069,9 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
-/* Only legal before guest might have detected the memory size: e.g. on
- * incoming migration, or right after reset.
+/*
+ * Resizing RAM while migrating can result in the migration being canceled.
+ * Care has to be taken if the guest might have already detected the memory.
  *
  * As memory core doesn't know how is memory accessed, it is up to
  * resize callback to update device state and/or add assertions to detect
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c..b9b9470a56 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -113,7 +113,7 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_SHARED     (1 << 1)
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
+ * Resizing RAM while migrating can result in the migration being canceled.
  */
 #define RAM_RESIZEABLE (1 << 2)
 
@@ -843,7 +843,9 @@ void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
  *                                     RAM.  Accesses into the region will
  *                                     modify memory directly.  Only an initial
  *                                     portion of this RAM is actually used.
- *                                     The used size can change across reboots.
+ *                                     Changing the size while migrating
+ *                                     can result in the migration being
+ *                                     canceled.
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
@@ -1464,8 +1466,8 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
 
 /* memory_region_ram_resize: Resize a RAM region.
  *
- * Only legal before guest might have detected the memory size: e.g. on
- * incoming migration, or right after reset.
+ * Resizing RAM while migrating can result in the migration being canceled.
+ * Care has to be taken if the guest might have already detected the memory.
  *
  * @mr: a memory region created with @memory_region_init_resizeable_ram.
  * @newsize: the new size the region
diff --git a/migration/migration.c b/migration/migration.c
index 8fb68795dc..ac9751dbe5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -175,13 +175,18 @@ void migration_object_init(void)
     }
 }
 
+void migration_cancel(void)
+{
+    migrate_fd_cancel(current_migration);
+}
+
 void migration_shutdown(void)
 {
     /*
      * Cancel the current migration - that will (eventually)
      * stop the migration using this structure
      */
-    migrate_fd_cancel(current_migration);
+    migration_cancel();
     object_unref(OBJECT(current_migration));
 }
 
@@ -2019,7 +2024,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migrate_fd_cancel(migrate_get_current());
+    migration_cancel();
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index 8473ddfc88..79fd74afa5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -343,5 +343,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
+void migration_cancel(void);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..39c7d1c4a6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -52,6 +52,7 @@
 #include "migration/colo.h"
 #include "block.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "savevm.h"
 #include "qemu/iov.h"
 #include "multifd.h"
@@ -3710,8 +3711,38 @@ static SaveVMHandlers savevm_ram_handlers = {
     .resume_prepare = ram_resume_prepare,
 };
 
+static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
+                                      size_t old_size, size_t new_size)
+{
+    ram_addr_t offset;
+    Error *err = NULL;
+    RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
+
+    if (ramblock_is_ignored(rb)) {
+        return;
+    }
+
+    if (!migration_is_idle()) {
+        /*
+         * Precopy code on the source cannot deal with the size of RAM blocks
+         * changing at random points in time - especially after sending the
+         * RAM block sizes to the migration stream, they must no longer change.
+         * Abort and indicate a proper reason.
+         */
+        error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
+        migrate_set_error(migrate_get_current(), err);
+        error_free(err);
+        migration_cancel();
+    }
+}
+
+static RAMBlockNotifier ram_mig_ram_notifier = {
+    .ram_block_resized = ram_mig_ram_block_resized,
+};
+
 void ram_mig_init(void)
 {
     qemu_mutex_init(&XBZRLE.lock);
     register_savevm_live("ram", 0, 4, &savevm_ram_handlers, &ram_state);
+    ram_block_notifier_add(&ram_mig_ram_notifier);
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 06/13] exec: Relax range check in ram_block_discard_range()
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-02-26 15:52 ` [PATCH v3 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

We want to make use of ram_block_discard_range() in the RAM block resize
callback when growing a RAM block, *before* used_length is changed.
Let's relax the check. We always have a reserved mapping for the whole
max_length, so we cannot corrupt unrelated data.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index d30a5d297a..9d351a7492 100644
--- a/exec.c
+++ b/exec.c
@@ -3876,7 +3876,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
         goto err;
     }
 
-    if ((start + length) <= rb->used_length) {
+    if ((start + length) <= rb->max_length) {
         bool need_madvise, need_fallocate;
         if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
             error_report("ram_block_discard_range: Unaligned length: %zx",
@@ -3943,7 +3943,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
     } else {
         error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
                      "/%zx/" RAM_ADDR_FMT")",
-                     rb->idstr, start, length, rb->used_length);
+                     rb->idstr, start, length, rb->max_length);
     }
 
 err:
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init()
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-02-26 15:52 ` [PATCH v3 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-02-26 15:52 ` [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when
synchronizing the RAM block state with the migration source), the resized
part would not get discarded. Let's perform that when being notified
about a resize while postcopy has been advised, but is not listening
yet. With precopy, the process is as following:

1. VM created
- RAM blocks are created
2. Incomming migration started
- Postcopy is advised
- All pages in RAM blocks are discarded
3. Precopy starts
- RAM blocks are resized to match the size on the migration source.
- RAM pages from precopy stream are loaded
- Uffd handler is registered, postcopy starts listening
4. Guest started, postcopy running
- Pagefaults get resolved, pages get placed

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 39c7d1c4a6..d5a4d69e1c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3714,6 +3714,7 @@ static SaveVMHandlers savevm_ram_handlers = {
 static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
                                       size_t old_size, size_t new_size)
 {
+    PostcopyState ps = postcopy_state_get();
     ram_addr_t offset;
     Error *err = NULL;
     RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
@@ -3734,6 +3735,35 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
         error_free(err);
         migration_cancel();
     }
+
+    switch (ps) {
+    case POSTCOPY_INCOMING_ADVISE:
+        /*
+         * Update what ram_postcopy_incoming_init()->init_range() does at the
+         * time postcopy was advised. Syncing RAM blocks with the source will
+         * result in RAM resizes.
+         */
+        if (old_size < new_size) {
+            if (ram_discard_range(rb->idstr, old_size, new_size - old_size)) {
+                error_report("RAM block '%s' discard of resized RAM failed",
+                             rb->idstr);
+            }
+        }
+        break;
+    case POSTCOPY_INCOMING_NONE:
+    case POSTCOPY_INCOMING_RUNNING:
+    case POSTCOPY_INCOMING_END:
+        /*
+         * Once our guest is running, postcopy does no longer care about
+         * resizes. When growing, the new memory was not available on the
+         * source, no handler needed.
+         */
+        break;
+    default:
+        error_report("RAM block '%s' resized during postcopy state: %d",
+                     rb->idstr, ps);
+        exit(-1);
+    }
 }
 
 static RAMBlockNotifier ram_mig_ram_notifier = {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-02-26 15:52 ` [PATCH v3 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
@ 2020-02-26 15:52 ` David Hildenbrand
  2020-03-06 16:05   ` Dr. David Alan Gilbert
  2020-02-26 15:53 ` [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Add two new helper functions. This will in come handy once we want to
handle ram block resizes while postcopy is active.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 54 ++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d5a4d69e1c..f815f4e532 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
     return block->host + offset;
 }
 
+static void *host_page_from_ram_block_offset(RAMBlock *block,
+                                             ram_addr_t offset)
+{
+    /* Note: Explicitly no check against offset_in_ramblock(). */
+    return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
+                                   block->page_size);
+}
+
+static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
+                                                         ram_addr_t offset)
+{
+    return ((uintptr_t)block->host + offset) & (block->page_size - 1);
+}
+
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
                                                  ram_addr_t offset)
 {
@@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
     MigrationIncomingState *mis = migration_incoming_get_current();
     /* Temporary page that is later 'placed' */
     void *postcopy_host_page = mis->postcopy_tmp_page;
-    void *this_host = NULL;
+    void *host_page = NULL;
     bool all_zero = false;
     int target_pages = 0;
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
-        void *host = NULL;
         void *page_buffer = NULL;
         void *place_source = NULL;
         RAMBlock *block = NULL;
@@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
             block = ram_block_from_stream(f, flags);
+            if (!block) {
+                ret = -EINVAL;
+                break;
+            }
 
-            host = host_from_ram_block_offset(block, addr);
-            if (!host) {
+            if (!offset_in_ramblock(block, addr)) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
@@ -3163,21 +3179,18 @@ static int ram_load_postcopy(QEMUFile *f)
              * of a host page in one chunk.
              */
             page_buffer = postcopy_host_page +
-                          ((uintptr_t)host & (block->page_size - 1));
+                          host_page_offset_from_ram_block_offset(block, addr);
             /* If all TP are zero then we can optimise the place */
             if (target_pages == 1) {
                 all_zero = true;
-                this_host = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
-                                                    block->page_size);
-            } else {
+                host_page = host_page_from_ram_block_offset(block, addr);
+            } else if (host_page != host_page_from_ram_block_offset(block,
+                                                                    addr)) {
                 /* not the 1st TP within the HP */
-                if (QEMU_ALIGN_DOWN((uintptr_t)host, block->page_size) !=
-                    (uintptr_t)this_host) {
-                    error_report("Non-same host page %p/%p",
-                                  host, this_host);
-                    ret = -EINVAL;
-                    break;
-                }
+                error_report("Non-same host page %p/%p", host_page,
+                             host_page_from_ram_block_offset(block, addr));
+                ret = -EINVAL;
+                break;
             }
 
             /*
@@ -3257,16 +3270,11 @@ static int ram_load_postcopy(QEMUFile *f)
         }
 
         if (!ret && place_needed) {
-            /* This gets called at the last target page in the host page */
-            void *place_dest = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
-                                                       block->page_size);
-
             if (all_zero) {
-                ret = postcopy_place_page_zero(mis, place_dest,
-                                               block);
+                ret = postcopy_place_page_zero(mis, host_page, block);
             } else {
-                ret = postcopy_place_page(mis, place_dest,
-                                          place_source, block);
+                ret = postcopy_place_page(mis, host_page, place_source,
+                                          block);
             }
         }
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-02-26 15:52 ` [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
@ 2020-02-26 15:53 ` David Hildenbrand
  2020-03-06 16:30   ` Dr. David Alan Gilbert
  2020-02-26 15:53 ` [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Let's consolidate resetting the variables.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f815f4e532..1a5ff07997 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3126,7 +3126,7 @@ static int ram_load_postcopy(QEMUFile *f)
     /* Temporary page that is later 'placed' */
     void *postcopy_host_page = mis->postcopy_tmp_page;
     void *host_page = NULL;
-    bool all_zero = false;
+    bool all_zero = true;
     int target_pages = 0;
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
@@ -3152,7 +3152,6 @@ static int ram_load_postcopy(QEMUFile *f)
         addr &= TARGET_PAGE_MASK;
 
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
-        place_needed = false;
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
             block = ram_block_from_stream(f, flags);
@@ -3180,9 +3179,7 @@ static int ram_load_postcopy(QEMUFile *f)
              */
             page_buffer = postcopy_host_page +
                           host_page_offset_from_ram_block_offset(block, addr);
-            /* If all TP are zero then we can optimise the place */
             if (target_pages == 1) {
-                all_zero = true;
                 host_page = host_page_from_ram_block_offset(block, addr);
             } else if (host_page != host_page_from_ram_block_offset(block,
                                                                     addr)) {
@@ -3199,7 +3196,6 @@ static int ram_load_postcopy(QEMUFile *f)
              */
             if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
                 place_needed = true;
-                target_pages = 0;
             }
             place_source = postcopy_host_page;
         }
@@ -3276,6 +3272,10 @@ static int ram_load_postcopy(QEMUFile *f)
                 ret = postcopy_place_page(mis, host_page, place_source,
                                           block);
             }
+            place_needed = false;
+            target_pages = 0;
+            /* Assume we have a zero page until we detect something different */
+            all_zero = true;
         }
     }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-02-26 15:53 ` [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
@ 2020-02-26 15:53 ` David Hildenbrand
  2020-03-06 16:56   ` Dr. David Alan Gilbert
  2020-02-26 15:53 ` [PATCH v3 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu,
	Michael S. Tsirkin, Shannon Zhao, Igor Mammedov, Paolo Bonzini,
	Alex Bennée, Richard Henderson

Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of postcopy, relying on used_length is racy as soon as the
guest is running. Also, when used_length changes we might leave the
uffd handler registered for some memory regions, reject valid pages
when migrating and fail when sending the recv bitmap to the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Let's remember the original used_length in a separate variable and
use it in relevant postcopy code. Make sure to update it when we resize
during precopy, when synchronizing the RAM block sizes with the source.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shannon Zhao <shannon.zhao@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/ramblock.h  | 10 ++++++++++
 migration/postcopy-ram.c | 15 ++++++++++++---
 migration/ram.c          | 11 +++++++++--
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 07d50864d8..664701b759 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -59,6 +59,16 @@ struct RAMBlock {
      */
     unsigned long *clear_bmap;
     uint8_t clear_bmap_shift;
+
+    /*
+     * RAM block length that corresponds to the used_length on the migration
+     * source (after RAM block sizes were synchronized). Especially, after
+     * starting to run the guest, used_length and postcopy_length can differ.
+     * Used to register/unregister uffd handlers and as the size of the received
+     * bitmap. Receiving any page beyond this length will bail out, as it
+     * could not have been valid on the source.
+     */
+    ram_addr_t postcopy_length;
 };
 #endif
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a36402722b..c68caf4e42 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -17,6 +17,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "migration.h"
 #include "qemu-file.h"
@@ -31,6 +32,7 @@
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "hw/boards.h"
+#include "exec/ramblock.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque)
     ram_addr_t length = qemu_ram_get_used_length(rb);
     trace_postcopy_init_range(block_name, host_addr, offset, length);
 
+    /*
+     * Save the used_length before running the guest. In case we have to
+     * resize RAM blocks when syncing RAM block sizes from the source during
+     * precopy, we'll update it manually via the ram block notifier.
+     */
+    rb->postcopy_length = length;
+
     /*
      * We need the whole of RAM to be truly empty for postcopy, so things
      * like ROMs and any data tables built during init must be zero'd
@@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
     const char *block_name = qemu_ram_get_idstr(rb);
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t offset = qemu_ram_get_offset(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
+    ram_addr_t length = rb->postcopy_length;
     MigrationIncomingState *mis = opaque;
     struct uffdio_range range_struct;
     trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
@@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque)
     const char *block_name = qemu_ram_get_idstr(rb);
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t offset = qemu_ram_get_offset(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
+    ram_addr_t length = rb->postcopy_length;
     trace_postcopy_nhp_range(block_name, host_addr, offset, length);
 
     /*
@@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque)
     struct uffdio_register reg_struct;
 
     reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb);
-    reg_struct.range.len = qemu_ram_get_used_length(rb);
+    reg_struct.range.len = rb->postcopy_length;
     reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
     /* Now tell our userfault_fd that it's responsible for this area */
diff --git a/migration/ram.c b/migration/ram.c
index 1a5ff07997..ee5c3d5784 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
         return -1;
     }
 
-    nbits = block->used_length >> TARGET_PAGE_BITS;
+    nbits = block->postcopy_length >> TARGET_PAGE_BITS;
 
     /*
      * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
@@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
                 break;
             }
 
-            if (!offset_in_ramblock(block, addr)) {
+            /*
+             * Relying on used_length is racy and can result in false positives.
+             * We might place pages beyond used_length in case RAM was shrunk
+             * while in postcopy, which is fine - trying to place via
+             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
+             */
+            if (!block->host || addr >= block->postcopy_length) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
@@ -3757,6 +3763,7 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
                              rb->idstr);
             }
         }
+        rb->postcopy_length = new_size;
         break;
     case POSTCOPY_INCOMING_NONE:
     case POSTCOPY_INCOMING_RUNNING:
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 11/13] migration/multifd: Print used_length of memory block
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-02-26 15:53 ` [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
@ 2020-02-26 15:53 ` David Hildenbrand
  2020-03-06 16:57   ` Dr. David Alan Gilbert
  2020-02-26 15:53 ` [PATCH v3 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
  2020-02-26 15:53 ` [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
  12 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

We actually want to print the used_length, against which we check.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/multifd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index b3e8ae9bcc..dd9e88c5f1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -222,7 +222,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         if (offset > (block->used_length - qemu_target_page_size())) {
             error_setg(errp, "multifd: offset too long %" PRIu64
                        " (max " RAM_ADDR_FMT ")",
-                       offset, block->max_length);
+                       offset, block->used_length);
             return -1;
         }
         p->pages->iov[i].iov_base = block->host + offset;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 12/13] migration/ram: Use offset_in_ramblock() in range checks
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-02-26 15:53 ` [PATCH v3 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
@ 2020-02-26 15:53 ` David Hildenbrand
  2020-03-06 16:59   ` Dr. David Alan Gilbert
  2020-02-26 15:53 ` [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
  12 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

We never read or write beyond the used_length of memory blocks when
migrating. Make this clearer by using offset_in_ramblock() consistently.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ee5c3d5784..5cc9993899 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1309,8 +1309,8 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
         *again = false;
         return false;
     }
-    if ((((ram_addr_t)pss->page) << TARGET_PAGE_BITS)
-        >= pss->block->used_length) {
+    if (!offset_in_ramblock(pss->block,
+                            ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
         /* Didn't find anything in this RAM Block */
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
@@ -1514,7 +1514,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
         rs->last_req_rb = ramblock;
     }
     trace_ram_save_queue_pages(ramblock->idstr, start, len);
-    if (start+len > ramblock->used_length) {
+    if (!offset_in_ramblock(ramblock, start + len - 1)) {
         error_report("%s request overrun start=" RAM_ADDR_FMT " len="
                      RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
                      __func__, start, len, ramblock->used_length);
@@ -3325,8 +3325,8 @@ static void colo_flush_ram_cache(void)
         while (block) {
             offset = migration_bitmap_find_dirty(ram_state, block, offset);
 
-            if (((ram_addr_t)offset) << TARGET_PAGE_BITS
-                >= block->used_length) {
+            if (!offset_in_ramblock(block,
+                                    ((ram_addr_t)offset) << TARGET_PAGE_BITS)) {
                 offset = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-02-26 15:53 ` [PATCH v3 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
@ 2020-02-26 15:53 ` David Hildenbrand
  2020-02-26 16:06   ` Peter Xu
  12 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Paolo Bonzini, Richard Henderson

When we partially change mappings (esp., mmap over parts of an existing
mmap like qemu_ram_remap() does) where we have a userfaultfd handler
registered, the handler will implicitly be unregistered from the parts that
changed.

Trying to place pages onto mappings where there is no longer a handler
registered will fail. Let's make sure that any waiter is woken up - we
have to do that manually.

Let's also document how UFFDIO_UNREGISTER will handle this scenario.

This is mainly a preparation for RAM blocks with resizable allcoations,
where the mapping of the invalid RAM range will change. The source will
keep sending pages that are outside of the new (shrunk) RAM size. We have
to treat these pages like they would have been migrated, but can
essentially simply drop the content (ignore the placement error).

Keep printing a warning when we hit EINVAL, to avoid hiding other
(programming) issues. ENOENT is unique.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/postcopy-ram.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c68caf4e42..f39c6304de 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -506,6 +506,14 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
     range_struct.start = (uintptr_t)host_addr;
     range_struct.len = length;
 
+    /*
+     * In case the mapping was partially changed since we enabled userfault
+     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
+     * for the mappings that changed. Unregistering will, however, still work
+     * and ignore mappings without a registered handler. There could only be
+     * an issue if we would suddenly encounter a mapping that's incompatible
+     * with UFFD - which cannot happen within a single RAM block.
+     */
     if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
         error_report("%s: userfault unregister %s", __func__, strerror(errno));
 
@@ -1180,6 +1188,17 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
     return 0;
 }
 
+static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
+                               uint64_t pagesize)
+{
+    struct uffdio_range range = {
+        .start = (uint64_t)(uintptr_t)host_addr,
+        .len = pagesize,
+    };
+
+    return ioctl(userfault_fd, UFFDIO_WAKE, &range);
+}
+
 static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
                                void *from_addr, uint64_t pagesize, RAMBlock *rb)
 {
@@ -1198,6 +1217,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
         zero_struct.mode = 0;
         ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
     }
+
+    /*
+     * When the mapping gets partially changed (e.g., qemu_ram_remap()) before
+     * we try to place a page, the userfaultfd handler will be removed for the
+     * changed mappings and placing pages will fail. We can safely ignore this,
+     * because mappings that changed on the destination don't need data from the
+     * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that page
+     * (unlikely but possible). Waking up waiters is always possible, even
+     * without a registered userfaultfd handler.
+     *
+     * Old kernels report EINVAL, new kernels report ENOENT in case there is
+     * no longer a userfaultfd handler for a mapping.
+     */
+    if (ret && (errno == ENOENT || errno == EINVAL)) {
+        if (errno == EINVAL) {
+            warn_report("%s: Failed to place page %p. Waking up any waiters.",
+                         __func__, host_addr);
+        }
+        ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize);
+    }
     if (!ret) {
         ramblock_recv_bitmap_set_range(rb, host_addr,
                                        pagesize / qemu_target_page_size());
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-26 15:53 ` [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
@ 2020-02-26 16:06   ` Peter Xu
  2020-02-26 16:08     ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2020-02-26 16:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Wed, Feb 26, 2020 at 04:53:04PM +0100, David Hildenbrand wrote:
> When we partially change mappings (esp., mmap over parts of an existing
> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
> registered, the handler will implicitly be unregistered from the parts that
> changed.
> 
> Trying to place pages onto mappings where there is no longer a handler
> registered will fail. Let's make sure that any waiter is woken up - we
> have to do that manually.
> 
> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
> 
> This is mainly a preparation for RAM blocks with resizable allcoations,
> where the mapping of the invalid RAM range will change. The source will
> keep sending pages that are outside of the new (shrunk) RAM size. We have
> to treat these pages like they would have been migrated, but can
> essentially simply drop the content (ignore the placement error).
> 
> Keep printing a warning when we hit EINVAL, to avoid hiding other
> (programming) issues. ENOENT is unique.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-26 16:06   ` Peter Xu
@ 2020-02-26 16:08     ` David Hildenbrand
  2020-02-26 16:26       ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 16:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On 26.02.20 17:06, Peter Xu wrote:
> On Wed, Feb 26, 2020 at 04:53:04PM +0100, David Hildenbrand wrote:
>> When we partially change mappings (esp., mmap over parts of an existing
>> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
>> registered, the handler will implicitly be unregistered from the parts that
>> changed.
>>
>> Trying to place pages onto mappings where there is no longer a handler
>> registered will fail. Let's make sure that any waiter is woken up - we
>> have to do that manually.
>>
>> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
>>
>> This is mainly a preparation for RAM blocks with resizable allcoations,
>> where the mapping of the invalid RAM range will change. The source will
>> keep sending pages that are outside of the new (shrunk) RAM size. We have
>> to treat these pages like they would have been migrated, but can
>> essentially simply drop the content (ignore the placement error).
>>
>> Keep printing a warning when we hit EINVAL, to avoid hiding other
>> (programming) issues. ENOENT is unique.
>>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 

Thanks a lot!

BTW, while I am playing with userfaultfd, I already have patches to
factor out all uffd handling from postcopy code into utils/uffd.c

My list of patches does not seem to get any smaller :(

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-26 16:08     ` David Hildenbrand
@ 2020-02-26 16:26       ` Peter Xu
  2020-02-26 16:34         ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2020-02-26 16:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Wed, Feb 26, 2020 at 05:08:08PM +0100, David Hildenbrand wrote:
> On 26.02.20 17:06, Peter Xu wrote:
> > On Wed, Feb 26, 2020 at 04:53:04PM +0100, David Hildenbrand wrote:
> >> When we partially change mappings (esp., mmap over parts of an existing
> >> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
> >> registered, the handler will implicitly be unregistered from the parts that
> >> changed.
> >>
> >> Trying to place pages onto mappings where there is no longer a handler
> >> registered will fail. Let's make sure that any waiter is woken up - we
> >> have to do that manually.
> >>
> >> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
> >>
> >> This is mainly a preparation for RAM blocks with resizable allcoations,
> >> where the mapping of the invalid RAM range will change. The source will
> >> keep sending pages that are outside of the new (shrunk) RAM size. We have
> >> to treat these pages like they would have been migrated, but can
> >> essentially simply drop the content (ignore the placement error).
> >>
> >> Keep printing a warning when we hit EINVAL, to avoid hiding other
> >> (programming) issues. ENOENT is unique.
> >>
> >> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Andrea Arcangeli <aarcange@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> 
> Thanks a lot!
> 
> BTW, while I am playing with userfaultfd, I already have patches to
> factor out all uffd handling from postcopy code into utils/uffd.c
> 
> My list of patches does not seem to get any smaller :(

Simply because you're working on more things? :)

Thanks for working on this (and this is far better than the exit()
version, IMHO)!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-26 16:26       ` Peter Xu
@ 2020-02-26 16:34         ` David Hildenbrand
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2020-02-26 16:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On 26.02.20 17:26, Peter Xu wrote:
> On Wed, Feb 26, 2020 at 05:08:08PM +0100, David Hildenbrand wrote:
>> On 26.02.20 17:06, Peter Xu wrote:
>>> On Wed, Feb 26, 2020 at 04:53:04PM +0100, David Hildenbrand wrote:
>>>> When we partially change mappings (esp., mmap over parts of an existing
>>>> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
>>>> registered, the handler will implicitly be unregistered from the parts that
>>>> changed.
>>>>
>>>> Trying to place pages onto mappings where there is no longer a handler
>>>> registered will fail. Let's make sure that any waiter is woken up - we
>>>> have to do that manually.
>>>>
>>>> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
>>>>
>>>> This is mainly a preparation for RAM blocks with resizable allcoations,
>>>> where the mapping of the invalid RAM range will change. The source will
>>>> keep sending pages that are outside of the new (shrunk) RAM size. We have
>>>> to treat these pages like they would have been migrated, but can
>>>> essentially simply drop the content (ignore the placement error).
>>>>
>>>> Keep printing a warning when we hit EINVAL, to avoid hiding other
>>>> (programming) issues. ENOENT is unique.
>>>>
>>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>> Cc: Juan Quintela <quintela@redhat.com>
>>>> Cc: Peter Xu <peterx@redhat.com>
>>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>
>>
>> Thanks a lot!
>>
>> BTW, while I am playing with userfaultfd, I already have patches to
>> factor out all uffd handling from postcopy code into utils/uffd.c
>>
>> My list of patches does not seem to get any smaller :(
> 
> Simply because you're working on more things? :)

virtio-mem has been a steady source of huge refactorings (both in QEMU
and the kernel). At least on the kernel side, an end might be in sight :)

> 
> Thanks for working on this (and this is far better than the exit()
> version, IMHO)!

Thanks for insisting to fix it instead of working around it!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
  2020-02-26 15:52 ` [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
@ 2020-03-06 16:05   ` Dr. David Alan Gilbert
  2020-03-06 16:20     ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 16:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> Add two new helper functions. This will in come handy once we want to
> handle ram block resizes while postcopy is active.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/ram.c | 54 ++++++++++++++++++++++++++++---------------------
>  1 file changed, 31 insertions(+), 23 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index d5a4d69e1c..f815f4e532 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
>      return block->host + offset;
>  }
>  
> +static void *host_page_from_ram_block_offset(RAMBlock *block,
> +                                             ram_addr_t offset)
> +{
> +    /* Note: Explicitly no check against offset_in_ramblock(). */
> +    return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
> +                                   block->page_size);
> +}
> +
> +static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
> +                                                         ram_addr_t offset)
> +{
> +    return ((uintptr_t)block->host + offset) & (block->page_size - 1);
> +}
> +
>  static inline void *colo_cache_from_block_offset(RAMBlock *block,
>                                                   ram_addr_t offset)
>  {
> @@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      /* Temporary page that is later 'placed' */
>      void *postcopy_host_page = mis->postcopy_tmp_page;
> -    void *this_host = NULL;
> +    void *host_page = NULL;
>      bool all_zero = false;
>      int target_pages = 0;
>  
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr;
> -        void *host = NULL;
>          void *page_buffer = NULL;
>          void *place_source = NULL;
>          RAMBlock *block = NULL;
> @@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE)) {
>              block = ram_block_from_stream(f, flags);
> +            if (!block) {
> +                ret = -EINVAL;

Could we have an error_report there, at the moment it would trigger
the one below.

Other than that,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> +                break;
> +            }
>  
> -            host = host_from_ram_block_offset(block, addr);
> -            if (!host) {
> +            if (!offset_in_ramblock(block, addr)) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
>                  break;
> @@ -3163,21 +3179,18 @@ static int ram_load_postcopy(QEMUFile *f)
>               * of a host page in one chunk.
>               */
>              page_buffer = postcopy_host_page +
> -                          ((uintptr_t)host & (block->page_size - 1));
> +                          host_page_offset_from_ram_block_offset(block, addr);
>              /* If all TP are zero then we can optimise the place */
>              if (target_pages == 1) {
>                  all_zero = true;
> -                this_host = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
> -                                                    block->page_size);
> -            } else {
> +                host_page = host_page_from_ram_block_offset(block, addr);
> +            } else if (host_page != host_page_from_ram_block_offset(block,
> +                                                                    addr)) {
>                  /* not the 1st TP within the HP */
> -                if (QEMU_ALIGN_DOWN((uintptr_t)host, block->page_size) !=
> -                    (uintptr_t)this_host) {
> -                    error_report("Non-same host page %p/%p",
> -                                  host, this_host);
> -                    ret = -EINVAL;
> -                    break;
> -                }
> +                error_report("Non-same host page %p/%p", host_page,
> +                             host_page_from_ram_block_offset(block, addr));
> +                ret = -EINVAL;
> +                break;
>              }
>  
>              /*
> @@ -3257,16 +3270,11 @@ static int ram_load_postcopy(QEMUFile *f)
>          }
>  
>          if (!ret && place_needed) {
> -            /* This gets called at the last target page in the host page */
> -            void *place_dest = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
> -                                                       block->page_size);
> -
>              if (all_zero) {
> -                ret = postcopy_place_page_zero(mis, place_dest,
> -                                               block);
> +                ret = postcopy_place_page_zero(mis, host_page, block);
>              } else {
> -                ret = postcopy_place_page(mis, place_dest,
> -                                          place_source, block);
> +                ret = postcopy_place_page(mis, host_page, place_source,
> +                                          block);
>              }
>          }
>      }
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
  2020-03-06 16:05   ` Dr. David Alan Gilbert
@ 2020-03-06 16:20     ` David Hildenbrand
  2020-03-06 18:47       ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-03-06 16:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

On 06.03.20 17:05, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Add two new helper functions. This will in come handy once we want to
>> handle ram block resizes while postcopy is active.
>>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  migration/ram.c | 54 ++++++++++++++++++++++++++++---------------------
>>  1 file changed, 31 insertions(+), 23 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index d5a4d69e1c..f815f4e532 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
>>      return block->host + offset;
>>  }
>>  
>> +static void *host_page_from_ram_block_offset(RAMBlock *block,
>> +                                             ram_addr_t offset)
>> +{
>> +    /* Note: Explicitly no check against offset_in_ramblock(). */
>> +    return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
>> +                                   block->page_size);
>> +}
>> +
>> +static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
>> +                                                         ram_addr_t offset)
>> +{
>> +    return ((uintptr_t)block->host + offset) & (block->page_size - 1);
>> +}
>> +
>>  static inline void *colo_cache_from_block_offset(RAMBlock *block,
>>                                                   ram_addr_t offset)
>>  {
>> @@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
>>      MigrationIncomingState *mis = migration_incoming_get_current();
>>      /* Temporary page that is later 'placed' */
>>      void *postcopy_host_page = mis->postcopy_tmp_page;
>> -    void *this_host = NULL;
>> +    void *host_page = NULL;
>>      bool all_zero = false;
>>      int target_pages = 0;
>>  
>>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>>          ram_addr_t addr;
>> -        void *host = NULL;
>>          void *page_buffer = NULL;
>>          void *place_source = NULL;
>>          RAMBlock *block = NULL;
>> @@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
>>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>>                       RAM_SAVE_FLAG_COMPRESS_PAGE)) {
>>              block = ram_block_from_stream(f, flags);
>> +            if (!block) {
>> +                ret = -EINVAL;
> 
> Could we have an error_report there, at the moment it would trigger
> the one below.

Makes sense, I'll add one!

> 
> Other than that,
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
  2020-02-26 15:53 ` [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
@ 2020-03-06 16:30   ` Dr. David Alan Gilbert
  2020-03-06 19:09     ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 16:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> Let's consolidate resetting the variables.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Thanks, I think that's actually fixing a case where huge zero pages
weren't placed as zero pages?

Dave

> ---
>  migration/ram.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index f815f4e532..1a5ff07997 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3126,7 +3126,7 @@ static int ram_load_postcopy(QEMUFile *f)
>      /* Temporary page that is later 'placed' */
>      void *postcopy_host_page = mis->postcopy_tmp_page;
>      void *host_page = NULL;
> -    bool all_zero = false;
> +    bool all_zero = true;
>      int target_pages = 0;
>  
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> @@ -3152,7 +3152,6 @@ static int ram_load_postcopy(QEMUFile *f)
>          addr &= TARGET_PAGE_MASK;
>  
>          trace_ram_load_postcopy_loop((uint64_t)addr, flags);
> -        place_needed = false;
>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE)) {
>              block = ram_block_from_stream(f, flags);
> @@ -3180,9 +3179,7 @@ static int ram_load_postcopy(QEMUFile *f)
>               */
>              page_buffer = postcopy_host_page +
>                            host_page_offset_from_ram_block_offset(block, addr);
> -            /* If all TP are zero then we can optimise the place */
>              if (target_pages == 1) {
> -                all_zero = true;
>                  host_page = host_page_from_ram_block_offset(block, addr);
>              } else if (host_page != host_page_from_ram_block_offset(block,
>                                                                      addr)) {
> @@ -3199,7 +3196,6 @@ static int ram_load_postcopy(QEMUFile *f)
>               */
>              if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
>                  place_needed = true;
> -                target_pages = 0;
>              }
>              place_source = postcopy_host_page;
>          }
> @@ -3276,6 +3272,10 @@ static int ram_load_postcopy(QEMUFile *f)
>                  ret = postcopy_place_page(mis, host_page, place_source,
>                                            block);
>              }
> +            place_needed = false;
> +            target_pages = 0;
> +            /* Assume we have a zero page until we detect something different */
> +            all_zero = true;
>          }
>      }
>  
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-02-26 15:53 ` [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
@ 2020-03-06 16:56   ` Dr. David Alan Gilbert
  2020-03-06 18:45     ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 16:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> Resizing while migrating is dangerous and does not work as expected.
> The whole migration code works on the usable_length of ram blocks and does
> not expect this to change at random points in time.
> 
> In the case of postcopy, relying on used_length is racy as soon as the
> guest is running. Also, when used_length changes we might leave the
> uffd handler registered for some memory regions, reject valid pages
> when migrating and fail when sending the recv bitmap to the source.
> 
> Resizing can be trigger *after* (but not during) a reset in
> ACPI code by the guest
> - hw/arm/virt-acpi-build.c:acpi_ram_update()
> - hw/i386/acpi-build.c:acpi_ram_update()
> 
> Let's remember the original used_length in a separate variable and
> use it in relevant postcopy code. Make sure to update it when we resize
> during precopy, when synchronizing the RAM block sizes with the source.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Shannon Zhao <shannon.zhao@linaro.org>
> Cc: Alex Bennée <alex.bennee@linaro.org>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/exec/ramblock.h  | 10 ++++++++++
>  migration/postcopy-ram.c | 15 ++++++++++++---
>  migration/ram.c          | 11 +++++++++--
>  3 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 07d50864d8..664701b759 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -59,6 +59,16 @@ struct RAMBlock {
>       */
>      unsigned long *clear_bmap;
>      uint8_t clear_bmap_shift;
> +
> +    /*
> +     * RAM block length that corresponds to the used_length on the migration
> +     * source (after RAM block sizes were synchronized). Especially, after
> +     * starting to run the guest, used_length and postcopy_length can differ.
> +     * Used to register/unregister uffd handlers and as the size of the received
> +     * bitmap. Receiving any page beyond this length will bail out, as it
> +     * could not have been valid on the source.
> +     */
> +    ram_addr_t postcopy_length;
>  };
>  #endif
>  #endif
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index a36402722b..c68caf4e42 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -17,6 +17,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/rcu.h"
>  #include "exec/target_page.h"
>  #include "migration.h"
>  #include "qemu-file.h"
> @@ -31,6 +32,7 @@
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  #include "hw/boards.h"
> +#include "exec/ramblock.h"
>  
>  /* Arbitrary limit on size of each discard command,
>   * keeps them around ~200 bytes
> @@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque)
>      ram_addr_t length = qemu_ram_get_used_length(rb);
>      trace_postcopy_init_range(block_name, host_addr, offset, length);
>  
> +    /*
> +     * Save the used_length before running the guest. In case we have to
> +     * resize RAM blocks when syncing RAM block sizes from the source during
> +     * precopy, we'll update it manually via the ram block notifier.
> +     */
> +    rb->postcopy_length = length;
> +
>      /*
>       * We need the whole of RAM to be truly empty for postcopy, so things
>       * like ROMs and any data tables built during init must be zero'd
> @@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
>      const char *block_name = qemu_ram_get_idstr(rb);
>      void *host_addr = qemu_ram_get_host_addr(rb);
>      ram_addr_t offset = qemu_ram_get_offset(rb);
> -    ram_addr_t length = qemu_ram_get_used_length(rb);
> +    ram_addr_t length = rb->postcopy_length;
>      MigrationIncomingState *mis = opaque;
>      struct uffdio_range range_struct;
>      trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
> @@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque)
>      const char *block_name = qemu_ram_get_idstr(rb);
>      void *host_addr = qemu_ram_get_host_addr(rb);
>      ram_addr_t offset = qemu_ram_get_offset(rb);
> -    ram_addr_t length = qemu_ram_get_used_length(rb);
> +    ram_addr_t length = rb->postcopy_length;
>      trace_postcopy_nhp_range(block_name, host_addr, offset, length);
>  
>      /*
> @@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque)
>      struct uffdio_register reg_struct;
>  
>      reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb);
> -    reg_struct.range.len = qemu_ram_get_used_length(rb);
> +    reg_struct.range.len = rb->postcopy_length;
>      reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
>  
>      /* Now tell our userfault_fd that it's responsible for this area */
> diff --git a/migration/ram.c b/migration/ram.c
> index 1a5ff07997..ee5c3d5784 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>          return -1;
>      }
>  
> -    nbits = block->used_length >> TARGET_PAGE_BITS;
> +    nbits = block->postcopy_length >> TARGET_PAGE_BITS;
>  
>      /*
>       * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
> @@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
>                  break;
>              }
>  
> -            if (!offset_in_ramblock(block, addr)) {
> +            /*
> +             * Relying on used_length is racy and can result in false positives.
> +             * We might place pages beyond used_length in case RAM was shrunk
> +             * while in postcopy, which is fine - trying to place via
> +             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
> +             */

Is this actually safe? Imagine that the region had got shrunk, would it
still be mmap'd in there - or could there now be a space where something
else might have landed in?

> +            if (!block->host || addr >= block->postcopy_length) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
>                  break;
> @@ -3757,6 +3763,7 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
>                               rb->idstr);
>              }
>          }
> +        rb->postcopy_length = new_size;
>          break;
>      case POSTCOPY_INCOMING_NONE:
>      case POSTCOPY_INCOMING_RUNNING:
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 11/13] migration/multifd: Print used_length of memory block
  2020-02-26 15:53 ` [PATCH v3 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
@ 2020-03-06 16:57   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 16:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> We actually want to print the used_length, against which we check.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/multifd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b3e8ae9bcc..dd9e88c5f1 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -222,7 +222,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          if (offset > (block->used_length - qemu_target_page_size())) {
>              error_setg(errp, "multifd: offset too long %" PRIu64
>                         " (max " RAM_ADDR_FMT ")",
> -                       offset, block->max_length);
> +                       offset, block->used_length);
>              return -1;
>          }
>          p->pages->iov[i].iov_base = block->host + offset;
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 12/13] migration/ram: Use offset_in_ramblock() in range checks
  2020-02-26 15:53 ` [PATCH v3 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
@ 2020-03-06 16:59   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 16:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> We never read or write beyond the used_length of memory blocks when
> migrating. Make this clearer by using offset_in_ramblock() consistently.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index ee5c3d5784..5cc9993899 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1309,8 +1309,8 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
>          *again = false;
>          return false;
>      }
> -    if ((((ram_addr_t)pss->page) << TARGET_PAGE_BITS)
> -        >= pss->block->used_length) {
> +    if (!offset_in_ramblock(pss->block,
> +                            ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
>          /* Didn't find anything in this RAM Block */
>          pss->page = 0;
>          pss->block = QLIST_NEXT_RCU(pss->block, next);
> @@ -1514,7 +1514,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
>          rs->last_req_rb = ramblock;
>      }
>      trace_ram_save_queue_pages(ramblock->idstr, start, len);
> -    if (start+len > ramblock->used_length) {
> +    if (!offset_in_ramblock(ramblock, start + len - 1)) {
>          error_report("%s request overrun start=" RAM_ADDR_FMT " len="
>                       RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
>                       __func__, start, len, ramblock->used_length);
> @@ -3325,8 +3325,8 @@ static void colo_flush_ram_cache(void)
>          while (block) {
>              offset = migration_bitmap_find_dirty(ram_state, block, offset);
>  
> -            if (((ram_addr_t)offset) << TARGET_PAGE_BITS
> -                >= block->used_length) {
> +            if (!offset_in_ramblock(block,
> +                                    ((ram_addr_t)offset) << TARGET_PAGE_BITS)) {
>                  offset = 0;
>                  block = QLIST_NEXT_RCU(block, next);
>              } else {
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-03-06 16:56   ` Dr. David Alan Gilbert
@ 2020-03-06 18:45     ` David Hildenbrand
  2020-03-06 18:51       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-03-06 18:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On 06.03.20 17:56, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Resizing while migrating is dangerous and does not work as expected.
>> The whole migration code works on the usable_length of ram blocks and does
>> not expect this to change at random points in time.
>>
>> In the case of postcopy, relying on used_length is racy as soon as the
>> guest is running. Also, when used_length changes we might leave the
>> uffd handler registered for some memory regions, reject valid pages
>> when migrating and fail when sending the recv bitmap to the source.
>>
>> Resizing can be trigger *after* (but not during) a reset in
>> ACPI code by the guest
>> - hw/arm/virt-acpi-build.c:acpi_ram_update()
>> - hw/i386/acpi-build.c:acpi_ram_update()
>>
>> Let's remember the original used_length in a separate variable and
>> use it in relevant postcopy code. Make sure to update it when we resize
>> during precopy, when synchronizing the RAM block sizes with the source.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Richard Henderson <richard.henderson@linaro.org>
>> Cc: Shannon Zhao <shannon.zhao@linaro.org>
>> Cc: Alex Bennée <alex.bennee@linaro.org>
>> Cc: Peter Xu <peterx@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  include/exec/ramblock.h  | 10 ++++++++++
>>  migration/postcopy-ram.c | 15 ++++++++++++---
>>  migration/ram.c          | 11 +++++++++--
>>  3 files changed, 31 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 07d50864d8..664701b759 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -59,6 +59,16 @@ struct RAMBlock {
>>       */
>>      unsigned long *clear_bmap;
>>      uint8_t clear_bmap_shift;
>> +
>> +    /*
>> +     * RAM block length that corresponds to the used_length on the migration
>> +     * source (after RAM block sizes were synchronized). Especially, after
>> +     * starting to run the guest, used_length and postcopy_length can differ.
>> +     * Used to register/unregister uffd handlers and as the size of the received
>> +     * bitmap. Receiving any page beyond this length will bail out, as it
>> +     * could not have been valid on the source.
>> +     */
>> +    ram_addr_t postcopy_length;
>>  };
>>  #endif
>>  #endif
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index a36402722b..c68caf4e42 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -17,6 +17,7 @@
>>   */
>>  
>>  #include "qemu/osdep.h"
>> +#include "qemu/rcu.h"
>>  #include "exec/target_page.h"
>>  #include "migration.h"
>>  #include "qemu-file.h"
>> @@ -31,6 +32,7 @@
>>  #include "qemu/error-report.h"
>>  #include "trace.h"
>>  #include "hw/boards.h"
>> +#include "exec/ramblock.h"
>>  
>>  /* Arbitrary limit on size of each discard command,
>>   * keeps them around ~200 bytes
>> @@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque)
>>      ram_addr_t length = qemu_ram_get_used_length(rb);
>>      trace_postcopy_init_range(block_name, host_addr, offset, length);
>>  
>> +    /*
>> +     * Save the used_length before running the guest. In case we have to
>> +     * resize RAM blocks when syncing RAM block sizes from the source during
>> +     * precopy, we'll update it manually via the ram block notifier.
>> +     */
>> +    rb->postcopy_length = length;
>> +
>>      /*
>>       * We need the whole of RAM to be truly empty for postcopy, so things
>>       * like ROMs and any data tables built during init must be zero'd
>> @@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
>>      const char *block_name = qemu_ram_get_idstr(rb);
>>      void *host_addr = qemu_ram_get_host_addr(rb);
>>      ram_addr_t offset = qemu_ram_get_offset(rb);
>> -    ram_addr_t length = qemu_ram_get_used_length(rb);
>> +    ram_addr_t length = rb->postcopy_length;
>>      MigrationIncomingState *mis = opaque;
>>      struct uffdio_range range_struct;
>>      trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
>> @@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque)
>>      const char *block_name = qemu_ram_get_idstr(rb);
>>      void *host_addr = qemu_ram_get_host_addr(rb);
>>      ram_addr_t offset = qemu_ram_get_offset(rb);
>> -    ram_addr_t length = qemu_ram_get_used_length(rb);
>> +    ram_addr_t length = rb->postcopy_length;
>>      trace_postcopy_nhp_range(block_name, host_addr, offset, length);
>>  
>>      /*
>> @@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque)
>>      struct uffdio_register reg_struct;
>>  
>>      reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb);
>> -    reg_struct.range.len = qemu_ram_get_used_length(rb);
>> +    reg_struct.range.len = rb->postcopy_length;
>>      reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
>>  
>>      /* Now tell our userfault_fd that it's responsible for this area */
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 1a5ff07997..ee5c3d5784 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>>          return -1;
>>      }
>>  
>> -    nbits = block->used_length >> TARGET_PAGE_BITS;
>> +    nbits = block->postcopy_length >> TARGET_PAGE_BITS;
>>  
>>      /*
>>       * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
>> @@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
>>                  break;
>>              }
>>  
>> -            if (!offset_in_ramblock(block, addr)) {
>> +            /*
>> +             * Relying on used_length is racy and can result in false positives.
>> +             * We might place pages beyond used_length in case RAM was shrunk
>> +             * while in postcopy, which is fine - trying to place via
>> +             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
>> +             */
> 
> Is this actually safe? Imagine that the region had got shrunk, would it
> still be mmap'd in there - or could there now be a space where something
> else might have landed in?

Yes, it's safe. The mapping of resizeable RAM blocks will currently not
change when resized. See patch #13 on how this is handled when the
mapping actually change (preparation for resizeable allocations [1]).

[1]
https://lore.kernel.org/qemu-devel/20200305142945.216465-1-david@redhat.com/

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
  2020-03-06 16:20     ` David Hildenbrand
@ 2020-03-06 18:47       ` David Hildenbrand
  2020-03-06 18:48         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-03-06 18:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

On 06.03.20 17:20, David Hildenbrand wrote:
> On 06.03.20 17:05, Dr. David Alan Gilbert wrote:
>> * David Hildenbrand (david@redhat.com) wrote:
>>> Add two new helper functions. This will in come handy once we want to
>>> handle ram block resizes while postcopy is active.
>>>
>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>> Cc: Juan Quintela <quintela@redhat.com>
>>> Cc: Peter Xu <peterx@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  migration/ram.c | 54 ++++++++++++++++++++++++++++---------------------
>>>  1 file changed, 31 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index d5a4d69e1c..f815f4e532 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
>>>      return block->host + offset;
>>>  }
>>>  
>>> +static void *host_page_from_ram_block_offset(RAMBlock *block,
>>> +                                             ram_addr_t offset)
>>> +{
>>> +    /* Note: Explicitly no check against offset_in_ramblock(). */
>>> +    return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
>>> +                                   block->page_size);
>>> +}
>>> +
>>> +static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
>>> +                                                         ram_addr_t offset)
>>> +{
>>> +    return ((uintptr_t)block->host + offset) & (block->page_size - 1);
>>> +}
>>> +
>>>  static inline void *colo_cache_from_block_offset(RAMBlock *block,
>>>                                                   ram_addr_t offset)
>>>  {
>>> @@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
>>>      MigrationIncomingState *mis = migration_incoming_get_current();
>>>      /* Temporary page that is later 'placed' */
>>>      void *postcopy_host_page = mis->postcopy_tmp_page;
>>> -    void *this_host = NULL;
>>> +    void *host_page = NULL;
>>>      bool all_zero = false;
>>>      int target_pages = 0;
>>>  
>>>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>>>          ram_addr_t addr;
>>> -        void *host = NULL;
>>>          void *page_buffer = NULL;
>>>          void *place_source = NULL;
>>>          RAMBlock *block = NULL;
>>> @@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
>>>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>>>                       RAM_SAVE_FLAG_COMPRESS_PAGE)) {
>>>              block = ram_block_from_stream(f, flags);
>>> +            if (!block) {
>>> +                ret = -EINVAL;
>>
>> Could we have an error_report there, at the moment it would trigger
>> the one below.
> 
> Makes sense, I'll add one!

My memory kicks in: This was dropped on purpose. ram_block_from_stream()
will print proper errors already.

Cheers!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
  2020-03-06 18:47       ` David Hildenbrand
@ 2020-03-06 18:48         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 18:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> On 06.03.20 17:20, David Hildenbrand wrote:
> > On 06.03.20 17:05, Dr. David Alan Gilbert wrote:
> >> * David Hildenbrand (david@redhat.com) wrote:
> >>> Add two new helper functions. This will in come handy once we want to
> >>> handle ram block resizes while postcopy is active.
> >>>
> >>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>> Cc: Juan Quintela <quintela@redhat.com>
> >>> Cc: Peter Xu <peterx@redhat.com>
> >>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>> ---
> >>>  migration/ram.c | 54 ++++++++++++++++++++++++++++---------------------
> >>>  1 file changed, 31 insertions(+), 23 deletions(-)
> >>>
> >>> diff --git a/migration/ram.c b/migration/ram.c
> >>> index d5a4d69e1c..f815f4e532 100644
> >>> --- a/migration/ram.c
> >>> +++ b/migration/ram.c
> >>> @@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
> >>>      return block->host + offset;
> >>>  }
> >>>  
> >>> +static void *host_page_from_ram_block_offset(RAMBlock *block,
> >>> +                                             ram_addr_t offset)
> >>> +{
> >>> +    /* Note: Explicitly no check against offset_in_ramblock(). */
> >>> +    return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
> >>> +                                   block->page_size);
> >>> +}
> >>> +
> >>> +static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
> >>> +                                                         ram_addr_t offset)
> >>> +{
> >>> +    return ((uintptr_t)block->host + offset) & (block->page_size - 1);
> >>> +}
> >>> +
> >>>  static inline void *colo_cache_from_block_offset(RAMBlock *block,
> >>>                                                   ram_addr_t offset)
> >>>  {
> >>> @@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
> >>>      MigrationIncomingState *mis = migration_incoming_get_current();
> >>>      /* Temporary page that is later 'placed' */
> >>>      void *postcopy_host_page = mis->postcopy_tmp_page;
> >>> -    void *this_host = NULL;
> >>> +    void *host_page = NULL;
> >>>      bool all_zero = false;
> >>>      int target_pages = 0;
> >>>  
> >>>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> >>>          ram_addr_t addr;
> >>> -        void *host = NULL;
> >>>          void *page_buffer = NULL;
> >>>          void *place_source = NULL;
> >>>          RAMBlock *block = NULL;
> >>> @@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
> >>>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
> >>>                       RAM_SAVE_FLAG_COMPRESS_PAGE)) {
> >>>              block = ram_block_from_stream(f, flags);
> >>> +            if (!block) {
> >>> +                ret = -EINVAL;
> >>
> >> Could we have an error_report there, at the moment it would trigger
> >> the one below.
> > 
> > Makes sense, I'll add one!
> 
> My memory kicks in: This was dropped on purpose. ram_block_from_stream()
> will print proper errors already.

OK!

Dave

> Cheers!
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-03-06 18:45     ` David Hildenbrand
@ 2020-03-06 18:51       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 18:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> On 06.03.20 17:56, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> >> Resizing while migrating is dangerous and does not work as expected.
> >> The whole migration code works on the usable_length of ram blocks and does
> >> not expect this to change at random points in time.
> >>
> >> In the case of postcopy, relying on used_length is racy as soon as the
> >> guest is running. Also, when used_length changes we might leave the
> >> uffd handler registered for some memory regions, reject valid pages
> >> when migrating and fail when sending the recv bitmap to the source.
> >>
> >> Resizing can be trigger *after* (but not during) a reset in
> >> ACPI code by the guest
> >> - hw/arm/virt-acpi-build.c:acpi_ram_update()
> >> - hw/i386/acpi-build.c:acpi_ram_update()
> >>
> >> Let's remember the original used_length in a separate variable and
> >> use it in relevant postcopy code. Make sure to update it when we resize
> >> during precopy, when synchronizing the RAM block sizes with the source.
> >>
> >> Reviewed-by: Peter Xu <peterx@redhat.com>
> >> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>
> >> Cc: Eduardo Habkost <ehabkost@redhat.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Igor Mammedov <imammedo@redhat.com>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Richard Henderson <richard.henderson@linaro.org>
> >> Cc: Shannon Zhao <shannon.zhao@linaro.org>
> >> Cc: Alex Bennée <alex.bennee@linaro.org>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  include/exec/ramblock.h  | 10 ++++++++++
> >>  migration/postcopy-ram.c | 15 ++++++++++++---
> >>  migration/ram.c          | 11 +++++++++--
> >>  3 files changed, 31 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> >> index 07d50864d8..664701b759 100644
> >> --- a/include/exec/ramblock.h
> >> +++ b/include/exec/ramblock.h
> >> @@ -59,6 +59,16 @@ struct RAMBlock {
> >>       */
> >>      unsigned long *clear_bmap;
> >>      uint8_t clear_bmap_shift;
> >> +
> >> +    /*
> >> +     * RAM block length that corresponds to the used_length on the migration
> >> +     * source (after RAM block sizes were synchronized). Especially, after
> >> +     * starting to run the guest, used_length and postcopy_length can differ.
> >> +     * Used to register/unregister uffd handlers and as the size of the received
> >> +     * bitmap. Receiving any page beyond this length will bail out, as it
> >> +     * could not have been valid on the source.
> >> +     */
> >> +    ram_addr_t postcopy_length;
> >>  };
> >>  #endif
> >>  #endif
> >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >> index a36402722b..c68caf4e42 100644
> >> --- a/migration/postcopy-ram.c
> >> +++ b/migration/postcopy-ram.c
> >> @@ -17,6 +17,7 @@
> >>   */
> >>  
> >>  #include "qemu/osdep.h"
> >> +#include "qemu/rcu.h"
> >>  #include "exec/target_page.h"
> >>  #include "migration.h"
> >>  #include "qemu-file.h"
> >> @@ -31,6 +32,7 @@
> >>  #include "qemu/error-report.h"
> >>  #include "trace.h"
> >>  #include "hw/boards.h"
> >> +#include "exec/ramblock.h"
> >>  
> >>  /* Arbitrary limit on size of each discard command,
> >>   * keeps them around ~200 bytes
> >> @@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque)
> >>      ram_addr_t length = qemu_ram_get_used_length(rb);
> >>      trace_postcopy_init_range(block_name, host_addr, offset, length);
> >>  
> >> +    /*
> >> +     * Save the used_length before running the guest. In case we have to
> >> +     * resize RAM blocks when syncing RAM block sizes from the source during
> >> +     * precopy, we'll update it manually via the ram block notifier.
> >> +     */
> >> +    rb->postcopy_length = length;
> >> +
> >>      /*
> >>       * We need the whole of RAM to be truly empty for postcopy, so things
> >>       * like ROMs and any data tables built during init must be zero'd
> >> @@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
> >>      const char *block_name = qemu_ram_get_idstr(rb);
> >>      void *host_addr = qemu_ram_get_host_addr(rb);
> >>      ram_addr_t offset = qemu_ram_get_offset(rb);
> >> -    ram_addr_t length = qemu_ram_get_used_length(rb);
> >> +    ram_addr_t length = rb->postcopy_length;
> >>      MigrationIncomingState *mis = opaque;
> >>      struct uffdio_range range_struct;
> >>      trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
> >> @@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque)
> >>      const char *block_name = qemu_ram_get_idstr(rb);
> >>      void *host_addr = qemu_ram_get_host_addr(rb);
> >>      ram_addr_t offset = qemu_ram_get_offset(rb);
> >> -    ram_addr_t length = qemu_ram_get_used_length(rb);
> >> +    ram_addr_t length = rb->postcopy_length;
> >>      trace_postcopy_nhp_range(block_name, host_addr, offset, length);
> >>  
> >>      /*
> >> @@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque)
> >>      struct uffdio_register reg_struct;
> >>  
> >>      reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb);
> >> -    reg_struct.range.len = qemu_ram_get_used_length(rb);
> >> +    reg_struct.range.len = rb->postcopy_length;
> >>      reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
> >>  
> >>      /* Now tell our userfault_fd that it's responsible for this area */
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index 1a5ff07997..ee5c3d5784 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
> >>          return -1;
> >>      }
> >>  
> >> -    nbits = block->used_length >> TARGET_PAGE_BITS;
> >> +    nbits = block->postcopy_length >> TARGET_PAGE_BITS;
> >>  
> >>      /*
> >>       * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
> >> @@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
> >>                  break;
> >>              }
> >>  
> >> -            if (!offset_in_ramblock(block, addr)) {
> >> +            /*
> >> +             * Relying on used_length is racy and can result in false positives.
> >> +             * We might place pages beyond used_length in case RAM was shrunk
> >> +             * while in postcopy, which is fine - trying to place via
> >> +             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
> >> +             */
> > 
> > Is this actually safe? Imagine that the region had got shrunk, would it
> > still be mmap'd in there - or could there now be a space where something
> > else might have landed in?
> 
> Yes, it's safe. The mapping of resizeable RAM blocks will currently not
> change when resized. See patch #13 on how this is handled when the
> mapping actually change (preparation for resizeable allocations [1]).

OK, in that case,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> [1]
> https://lore.kernel.org/qemu-devel/20200305142945.216465-1-david@redhat.com/
> 
> -- 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
  2020-03-06 16:30   ` Dr. David Alan Gilbert
@ 2020-03-06 19:09     ` David Hildenbrand
  2020-03-06 19:11       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2020-03-06 19:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

On 06.03.20 17:30, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Let's consolidate resetting the variables.
>>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Thanks, I think that's actually fixing a case where huge zero pages
> weren't placed as zero pages?

I don't see it :) Can you point out in which receive sequence it would
go wrong?

We used to set "all_zero = true" when processing the first sub-page.
Now, we set "all_zero = true" before we start to process the first sub-page.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
  2020-03-06 19:09     ` David Hildenbrand
@ 2020-03-06 19:11       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-06 19:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel, Peter Xu,
	Paolo Bonzini, Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> On 06.03.20 17:30, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> >> Let's consolidate resetting the variables.
> >>
> >> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > Thanks, I think that's actually fixing a case where huge zero pages
> > weren't placed as zero pages?
> 
> I don't see it :) Can you point out in which receive sequence it would
> go wrong?
> 
> We used to set "all_zero = true" when processing the first sub-page.
> Now, we set "all_zero = true" before we start to process the first sub-page.

No, you're right - no change.

Dave

> Thanks!
> 
> -- 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-03-06 19:12 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26 15:52 [PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 03/13] numa: Teach ram block notifiers about resizeable ram blocks David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 04/13] numa: Make all callbacks of ram block notifiers optional David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
2020-02-26 15:52 ` [PATCH v3 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
2020-03-06 16:05   ` Dr. David Alan Gilbert
2020-03-06 16:20     ` David Hildenbrand
2020-03-06 18:47       ` David Hildenbrand
2020-03-06 18:48         ` Dr. David Alan Gilbert
2020-02-26 15:53 ` [PATCH v3 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
2020-03-06 16:30   ` Dr. David Alan Gilbert
2020-03-06 19:09     ` David Hildenbrand
2020-03-06 19:11       ` Dr. David Alan Gilbert
2020-02-26 15:53 ` [PATCH v3 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
2020-03-06 16:56   ` Dr. David Alan Gilbert
2020-03-06 18:45     ` David Hildenbrand
2020-03-06 18:51       ` Dr. David Alan Gilbert
2020-02-26 15:53 ` [PATCH v3 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
2020-03-06 16:57   ` Dr. David Alan Gilbert
2020-02-26 15:53 ` [PATCH v3 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
2020-03-06 16:59   ` Dr. David Alan Gilbert
2020-02-26 15:53 ` [PATCH v3 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
2020-02-26 16:06   ` Peter Xu
2020-02-26 16:08     ` David Hildenbrand
2020-02-26 16:26       ` Peter Xu
2020-02-26 16:34         ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).