All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
@ 2020-02-21 16:41 David Hildenbrand
  2020-02-21 16:41 ` [PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
                   ` (13 more replies)
  0 siblings, 14 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu,
	Paul Durrant, Alex Williamson, Michael S. Tsirkin, Shannon Zhao,
	Igor Mammedov, Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

This is the follow up of
    "[PATCH RFC] memory: Don't allow to resize RAM while migrating" [1]

This series contains some (slightly modified) patches also contained in:
    "[PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations
     under POSIX" [2]
That series will be based on this series. The last patch (#13) in this
series could be moved to the other series, but I decided to include it in
here for now (similar context).

I realized that resizing RAM blocks while the guest is being migrated
(precopy: resize while still running on the source, postcopy: resize
 while already running on the target) is buggy. In case of precopy, we
can simply cancel migration. Postcopy handling is more involved. Resizing
can currently happen during a guest reboot, triggered by ACPI rebuilds.

Along with the fixes, some cleanups.

[1] https://lkml.kernel.org/r/20200213172016.196609-1-david@redhat.com
[2] https://lkml.kernel.org/r/20200212134254.11073-1-david@redhat.com

I was now able to actually test resizing while migrating. I am using the
prototype of virtio-mem to test (which also makes use of resizable
allocations). Things I was able to reproduce:
- Resize while still running on the migration source. Migration is canceled
-- Test case for "migraton/ram: Handle RAM block resizes during precopy"
- Resize (grow+shrink) on the migration target during postcopy migration
  (when syncing RAM blocks), while not yet running on the target
-- Test case for "migration/ram: Discard new RAM when growing RAM blocks
   and the VM is stopped", and overall RAM size synchronization. Seems to
   work just fine.
- Resize (grow+shrink) on the migration tagret during postcopy migration
  while already running on the target.
-- Test case for "migration/ram: Handle RAM block resizes during postcopy"
-- Test case for "migration/ram: Tolerate partially changed mappings in
   postcopy code" - I can see that -ENOENT is actually triggered and that
   migration succeeds. Migration seems to work just fine.

In addition I run avocado-vt migration tests + usual QEMU checks.

v1 -> v2:
- "util: vfio-helpers: Factor out and fix processing of existing ram
   blocks"
-- Stringify error
- "migraton/ram: Handle RAM block resizes during precopy"
-- Simplified check if we're migrating on the source
- "exec: Relax range check in ram_block_discard_range()"
-- Added to make discard during resizes actually work
- "migration/ram: Discard new RAM when growing RAM blocks after
   ram_postcopy_incoming_init()"
-- Better checks if in the right postcopy mode.
-- Better patch subject/description/comments
- "migration/ram: Handle RAM block resizes during postcopy"
-- Better comments
-- Adapt to changed postcopy checks
- "migrate/ram: Get rid of "place_source" in ram_load_postcopy()"
-- Dropped, as broken
- "migration/ram: Tolerate partially changed mappings in postcopy code"
-- Better comment / description. Clarify that no implicit wakeup will
   happen
-- Warn on EINVAL (older kernels)
-- Wake up any waiter explicitly


David Hildenbrand (13):
  util: vfio-helpers: Factor out and fix processing of existing ram
    blocks
  stubs/ram-block: Remove stubs that are no longer needed
  numa: Teach ram block notifiers about resizeable ram blocks
  numa: Make all callbacks of ram block notifiers optional
  migration/ram: Handle RAM block resizes during precopy
  exec: Relax range check in ram_block_discard_range()
  migration/ram: Discard RAM when growing RAM blocks after
    ram_postcopy_incoming_init()
  migration/ram: Simplify host page handling in ram_load_postcopy()
  migration/ram: Consolidate variable reset after placement in
    ram_load_postcopy()
  migration/ram: Handle RAM block resizes during postcopy
  migration/multifd: Print used_length of memory block
  migration/ram: Use offset_in_ramblock() in range checks
  migration/ram: Tolerate partially changed mappings in postcopy code

 exec.c                     |  27 +++++--
 hw/core/numa.c             |  41 +++++++++--
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   1 +
 include/exec/memory.h      |  10 +--
 include/exec/ramblock.h    |  10 +++
 include/exec/ramlist.h     |  13 ++--
 migration/migration.c      |   9 ++-
 migration/migration.h      |   1 +
 migration/multifd.c        |   2 +-
 migration/postcopy-ram.c   |  52 +++++++++++++-
 migration/ram.c            | 144 ++++++++++++++++++++++++++++---------
 stubs/ram-block.c          |  20 ------
 target/i386/hax-mem.c      |   5 +-
 target/i386/sev.c          |  18 ++---
 util/vfio-helpers.c        |  41 ++++-------
 16 files changed, 283 insertions(+), 118 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-21 16:41 ` [PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Paolo Bonzini, Richard Henderson

Factor it out into common code when a new notifier is registered, just
as done with the memory region notifier. This allows us to have the
logic about how to process existing ram blocks at a central place (which
will be extended soon).

Just like when adding a new ram block, we have to register the max_length
for now. We don't have a way to get notified about resizes yet, and some
memory would not be mapped when growing the ram block.

Note: Currently, ram blocks are only "fake resized". All memory
(max_length) is accessible.

We can get rid of a bunch of functions in stubs/ram-block.c . Print the
warning from inside qemu_vfio_ram_block_added().

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                    |  5 +++++
 hw/core/numa.c            | 14 ++++++++++++++
 include/exec/cpu-common.h |  1 +
 util/vfio-helpers.c       | 29 ++++++++---------------------
 4 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 8e9cc3b47c..dfd43d27c6 100644
--- a/exec.c
+++ b/exec.c
@@ -2016,6 +2016,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
     return rb->used_length;
 }
 
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb)
+{
+    return rb->max_length;
+}
+
 bool qemu_ram_is_shared(RAMBlock *rb)
 {
     return rb->flags & RAM_SHARED;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 0d1b4be76a..6599c69e05 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -899,9 +899,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
     }
 }
 
+static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
+{
+    const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    void *host = qemu_ram_get_host_addr(rb);
+    RAMBlockNotifier *notifier = opaque;
+
+    if (host) {
+        notifier->ram_block_added(notifier, host, max_size);
+    }
+    return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
+    /* Notify about all existing ram blocks. */
+    qemu_ram_foreach_block(ram_block_notify_add_single, n);
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 81753bbb34..9760ac9068 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
 void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index ddd9a96e76..260570ae19 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -376,8 +376,14 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
                                       void *host, size_t size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+    int ret;
+
     trace_qemu_vfio_ram_block_added(s, host, size);
-    qemu_vfio_dma_map(s, host, size, false, NULL);
+    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    if (ret) {
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size,
+                     strerror(-ret));
+    }
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -390,33 +396,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
     }
 }
 
-static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
-{
-    void *host_addr = qemu_ram_get_host_addr(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
-    int ret;
-    QEMUVFIOState *s = opaque;
-
-    if (!host_addr) {
-        return 0;
-    }
-    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
-    if (ret) {
-        fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-                host_addr, (uint64_t)length);
-    }
-    return 0;
-}
-
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
     qemu_mutex_init(&s->lock);
     s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
     s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
-    ram_block_notifier_add(&s->ram_notifier);
     s->low_water_mark = QEMU_VFIO_IOVA_MIN;
     s->high_water_mark = QEMU_VFIO_IOVA_MAX;
-    qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
+    ram_block_notifier_add(&s->ram_notifier);
 }
 
 /**
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
  2020-02-21 16:41 ` [PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-21 16:41   ` [Xen-devel] " David Hildenbrand
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Current code no longer needs these stubs to compile. Let's just remove
them.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 stubs/ram-block.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/stubs/ram-block.c b/stubs/ram-block.c
index 73c0a3ee08..10855b52dd 100644
--- a/stubs/ram-block.c
+++ b/stubs/ram-block.c
@@ -2,21 +2,6 @@
 #include "exec/ramlist.h"
 #include "exec/cpu-common.h"
 
-void *qemu_ram_get_host_addr(RAMBlock *rb)
-{
-    return 0;
-}
-
-ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
-{
-    return 0;
-}
-
-ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
-{
-    return 0;
-}
-
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 }
@@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 }
-
-int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
-{
-    return 0;
-}
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 03/13] numa: Teach ram block notifiers about resizeable ram blocks
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
@ 2020-02-21 16:41   ` David Hildenbrand
  2020-02-21 16:41 ` [PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Stabellini, Eduardo Habkost, Juan Quintela,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Paul Durrant, Igor Mammedov, Michael S. Tsirkin, xen-devel,
	Anthony Perard, Paolo Bonzini, Richard Henderson

Ram block notifiers are currently not aware of resizes. Especially to
handle resizes during migration, but also to implement actually resizeable
ram blocks (make everything between used_length and max_length
inaccessible), we want to teach ram block notifiers about resizeable
ram.

Introduce the basic infrastructure but keep using max_size in the
existing notifiers. Supply the max_size when adding and removing ram
blocks. Also, notify on resizes.

Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: xen-devel@lists.xenproject.org
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                     | 13 +++++++++++--
 hw/core/numa.c             | 22 +++++++++++++++++-----
 hw/i386/xen/xen-mapcache.c |  7 ++++---
 include/exec/ramlist.h     | 13 +++++++++----
 target/i386/hax-mem.c      |  5 +++--
 target/i386/sev.c          | 18 ++++++++++--------
 util/vfio-helpers.c        | 16 ++++++++--------
 7 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index dfd43d27c6..b75250e773 100644
--- a/exec.c
+++ b/exec.c
@@ -2129,6 +2129,8 @@ static int memory_try_enable_merging(void *addr, size_t len)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const ram_addr_t oldsize = block->used_length;
+
     assert(block);
 
     newsize = HOST_PAGE_ALIGN(newsize);
@@ -2153,6 +2155,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return -EINVAL;
     }
 
+    /* Notify before modifying the ram block and touching the bitmaps. */
+    if (block->host) {
+        ram_block_notify_resize(block->host, oldsize, newsize);
+    }
+
     cpu_physical_memory_clear_dirty_range(block->offset, block->used_length);
     block->used_length = newsize;
     cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
@@ -2312,7 +2319,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
         qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
         /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
         qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_DONTFORK);
-        ram_block_notify_add(new_block->host, new_block->max_length);
+        ram_block_notify_add(new_block->host, new_block->used_length,
+                             new_block->max_length);
     }
 }
 
@@ -2492,7 +2500,8 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     if (block->host) {
-        ram_block_notify_remove(block->host, block->max_length);
+        ram_block_notify_remove(block->host, block->used_length,
+                                block->max_length);
     }
 
     qemu_mutex_lock_ramlist();
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 6599c69e05..e28ad24fcd 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -902,11 +902,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 {
     const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    const ram_addr_t size = qemu_ram_get_used_length(rb);
     void *host = qemu_ram_get_host_addr(rb);
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
-        notifier->ram_block_added(notifier, host, max_size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
 }
@@ -923,20 +924,31 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
     QLIST_REMOVE(n, next);
 }
 
-void ram_block_notify_add(void *host, size_t size)
+void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_added(notifier, host, size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
 }
 
-void ram_block_notify_remove(void *host, size_t size)
+void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_removed(notifier, host, size);
+        notifier->ram_block_removed(notifier, host, size, max_size);
+    }
+}
+
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (notifier->ram_block_resized) {
+            notifier->ram_block_resized(notifier, host, old_size, new_size);
+        }
     }
 }
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 5b120ed44b..d6dcea65d1 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -169,7 +169,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 
     if (entry->vaddr_base != NULL) {
         if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-            ram_block_notify_remove(entry->vaddr_base, entry->size);
+            ram_block_notify_remove(entry->vaddr_base, entry->size,
+                                    entry->size);
         }
         if (munmap(entry->vaddr_base, entry->size) != 0) {
             perror("unmap fails");
@@ -211,7 +212,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
     }
 
     if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-        ram_block_notify_add(vaddr_base, size);
+        ram_block_notify_add(vaddr_base, size, size);
     }
 
     entry->vaddr_base = vaddr_base;
@@ -452,7 +453,7 @@ static void xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)
     }
 
     pentry->next = entry->next;
-    ram_block_notify_remove(entry->vaddr_base, entry->size);
+    ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
     if (munmap(entry->vaddr_base, entry->size) != 0) {
         perror("unmap fails");
         exit(-1);
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index bc4faa1b00..293c0ddabe 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -65,15 +65,20 @@ void qemu_mutex_lock_ramlist(void);
 void qemu_mutex_unlock_ramlist(void);
 
 struct RAMBlockNotifier {
-    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size);
-    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size);
+    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size,
+                            size_t max_size);
+    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size,
+                              size_t max_size);
+    void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
+                              size_t new_size);
     QLIST_ENTRY(RAMBlockNotifier) next;
 };
 
 void ram_block_notifier_add(RAMBlockNotifier *n);
 void ram_block_notifier_remove(RAMBlockNotifier *n);
-void ram_block_notify_add(void *host, size_t size);
-void ram_block_notify_remove(void *host, size_t size);
+void ram_block_notify_add(void *host, size_t size, size_t max_size);
+void ram_block_notify_remove(void *host, size_t size, size_t max_size);
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
 
 void ram_block_dump(Monitor *mon);
 
diff --git a/target/i386/hax-mem.c b/target/i386/hax-mem.c
index 6bb5a24917..454d7fb212 100644
--- a/target/i386/hax-mem.c
+++ b/target/i386/hax-mem.c
@@ -293,7 +293,8 @@ static MemoryListener hax_memory_listener = {
     .priority = 10,
 };
 
-static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                                size_t max_size)
 {
     /*
      * We must register each RAM block with the HAXM kernel module, or
@@ -304,7 +305,7 @@ static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
      * host physical pages for the RAM block as part of this registration
      * process, hence the name hax_populate_ram().
      */
-    if (hax_populate_ram((uint64_t)(uintptr_t)host, size) < 0) {
+    if (hax_populate_ram((uint64_t)(uintptr_t)host, max_size) < 0) {
         fprintf(stderr, "HAX failed to populate RAM\n");
         abort();
     }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 024bb24e51..6b4cee24a2 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -129,7 +129,8 @@ sev_set_guest_state(SevState new_state)
 }
 
 static void
-sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                    size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -146,19 +147,20 @@ sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_register_region(host, size);
+    trace_kvm_memcrypt_register_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_REG_REGION, &range);
     if (r) {
         error_report("%s: failed to register region (%p+%#zx) error '%s'",
-                     __func__, host, size, strerror(errno));
+                     __func__, host, max_size, strerror(errno));
         exit(1);
     }
 }
 
 static void
-sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+                      size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -175,13 +177,13 @@ sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_unregister_region(host, size);
+    trace_kvm_memcrypt_unregister_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_UNREG_REGION, &range);
     if (r) {
         error_report("%s: failed to unregister region (%p+%#zx)",
-                     __func__, host, size);
+                     __func__, host, max_size);
     }
 }
 
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 260570ae19..9ec01bfe26 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,26 +372,26 @@ fail_container:
     return ret;
 }
 
-static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
-                                      void *host, size_t size)
+static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
+                                      size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, size);
-    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, max_size);
+    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
     if (ret) {
-        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size,
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, max_size,
                      strerror(-ret));
     }
 }
 
-static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
-                                        void *host, size_t size)
+static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
+                                        size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, size);
+        trace_qemu_vfio_ram_block_removed(s, host, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH v2 03/13] numa: Teach ram block notifiers about resizeable ram blocks
@ 2020-02-21 16:41   ` David Hildenbrand
  0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Stabellini, Eduardo Habkost, Juan Quintela,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Paul Durrant, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, xen-devel, Anthony Perard, Paolo Bonzini,
	Richard Henderson

Ram block notifiers are currently not aware of resizes. Especially to
handle resizes during migration, but also to implement actually resizeable
ram blocks (make everything between used_length and max_length
inaccessible), we want to teach ram block notifiers about resizeable
ram.

Introduce the basic infrastructure but keep using max_size in the
existing notifiers. Supply the max_size when adding and removing ram
blocks. Also, notify on resizes.

Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: xen-devel@lists.xenproject.org
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                     | 13 +++++++++++--
 hw/core/numa.c             | 22 +++++++++++++++++-----
 hw/i386/xen/xen-mapcache.c |  7 ++++---
 include/exec/ramlist.h     | 13 +++++++++----
 target/i386/hax-mem.c      |  5 +++--
 target/i386/sev.c          | 18 ++++++++++--------
 util/vfio-helpers.c        | 16 ++++++++--------
 7 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index dfd43d27c6..b75250e773 100644
--- a/exec.c
+++ b/exec.c
@@ -2129,6 +2129,8 @@ static int memory_try_enable_merging(void *addr, size_t len)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const ram_addr_t oldsize = block->used_length;
+
     assert(block);
 
     newsize = HOST_PAGE_ALIGN(newsize);
@@ -2153,6 +2155,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return -EINVAL;
     }
 
+    /* Notify before modifying the ram block and touching the bitmaps. */
+    if (block->host) {
+        ram_block_notify_resize(block->host, oldsize, newsize);
+    }
+
     cpu_physical_memory_clear_dirty_range(block->offset, block->used_length);
     block->used_length = newsize;
     cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
@@ -2312,7 +2319,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
         qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
         /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
         qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_DONTFORK);
-        ram_block_notify_add(new_block->host, new_block->max_length);
+        ram_block_notify_add(new_block->host, new_block->used_length,
+                             new_block->max_length);
     }
 }
 
@@ -2492,7 +2500,8 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     if (block->host) {
-        ram_block_notify_remove(block->host, block->max_length);
+        ram_block_notify_remove(block->host, block->used_length,
+                                block->max_length);
     }
 
     qemu_mutex_lock_ramlist();
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 6599c69e05..e28ad24fcd 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -902,11 +902,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 {
     const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    const ram_addr_t size = qemu_ram_get_used_length(rb);
     void *host = qemu_ram_get_host_addr(rb);
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
-        notifier->ram_block_added(notifier, host, max_size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
 }
@@ -923,20 +924,31 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
     QLIST_REMOVE(n, next);
 }
 
-void ram_block_notify_add(void *host, size_t size)
+void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_added(notifier, host, size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
 }
 
-void ram_block_notify_remove(void *host, size_t size)
+void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_removed(notifier, host, size);
+        notifier->ram_block_removed(notifier, host, size, max_size);
+    }
+}
+
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (notifier->ram_block_resized) {
+            notifier->ram_block_resized(notifier, host, old_size, new_size);
+        }
     }
 }
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 5b120ed44b..d6dcea65d1 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -169,7 +169,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 
     if (entry->vaddr_base != NULL) {
         if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-            ram_block_notify_remove(entry->vaddr_base, entry->size);
+            ram_block_notify_remove(entry->vaddr_base, entry->size,
+                                    entry->size);
         }
         if (munmap(entry->vaddr_base, entry->size) != 0) {
             perror("unmap fails");
@@ -211,7 +212,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
     }
 
     if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-        ram_block_notify_add(vaddr_base, size);
+        ram_block_notify_add(vaddr_base, size, size);
     }
 
     entry->vaddr_base = vaddr_base;
@@ -452,7 +453,7 @@ static void xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)
     }
 
     pentry->next = entry->next;
-    ram_block_notify_remove(entry->vaddr_base, entry->size);
+    ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
     if (munmap(entry->vaddr_base, entry->size) != 0) {
         perror("unmap fails");
         exit(-1);
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index bc4faa1b00..293c0ddabe 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -65,15 +65,20 @@ void qemu_mutex_lock_ramlist(void);
 void qemu_mutex_unlock_ramlist(void);
 
 struct RAMBlockNotifier {
-    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size);
-    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size);
+    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size,
+                            size_t max_size);
+    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size,
+                              size_t max_size);
+    void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
+                              size_t new_size);
     QLIST_ENTRY(RAMBlockNotifier) next;
 };
 
 void ram_block_notifier_add(RAMBlockNotifier *n);
 void ram_block_notifier_remove(RAMBlockNotifier *n);
-void ram_block_notify_add(void *host, size_t size);
-void ram_block_notify_remove(void *host, size_t size);
+void ram_block_notify_add(void *host, size_t size, size_t max_size);
+void ram_block_notify_remove(void *host, size_t size, size_t max_size);
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
 
 void ram_block_dump(Monitor *mon);
 
diff --git a/target/i386/hax-mem.c b/target/i386/hax-mem.c
index 6bb5a24917..454d7fb212 100644
--- a/target/i386/hax-mem.c
+++ b/target/i386/hax-mem.c
@@ -293,7 +293,8 @@ static MemoryListener hax_memory_listener = {
     .priority = 10,
 };
 
-static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                                size_t max_size)
 {
     /*
      * We must register each RAM block with the HAXM kernel module, or
@@ -304,7 +305,7 @@ static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
      * host physical pages for the RAM block as part of this registration
      * process, hence the name hax_populate_ram().
      */
-    if (hax_populate_ram((uint64_t)(uintptr_t)host, size) < 0) {
+    if (hax_populate_ram((uint64_t)(uintptr_t)host, max_size) < 0) {
         fprintf(stderr, "HAX failed to populate RAM\n");
         abort();
     }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 024bb24e51..6b4cee24a2 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -129,7 +129,8 @@ sev_set_guest_state(SevState new_state)
 }
 
 static void
-sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                    size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -146,19 +147,20 @@ sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_register_region(host, size);
+    trace_kvm_memcrypt_register_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_REG_REGION, &range);
     if (r) {
         error_report("%s: failed to register region (%p+%#zx) error '%s'",
-                     __func__, host, size, strerror(errno));
+                     __func__, host, max_size, strerror(errno));
         exit(1);
     }
 }
 
 static void
-sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+                      size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -175,13 +177,13 @@ sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_unregister_region(host, size);
+    trace_kvm_memcrypt_unregister_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_UNREG_REGION, &range);
     if (r) {
         error_report("%s: failed to unregister region (%p+%#zx)",
-                     __func__, host, size);
+                     __func__, host, max_size);
     }
 }
 
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 260570ae19..9ec01bfe26 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,26 +372,26 @@ fail_container:
     return ret;
 }
 
-static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
-                                      void *host, size_t size)
+static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
+                                      size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, size);
-    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, max_size);
+    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
     if (ret) {
-        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size,
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, max_size,
                      strerror(-ret));
     }
 }
 
-static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
-                                        void *host, size_t size)
+static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
+                                        size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, size);
+        trace_qemu_vfio_ram_block_removed(s, host, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
-- 
2.24.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 04/13] numa: Make all callbacks of ram block notifiers optional
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-02-21 16:41   ` [Xen-devel] " David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-21 16:41 ` [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Let's make add/remove optional. We want to introduce a RAM block
notifier for RAM migration, that's only interested in resizes.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/numa.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index e28ad24fcd..4270b268c8 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -915,8 +915,11 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
+
     /* Notify about all existing ram blocks. */
-    qemu_ram_foreach_block(ram_block_notify_add_single, n);
+    if (n->ram_block_added) {
+        qemu_ram_foreach_block(ram_block_notify_add_single, n);
+    }
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
@@ -929,7 +932,9 @@ void ram_block_notify_add(void *host, size_t size, size_t max_size)
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_added(notifier, host, size, max_size);
+        if (notifier->ram_block_added) {
+            notifier->ram_block_added(notifier, host, size, max_size);
+        }
     }
 }
 
@@ -938,7 +943,9 @@ void ram_block_notify_remove(void *host, size_t size, size_t max_size)
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_removed(notifier, host, size, max_size);
+        if (notifier->ram_block_removed) {
+            notifier->ram_block_removed(notifier, host, size, max_size);
+        }
     }
 }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-02-21 16:41 ` [PATCH v2 04/13] numa: Make all callbacks of ram block notifiers optional David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-24 22:27   ` Peter Xu
  2020-02-21 16:41 ` [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu,
	Michael S. Tsirkin, Shannon Zhao, Igor Mammedov, Paolo Bonzini,
	Alex Bennée, Richard Henderson

Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of precopy, the ram block size must not change on the source,
after syncing the RAM block list in ram_save_setup(), so as long as the
guest is still running on the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Use the ram block notifier to get notified about resizes. Let's simply
cancel migration and indicate the reason. We'll continue running on the
source. No harm done.

Update the documentation. Postcopy will be handled separately.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shannon Zhao <shannon.zhao@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                |  5 +++--
 include/exec/memory.h | 10 ++++++----
 migration/migration.c |  9 +++++++--
 migration/migration.h |  1 +
 migration/ram.c       | 31 +++++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index b75250e773..8b015821d6 100644
--- a/exec.c
+++ b/exec.c
@@ -2120,8 +2120,9 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
-/* Only legal before guest might have detected the memory size: e.g. on
- * incoming migration, or right after reset.
+/*
+ * Resizing RAM while migrating can result in the migration being canceled.
+ * Care has to be taken if the guest might have already detected the memory.
  *
  * As memory core doesn't know how is memory accessed, it is up to
  * resize callback to update device state and/or add assertions to detect
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e85b7de99a..de111347e8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -113,7 +113,7 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_SHARED     (1 << 1)
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
+ * Resizing RAM while migrating can result in the migration being canceled.
  */
 #define RAM_RESIZEABLE (1 << 2)
 
@@ -843,7 +843,9 @@ void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
  *                                     RAM.  Accesses into the region will
  *                                     modify memory directly.  Only an initial
  *                                     portion of this RAM is actually used.
- *                                     The used size can change across reboots.
+ *                                     Changing the size while migrating
+ *                                     can result in the migration being
+ *                                     canceled.
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
@@ -1464,8 +1466,8 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
 
 /* memory_region_ram_resize: Resize a RAM region.
  *
- * Only legal before guest might have detected the memory size: e.g. on
- * incoming migration, or right after reset.
+ * Resizing RAM while migrating can result in the migration being canceled.
+ * Care has to be taken if the guest might have already detected the memory.
  *
  * @mr: a memory region created with @memory_region_init_resizeable_ram.
  * @newsize: the new size the region
diff --git a/migration/migration.c b/migration/migration.c
index 8fb68795dc..ac9751dbe5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -175,13 +175,18 @@ void migration_object_init(void)
     }
 }
 
+void migration_cancel(void)
+{
+    migrate_fd_cancel(current_migration);
+}
+
 void migration_shutdown(void)
 {
     /*
      * Cancel the current migration - that will (eventually)
      * stop the migration using this structure
      */
-    migrate_fd_cancel(current_migration);
+    migration_cancel();
     object_unref(OBJECT(current_migration));
 }
 
@@ -2019,7 +2024,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migrate_fd_cancel(migrate_get_current());
+    migration_cancel();
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index 8473ddfc88..79fd74afa5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -343,5 +343,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
+void migration_cancel(void);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..39c7d1c4a6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -52,6 +52,7 @@
 #include "migration/colo.h"
 #include "block.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "savevm.h"
 #include "qemu/iov.h"
 #include "multifd.h"
@@ -3710,8 +3711,38 @@ static SaveVMHandlers savevm_ram_handlers = {
     .resume_prepare = ram_resume_prepare,
 };
 
+static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
+                                      size_t old_size, size_t new_size)
+{
+    ram_addr_t offset;
+    Error *err = NULL;
+    RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
+
+    if (ramblock_is_ignored(rb)) {
+        return;
+    }
+
+    if (!migration_is_idle()) {
+        /*
+         * Precopy code on the source cannot deal with the size of RAM blocks
+         * changing at random points in time - especially after sending the
+         * RAM block sizes to the migration stream, they must no longer change.
+         * Abort and indicate a proper reason.
+         */
+        error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
+        migrate_set_error(migrate_get_current(), err);
+        error_free(err);
+        migration_cancel();
+    }
+}
+
+static RAMBlockNotifier ram_mig_ram_notifier = {
+    .ram_block_resized = ram_mig_ram_block_resized,
+};
+
 void ram_mig_init(void)
 {
     qemu_mutex_init(&XBZRLE.lock);
     register_savevm_live("ram", 0, 4, &savevm_ram_handlers, &ram_state);
+    ram_block_notifier_add(&ram_mig_ram_notifier);
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range()
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-02-21 16:41 ` [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-24 22:27   ` Peter Xu
  2020-02-21 16:41 ` [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

We want to make use of ram_block_discard_range() in the RAM block resize
callback when growing a RAM block, *before* used_length is changed.
Let's relax the check. We always have a reserved mapping for the whole
max_length, so we cannot corrupt unrelated data.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 8b015821d6..8737acedab 100644
--- a/exec.c
+++ b/exec.c
@@ -3915,7 +3915,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
         goto err;
     }
 
-    if ((start + length) <= rb->used_length) {
+    if ((start + length) <= rb->max_length) {
         bool need_madvise, need_fallocate;
         if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
             error_report("ram_block_discard_range: Unaligned length: %zx",
@@ -3982,7 +3982,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
     } else {
         error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
                      "/%zx/" RAM_ADDR_FMT")",
-                     rb->idstr, start, length, rb->used_length);
+                     rb->idstr, start, length, rb->max_length);
     }
 
 err:
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init()
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-02-21 16:41 ` [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-24 22:28   ` Peter Xu
  2020-02-21 16:41 ` [PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when
synchronizing the RAM block state with the migration source), the resized
part would not get discarded. Let's perform that when being notified
about a resize while postcopy has been advised, but is not listening
yet. With precopy, the process is as following:

1. VM created
- RAM blocks are created
2. Incomming migration started
- Postcopy is advised
- All pages in RAM blocks are discarded
3. Precopy starts
- RAM blocks are resized to match the size on the migration source.
- RAM pages from precopy stream are loaded
- Uffd handler is registered, postcopy starts listening
3. Guest started, postcopy running
- Pagefaults get resolved, pages get placed

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 39c7d1c4a6..d5a4d69e1c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3714,6 +3714,7 @@ static SaveVMHandlers savevm_ram_handlers = {
 static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
                                       size_t old_size, size_t new_size)
 {
+    PostcopyState ps = postcopy_state_get();
     ram_addr_t offset;
     Error *err = NULL;
     RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
@@ -3734,6 +3735,35 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
         error_free(err);
         migration_cancel();
     }
+
+    switch (ps) {
+    case POSTCOPY_INCOMING_ADVISE:
+        /*
+         * Update what ram_postcopy_incoming_init()->init_range() does at the
+         * time postcopy was advised. Syncing RAM blocks with the source will
+         * result in RAM resizes.
+         */
+        if (old_size < new_size) {
+            if (ram_discard_range(rb->idstr, old_size, new_size - old_size)) {
+                error_report("RAM block '%s' discard of resized RAM failed",
+                             rb->idstr);
+            }
+        }
+        break;
+    case POSTCOPY_INCOMING_NONE:
+    case POSTCOPY_INCOMING_RUNNING:
+    case POSTCOPY_INCOMING_END:
+        /*
+         * Once our guest is running, postcopy does no longer care about
+         * resizes. When growing, the new memory was not available on the
+         * source, no handler needed.
+         */
+        break;
+    default:
+        error_report("RAM block '%s' resized during postcopy state: %d",
+                     rb->idstr, ps);
+        exit(-1);
+    }
 }
 
 static RAMBlockNotifier ram_mig_ram_notifier = {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-02-21 16:41 ` [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
@ 2020-02-21 16:41 ` David Hildenbrand
  2020-02-21 16:42 ` [PATCH v2 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Add two new helper functions. This will in come handy once we want to
handle ram block resizes while postcopy is active.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 54 ++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d5a4d69e1c..f815f4e532 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
     return block->host + offset;
 }
 
+static void *host_page_from_ram_block_offset(RAMBlock *block,
+                                             ram_addr_t offset)
+{
+    /* Note: Explicitly no check against offset_in_ramblock(). */
+    return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
+                                   block->page_size);
+}
+
+static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
+                                                         ram_addr_t offset)
+{
+    return ((uintptr_t)block->host + offset) & (block->page_size - 1);
+}
+
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
                                                  ram_addr_t offset)
 {
@@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
     MigrationIncomingState *mis = migration_incoming_get_current();
     /* Temporary page that is later 'placed' */
     void *postcopy_host_page = mis->postcopy_tmp_page;
-    void *this_host = NULL;
+    void *host_page = NULL;
     bool all_zero = false;
     int target_pages = 0;
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
-        void *host = NULL;
         void *page_buffer = NULL;
         void *place_source = NULL;
         RAMBlock *block = NULL;
@@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
             block = ram_block_from_stream(f, flags);
+            if (!block) {
+                ret = -EINVAL;
+                break;
+            }
 
-            host = host_from_ram_block_offset(block, addr);
-            if (!host) {
+            if (!offset_in_ramblock(block, addr)) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
@@ -3163,21 +3179,18 @@ static int ram_load_postcopy(QEMUFile *f)
              * of a host page in one chunk.
              */
             page_buffer = postcopy_host_page +
-                          ((uintptr_t)host & (block->page_size - 1));
+                          host_page_offset_from_ram_block_offset(block, addr);
             /* If all TP are zero then we can optimise the place */
             if (target_pages == 1) {
                 all_zero = true;
-                this_host = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
-                                                    block->page_size);
-            } else {
+                host_page = host_page_from_ram_block_offset(block, addr);
+            } else if (host_page != host_page_from_ram_block_offset(block,
+                                                                    addr)) {
                 /* not the 1st TP within the HP */
-                if (QEMU_ALIGN_DOWN((uintptr_t)host, block->page_size) !=
-                    (uintptr_t)this_host) {
-                    error_report("Non-same host page %p/%p",
-                                  host, this_host);
-                    ret = -EINVAL;
-                    break;
-                }
+                error_report("Non-same host page %p/%p", host_page,
+                             host_page_from_ram_block_offset(block, addr));
+                ret = -EINVAL;
+                break;
             }
 
             /*
@@ -3257,16 +3270,11 @@ static int ram_load_postcopy(QEMUFile *f)
         }
 
         if (!ret && place_needed) {
-            /* This gets called at the last target page in the host page */
-            void *place_dest = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
-                                                       block->page_size);
-
             if (all_zero) {
-                ret = postcopy_place_page_zero(mis, place_dest,
-                                               block);
+                ret = postcopy_place_page_zero(mis, host_page, block);
             } else {
-                ret = postcopy_place_page(mis, place_dest,
-                                          place_source, block);
+                ret = postcopy_place_page(mis, host_page, place_source,
+                                          block);
             }
         }
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-02-21 16:41 ` [PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
@ 2020-02-21 16:42 ` David Hildenbrand
  2020-02-21 16:42 ` [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

Let's consolidate resetting the variables.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f815f4e532..1a5ff07997 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3126,7 +3126,7 @@ static int ram_load_postcopy(QEMUFile *f)
     /* Temporary page that is later 'placed' */
     void *postcopy_host_page = mis->postcopy_tmp_page;
     void *host_page = NULL;
-    bool all_zero = false;
+    bool all_zero = true;
     int target_pages = 0;
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
@@ -3152,7 +3152,6 @@ static int ram_load_postcopy(QEMUFile *f)
         addr &= TARGET_PAGE_MASK;
 
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
-        place_needed = false;
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
             block = ram_block_from_stream(f, flags);
@@ -3180,9 +3179,7 @@ static int ram_load_postcopy(QEMUFile *f)
              */
             page_buffer = postcopy_host_page +
                           host_page_offset_from_ram_block_offset(block, addr);
-            /* If all TP are zero then we can optimise the place */
             if (target_pages == 1) {
-                all_zero = true;
                 host_page = host_page_from_ram_block_offset(block, addr);
             } else if (host_page != host_page_from_ram_block_offset(block,
                                                                     addr)) {
@@ -3199,7 +3196,6 @@ static int ram_load_postcopy(QEMUFile *f)
              */
             if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
                 place_needed = true;
-                target_pages = 0;
             }
             place_source = postcopy_host_page;
         }
@@ -3276,6 +3272,10 @@ static int ram_load_postcopy(QEMUFile *f)
                 ret = postcopy_place_page(mis, host_page, place_source,
                                           block);
             }
+            place_needed = false;
+            target_pages = 0;
+            /* Assume we have a zero page until we detect something different */
+            all_zero = true;
         }
     }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-02-21 16:42 ` [PATCH v2 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
@ 2020-02-21 16:42 ` David Hildenbrand
  2020-02-24 22:26   ` Peter Xu
  2020-02-25 16:11   ` Peter Xu
  2020-02-21 16:42 ` [PATCH v2 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu,
	Michael S. Tsirkin, Shannon Zhao, Igor Mammedov, Paolo Bonzini,
	Alex Bennée, Richard Henderson

Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of postcopy, relying on used_length is racy as soon as the
guest is running. Also, when used_length changes we might leave the
uffd handler registered for some memory regions, reject valid pages
when migrating and fail when sending the recv bitmap to the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Let's remember the original used_length in a separate variable and
use it in relevant postcopy code. Make sure to update it when we resize
during precopy, when synchronizing the RAM block sizes with the source.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shannon Zhao <shannon.zhao@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/ramblock.h  | 10 ++++++++++
 migration/postcopy-ram.c | 15 ++++++++++++---
 migration/ram.c          | 11 +++++++++--
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 07d50864d8..664701b759 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -59,6 +59,16 @@ struct RAMBlock {
      */
     unsigned long *clear_bmap;
     uint8_t clear_bmap_shift;
+
+    /*
+     * RAM block length that corresponds to the used_length on the migration
+     * source (after RAM block sizes were synchronized). Especially, after
+     * starting to run the guest, used_length and postcopy_length can differ.
+     * Used to register/unregister uffd handlers and as the size of the received
+     * bitmap. Receiving any page beyond this length will bail out, as it
+     * could not have been valid on the source.
+     */
+    ram_addr_t postcopy_length;
 };
 #endif
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a36402722b..c68caf4e42 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -17,6 +17,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "migration.h"
 #include "qemu-file.h"
@@ -31,6 +32,7 @@
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "hw/boards.h"
+#include "exec/ramblock.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque)
     ram_addr_t length = qemu_ram_get_used_length(rb);
     trace_postcopy_init_range(block_name, host_addr, offset, length);
 
+    /*
+     * Save the used_length before running the guest. In case we have to
+     * resize RAM blocks when syncing RAM block sizes from the source during
+     * precopy, we'll update it manually via the ram block notifier.
+     */
+    rb->postcopy_length = length;
+
     /*
      * We need the whole of RAM to be truly empty for postcopy, so things
      * like ROMs and any data tables built during init must be zero'd
@@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
     const char *block_name = qemu_ram_get_idstr(rb);
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t offset = qemu_ram_get_offset(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
+    ram_addr_t length = rb->postcopy_length;
     MigrationIncomingState *mis = opaque;
     struct uffdio_range range_struct;
     trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
@@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque)
     const char *block_name = qemu_ram_get_idstr(rb);
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t offset = qemu_ram_get_offset(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
+    ram_addr_t length = rb->postcopy_length;
     trace_postcopy_nhp_range(block_name, host_addr, offset, length);
 
     /*
@@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque)
     struct uffdio_register reg_struct;
 
     reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb);
-    reg_struct.range.len = qemu_ram_get_used_length(rb);
+    reg_struct.range.len = rb->postcopy_length;
     reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
     /* Now tell our userfault_fd that it's responsible for this area */
diff --git a/migration/ram.c b/migration/ram.c
index 1a5ff07997..ee5c3d5784 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
         return -1;
     }
 
-    nbits = block->used_length >> TARGET_PAGE_BITS;
+    nbits = block->postcopy_length >> TARGET_PAGE_BITS;
 
     /*
      * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
@@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
                 break;
             }
 
-            if (!offset_in_ramblock(block, addr)) {
+            /*
+             * Relying on used_length is racy and can result in false positives.
+             * We might place pages beyond used_length in case RAM was shrunk
+             * while in postcopy, which is fine - trying to place via
+             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
+             */
+            if (!block->host || addr >= block->postcopy_length) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
@@ -3757,6 +3763,7 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
                              rb->idstr);
             }
         }
+        rb->postcopy_length = new_size;
         break;
     case POSTCOPY_INCOMING_NONE:
     case POSTCOPY_INCOMING_RUNNING:
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 11/13] migration/multifd: Print used_length of memory block
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-02-21 16:42 ` [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
@ 2020-02-21 16:42 ` David Hildenbrand
  2020-02-21 16:42 ` [PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

We actually want to print the used_length, against which we check.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/multifd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index b3e8ae9bcc..dd9e88c5f1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -222,7 +222,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         if (offset > (block->used_length - qemu_target_page_size())) {
             error_setg(errp, "multifd: offset too long %" PRIu64
                        " (max " RAM_ADDR_FMT ")",
-                       offset, block->max_length);
+                       offset, block->used_length);
             return -1;
         }
         p->pages->iov[i].iov_base = block->host + offset;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-02-21 16:42 ` [PATCH v2 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
@ 2020-02-21 16:42 ` David Hildenbrand
  2020-02-21 16:42 ` [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
  2020-02-21 18:04 ` [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating Peter Xu
  13 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Richard Henderson

We never read or write beyond the used_length of memory blocks when
migrating. Make this clearer by using offset_in_ramblock() consistently.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ee5c3d5784..5cc9993899 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1309,8 +1309,8 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
         *again = false;
         return false;
     }
-    if ((((ram_addr_t)pss->page) << TARGET_PAGE_BITS)
-        >= pss->block->used_length) {
+    if (!offset_in_ramblock(pss->block,
+                            ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
         /* Didn't find anything in this RAM Block */
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
@@ -1514,7 +1514,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
         rs->last_req_rb = ramblock;
     }
     trace_ram_save_queue_pages(ramblock->idstr, start, len);
-    if (start+len > ramblock->used_length) {
+    if (!offset_in_ramblock(ramblock, start + len - 1)) {
         error_report("%s request overrun start=" RAM_ADDR_FMT " len="
                      RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
                      __func__, start, len, ramblock->used_length);
@@ -3325,8 +3325,8 @@ static void colo_flush_ram_cache(void)
         while (block) {
             offset = migration_bitmap_find_dirty(ram_state, block, offset);
 
-            if (((ram_addr_t)offset) << TARGET_PAGE_BITS
-                >= block->used_length) {
+            if (!offset_in_ramblock(block,
+                                    ((ram_addr_t)offset) << TARGET_PAGE_BITS)) {
                 offset = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-02-21 16:42 ` [PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
@ 2020-02-21 16:42 ` David Hildenbrand
  2020-02-24 22:49   ` Peter Xu
  2020-02-21 18:04 ` [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating Peter Xu
  13 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-21 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Paolo Bonzini, Richard Henderson

When we partially change mappings (esp., mmap over parts of an existing
mmap like qemu_ram_remap() does) where we have a userfaultfd handler
registered, the handler will implicitly be unregistered from the parts that
changed.

Trying to place pages onto mappings where there is no longer a handler
registered will fail. Let's make sure that any waiter is woken up - we
have to do that manually.

Let's also document how UFFDIO_UNREGISTER will handle this scenario.

This is mainly a preparation for RAM blocks with resizable allcoations,
where the mapping of the invalid RAM range will change. The source will
keep sending pages that are outside of the new (shrunk) RAM size. We have
to treat these pages like they would have been migrated, but can
essentially simply drop the content (ignore the placement error).

Keep printing a warning on EINVAL, to avoid hiding other (programming)
issues. ENOENT is unique.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c68caf4e42..f023830b9a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
     range_struct.start = (uintptr_t)host_addr;
     range_struct.len = length;
 
+    /*
+     * In case the mapping was partially changed since we enabled userfault
+     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
+     * for the mappings that changed. Unregistering will, however, still work
+     * and ignore mappings without a registered handler.
+     */
     if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
         error_report("%s: userfault unregister %s", __func__, strerror(errno));
 
@@ -1180,6 +1186,17 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
     return 0;
 }
 
+static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
+                               uint64_t pagesize)
+{
+    struct uffdio_range range = {
+        .start = (uint64_t)(uintptr_t)host_addr,
+        .len = pagesize,
+    };
+
+    return ioctl(userfault_fd, UFFDIO_WAKE, &range);
+}
+
 static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
                                void *from_addr, uint64_t pagesize, RAMBlock *rb)
 {
@@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
         zero_struct.mode = 0;
         ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
     }
+
+    /*
+     * When the mapping gets partially changed (e.g., qemu_ram_remap()) before
+     * we try to place a page, the userfaultfd handler will be removed for the
+     * changed mappings and placing pages will fail. We can safely ignore this,
+     * because mappings that changed on the destination don't need data from the
+     * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that page
+     * (unlikely but possible). Waking up waiters is always possible, even
+     * without a registered userfaultfd handler.
+     *
+     * Old kernels report EINVAL, new kernels report ENOENT in case there is
+     * no longer a userfaultfd handler for a mapping.
+     */
+    if (ret && (errno == ENOENT || errno == EINVAL)) {
+        if (errno == EINVAL) {
+            warn_report("%s: Failed to place page %p. Waking up any waiters.",
+                         __func__, host_addr);
+        }
+        ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize);
+    }
     if (!ret) {
         ramblock_recv_bitmap_set_range(rb, host_addr,
                                        pagesize / qemu_target_page_size());
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-02-21 16:42 ` [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
@ 2020-02-21 18:04 ` Peter Xu
  2020-02-24  9:09   ` David Hildenbrand
  13 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-21 18:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
> I was now able to actually test resizing while migrating. I am using the
> prototype of virtio-mem to test (which also makes use of resizable
> allocations). Things I was able to reproduce:

The test cases cover quite a lot.  Thanks for doing that.

> - Resize while still running on the migration source. Migration is canceled
> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"

> - Resize (grow+shrink) on the migration target during postcopy migration
>   (when syncing RAM blocks), while not yet running on the target
> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>    and the VM is stopped", and overall RAM size synchronization. Seems to
>    work just fine.

This won't be able to trigger without virtio-mem, right?

And I'm also curious on how to test this even with virtio-mem.  Is
that a QMP command to extend/shrink virtio-mem?

> - Resize (grow+shrink) on the migration tagret during postcopy migration
>   while already running on the target.
> -- Test case for "migration/ram: Handle RAM block resizes during postcopy"
> -- Test case for "migration/ram: Tolerate partially changed mappings in
>    postcopy code" - I can see that -ENOENT is actually triggered and that
>    migration succeeds. Migration seems to work just fine.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-21 18:04 ` [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating Peter Xu
@ 2020-02-24  9:09   ` David Hildenbrand
  2020-02-24 17:45     ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-24  9:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On 21.02.20 19:04, Peter Xu wrote:
> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
>> I was now able to actually test resizing while migrating. I am using the
>> prototype of virtio-mem to test (which also makes use of resizable
>> allocations). Things I was able to reproduce:
> 
> The test cases cover quite a lot.  Thanks for doing that.
> 
>> - Resize while still running on the migration source. Migration is canceled
>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
> 
>> - Resize (grow+shrink) on the migration target during postcopy migration
>>   (when syncing RAM blocks), while not yet running on the target
>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>>    and the VM is stopped", and overall RAM size synchronization. Seems to
>>    work just fine.
> 
> This won't be able to trigger without virtio-mem, right?

AFAIK all cases can also be triggered without virtio-mem (not just that
easily :) ). This case would be "RAM block is bigger on source than on
destination.".

> 
> And I'm also curious on how to test this even with virtio-mem.  Is
> that a QMP command to extend/shrink virtio-mem?

Currently, there is a single qom property that can be modifed via
QMP/HMP - "requested-size". With resizable resizable memory backends,
increasing the requested size will also implicitly grow the RAM block.
Shrinking the requested size will currently result in shrinking the RAM
block on the next reboot.

So, to trigger growing of a RAM block (assuming requested-size was
smaller before, e.g., 1000M)

echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON

To trigger shrinking (assuming requested-size was bigger before)

echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
echo 'system_reset' | sudo nc -U $MON


Placing these at the right spots during a migration allows to test this
very reliably.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24  9:09   ` David Hildenbrand
@ 2020-02-24 17:45     ` Peter Xu
  2020-02-24 18:44       ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-24 17:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
> On 21.02.20 19:04, Peter Xu wrote:
> > On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
> >> I was now able to actually test resizing while migrating. I am using the
> >> prototype of virtio-mem to test (which also makes use of resizable
> >> allocations). Things I was able to reproduce:
> > 
> > The test cases cover quite a lot.  Thanks for doing that.
> > 
> >> - Resize while still running on the migration source. Migration is canceled
> >> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
> > 
> >> - Resize (grow+shrink) on the migration target during postcopy migration
> >>   (when syncing RAM blocks), while not yet running on the target
> >> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
> >>    and the VM is stopped", and overall RAM size synchronization. Seems to
> >>    work just fine.
> > 
> > This won't be able to trigger without virtio-mem, right?
> 
> AFAIK all cases can also be triggered without virtio-mem (not just that
> easily :) ). This case would be "RAM block is bigger on source than on
> destination.".
> 
> > 
> > And I'm also curious on how to test this even with virtio-mem.  Is
> > that a QMP command to extend/shrink virtio-mem?
> 
> Currently, there is a single qom property that can be modifed via
> QMP/HMP - "requested-size". With resizable resizable memory backends,
> increasing the requested size will also implicitly grow the RAM block.
> Shrinking the requested size will currently result in shrinking the RAM
> block on the next reboot.
> 
> So, to trigger growing of a RAM block (assuming requested-size was
> smaller before, e.g., 1000M)
> 
> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
> 
> To trigger shrinking (assuming requested-size was bigger before)
> 
> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
> echo 'system_reset' | sudo nc -U $MON
> 
> 
> Placing these at the right spots during a migration allows to test this
> very reliably.

I see, thanks for the context.  The question was majorly about when
you say "during postcopy migration (when syncing RAM blocks), while
not yet running on the target" - it's not easy to do so imho, because:

  - it's a very short transition period between precopy and postcopy,
    so I was curious about how you made sure that the grow/shrink
    happened exactly during that period

  - during the period, IIUC it was still in the main thread, which
    means logically QEMU should not be able to respond to any QMP/HMP
    command at all...  So even if you send a command, I think it'll
    only be executed later after the transition completes

  - this I'm not sure, but ... even for virtio-mem, the resizing can
    only happen after guest ack it, right?  During the precopy to
    postcopy transition period, the VM is stopped, AFAICT, so
    logically we can't trigger resizing during the transition

So it's really a question/matter of whether we still even need to
consider that transition period for resizing event for postcopy.
Maybe we don't even need to.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24 17:45     ` Peter Xu
@ 2020-02-24 18:44       ` David Hildenbrand
  2020-02-24 18:59         ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-24 18:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On 24.02.20 18:45, Peter Xu wrote:
> On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
>> On 21.02.20 19:04, Peter Xu wrote:
>>> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
>>>> I was now able to actually test resizing while migrating. I am using the
>>>> prototype of virtio-mem to test (which also makes use of resizable
>>>> allocations). Things I was able to reproduce:
>>>
>>> The test cases cover quite a lot.  Thanks for doing that.
>>>
>>>> - Resize while still running on the migration source. Migration is canceled
>>>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
>>>
>>>> - Resize (grow+shrink) on the migration target during postcopy migration
>>>>   (when syncing RAM blocks), while not yet running on the target
>>>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>>>>    and the VM is stopped", and overall RAM size synchronization. Seems to
>>>>    work just fine.
>>>
>>> This won't be able to trigger without virtio-mem, right?
>>
>> AFAIK all cases can also be triggered without virtio-mem (not just that
>> easily :) ). This case would be "RAM block is bigger on source than on
>> destination.".
>>
>>>
>>> And I'm also curious on how to test this even with virtio-mem.  Is
>>> that a QMP command to extend/shrink virtio-mem?
>>
>> Currently, there is a single qom property that can be modifed via
>> QMP/HMP - "requested-size". With resizable resizable memory backends,
>> increasing the requested size will also implicitly grow the RAM block.
>> Shrinking the requested size will currently result in shrinking the RAM
>> block on the next reboot.
>>
>> So, to trigger growing of a RAM block (assuming requested-size was
>> smaller before, e.g., 1000M)
>>
>> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
>>
>> To trigger shrinking (assuming requested-size was bigger before)
>>
>> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
>> echo 'system_reset' | sudo nc -U $MON
>>
>>
>> Placing these at the right spots during a migration allows to test this
>> very reliably.
> 
> I see, thanks for the context.  The question was majorly about when
> you say "during postcopy migration (when syncing RAM blocks), while
> not yet running on the target" - it's not easy to do so imho, because:

This case is very easy to trigger, even with acpi. Simply have a ram
block on the source be bigger than one on the target. The sync code
(migration/ram.c:qemu_ram_resize()) will perform the resize during
precopy. Postcopy misses to discard the additional memory.

Maybe my description was confusing. But this really just triggers when

- Postcopy is advised and discards memory on all ram blocks
- Precopy grows the RAM block when syncing the RAM block sizes with the
source

Postcopy misses to discard the new RAM.

> 
>   - it's a very short transition period between precopy and postcopy,
>     so I was curious about how you made sure that the grow/shrink
>     happened exactly during that period
> 
>   - during the period, IIUC it was still in the main thread, which
>     means logically QEMU should not be able to respond to any QMP/HMP
>     command at all...  So even if you send a command, I think it'll
>     only be executed later after the transition completes
> 
>   - this I'm not sure, but ... even for virtio-mem, the resizing can
>     only happen after guest ack it, right?  During the precopy to
>     postcopy transition period, the VM is stopped, AFAICT, so
>     logically we can't trigger resizing during the transition
> 
> So it's really a question/matter of whether we still even need to
> consider that transition period for resizing event for postcopy.
> Maybe we don't even need to.

It's synchronous and not a race. So it does matter very much :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24 18:44       ` David Hildenbrand
@ 2020-02-24 18:59         ` David Hildenbrand
  2020-02-24 19:18           ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-24 18:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On 24.02.20 19:44, David Hildenbrand wrote:
> On 24.02.20 18:45, Peter Xu wrote:
>> On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
>>> On 21.02.20 19:04, Peter Xu wrote:
>>>> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
>>>>> I was now able to actually test resizing while migrating. I am using the
>>>>> prototype of virtio-mem to test (which also makes use of resizable
>>>>> allocations). Things I was able to reproduce:
>>>>
>>>> The test cases cover quite a lot.  Thanks for doing that.
>>>>
>>>>> - Resize while still running on the migration source. Migration is canceled
>>>>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
>>>>
>>>>> - Resize (grow+shrink) on the migration target during postcopy migration
>>>>>   (when syncing RAM blocks), while not yet running on the target
>>>>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>>>>>    and the VM is stopped", and overall RAM size synchronization. Seems to
>>>>>    work just fine.
>>>>
>>>> This won't be able to trigger without virtio-mem, right?
>>>
>>> AFAIK all cases can also be triggered without virtio-mem (not just that
>>> easily :) ). This case would be "RAM block is bigger on source than on
>>> destination.".
>>>
>>>>
>>>> And I'm also curious on how to test this even with virtio-mem.  Is
>>>> that a QMP command to extend/shrink virtio-mem?
>>>
>>> Currently, there is a single qom property that can be modifed via
>>> QMP/HMP - "requested-size". With resizable resizable memory backends,
>>> increasing the requested size will also implicitly grow the RAM block.
>>> Shrinking the requested size will currently result in shrinking the RAM
>>> block on the next reboot.
>>>
>>> So, to trigger growing of a RAM block (assuming requested-size was
>>> smaller before, e.g., 1000M)
>>>
>>> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
>>>
>>> To trigger shrinking (assuming requested-size was bigger before)
>>>
>>> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
>>> echo 'system_reset' | sudo nc -U $MON
>>>
>>>
>>> Placing these at the right spots during a migration allows to test this
>>> very reliably.
>>
>> I see, thanks for the context.  The question was majorly about when
>> you say "during postcopy migration (when syncing RAM blocks), while
>> not yet running on the target" - it's not easy to do so imho, because:
> 
> This case is very easy to trigger, even with acpi. Simply have a ram
> block on the source be bigger than one on the target. The sync code
> (migration/ram.c:qemu_ram_resize()) will perform the resize during
> precopy. Postcopy misses to discard the additional memory.
> 
> Maybe my description was confusing. But this really just triggers when
> 
> - Postcopy is advised and discards memory on all ram blocks
> - Precopy grows the RAM block when syncing the RAM block sizes with the
> source
> 
> Postcopy misses to discard the new RAM.
> 
>>
>>   - it's a very short transition period between precopy and postcopy,
>>     so I was curious about how you made sure that the grow/shrink
>>     happened exactly during that period
>>
>>   - during the period, IIUC it was still in the main thread, which
>>     means logically QEMU should not be able to respond to any QMP/HMP
>>     command at all...  So even if you send a command, I think it'll
>>     only be executed later after the transition completes
>>
>>   - this I'm not sure, but ... even for virtio-mem, the resizing can
>>     only happen after guest ack it, right?  During the precopy to
>>     postcopy transition period, the VM is stopped, AFAICT, so
>>     logically we can't trigger resizing during the transition

Regarding that question: Resizes will happen without guest interaction
(e.g., during a reboot, or when increasing the requested size). In the
future, there are theoretical plans to have resizes that can be
triggered by guest interaction/request to some extend as well.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24 18:59         ` David Hildenbrand
@ 2020-02-24 19:18           ` Peter Xu
  2020-02-24 19:34             ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-24 19:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Mon, Feb 24, 2020 at 07:59:10PM +0100, David Hildenbrand wrote:
> On 24.02.20 19:44, David Hildenbrand wrote:
> > On 24.02.20 18:45, Peter Xu wrote:
> >> On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
> >>> On 21.02.20 19:04, Peter Xu wrote:
> >>>> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
> >>>>> I was now able to actually test resizing while migrating. I am using the
> >>>>> prototype of virtio-mem to test (which also makes use of resizable
> >>>>> allocations). Things I was able to reproduce:
> >>>>
> >>>> The test cases cover quite a lot.  Thanks for doing that.
> >>>>
> >>>>> - Resize while still running on the migration source. Migration is canceled
> >>>>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
> >>>>
> >>>>> - Resize (grow+shrink) on the migration target during postcopy migration
> >>>>>   (when syncing RAM blocks), while not yet running on the target
> >>>>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
> >>>>>    and the VM is stopped", and overall RAM size synchronization. Seems to
> >>>>>    work just fine.
> >>>>
> >>>> This won't be able to trigger without virtio-mem, right?
> >>>
> >>> AFAIK all cases can also be triggered without virtio-mem (not just that
> >>> easily :) ). This case would be "RAM block is bigger on source than on
> >>> destination.".
> >>>
> >>>>
> >>>> And I'm also curious on how to test this even with virtio-mem.  Is
> >>>> that a QMP command to extend/shrink virtio-mem?
> >>>
> >>> Currently, there is a single qom property that can be modifed via
> >>> QMP/HMP - "requested-size". With resizable resizable memory backends,
> >>> increasing the requested size will also implicitly grow the RAM block.
> >>> Shrinking the requested size will currently result in shrinking the RAM
> >>> block on the next reboot.
> >>>
> >>> So, to trigger growing of a RAM block (assuming requested-size was
> >>> smaller before, e.g., 1000M)
> >>>
> >>> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
> >>>
> >>> To trigger shrinking (assuming requested-size was bigger before)
> >>>
> >>> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
> >>> echo 'system_reset' | sudo nc -U $MON
> >>>
> >>>
> >>> Placing these at the right spots during a migration allows to test this
> >>> very reliably.
> >>
> >> I see, thanks for the context.  The question was majorly about when
> >> you say "during postcopy migration (when syncing RAM blocks), while
> >> not yet running on the target" - it's not easy to do so imho, because:
> > 
> > This case is very easy to trigger, even with acpi. Simply have a ram
> > block on the source be bigger than one on the target. The sync code
> > (migration/ram.c:qemu_ram_resize()) will perform the resize during
> > precopy. Postcopy misses to discard the additional memory.

But when resizing happens during precopy, we should cancel this
migration directly?  Hmm?...

> > 
> > Maybe my description was confusing. But this really just triggers when
> > 
> > - Postcopy is advised and discards memory on all ram blocks
> > - Precopy grows the RAM block when syncing the RAM block sizes with the
> > source
> > 
> > Postcopy misses to discard the new RAM.
> > 
> >>
> >>   - it's a very short transition period between precopy and postcopy,
> >>     so I was curious about how you made sure that the grow/shrink
> >>     happened exactly during that period
> >>
> >>   - during the period, IIUC it was still in the main thread, which
> >>     means logically QEMU should not be able to respond to any QMP/HMP
> >>     command at all...  So even if you send a command, I think it'll
> >>     only be executed later after the transition completes
> >>
> >>   - this I'm not sure, but ... even for virtio-mem, the resizing can
> >>     only happen after guest ack it, right?  During the precopy to
> >>     postcopy transition period, the VM is stopped, AFAICT, so
> >>     logically we can't trigger resizing during the transition
> 
> Regarding that question: Resizes will happen without guest interaction
> (e.g., during a reboot, or when increasing the requested size). In the
> future, there are theoretical plans to have resizes that can be
> triggered by guest interaction/request to some extend as well.

I see.  I was thinking about shrinking case which should probably need
an acknowledgement from the guest, but yes increasing seems to be fine
even without it.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24 19:18           ` Peter Xu
@ 2020-02-24 19:34             ` David Hildenbrand
  2020-02-24 20:04               ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-24 19:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	David Hildenbrand, Stefan Hajnoczi, Juan Quintela,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Michael S. Tsirkin, Shannon Zhao,
	Igor Mammedov, Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson



> Am 24.02.2020 um 20:19 schrieb Peter Xu <peterx@redhat.com>:
> 
> On Mon, Feb 24, 2020 at 07:59:10PM +0100, David Hildenbrand wrote:
>>> On 24.02.20 19:44, David Hildenbrand wrote:
>>> On 24.02.20 18:45, Peter Xu wrote:
>>>> On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
>>>>> On 21.02.20 19:04, Peter Xu wrote:
>>>>>> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
>>>>>>> I was now able to actually test resizing while migrating. I am using the
>>>>>>> prototype of virtio-mem to test (which also makes use of resizable
>>>>>>> allocations). Things I was able to reproduce:
>>>>>> 
>>>>>> The test cases cover quite a lot.  Thanks for doing that.
>>>>>> 
>>>>>>> - Resize while still running on the migration source. Migration is canceled
>>>>>>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
>>>>>> 
>>>>>>> - Resize (grow+shrink) on the migration target during postcopy migration
>>>>>>>  (when syncing RAM blocks), while not yet running on the target
>>>>>>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>>>>>>>   and the VM is stopped", and overall RAM size synchronization. Seems to
>>>>>>>   work just fine.
>>>>>> 
>>>>>> This won't be able to trigger without virtio-mem, right?
>>>>> 
>>>>> AFAIK all cases can also be triggered without virtio-mem (not just that
>>>>> easily :) ). This case would be "RAM block is bigger on source than on
>>>>> destination.".
>>>>> 
>>>>>> 
>>>>>> And I'm also curious on how to test this even with virtio-mem.  Is
>>>>>> that a QMP command to extend/shrink virtio-mem?
>>>>> 
>>>>> Currently, there is a single qom property that can be modifed via
>>>>> QMP/HMP - "requested-size". With resizable resizable memory backends,
>>>>> increasing the requested size will also implicitly grow the RAM block.
>>>>> Shrinking the requested size will currently result in shrinking the RAM
>>>>> block on the next reboot.
>>>>> 
>>>>> So, to trigger growing of a RAM block (assuming requested-size was
>>>>> smaller before, e.g., 1000M)
>>>>> 
>>>>> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
>>>>> 
>>>>> To trigger shrinking (assuming requested-size was bigger before)
>>>>> 
>>>>> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
>>>>> echo 'system_reset' | sudo nc -U $MON
>>>>> 
>>>>> 
>>>>> Placing these at the right spots during a migration allows to test this
>>>>> very reliably.
>>>> 
>>>> I see, thanks for the context.  The question was majorly about when
>>>> you say "during postcopy migration (when syncing RAM blocks), while
>>>> not yet running on the target" - it's not easy to do so imho, because:
>>> 
>>> This case is very easy to trigger, even with acpi. Simply have a ram
>>> block on the source be bigger than one on the target. The sync code
>>> (migration/ram.c:qemu_ram_resize()) will perform the resize during
>>> precopy. Postcopy misses to discard the additional memory.
> 
> But when resizing happens during precopy, we should cancel this
> migration directly?  Hmm?...

?

We are talking about the migration target, not the source. Please have a look at the RAM block size sync code I mentioned. That‘s probably faster than me having to explain it (and obviously failing to do so :) ).



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24 19:34             ` David Hildenbrand
@ 2020-02-24 20:04               ` Peter Xu
  2020-02-24 20:54                 ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-24 20:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	Juan Quintela, Stefan Hajnoczi, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Shannon Zhao, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Mon, Feb 24, 2020 at 08:34:16PM +0100, David Hildenbrand wrote:
> 
> 
> > Am 24.02.2020 um 20:19 schrieb Peter Xu <peterx@redhat.com>:
> > 
> > On Mon, Feb 24, 2020 at 07:59:10PM +0100, David Hildenbrand wrote:
> >>> On 24.02.20 19:44, David Hildenbrand wrote:
> >>> On 24.02.20 18:45, Peter Xu wrote:
> >>>> On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
> >>>>> On 21.02.20 19:04, Peter Xu wrote:
> >>>>>> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
> >>>>>>> I was now able to actually test resizing while migrating. I am using the
> >>>>>>> prototype of virtio-mem to test (which also makes use of resizable
> >>>>>>> allocations). Things I was able to reproduce:
> >>>>>> 
> >>>>>> The test cases cover quite a lot.  Thanks for doing that.
> >>>>>> 
> >>>>>>> - Resize while still running on the migration source. Migration is canceled
> >>>>>>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
> >>>>>> 
> >>>>>>> - Resize (grow+shrink) on the migration target during postcopy migration

[2]

> >>>>>>>  (when syncing RAM blocks), while not yet running on the target
> >>>>>>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
> >>>>>>>   and the VM is stopped", and overall RAM size synchronization. Seems to
> >>>>>>>   work just fine.
> >>>>>> 
> >>>>>> This won't be able to trigger without virtio-mem, right?
> >>>>> 
> >>>>> AFAIK all cases can also be triggered without virtio-mem (not just that
> >>>>> easily :) ). This case would be "RAM block is bigger on source than on
> >>>>> destination.".
> >>>>> 
> >>>>>> 
> >>>>>> And I'm also curious on how to test this even with virtio-mem.  Is
> >>>>>> that a QMP command to extend/shrink virtio-mem?
> >>>>> 
> >>>>> Currently, there is a single qom property that can be modifed via
> >>>>> QMP/HMP - "requested-size". With resizable resizable memory backends,
> >>>>> increasing the requested size will also implicitly grow the RAM block.
> >>>>> Shrinking the requested size will currently result in shrinking the RAM
> >>>>> block on the next reboot.
> >>>>> 
> >>>>> So, to trigger growing of a RAM block (assuming requested-size was
> >>>>> smaller before, e.g., 1000M)
> >>>>> 
> >>>>> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
> >>>>> 
> >>>>> To trigger shrinking (assuming requested-size was bigger before)
> >>>>> 
> >>>>> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
> >>>>> echo 'system_reset' | sudo nc -U $MON
> >>>>> 
> >>>>> 
> >>>>> Placing these at the right spots during a migration allows to test this
> >>>>> very reliably.
> >>>> 
> >>>> I see, thanks for the context.  The question was majorly about when
> >>>> you say "during postcopy migration (when syncing RAM blocks), while
> >>>> not yet running on the target" - it's not easy to do so imho, because:
> >>> 
> >>> This case is very easy to trigger, even with acpi. Simply have a ram
> >>> block on the source be bigger than one on the target. The sync code
> >>> (migration/ram.c:qemu_ram_resize()) will perform the resize during

[1]

> >>> precopy. Postcopy misses to discard the additional memory.
> > 
> > But when resizing happens during precopy, we should cancel this
> > migration directly?  Hmm?...
> 
> ?
> 
> We are talking about the migration target, not the source. Please have a look at the RAM block size sync code I mentioned. That‘s probably faster than me having to explain it (and obviously failing to do so :) ).

OK finally I noticed you meant migration/ram.c:ram_load_precopy() [1]
not qemu_ram_resize().  And at [2] I think you meant during precopy
migration, not postcopy.  Those are probably the things that made me
confused.  And yes we need to consider this case.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
  2020-02-24 20:04               ` Peter Xu
@ 2020-02-24 20:54                 ` David Hildenbrand
  0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-24 20:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Stefano Stabellini, Eduardo Habkost,
	David Hildenbrand, Stefan Hajnoczi, Juan Quintela,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paul Durrant, Alex Williamson, Michael S. Tsirkin, Shannon Zhao,
	Igor Mammedov, Anthony Perard, Paolo Bonzini, Alex Bennée,
	Richard Henderson



> Am 24.02.2020 um 21:04 schrieb Peter Xu <peterx@redhat.com>:
> 
> On Mon, Feb 24, 2020 at 08:34:16PM +0100, David Hildenbrand wrote:
>> 
>> 
>>>> Am 24.02.2020 um 20:19 schrieb Peter Xu <peterx@redhat.com>:
>>> 
>>> On Mon, Feb 24, 2020 at 07:59:10PM +0100, David Hildenbrand wrote:
>>>>> On 24.02.20 19:44, David Hildenbrand wrote:
>>>>> On 24.02.20 18:45, Peter Xu wrote:
>>>>>> On Mon, Feb 24, 2020 at 10:09:19AM +0100, David Hildenbrand wrote:
>>>>>>> On 21.02.20 19:04, Peter Xu wrote:
>>>>>>>> On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
>>>>>>>>> I was now able to actually test resizing while migrating. I am using the
>>>>>>>>> prototype of virtio-mem to test (which also makes use of resizable
>>>>>>>>> allocations). Things I was able to reproduce:
>>>>>>>> 
>>>>>>>> The test cases cover quite a lot.  Thanks for doing that.
>>>>>>>> 
>>>>>>>>> - Resize while still running on the migration source. Migration is canceled
>>>>>>>>> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"
>>>>>>>> 
>>>>>>>>> - Resize (grow+shrink) on the migration target during postcopy migration
> 
> [2]
> 
>>>>>>>>> (when syncing RAM blocks), while not yet running on the target
>>>>>>>>> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>>>>>>>>>  and the VM is stopped", and overall RAM size synchronization. Seems to
>>>>>>>>>  work just fine.
>>>>>>>> 
>>>>>>>> This won't be able to trigger without virtio-mem, right?
>>>>>>> 
>>>>>>> AFAIK all cases can also be triggered without virtio-mem (not just that
>>>>>>> easily :) ). This case would be "RAM block is bigger on source than on
>>>>>>> destination.".
>>>>>>> 
>>>>>>>> 
>>>>>>>> And I'm also curious on how to test this even with virtio-mem.  Is
>>>>>>>> that a QMP command to extend/shrink virtio-mem?
>>>>>>> 
>>>>>>> Currently, there is a single qom property that can be modifed via
>>>>>>> QMP/HMP - "requested-size". With resizable resizable memory backends,
>>>>>>> increasing the requested size will also implicitly grow the RAM block.
>>>>>>> Shrinking the requested size will currently result in shrinking the RAM
>>>>>>> block on the next reboot.
>>>>>>> 
>>>>>>> So, to trigger growing of a RAM block (assuming requested-size was
>>>>>>> smaller before, e.g., 1000M)
>>>>>>> 
>>>>>>> echo "qom-set vm1 requested-size 6000M" | sudo nc -U $MON
>>>>>>> 
>>>>>>> To trigger shrinking (assuming requested-size was bigger before)
>>>>>>> 
>>>>>>> echo "qom-set vm1 requested-size 100M" | sudo nc -U $MON
>>>>>>> echo 'system_reset' | sudo nc -U $MON
>>>>>>> 
>>>>>>> 
>>>>>>> Placing these at the right spots during a migration allows to test this
>>>>>>> very reliably.
>>>>>> 
>>>>>> I see, thanks for the context.  The question was majorly about when
>>>>>> you say "during postcopy migration (when syncing RAM blocks), while
>>>>>> not yet running on the target" - it's not easy to do so imho, because:
>>>>> 
>>>>> This case is very easy to trigger, even with acpi. Simply have a ram
>>>>> block on the source be bigger than one on the target. The sync code
>>>>> (migration/ram.c:qemu_ram_resize()) will perform the resize during
> 
> [1]
> 
>>>>> precopy. Postcopy misses to discard the additional memory.
>>> 
>>> But when resizing happens during precopy, we should cancel this
>>> migration directly?  Hmm?...
>> 
>> ?
>> 
>> We are talking about the migration target, not the source. Please have a look at the RAM block size sync code I mentioned. That‘s probably faster than me having to explain it (and obviously failing to do so :) ).
> 
> OK finally I noticed you meant migration/ram.c:ram_load_precopy() [1]
> not qemu_ram_resize().

Right, the single invocation of qemu_ram_resize() in that file/function.

> And at [2] I think you meant during precopy
> migration, not postcopy.

The precopy stage when postcopy was advised. Yes, it‘s confusing :)

> Those are probably the things that made me
> confused.  And yes we need to consider this case.  Thanks,

Thanks for having a look!

> 
> -- 
> Peter Xu
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-02-21 16:42 ` [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
@ 2020-02-24 22:26   ` Peter Xu
  2020-02-25  7:28     ` David Hildenbrand
  2020-02-25 16:11   ` Peter Xu
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-24 22:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Shannon Zhao, Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Fri, Feb 21, 2020 at 05:42:01PM +0100, David Hildenbrand wrote:

[...]

> @@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
>                  break;
>              }
>  
> -            if (!offset_in_ramblock(block, addr)) {
> +            /*
> +             * Relying on used_length is racy and can result in false positives.
> +             * We might place pages beyond used_length in case RAM was shrunk
> +             * while in postcopy, which is fine - trying to place via
> +             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
> +             */
> +            if (!block->host || addr >= block->postcopy_length) {

I'm thinking whether we can even avoid the -ENOENT failure of
UFFDIO_COPY.  With the postcopy_length you introduced, I think it's
the case when addr >= used_length && addr < postcopy_length, right?
Can we skip those?

Thanks,

>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
>                  break;
> @@ -3757,6 +3763,7 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
>                               rb->idstr);
>              }
>          }
> +        rb->postcopy_length = new_size;
>          break;
>      case POSTCOPY_INCOMING_NONE:
>      case POSTCOPY_INCOMING_RUNNING:
> -- 
> 2.24.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy
  2020-02-21 16:41 ` [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
@ 2020-02-24 22:27   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2020-02-24 22:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Shannon Zhao, Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Fri, Feb 21, 2020 at 05:41:56PM +0100, David Hildenbrand wrote:
> Resizing while migrating is dangerous and does not work as expected.
> The whole migration code works on the usable_length of ram blocks and does
> not expect this to change at random points in time.
> 
> In the case of precopy, the ram block size must not change on the source,
> after syncing the RAM block list in ram_save_setup(), so as long as the
> guest is still running on the source.
> 
> Resizing can be trigger *after* (but not during) a reset in
> ACPI code by the guest
> - hw/arm/virt-acpi-build.c:acpi_ram_update()
> - hw/i386/acpi-build.c:acpi_ram_update()
> 
> Use the ram block notifier to get notified about resizes. Let's simply
> cancel migration and indicate the reason. We'll continue running on the
> source. No harm done.
> 
> Update the documentation. Postcopy will be handled separately.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Shannon Zhao <shannon.zhao@linaro.org>
> Cc: Alex Bennée <alex.bennee@linaro.org>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range()
  2020-02-21 16:41 ` [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
@ 2020-02-24 22:27   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2020-02-24 22:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Fri, Feb 21, 2020 at 05:41:57PM +0100, David Hildenbrand wrote:
> We want to make use of ram_block_discard_range() in the RAM block resize
> callback when growing a RAM block, *before* used_length is changed.
> Let's relax the check. We always have a reserved mapping for the whole
> max_length, so we cannot corrupt unrelated data.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init()
  2020-02-21 16:41 ` [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
@ 2020-02-24 22:28   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2020-02-24 22:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Fri, Feb 21, 2020 at 05:41:58PM +0100, David Hildenbrand wrote:
> In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when
> synchronizing the RAM block state with the migration source), the resized
> part would not get discarded. Let's perform that when being notified
> about a resize while postcopy has been advised, but is not listening
> yet. With precopy, the process is as following:
> 
> 1. VM created
> - RAM blocks are created
> 2. Incomming migration started
> - Postcopy is advised
> - All pages in RAM blocks are discarded
> 3. Precopy starts
> - RAM blocks are resized to match the size on the migration source.
> - RAM pages from precopy stream are loaded
> - Uffd handler is registered, postcopy starts listening
> 3. Guest started, postcopy running
> - Pagefaults get resolved, pages get placed
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-21 16:42 ` [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
@ 2020-02-24 22:49   ` Peter Xu
  2020-02-25  7:44     ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-24 22:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
> When we partially change mappings (esp., mmap over parts of an existing
> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
> registered, the handler will implicitly be unregistered from the parts that
> changed.
> 
> Trying to place pages onto mappings where there is no longer a handler
> registered will fail. Let's make sure that any waiter is woken up - we
> have to do that manually.
> 
> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
> 
> This is mainly a preparation for RAM blocks with resizable allcoations,
> where the mapping of the invalid RAM range will change. The source will
> keep sending pages that are outside of the new (shrunk) RAM size. We have
> to treat these pages like they would have been migrated, but can
> essentially simply drop the content (ignore the placement error).
> 
> Keep printing a warning on EINVAL, to avoid hiding other (programming)
> issues. ENOENT is unique.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index c68caf4e42..f023830b9a 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
>      range_struct.start = (uintptr_t)host_addr;
>      range_struct.len = length;
>  
> +    /*
> +     * In case the mapping was partially changed since we enabled userfault
> +     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
> +     * for the mappings that changed. Unregistering will, however, still work
> +     * and ignore mappings without a registered handler.
> +     */

Ideally we should still only unregister what we have registered.
After all we do have this information because we know what we
registered, we know what has unmapped (in your new resize() hook, when
postcopy_state==RUNNING).

An extreme example is when we register with pages in range [A, B),
then shrink it to [A, C), then we mapped something else within [C, B)
(note, with virtio-mem logically B can be very big and C can be very
small, it means [B, C) can cover quite some address space). Then if:

  - [C, B) memory type is not compatible with uffd, or

  - [C, B) could be registered with uffd again due to some other
    reason (so far QEMU should not have such a reason)

Then the unregister could fail or misbehave, IMHO.  Another benefit is
that...

>      if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
>          error_report("%s: userfault unregister %s", __func__, strerror(errno));
>  
> @@ -1180,6 +1186,17 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>      return 0;
>  }
>  
> +static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
> +                               uint64_t pagesize)
> +{
> +    struct uffdio_range range = {
> +        .start = (uint64_t)(uintptr_t)host_addr,
> +        .len = pagesize,
> +    };
> +
> +    return ioctl(userfault_fd, UFFDIO_WAKE, &range);
> +}
> +
>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>                                 void *from_addr, uint64_t pagesize, RAMBlock *rb)
>  {
> @@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>          zero_struct.mode = 0;
>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>      }
> +
> +    /*
> +     * When the mapping gets partially changed (e.g., qemu_ram_remap()) before
> +     * we try to place a page, the userfaultfd handler will be removed for the
> +     * changed mappings and placing pages will fail. We can safely ignore this,
> +     * because mappings that changed on the destination don't need data from the
> +     * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that page
> +     * (unlikely but possible). Waking up waiters is always possible, even
> +     * without a registered userfaultfd handler.
> +     *
> +     * Old kernels report EINVAL, new kernels report ENOENT in case there is
> +     * no longer a userfaultfd handler for a mapping.
> +     */
> +    if (ret && (errno == ENOENT || errno == EINVAL)) {
> +        if (errno == EINVAL) {
> +            warn_report("%s: Failed to place page %p. Waking up any waiters.",
> +                         __func__, host_addr);
> +        }
> +        ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize);

... if with above information (takes notes on where we registered
uffd), I think we don't need to capture error, but we can simply skip
those outliers.

Thanks,

> +    }
>      if (!ret) {
>          ramblock_recv_bitmap_set_range(rb, host_addr,
>                                         pagesize / qemu_target_page_size());
> -- 
> 2.24.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-02-24 22:26   ` Peter Xu
@ 2020-02-25  7:28     ` David Hildenbrand
  0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2020-02-25  7:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Shannon Zhao, Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On 24.02.20 23:26, Peter Xu wrote:
> On Fri, Feb 21, 2020 at 05:42:01PM +0100, David Hildenbrand wrote:
> 
> [...]
> 
>> @@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
>>                  break;
>>              }
>>  
>> -            if (!offset_in_ramblock(block, addr)) {
>> +            /*
>> +             * Relying on used_length is racy and can result in false positives.
>> +             * We might place pages beyond used_length in case RAM was shrunk
>> +             * while in postcopy, which is fine - trying to place via
>> +             * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
>> +             */
>> +            if (!block->host || addr >= block->postcopy_length) {
> 
> I'm thinking whether we can even avoid the -ENOENT failure of
> UFFDIO_COPY.  With the postcopy_length you introduced, I think it's
> the case when addr >= used_length && addr < postcopy_length, right?
> Can we skip those?

1. Recall that any check against used_length is completely racy. So no,
it's not that easy. There is no trusting on used_length at all. It
should never be access from asynchronous postcopy code.

2. There is one theoretical case with resizable allocations: Assume you
first shrink and then grow again. You would have some addr < used_length
where you cannot (and don't want to) place.


Note: Before discovering the nice -ENOENT handling, I had a second
variable postcopy_place_length stored in RAM blocks that would be

- Initialized to postcopy_length
- Synchronized by a mutex
- Changed inside the resize callback on any resizes to
-- postcopy_place_length = min(postcopy_place_length, newsize)

But TBH, I find using -ENOENT much more elegant. It was designed to
handle mmap changes like this.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-24 22:49   ` Peter Xu
@ 2020-02-25  7:44     ` David Hildenbrand
  2020-02-25 14:27       ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2020-02-25  7:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On 24.02.20 23:49, Peter Xu wrote:
> On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
>> When we partially change mappings (esp., mmap over parts of an existing
>> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
>> registered, the handler will implicitly be unregistered from the parts that
>> changed.
>>
>> Trying to place pages onto mappings where there is no longer a handler
>> registered will fail. Let's make sure that any waiter is woken up - we
>> have to do that manually.
>>
>> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
>>
>> This is mainly a preparation for RAM blocks with resizable allcoations,
>> where the mapping of the invalid RAM range will change. The source will
>> keep sending pages that are outside of the new (shrunk) RAM size. We have
>> to treat these pages like they would have been migrated, but can
>> essentially simply drop the content (ignore the placement error).
>>
>> Keep printing a warning on EINVAL, to avoid hiding other (programming)
>> issues. ENOENT is unique.
>>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>>
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index c68caf4e42..f023830b9a 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
>>      range_struct.start = (uintptr_t)host_addr;
>>      range_struct.len = length;
>>  
>> +    /*
>> +     * In case the mapping was partially changed since we enabled userfault
>> +     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
>> +     * for the mappings that changed. Unregistering will, however, still work
>> +     * and ignore mappings without a registered handler.
>> +     */
> 
> Ideally we should still only unregister what we have registered.
> After all we do have this information because we know what we
> registered, we know what has unmapped (in your new resize() hook, when
> postcopy_state==RUNNING).

Not in the case of qemu_ram_remap(). And whatever you propose will
require synchronization (see my other mail) and more complicated
handling than this. uffd allows you to handle races with mmap changes in
a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).

> 
> An extreme example is when we register with pages in range [A, B),
> then shrink it to [A, C), then we mapped something else within [C, B)
> (note, with virtio-mem logically B can be very big and C can be very
> small, it means [B, C) can cover quite some address space). Then if:
> 
>   - [C, B) memory type is not compatible with uffd, or

That will never happen in the near future. Without resizable allocations:
- All memory is either anonymous or from a single fd

In addition, right now, only anonymous memory can be used for resizable
RAM. However, with resizable allocations we could have:
- All used_length memory is either anonymous or from a single fd
- All remaining memory is either anonymous or from a single fd

Everything else does not make any sense IMHO and I don't think this is
relevant long term. You cannot arbitrarily map things into the
used_length part of a RAMBlock. That would contradict to its page_size
and its fd. E.g., you would break qemu_ram_remap().

> 
>   - [C, B) could be registered with uffd again due to some other
>     reason (so far QEMU should not have such a reason)

Any code that wants to make use of uffd properly has to synchronize
against postcopy code either way IMHO. It just doesn't work otherwise.

E.g., once I would use it to protect unplugged memory in virtio-mem
(something I am looking into right now and teaching QEMU not to touch
all RAMBlock memory is complicated :) ), virtio-mem would unregister any
uffd handler when notified that postcopy will start, and re-register
after postcopy finished.

[...]


>>  
>> +static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
>> +                               uint64_t pagesize)
>> +{
>> +    struct uffdio_range range = {
>> +        .start = (uint64_t)(uintptr_t)host_addr,
>> +        .len = pagesize,
>> +    };
>> +
>> +    return ioctl(userfault_fd, UFFDIO_WAKE, &range);
>> +}
>> +
>>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>                                 void *from_addr, uint64_t pagesize, RAMBlock *rb)
>>  {
>> @@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>          zero_struct.mode = 0;
>>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>      }
>> +
>> +    /*
>> +     * When the mapping gets partially changed (e.g., qemu_ram_remap()) before
>> +     * we try to place a page, the userfaultfd handler will be removed for the
>> +     * changed mappings and placing pages will fail. We can safely ignore this,
>> +     * because mappings that changed on the destination don't need data from the
>> +     * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that page
>> +     * (unlikely but possible). Waking up waiters is always possible, even
>> +     * without a registered userfaultfd handler.
>> +     *
>> +     * Old kernels report EINVAL, new kernels report ENOENT in case there is
>> +     * no longer a userfaultfd handler for a mapping.
>> +     */
>> +    if (ret && (errno == ENOENT || errno == EINVAL)) {
>> +        if (errno == EINVAL) {
>> +            warn_report("%s: Failed to place page %p. Waking up any waiters.",
>> +                         __func__, host_addr);
>> +        }
>> +        ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize);
> 
> ... if with above information (takes notes on where we registered
> uffd), I think we don't need to capture error, but we can simply skip
> those outliers.

I think we could skip them in general. Nobody should be touching that
memory. But debugging e.g., a SEGFAULT is easier than debugging some
sleeping thread IMHO.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-25  7:44     ` David Hildenbrand
@ 2020-02-25 14:27       ` Peter Xu
  2020-02-25 15:37         ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2020-02-25 14:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Tue, Feb 25, 2020 at 08:44:56AM +0100, David Hildenbrand wrote:
> On 24.02.20 23:49, Peter Xu wrote:
> > On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
> >> When we partially change mappings (esp., mmap over parts of an existing
> >> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
> >> registered, the handler will implicitly be unregistered from the parts that
> >> changed.
> >>
> >> Trying to place pages onto mappings where there is no longer a handler
> >> registered will fail. Let's make sure that any waiter is woken up - we
> >> have to do that manually.
> >>
> >> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
> >>
> >> This is mainly a preparation for RAM blocks with resizable allcoations,
> >> where the mapping of the invalid RAM range will change. The source will
> >> keep sending pages that are outside of the new (shrunk) RAM size. We have
> >> to treat these pages like they would have been migrated, but can
> >> essentially simply drop the content (ignore the placement error).
> >>
> >> Keep printing a warning on EINVAL, to avoid hiding other (programming)
> >> issues. ENOENT is unique.
> >>
> >> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Andrea Arcangeli <aarcange@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 37 insertions(+)
> >>
> >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >> index c68caf4e42..f023830b9a 100644
> >> --- a/migration/postcopy-ram.c
> >> +++ b/migration/postcopy-ram.c
> >> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
> >>      range_struct.start = (uintptr_t)host_addr;
> >>      range_struct.len = length;
> >>  
> >> +    /*
> >> +     * In case the mapping was partially changed since we enabled userfault
> >> +     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
> >> +     * for the mappings that changed. Unregistering will, however, still work
> >> +     * and ignore mappings without a registered handler.
> >> +     */
> > 
> > Ideally we should still only unregister what we have registered.
> > After all we do have this information because we know what we
> > registered, we know what has unmapped (in your new resize() hook, when
> > postcopy_state==RUNNING).
> 
> Not in the case of qemu_ram_remap(). And whatever you propose will
> require synchronization (see my other mail) and more complicated
> handling than this. uffd allows you to handle races with mmap changes in
> a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).

All writers to the new postcopy_min_length should have BQL already.
The only left is the last cleanup_range() where we can take the BQL
for a while.  However...

> 
> > 
> > An extreme example is when we register with pages in range [A, B),
> > then shrink it to [A, C), then we mapped something else within [C, B)
> > (note, with virtio-mem logically B can be very big and C can be very
> > small, it means [B, C) can cover quite some address space). Then if:
> > 
> >   - [C, B) memory type is not compatible with uffd, or
> 
> That will never happen in the near future. Without resizable allocations:
> - All memory is either anonymous or from a single fd
> 
> In addition, right now, only anonymous memory can be used for resizable
> RAM. However, with resizable allocations we could have:
> - All used_length memory is either anonymous or from a single fd
> - All remaining memory is either anonymous or from a single fd
> 
> Everything else does not make any sense IMHO and I don't think this is
> relevant long term. You cannot arbitrarily map things into the
> used_length part of a RAMBlock. That would contradict to its page_size
> and its fd. E.g., you would break qemu_ram_remap().

... I think this persuaded me. :) You are right they can still be
protected until max_length with PROT_NONE.  Would you mind add some of
the above into the comment above unregister of uffd?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
  2020-02-25 14:27       ` Peter Xu
@ 2020-02-25 15:37         ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2020-02-25 15:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrea Arcangeli, Eduardo Habkost, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Tue, Feb 25, 2020 at 09:27:40AM -0500, Peter Xu wrote:
> On Tue, Feb 25, 2020 at 08:44:56AM +0100, David Hildenbrand wrote:
> > On 24.02.20 23:49, Peter Xu wrote:
> > > On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
> > >> When we partially change mappings (esp., mmap over parts of an existing
> > >> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
> > >> registered, the handler will implicitly be unregistered from the parts that
> > >> changed.
> > >>
> > >> Trying to place pages onto mappings where there is no longer a handler
> > >> registered will fail. Let's make sure that any waiter is woken up - we
> > >> have to do that manually.
> > >>
> > >> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
> > >>
> > >> This is mainly a preparation for RAM blocks with resizable allcoations,
> > >> where the mapping of the invalid RAM range will change. The source will
> > >> keep sending pages that are outside of the new (shrunk) RAM size. We have
> > >> to treat these pages like they would have been migrated, but can
> > >> essentially simply drop the content (ignore the placement error).
> > >>
> > >> Keep printing a warning on EINVAL, to avoid hiding other (programming)
> > >> issues. ENOENT is unique.
> > >>
> > >> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >> Cc: Juan Quintela <quintela@redhat.com>
> > >> Cc: Peter Xu <peterx@redhat.com>
> > >> Cc: Andrea Arcangeli <aarcange@redhat.com>
> > >> Signed-off-by: David Hildenbrand <david@redhat.com>
> > >> ---
> > >>  migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
> > >>  1 file changed, 37 insertions(+)
> > >>
> > >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > >> index c68caf4e42..f023830b9a 100644
> > >> --- a/migration/postcopy-ram.c
> > >> +++ b/migration/postcopy-ram.c
> > >> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
> > >>      range_struct.start = (uintptr_t)host_addr;
> > >>      range_struct.len = length;
> > >>  
> > >> +    /*
> > >> +     * In case the mapping was partially changed since we enabled userfault
> > >> +     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
> > >> +     * for the mappings that changed. Unregistering will, however, still work
> > >> +     * and ignore mappings without a registered handler.
> > >> +     */
> > > 
> > > Ideally we should still only unregister what we have registered.
> > > After all we do have this information because we know what we
> > > registered, we know what has unmapped (in your new resize() hook, when
> > > postcopy_state==RUNNING).
> > 
> > Not in the case of qemu_ram_remap(). And whatever you propose will
> > require synchronization (see my other mail) and more complicated
> > handling than this. uffd allows you to handle races with mmap changes in
> > a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).
> 
> All writers to the new postcopy_min_length should have BQL already.
> The only left is the last cleanup_range() where we can take the BQL
> for a while.  However...
> 
> > 
> > > 
> > > An extreme example is when we register with pages in range [A, B),
> > > then shrink it to [A, C), then we mapped something else within [C, B)
> > > (note, with virtio-mem logically B can be very big and C can be very
> > > small, it means [B, C) can cover quite some address space). Then if:
> > > 
> > >   - [C, B) memory type is not compatible with uffd, or
> > 
> > That will never happen in the near future. Without resizable allocations:
> > - All memory is either anonymous or from a single fd
> > 
> > In addition, right now, only anonymous memory can be used for resizable
> > RAM. However, with resizable allocations we could have:
> > - All used_length memory is either anonymous or from a single fd
> > - All remaining memory is either anonymous or from a single fd
> > 
> > Everything else does not make any sense IMHO and I don't think this is
> > relevant long term. You cannot arbitrarily map things into the
> > used_length part of a RAMBlock. That would contradict to its page_size
> > and its fd. E.g., you would break qemu_ram_remap().
> 
> ... I think this persuaded me. :) You are right they can still be
> protected until max_length with PROT_NONE.  Would you mind add some of
> the above into the comment above unregister of uffd?

Sorry please ignore the PROT_NONE part.  Maybe just mention something
like "the used_length will either be or larger than the userfaultfd
registered range (with the same memory type), so it will be safe to
unregister for even bigger than what we have registered".  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy
  2020-02-21 16:42 ` [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
  2020-02-24 22:26   ` Peter Xu
@ 2020-02-25 16:11   ` Peter Xu
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Xu @ 2020-02-25 16:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Juan Quintela, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Shannon Zhao, Igor Mammedov, Paolo Bonzini, Alex Bennée,
	Richard Henderson

On Fri, Feb 21, 2020 at 05:42:01PM +0100, David Hildenbrand wrote:
> Resizing while migrating is dangerous and does not work as expected.
> The whole migration code works on the usable_length of ram blocks and does
> not expect this to change at random points in time.
> 
> In the case of postcopy, relying on used_length is racy as soon as the
> guest is running. Also, when used_length changes we might leave the
> uffd handler registered for some memory regions, reject valid pages
> when migrating and fail when sending the recv bitmap to the source.
> 
> Resizing can be trigger *after* (but not during) a reset in
> ACPI code by the guest
> - hw/arm/virt-acpi-build.c:acpi_ram_update()
> - hw/i386/acpi-build.c:acpi_ram_update()
> 
> Let's remember the original used_length in a separate variable and
> use it in relevant postcopy code. Make sure to update it when we resize
> during precopy, when synchronizing the RAM block sizes with the source.
> 
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Shannon Zhao <shannon.zhao@linaro.org>
> Cc: Alex Bennée <alex.bennee@linaro.org>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-02-25 16:13 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 03/13] numa: Teach ram block notifiers about resizeable ram blocks David Hildenbrand
2020-02-21 16:41   ` [Xen-devel] " David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 04/13] numa: Make all callbacks of ram block notifiers optional David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
2020-02-24 22:27   ` Peter Xu
2020-02-21 16:41 ` [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
2020-02-24 22:27   ` Peter Xu
2020-02-21 16:41 ` [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
2020-02-24 22:28   ` Peter Xu
2020-02-21 16:41 ` [PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
2020-02-24 22:26   ` Peter Xu
2020-02-25  7:28     ` David Hildenbrand
2020-02-25 16:11   ` Peter Xu
2020-02-21 16:42 ` [PATCH v2 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
2020-02-24 22:49   ` Peter Xu
2020-02-25  7:44     ` David Hildenbrand
2020-02-25 14:27       ` Peter Xu
2020-02-25 15:37         ` Peter Xu
2020-02-21 18:04 ` [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating Peter Xu
2020-02-24  9:09   ` David Hildenbrand
2020-02-24 17:45     ` Peter Xu
2020-02-24 18:44       ` David Hildenbrand
2020-02-24 18:59         ` David Hildenbrand
2020-02-24 19:18           ` Peter Xu
2020-02-24 19:34             ` David Hildenbrand
2020-02-24 20:04               ` Peter Xu
2020-02-24 20:54                 ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.