All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager
@ 2021-06-16 16:29 David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 1/6] memory: Introduce replay_discarded callback for RamDiscardManager David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

This is based on another series which introduces the RamDiscardManager
infrastructure: "[PATCH RESEND v7 00/13] virtio-mem: vfio support" [1]

Based-on: 20210413095531.25603-1-david@redhat.com

virtio-mem exposes a dynamic amount of memory within RAMBlocks by
coordinating with the VM. Memory within a RAMBlock can either get
plugged and consequently used by the VM, or unplugged and consequently no
longer used by the VM. Logical unplug is realized by discarding the
physical memory backing for virtual memory ranges, similar to memory
ballooning.

However, important difference to virtio-balloon are:

a) A virtio-mem device only operates on its assigned memory region /
   RAMBlock ("device memory")
b) Initially, all device memory is logically unplugged
c) Virtual machines will never accidentally reuse memory that is currently
   logically unplugged. The spec defines most accesses to unplugged memory
   as "undefined behavior" -- except reading unplugged memory, which is
   currently expected to work, but that will change in the future.
d) The (un)plug granularity is in the range of megabytes -- "memory blocks"
e) The state (plugged/unplugged) of a memory block is always known and
   properly tracked.

Whenever memory blocks within the RAMBlock get (un)plugged, changes are
communicated via the RamDiscardManager to other QEMU subsystems, most
prominently vfio which updates the DMA mapping accordingly. "Unplugging"
corresponds to "discarding" and "plugging" corresponds to "populating".

While migrating (precopy/postcopy) that state of such memory blocks cannot
change. We never ever want to migrate such logically unplugged memory,
because it can result in an unintended memory consumption both, on the
source (when reading memory from some memory backends) and on the
destination (when writing memory). Further, migration time can be heavily
reduced when skipping logically unplugged blocks and we avoid populating
unnecessary page tables in Linux.

Right now, virtio-mem reuses the free page hinting infrastructure during
precopy to exclude all logically unplugged ("discarded") parts from the
migration stream. However, there are some scenarios that are not handled
properly and need fixing. Further, there are some ugly corner cases in
postcopy code and background snapshotting code that similarly have to
handle such special RAMBlocks.

Let's reuse the RamDiscardManager infrastructure to essentially handle
precopy, postcopy and background snapshots cleanly, which means:

a) In precopy code, always clearing all dirty bits from the bitmap that
   correspond to discarded range, whenever we update the dirty bitmap. This
   results in logically unplugged memory to never get migrated.
b) In postcopy code, placing a zeropage when requested to handle a page
   falling into a discarded range -- because the source will never send it.
c) In background snapshot code, never populating discarded ranges, not even
   with the shared zeropage, to avoid unintended memory consumption,
   especially in the future with hugetlb and shmem.

Detail: When realizing a virtio-mem devices, it will register the RAM
        for migration via vmstate_register_ram(). Further, it will
        set itself as the RamDiscardManager for the corresponding memory
        region of the RAMBlock via memory_region_set_ram_discard_manager().
        Last but not least, memory device code will actually map the
        memory region into guest physical address space. So migration
        code can always properly identify such RAMBlocks.

Tested with precopy/postcopy on shmem, where even reading unpopulated
memory ranges will populate actual memory and not the shared zeropage.
Tested with background snapshots on anonymous memory, because other
backends are not supported yet (for example, Linux does not support
userfaultfd WP on shmem yet).

[1] https://lkml.kernel.org/r/20210413095531.25603-1-david@redhat.com

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>

David Hildenbrand (6):
  memory: Introduce replay_discarded callback for RamDiscardManager
  virtio-mem: Implement replay_discarded RamDiscardManager callback
  migration/ram: Handle RAMBlocks with a RamDiscardManager on the
    migration source
  virtio-mem: Drop precopy notifier
  migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the
    destination
  migration/ram: Handle RAMBlocks with a RamDiscardManager on background
    snapshots

 hw/virtio/virtio-mem.c         |  92 +++++++++++++--------
 include/exec/memory.h          |  21 +++++
 include/hw/virtio/virtio-mem.h |   3 -
 migration/postcopy-ram.c       |  25 +++++-
 migration/ram.c                | 147 ++++++++++++++++++++++++++++++---
 migration/ram.h                |   1 +
 softmmu/memory.c               |  11 +++
 7 files changed, 246 insertions(+), 54 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v1 1/6] memory: Introduce replay_discarded callback for RamDiscardManager
  2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
@ 2021-06-16 16:29 ` David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 2/6] virtio-mem: Implement replay_discarded RamDiscardManager callback David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

Introduce replay_discarded callback similar to our existing
replay_populated callback, to be used my migration code to never migrate
discarded memory.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/memory.h | 21 +++++++++++++++++++++
 softmmu/memory.c      | 11 +++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 84e739f106..1afdba2c27 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -530,6 +530,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
 }
 
 typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
+typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
 
 /*
  * RamDiscardManagerClass:
@@ -618,6 +619,21 @@ struct RamDiscardManagerClass {
                             MemoryRegionSection *section,
                             ReplayRamPopulate replay_fn, void *opaque);
 
+    /**
+     * @replay_discarded:
+     *
+     * Call the #ReplayRamDiscard callback for all discarded parts within the
+     * #MemoryRegionSection via the #RamDiscardManager.
+     *
+     * @rdm: the #RamDiscardManager
+     * @section: the #MemoryRegionSection
+     * @replay_fn: the #ReplayRamDiscard callback
+     * @opaque: pointer to forward to the callback
+     */
+    void (*replay_discarded)(const RamDiscardManager *rdm,
+                             MemoryRegionSection *section,
+                             ReplayRamDiscard replay_fn, void *opaque);
+
     /**
      * @register_listener:
      *
@@ -662,6 +678,11 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
                                          ReplayRamPopulate replay_fn,
                                          void *opaque);
 
+void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+                                          MemoryRegionSection *section,
+                                          ReplayRamDiscard replay_fn,
+                                          void *opaque);
+
 void ram_discard_manager_register_listener(RamDiscardManager *rdm,
                                            RamDiscardListener *rdl,
                                            MemoryRegionSection *section);
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 920ebc0f4e..ae32083873 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2073,6 +2073,17 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
     return rdmc->replay_populated(rdm, section, replay_fn, opaque);
 }
 
+void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+                                          MemoryRegionSection *section,
+                                          ReplayRamDiscard replay_fn,
+                                          void *opaque)
+{
+    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+
+    g_assert(rdmc->replay_discarded);
+    rdmc->replay_discarded(rdm, section, replay_fn, opaque);
+}
+
 void ram_discard_manager_register_listener(RamDiscardManager *rdm,
                                            RamDiscardListener *rdl,
                                            MemoryRegionSection *section)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 2/6] virtio-mem: Implement replay_discarded RamDiscardManager callback
  2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 1/6] memory: Introduce replay_discarded callback for RamDiscardManager David Hildenbrand
@ 2021-06-16 16:29 ` David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 3/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

Implement it similar to the replay_populated callback.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 58 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index df91e454b2..284096ec5f 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -228,6 +228,38 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
     return ret;
 }
 
+static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
+                                                 MemoryRegionSection *s,
+                                                 void *arg,
+                                                 virtio_mem_section_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    int ret = 0;
+
+    first_bit = s->offset_within_region / vmem->bitmap_size;
+    first_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
+    while (first_bit < vmem->bitmap_size) {
+        MemoryRegionSection tmp = *s;
+
+        offset = first_bit * vmem->block_size;
+        last_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
+                                 first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * vmem->block_size;
+
+        if (!virito_mem_intersect_memory_section(&tmp, offset, size)) {
+            break;
+        }
+        ret = cb(&tmp, arg);
+        if (ret) {
+            break;
+        }
+        first_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
+                                       last_bit + 2);
+    }
+    return ret;
+}
+
 static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
 {
     RamDiscardListener *rdl = arg;
@@ -1170,6 +1202,31 @@ static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
                                             virtio_mem_rdm_replay_populated_cb);
 }
 
+static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
+                                              void *arg)
+{
+    struct VirtIOMEMReplayData *data = arg;
+
+    ((ReplayRamDiscard)data->fn)(s, data->opaque);
+    return 0;
+}
+
+static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
+                                            MemoryRegionSection *s,
+                                            ReplayRamDiscard replay_fn,
+                                            void *opaque)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    struct VirtIOMEMReplayData data = {
+        .fn = replay_fn,
+        .opaque = opaque,
+    };
+
+    g_assert(s->mr == &vmem->memdev->mr);
+    virtio_mem_for_each_unplugged_section(vmem, s, &data,
+                                          virtio_mem_rdm_replay_discarded_cb);
+}
+
 static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
                                              RamDiscardListener *rdl,
                                              MemoryRegionSection *s)
@@ -1234,6 +1291,7 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
     rdmc->is_populated = virtio_mem_rdm_is_populated;
     rdmc->replay_populated = virtio_mem_rdm_replay_populated;
+    rdmc->replay_discarded = virtio_mem_rdm_replay_discarded;
     rdmc->register_listener = virtio_mem_rdm_register_listener;
     rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 3/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source
  2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 1/6] memory: Introduce replay_discarded callback for RamDiscardManager David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 2/6] virtio-mem: Implement replay_discarded RamDiscardManager callback David Hildenbrand
@ 2021-06-16 16:29 ` David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 4/6] virtio-mem: Drop precopy notifier David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

We never want to migrate memory that corresponds to discarded ranges as
managed by a RamDiscardManager responsible for the mapped memory region of
the RAMBlock. The content of these pages is essentially stale and
without any guarantees for the VM ("logically unplugged").

Depending on the underlying memory type, even reading memory might populate
memory on the source, resulting in an undesired memory consumption. Of
course, on the destination, even writing a zeropage consumes memory,
which we also want to avoid (similar to free page hinting).

Currently, virtio-mem tries achieving that goal (not migrating "unplugged"
memory that was discarded) by going via qemu_guest_free_page_hint() - but
it's hackish and incomplete.

For example, background snapshots still end up reading all memory, as
they don't do bitmap syncs. Postcopy recovery code will re-add
previously cleared bits to the dirty bitmap and migrate them.

Let's consult the RamDiscardManager whenever we add bits to the bitmap
and exclude all discarded pages.

As colo is incompatible with discarding of RAM and inhibits it, we don't
have to bother.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 60ea913c54..03867b4971 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -832,14 +832,67 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
     return ret;
 }
 
+static void dirty_bitmap_exclude_section(MemoryRegionSection *section,
+                                         void *opaque)
+{
+    const hwaddr offset = section->offset_within_region;
+    const hwaddr size = int128_get64(section->size);
+    const unsigned long start = offset >> TARGET_PAGE_BITS;
+    const unsigned long npages = size >> TARGET_PAGE_BITS;
+    RAMBlock *rb = section->mr->ram_block;
+    uint64_t *cleared_bits = opaque;
+
+    /* Clear the discarded part from our bitmap. */
+    *cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages);
+    bitmap_clear(rb->bmap, start, npages);
+}
+
+/*
+ * Exclude all dirty pages that fall into a discarded range as managed by a
+ * RamDiscardManager responsible for the mapped memory region of the RAMBlock.
+ *
+ * Discarded pages ("logically unplugged") have undefined content and must
+ * never be migrated, because even reading these pages for migrating might
+ * result in undesired behavior.
+ *
+ * Returns the number of cleared bits in the dirty bitmap.
+ *
+ * Note: The result is only stable while migration (precopy/postcopy).
+ */
+static uint64_t ramblock_dirty_bitmap_exclude_discarded_pages(RAMBlock *rb)
+{
+    uint64_t cleared_bits = 0;
+
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = 0,
+            .size = int128_make64(qemu_ram_get_used_length(rb)),
+        };
+
+        ram_discard_manager_replay_discarded(rdm, &section,
+                                             dirty_bitmap_exclude_section,
+                                             &cleared_bits);
+    }
+    return cleared_bits;
+}
+
 /* Called with RCU critical section */
 static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb)
 {
     uint64_t new_dirty_pages =
         cpu_physical_memory_sync_dirty_bitmap(rb, 0, rb->used_length);
+    uint64_t cleared_pages = 0;
 
-    rs->migration_dirty_pages += new_dirty_pages;
-    rs->num_dirty_pages_period += new_dirty_pages;
+    if (new_dirty_pages) {
+        cleared_pages = ramblock_dirty_bitmap_exclude_discarded_pages(rb);
+    }
+
+    rs->migration_dirty_pages += new_dirty_pages - cleared_pages;
+    if (new_dirty_pages > cleared_pages) {
+        rs->num_dirty_pages_period += new_dirty_pages - cleared_pages;
+    }
 }
 
 /**
@@ -2603,7 +2656,7 @@ static int ram_state_init(RAMState **rsp)
     return 0;
 }
 
-static void ram_list_init_bitmaps(void)
+static void ram_list_init_bitmaps(RAMState *rs)
 {
     MigrationState *ms = migrate_get_current();
     RAMBlock *block;
@@ -2638,6 +2691,10 @@ static void ram_list_init_bitmaps(void)
             bitmap_set(block->bmap, 0, pages);
             block->clear_bmap_shift = shift;
             block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
+
+            /* Exclude all discarded pages we never want to migrate. */
+            pages = ramblock_dirty_bitmap_exclude_discarded_pages(block);
+            rs->migration_dirty_pages -= pages;
         }
     }
 }
@@ -2649,7 +2706,7 @@ static void ram_init_bitmaps(RAMState *rs)
     qemu_mutex_lock_ramlist();
 
     WITH_RCU_READ_LOCK_GUARD() {
-        ram_list_init_bitmaps();
+        ram_list_init_bitmaps(rs);
         /* We don't use dirty log with background snapshots */
         if (!migrate_background_snapshot()) {
             memory_global_dirty_log_start();
@@ -4068,6 +4125,10 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
      */
     bitmap_complement(block->bmap, block->bmap, nbits);
 
+    /* Exclude all discarded pages we never want to migrate. */
+    ramblock_dirty_bitmap_exclude_discarded_pages(block);
+
+    /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare(). */
     trace_ram_dirty_bitmap_reload_complete(block->idstr);
 
     /*
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 4/6] virtio-mem: Drop precopy notifier
  2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-06-16 16:29 ` [PATCH v1 3/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source David Hildenbrand
@ 2021-06-16 16:29 ` David Hildenbrand
  2021-07-02 14:53   ` Michael S. Tsirkin
  2021-06-16 16:29 ` [PATCH v1 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 6/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots David Hildenbrand
  5 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

Migration code now properly handles RAMBlocks which are indirectly managed
by a RamDiscardManager. No need for manual handling via the free page
optimization interface, let's get rid of it.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c         | 34 ----------------------------------
 include/hw/virtio/virtio-mem.h |  3 ---
 2 files changed, 37 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 284096ec5f..d5a578142b 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -776,7 +776,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     host_memory_backend_set_mapped(vmem->memdev, true);
     vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
     qemu_register_reset(virtio_mem_system_reset, vmem);
-    precopy_add_notifier(&vmem->precopy_notifier);
 
     /*
      * Set ourselves as RamDiscardManager before the plug handler maps the
@@ -796,7 +795,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
      * found via an address space anymore. Unset ourselves.
      */
     memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
-    precopy_remove_notifier(&vmem->precopy_notifier);
     qemu_unregister_reset(virtio_mem_system_reset, vmem);
     vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
     host_memory_backend_set_mapped(vmem->memdev, false);
@@ -1089,43 +1087,11 @@ static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name,
     vmem->block_size = value;
 }
 
-static int virtio_mem_precopy_exclude_range_cb(const VirtIOMEM *vmem, void *arg,
-                                               uint64_t offset, uint64_t size)
-{
-    void * const host = qemu_ram_get_host_addr(vmem->memdev->mr.ram_block);
-
-    qemu_guest_free_page_hint(host + offset, size);
-    return 0;
-}
-
-static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem)
-{
-    virtio_mem_for_each_unplugged_range(vmem, NULL,
-                                        virtio_mem_precopy_exclude_range_cb);
-}
-
-static int virtio_mem_precopy_notify(NotifierWithReturn *n, void *data)
-{
-    VirtIOMEM *vmem = container_of(n, VirtIOMEM, precopy_notifier);
-    PrecopyNotifyData *pnd = data;
-
-    switch (pnd->reason) {
-    case PRECOPY_NOTIFY_AFTER_BITMAP_SYNC:
-        virtio_mem_precopy_exclude_unplugged(vmem);
-        break;
-    default:
-        break;
-    }
-
-    return 0;
-}
-
 static void virtio_mem_instance_init(Object *obj)
 {
     VirtIOMEM *vmem = VIRTIO_MEM(obj);
 
     notifier_list_init(&vmem->size_change_notifiers);
-    vmem->precopy_notifier.notify = virtio_mem_precopy_notify;
     QLIST_INIT(&vmem->rdl_list);
 
     object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size,
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 9a6e348fa2..a5dd6a493b 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -65,9 +65,6 @@ struct VirtIOMEM {
     /* notifiers to notify when "size" changes */
     NotifierList size_change_notifiers;
 
-    /* don't migrate unplugged memory */
-    NotifierWithReturn precopy_notifier;
-
     /* listeners to notify on plug/unplug activity. */
     QLIST_HEAD(, RamDiscardListener) rdl_list;
 };
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination
  2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
                   ` (3 preceding siblings ...)
  2021-06-16 16:29 ` [PATCH v1 4/6] virtio-mem: Drop precopy notifier David Hildenbrand
@ 2021-06-16 16:29 ` David Hildenbrand
  2021-06-16 16:29 ` [PATCH v1 6/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots David Hildenbrand
  5 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

Currently, when someone (i.e., the VM) accesses discarded parts inside a
RAMBlock with a RamDiscardManager managing the corresponding mapped memory
region, postcopy will request migration of the corresponding page from the
source. The source, however, will never answer, because it refuses to
migrate such pages with undefined content ("logically unplugged"): the
pages are never dirty, and get_queued_page() will consequently skip
processing these postcopy requests.

Especially reading discarded ("logically unplugged") ranges is supposed to
work in some setups (for example with current virtio-mem), although it
barely ever happens: still, not placing a page would currently stall the
VM, as it cannot make forward progress.

Let's check the state via the RamDiscardManager (the state e.g.,
of virtio-mem is migrated during precopy) and avoid sending a request
that will never get answered. Place a fresh zero page instead to keep
the VM working. This is the same behavior that would happen
automatically without userfaultfd being active, when accessing virtual
memory regions without populated pages -- "populate on demand".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/postcopy-ram.c | 25 +++++++++++++++++++++----
 migration/ram.c          | 23 +++++++++++++++++++++++
 migration/ram.h          |  1 +
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e9697bdd2..673f60ce2b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -671,6 +671,23 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
     return ret;
 }
 
+static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
+                                 ram_addr_t start, uint64_t haddr)
+{
+    /*
+     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
+     * access, place a zeropage, which will also set the relevant bits in the
+     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
+     */
+    if (ramblock_page_is_discarded(rb, start)) {
+        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
+
+        return received ? 0 : postcopy_place_page_zero(mis, (void *)haddr, rb);
+    }
+
+    return migrate_send_rp_req_pages(mis, rb, start, haddr);
+}
+
 /*
  * Callback from shared fault handlers to ask for a page,
  * the page must be specified by a RAMBlock and an offset in that rb
@@ -690,7 +707,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
+    postcopy_request_page(mis, rb, aligned_rbo, client_addr);
     return 0;
 }
 
@@ -984,8 +1001,8 @@ retry:
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
-                                            msg.arg.pagefault.address);
+            ret = postcopy_request_page(mis, rb, rb_offset,
+                                        msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
@@ -993,7 +1010,7 @@ retry:
                     goto retry;
                 } else {
                     /* This is a unavoidable fault */
-                    error_report("%s: migrate_send_rp_req_pages() get %d",
+                    error_report("%s: postcopy_request_page() get %d",
                                  __func__, ret);
                     break;
                 }
diff --git a/migration/ram.c b/migration/ram.c
index 03867b4971..54136cd76e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -878,6 +878,29 @@ static uint64_t ramblock_dirty_bitmap_exclude_discarded_pages(RAMBlock *rb)
     return cleared_bits;
 }
 
+/*
+ * Check if a page falls into a discarded range as managed by a
+ * RamDiscardManager responsible for the mapped memory region of the RAMBlock.
+ * Pages inside discarded ranges are never migrated and postcopy has to
+ * place zeropages instead.
+ *
+ * Note: The result is only stable while migration (precopy/postcopy).
+ */
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset)
+{
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = offset,
+            .size = int128_get64(TARGET_PAGE_SIZE),
+        };
+
+        return !ram_discard_manager_is_populated(rdm, &section);
+    }
+    return false;
+}
+
 /* Called with RCU critical section */
 static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb)
 {
diff --git a/migration/ram.h b/migration/ram.h
index 4833e9fd5b..6e7f12e556 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset);
 
 /* ram cache */
 int colo_init_ram_cache(void);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 6/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots
  2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
                   ` (4 preceding siblings ...)
  2021-06-16 16:29 ` [PATCH v1 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination David Hildenbrand
@ 2021-06-16 16:29 ` David Hildenbrand
  5 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-06-16 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Wei Yang, Michael S. Tsirkin, Pankaj Gupta, Juan Quintela,
	David Hildenbrand, Dr. David Alan Gilbert, Peter Xu,
	Marek Kedzierski, Alex Williamson, teawater, Paolo Bonzini,
	Andrey Gruzdev

We already don't ever migrate memory that corresponds to discarded ranges
as managed by a RamDiscardManager responsible for the mapped memory region
of the RAMBlock.

virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
Right now, we still populate zeropages for the whole usable part of the
RAMBlock, which is undesired because:

1. Even populating the shared zeropage will result in memory getting
   consumed for page tables.
2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
   will populate an actual, fresh page, resulting in an unintended
   memory consumption.

Discarded ("logically unplugged") parts have to remain discarded. As
these pages are never part of the migration stream, there is no need to
track modifications via userfaultfd WP reliably for these parts.

Further, any writes to these ranges by the VM are invalid and the
behavior is undefined.

Note that Linux only supports userfaultfd WP on private anonymous memory
for now, which usually results in the shared zeropage getting populated.
The issue will become more relevant once userfaultfd WP supports shmem
and hugetlb.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 54136cd76e..7236230950 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1614,6 +1614,28 @@ out:
     return ret;
 }
 
+static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size)
+{
+    char *ptr = (char *) block->host;
+
+    for (; offset < size; offset += qemu_real_host_page_size) {
+        char tmp = *(ptr + offset);
+
+        /* Don't optimize the read out */
+        asm volatile("" : "+r" (tmp));
+    }
+}
+
+static inline int populate_section(MemoryRegionSection *section, void *opaque)
+{
+    const hwaddr size = int128_get64(section->size);
+    hwaddr offset = section->offset_within_region;
+    RAMBlock *block = section->mr->ram_block;
+
+    populate_range(block, offset, size);
+    return 0;
+}
+
 /*
  * ram_block_populate_pages: populate memory in the RAM block by reading
  *   an integer from the beginning of each page.
@@ -1623,16 +1645,31 @@ out:
  *
  * @block: RAM block to populate
  */
-static void ram_block_populate_pages(RAMBlock *block)
+static void ram_block_populate_pages(RAMBlock *rb)
 {
-    char *ptr = (char *) block->host;
-
-    for (ram_addr_t offset = 0; offset < block->used_length;
-            offset += qemu_real_host_page_size) {
-        char tmp = *(ptr + offset);
+    /*
+     * Skip populating all pages that fall into a discarded range as managed by
+     * a RamDiscardManager responsible for the mapped memory region of the
+     * RAMBlock. Such discarded ("logically unplugged") parts of a RAMBlock
+     * must not get populated automatically. We don't have to track
+     * modifications via userfaultfd WP reliably, because these pages will
+     * not be part of the migration stream either way -- see
+     * ramblock_dirty_bitmap_exclude_discarded_pages().
+     *
+     * Note: The result is only stable while migration (precopy/postcopy).
+     */
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = 0,
+            .size = rb->mr->size,
+        };
 
-        /* Don't optimize the read out */
-        asm volatile("" : "+r" (tmp));
+        ram_discard_manager_replay_populated(rdm, &section,
+                                             populate_section, NULL);
+    } else {
+        populate_range(rb, 0, qemu_ram_get_used_length(rb));
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 4/6] virtio-mem: Drop precopy notifier
  2021-06-16 16:29 ` [PATCH v1 4/6] virtio-mem: Drop precopy notifier David Hildenbrand
@ 2021-07-02 14:53   ` Michael S. Tsirkin
  0 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2021-07-02 14:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Juan Quintela, Pankaj Gupta, qemu-devel, Peter Xu,
	Dr. David Alan Gilbert, Alex Williamson, Marek Kedzierski,
	Paolo Bonzini, Andrey Gruzdev, teawater

On Wed, Jun 16, 2021 at 06:29:38PM +0200, David Hildenbrand wrote:
> Migration code now properly handles RAMBlocks which are indirectly managed
> by a RamDiscardManager. No need for manual handling via the free page
> optimization interface, let's get rid of it.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

but really this is all migration code and should go through that tree.

> ---
>  hw/virtio/virtio-mem.c         | 34 ----------------------------------
>  include/hw/virtio/virtio-mem.h |  3 ---
>  2 files changed, 37 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 284096ec5f..d5a578142b 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -776,7 +776,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>      host_memory_backend_set_mapped(vmem->memdev, true);
>      vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
>      qemu_register_reset(virtio_mem_system_reset, vmem);
> -    precopy_add_notifier(&vmem->precopy_notifier);
>  
>      /*
>       * Set ourselves as RamDiscardManager before the plug handler maps the
> @@ -796,7 +795,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>       * found via an address space anymore. Unset ourselves.
>       */
>      memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
> -    precopy_remove_notifier(&vmem->precopy_notifier);
>      qemu_unregister_reset(virtio_mem_system_reset, vmem);
>      vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
>      host_memory_backend_set_mapped(vmem->memdev, false);
> @@ -1089,43 +1087,11 @@ static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name,
>      vmem->block_size = value;
>  }
>  
> -static int virtio_mem_precopy_exclude_range_cb(const VirtIOMEM *vmem, void *arg,
> -                                               uint64_t offset, uint64_t size)
> -{
> -    void * const host = qemu_ram_get_host_addr(vmem->memdev->mr.ram_block);
> -
> -    qemu_guest_free_page_hint(host + offset, size);
> -    return 0;
> -}
> -
> -static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem)
> -{
> -    virtio_mem_for_each_unplugged_range(vmem, NULL,
> -                                        virtio_mem_precopy_exclude_range_cb);
> -}
> -
> -static int virtio_mem_precopy_notify(NotifierWithReturn *n, void *data)
> -{
> -    VirtIOMEM *vmem = container_of(n, VirtIOMEM, precopy_notifier);
> -    PrecopyNotifyData *pnd = data;
> -
> -    switch (pnd->reason) {
> -    case PRECOPY_NOTIFY_AFTER_BITMAP_SYNC:
> -        virtio_mem_precopy_exclude_unplugged(vmem);
> -        break;
> -    default:
> -        break;
> -    }
> -
> -    return 0;
> -}
> -
>  static void virtio_mem_instance_init(Object *obj)
>  {
>      VirtIOMEM *vmem = VIRTIO_MEM(obj);
>  
>      notifier_list_init(&vmem->size_change_notifiers);
> -    vmem->precopy_notifier.notify = virtio_mem_precopy_notify;
>      QLIST_INIT(&vmem->rdl_list);
>  
>      object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size,
> diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
> index 9a6e348fa2..a5dd6a493b 100644
> --- a/include/hw/virtio/virtio-mem.h
> +++ b/include/hw/virtio/virtio-mem.h
> @@ -65,9 +65,6 @@ struct VirtIOMEM {
>      /* notifiers to notify when "size" changes */
>      NotifierList size_change_notifiers;
>  
> -    /* don't migrate unplugged memory */
> -    NotifierWithReturn precopy_notifier;
> -
>      /* listeners to notify on plug/unplug activity. */
>      QLIST_HEAD(, RamDiscardListener) rdl_list;
>  };
> -- 
> 2.31.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-02 14:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 16:29 [PATCH v1 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager David Hildenbrand
2021-06-16 16:29 ` [PATCH v1 1/6] memory: Introduce replay_discarded callback for RamDiscardManager David Hildenbrand
2021-06-16 16:29 ` [PATCH v1 2/6] virtio-mem: Implement replay_discarded RamDiscardManager callback David Hildenbrand
2021-06-16 16:29 ` [PATCH v1 3/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source David Hildenbrand
2021-06-16 16:29 ` [PATCH v1 4/6] virtio-mem: Drop precopy notifier David Hildenbrand
2021-07-02 14:53   ` Michael S. Tsirkin
2021-06-16 16:29 ` [PATCH v1 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination David Hildenbrand
2021-06-16 16:29 ` [PATCH v1 6/6] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.