All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Markus Armbruster" <armbru@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	xen-devel@lists.xenproject.org,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	kvm@vger.kernel.org, "Peter Xu" <peterx@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paul Durrant" <paul@xen.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Anthony Perard" <anthony.perard@citrix.com>
Subject: [PULL 15/20] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination
Date: Mon,  1 Nov 2021 23:09:07 +0100	[thread overview]
Message-ID: <20211101220912.10039-16-quintela@redhat.com> (raw)
In-Reply-To: <20211101220912.10039-1-quintela@redhat.com>

From: David Hildenbrand <david@redhat.com>

Currently, when someone (i.e., the VM) accesses discarded parts inside a
RAMBlock with a RamDiscardManager managing the corresponding mapped memory
region, postcopy will request migration of the corresponding page from the
source. The source, however, will never answer, because it refuses to
migrate such pages with undefined content ("logically unplugged"): the
pages are never dirty, and get_queued_page() will consequently skip
processing these postcopy requests.

Especially reading discarded ("logically unplugged") ranges is supposed to
work in some setups (for example with current virtio-mem), although it
barely ever happens: still, not placing a page would currently stall the
VM, as it cannot make forward progress.

Let's check the state via the RamDiscardManager (the state e.g.,
of virtio-mem is migrated during precopy) and avoid sending a request
that will never get answered. Place a fresh zero page instead to keep
the VM working. This is the same behavior that would happen
automatically without userfaultfd being active, when accessing virtual
memory regions without populated pages -- "populate on demand".

For now, there are valid cases (as documented in the virtio-mem spec) where
a VM might read discarded memory; in the future, we will disallow that.
Then, we might want to handle that case differently, e.g., warning the
user that the VM seems to be mis-behaving.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.h          |  1 +
 migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++----
 migration/ram.c          | 21 +++++++++++++++++++++
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/migration/ram.h b/migration/ram.h
index 4833e9fd5b..dda1988f3d 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 
 /* ram cache */
 int colo_init_ram_cache(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e9697bdd2..3609ce7e52 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
     return ret;
 }
 
+static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
+                                 ram_addr_t start, uint64_t haddr)
+{
+    void *aligned = (void *)(uintptr_t)ROUND_DOWN(haddr, qemu_ram_pagesize(rb));
+
+    /*
+     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
+     * access, place a zeropage, which will also set the relevant bits in the
+     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
+     *
+     * Checking a single bit is sufficient to handle pagesize > TPS as either
+     * all relevant bits are set or not.
+     */
+    assert(QEMU_IS_ALIGNED(start, qemu_ram_pagesize(rb)));
+    if (ramblock_page_is_discarded(rb, start)) {
+        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
+
+        return received ? 0 : postcopy_place_page_zero(mis, aligned, rb);
+    }
+
+    return migrate_send_rp_req_pages(mis, rb, start, haddr);
+}
+
 /*
  * Callback from shared fault handlers to ask for a page,
  * the page must be specified by a RAMBlock and an offset in that rb
@@ -690,7 +713,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
+    postcopy_request_page(mis, rb, aligned_rbo, client_addr);
     return 0;
 }
 
@@ -984,8 +1007,8 @@ retry:
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
-                                            msg.arg.pagefault.address);
+            ret = postcopy_request_page(mis, rb, rb_offset,
+                                        msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
@@ -993,7 +1016,7 @@ retry:
                     goto retry;
                 } else {
                     /* This is a unavoidable fault */
-                    error_report("%s: migrate_send_rp_req_pages() get %d",
+                    error_report("%s: postcopy_request_page() get %d",
                                  __func__, ret);
                     break;
                 }
diff --git a/migration/ram.c b/migration/ram.c
index e8c06f207c..4f629de7d0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -912,6 +912,27 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
     return cleared_bits;
 }
 
+/*
+ * Check if a host-page aligned page falls into a discarded range as managed by
+ * a RamDiscardManager responsible for the mapped memory region of the RAMBlock.
+ *
+ * Note: The result is only stable while migrating (precopy/postcopy).
+ */
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
+{
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = start,
+            .size = int128_make64(qemu_ram_pagesize(rb)),
+        };
+
+        return !ram_discard_manager_is_populated(rdm, &section);
+    }
+    return false;
+}
+
 /* Called with RCU critical section */
 static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb)
 {
-- 
2.33.1


WARNING: multiple messages have this Message-ID (diff)
From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Stefano Stabellini" <sstabellini@kernel.org>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	kvm@vger.kernel.org, "David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Paul Durrant" <paul@xen.org>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Anthony Perard" <anthony.perard@citrix.com>,
	xen-devel@lists.xenproject.org, "Eric Blake" <eblake@redhat.com>
Subject: [PULL 15/20] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination
Date: Mon,  1 Nov 2021 23:09:07 +0100	[thread overview]
Message-ID: <20211101220912.10039-16-quintela@redhat.com> (raw)
In-Reply-To: <20211101220912.10039-1-quintela@redhat.com>

From: David Hildenbrand <david@redhat.com>

Currently, when someone (i.e., the VM) accesses discarded parts inside a
RAMBlock with a RamDiscardManager managing the corresponding mapped memory
region, postcopy will request migration of the corresponding page from the
source. The source, however, will never answer, because it refuses to
migrate such pages with undefined content ("logically unplugged"): the
pages are never dirty, and get_queued_page() will consequently skip
processing these postcopy requests.

Especially reading discarded ("logically unplugged") ranges is supposed to
work in some setups (for example with current virtio-mem), although it
barely ever happens: still, not placing a page would currently stall the
VM, as it cannot make forward progress.

Let's check the state via the RamDiscardManager (the state e.g.,
of virtio-mem is migrated during precopy) and avoid sending a request
that will never get answered. Place a fresh zero page instead to keep
the VM working. This is the same behavior that would happen
automatically without userfaultfd being active, when accessing virtual
memory regions without populated pages -- "populate on demand".

For now, there are valid cases (as documented in the virtio-mem spec) where
a VM might read discarded memory; in the future, we will disallow that.
Then, we might want to handle that case differently, e.g., warning the
user that the VM seems to be mis-behaving.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.h          |  1 +
 migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++----
 migration/ram.c          | 21 +++++++++++++++++++++
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/migration/ram.h b/migration/ram.h
index 4833e9fd5b..dda1988f3d 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 
 /* ram cache */
 int colo_init_ram_cache(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e9697bdd2..3609ce7e52 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
     return ret;
 }
 
+static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
+                                 ram_addr_t start, uint64_t haddr)
+{
+    void *aligned = (void *)(uintptr_t)ROUND_DOWN(haddr, qemu_ram_pagesize(rb));
+
+    /*
+     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
+     * access, place a zeropage, which will also set the relevant bits in the
+     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
+     *
+     * Checking a single bit is sufficient to handle pagesize > TPS as either
+     * all relevant bits are set or not.
+     */
+    assert(QEMU_IS_ALIGNED(start, qemu_ram_pagesize(rb)));
+    if (ramblock_page_is_discarded(rb, start)) {
+        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
+
+        return received ? 0 : postcopy_place_page_zero(mis, aligned, rb);
+    }
+
+    return migrate_send_rp_req_pages(mis, rb, start, haddr);
+}
+
 /*
  * Callback from shared fault handlers to ask for a page,
  * the page must be specified by a RAMBlock and an offset in that rb
@@ -690,7 +713,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
+    postcopy_request_page(mis, rb, aligned_rbo, client_addr);
     return 0;
 }
 
@@ -984,8 +1007,8 @@ retry:
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
-                                            msg.arg.pagefault.address);
+            ret = postcopy_request_page(mis, rb, rb_offset,
+                                        msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
@@ -993,7 +1016,7 @@ retry:
                     goto retry;
                 } else {
                     /* This is a unavoidable fault */
-                    error_report("%s: migrate_send_rp_req_pages() get %d",
+                    error_report("%s: postcopy_request_page() get %d",
                                  __func__, ret);
                     break;
                 }
diff --git a/migration/ram.c b/migration/ram.c
index e8c06f207c..4f629de7d0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -912,6 +912,27 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
     return cleared_bits;
 }
 
+/*
+ * Check if a host-page aligned page falls into a discarded range as managed by
+ * a RamDiscardManager responsible for the mapped memory region of the RAMBlock.
+ *
+ * Note: The result is only stable while migrating (precopy/postcopy).
+ */
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
+{
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = start,
+            .size = int128_make64(qemu_ram_pagesize(rb)),
+        };
+
+        return !ram_discard_manager_is_populated(rdm, &section);
+    }
+    return false;
+}
+
 /* Called with RCU critical section */
 static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb)
 {
-- 
2.33.1



  parent reply	other threads:[~2021-11-01 22:09 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 22:08 [PULL 00/20] Migration 20211031 patches Juan Quintela
2021-11-01 22:08 ` Juan Quintela
2021-11-01 22:08 ` [PULL 01/20] migration/rdma: Fix out of order wrid Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 02/20] KVM: introduce dirty_pages and kvm_dirty_ring_enabled Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 03/20] memory: make global_dirty_tracking a bitmask Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 04/20] migration/dirtyrate: introduce struct and adjust DirtyRateStat Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-04 20:54   ` Eric Blake
2021-11-04 20:54     ` Eric Blake
2021-11-01 22:08 ` [PULL 05/20] migration/dirtyrate: adjust order of registering thread Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 06/20] migration/dirtyrate: move init step of calculation to main thread Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 07/20] migration/dirtyrate: implement dirty-ring dirtyrate calculation Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-04 22:05   ` Philippe Mathieu-Daudé
2021-11-04 22:05     ` Philippe Mathieu-Daudé
2021-11-06 11:45     ` Juan Quintela
2021-11-06 11:45       ` Juan Quintela
2021-11-01 22:09 ` [PULL 08/20] migration: Make migration blocker work for snapshots too Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 09/20] migration: Add migrate_add_blocker_internal() Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 10/20] dump-guest-memory: Block live migration Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 11/20] memory: Introduce replay_discarded callback for RamDiscardManager Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 12/20] virtio-mem: Implement replay_discarded RamDiscardManager callback Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 13/20] migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 14/20] virtio-mem: Drop precopy notifier Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` Juan Quintela [this message]
2021-11-01 22:09   ` [PULL 15/20] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination Juan Quintela
2021-11-01 22:09 ` [PULL 16/20] migration: Simplify alignment and alignment checks Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 17/20] migration/ram: Factor out populating pages readable in ram_block_populate_pages() Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 18/20] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 19/20] memory: introduce total_dirty_pages to stat dirty pages Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 20/20] migration/dirtyrate: implement dirty-bitmap dirtyrate calculation Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-02 15:45 ` [PULL 00/20] Migration 20211031 patches Richard Henderson
2021-11-02 15:45   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211101220912.10039-16-quintela@redhat.com \
    --to=quintela@redhat.com \
    --cc=anthony.perard@citrix.com \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcandre.lureau@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.