All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Markus Armbruster" <armbru@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	xen-devel@lists.xenproject.org,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	kvm@vger.kernel.org, "Peter Xu" <peterx@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paul Durrant" <paul@xen.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Anthony Perard" <anthony.perard@citrix.com>
Subject: [PULL 18/20] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots
Date: Mon,  1 Nov 2021 23:09:10 +0100	[thread overview]
Message-ID: <20211101220912.10039-19-quintela@redhat.com> (raw)
In-Reply-To: <20211101220912.10039-1-quintela@redhat.com>

From: David Hildenbrand <david@redhat.com>

We already don't ever migrate memory that corresponds to discarded ranges
as managed by a RamDiscardManager responsible for the mapped memory region
of the RAMBlock.

virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
Right now, we still populate zeropages for the whole usable part of the
RAMBlock, which is undesired because:

1. Even populating the shared zeropage will result in memory getting
   consumed for page tables.
2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
   will populate an actual, fresh page, resulting in an unintended
   memory consumption.

Discarded ("logically unplugged") parts have to remain discarded. As
these pages are never part of the migration stream, there is no need to
track modifications via userfaultfd WP reliably for these parts.

Further, any writes to these ranges by the VM are invalid and the
behavior is undefined.

Note that Linux only supports userfaultfd WP on private anonymous memory
for now, which usually results in the shared zeropage getting populated.
The issue will become more relevant once userfaultfd WP supports shmem
and hugetlb.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 92c7b788ae..680a5158aa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1656,6 +1656,17 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
     }
 }
 
+static inline int populate_read_section(MemoryRegionSection *section,
+                                        void *opaque)
+{
+    const hwaddr size = int128_get64(section->size);
+    hwaddr offset = section->offset_within_region;
+    RAMBlock *block = section->mr->ram_block;
+
+    populate_read_range(block, offset, size);
+    return 0;
+}
+
 /*
  * ram_block_populate_read: preallocate page tables and populate pages in the
  *   RAM block by reading a byte of each page.
@@ -1665,9 +1676,32 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
  *
  * @block: RAM block to populate
  */
-static void ram_block_populate_read(RAMBlock *block)
+static void ram_block_populate_read(RAMBlock *rb)
 {
-    populate_read_range(block, 0, block->used_length);
+    /*
+     * Skip populating all pages that fall into a discarded range as managed by
+     * a RamDiscardManager responsible for the mapped memory region of the
+     * RAMBlock. Such discarded ("logically unplugged") parts of a RAMBlock
+     * must not get populated automatically. We don't have to track
+     * modifications via userfaultfd WP reliably, because these pages will
+     * not be part of the migration stream either way -- see
+     * ramblock_dirty_bitmap_exclude_discarded_pages().
+     *
+     * Note: The result is only stable while migrating (precopy/postcopy).
+     */
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = 0,
+            .size = rb->mr->size,
+        };
+
+        ram_discard_manager_replay_populated(rdm, &section,
+                                             populate_read_section, NULL);
+    } else {
+        populate_read_range(rb, 0, rb->used_length);
+    }
 }
 
 /*
-- 
2.33.1


WARNING: multiple messages have this Message-ID (diff)
From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Stefano Stabellini" <sstabellini@kernel.org>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	kvm@vger.kernel.org, "David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Paul Durrant" <paul@xen.org>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Anthony Perard" <anthony.perard@citrix.com>,
	xen-devel@lists.xenproject.org, "Eric Blake" <eblake@redhat.com>
Subject: [PULL 18/20] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots
Date: Mon,  1 Nov 2021 23:09:10 +0100	[thread overview]
Message-ID: <20211101220912.10039-19-quintela@redhat.com> (raw)
In-Reply-To: <20211101220912.10039-1-quintela@redhat.com>

From: David Hildenbrand <david@redhat.com>

We already don't ever migrate memory that corresponds to discarded ranges
as managed by a RamDiscardManager responsible for the mapped memory region
of the RAMBlock.

virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
Right now, we still populate zeropages for the whole usable part of the
RAMBlock, which is undesired because:

1. Even populating the shared zeropage will result in memory getting
   consumed for page tables.
2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
   will populate an actual, fresh page, resulting in an unintended
   memory consumption.

Discarded ("logically unplugged") parts have to remain discarded. As
these pages are never part of the migration stream, there is no need to
track modifications via userfaultfd WP reliably for these parts.

Further, any writes to these ranges by the VM are invalid and the
behavior is undefined.

Note that Linux only supports userfaultfd WP on private anonymous memory
for now, which usually results in the shared zeropage getting populated.
The issue will become more relevant once userfaultfd WP supports shmem
and hugetlb.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 92c7b788ae..680a5158aa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1656,6 +1656,17 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
     }
 }
 
+static inline int populate_read_section(MemoryRegionSection *section,
+                                        void *opaque)
+{
+    const hwaddr size = int128_get64(section->size);
+    hwaddr offset = section->offset_within_region;
+    RAMBlock *block = section->mr->ram_block;
+
+    populate_read_range(block, offset, size);
+    return 0;
+}
+
 /*
  * ram_block_populate_read: preallocate page tables and populate pages in the
  *   RAM block by reading a byte of each page.
@@ -1665,9 +1676,32 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
  *
  * @block: RAM block to populate
  */
-static void ram_block_populate_read(RAMBlock *block)
+static void ram_block_populate_read(RAMBlock *rb)
 {
-    populate_read_range(block, 0, block->used_length);
+    /*
+     * Skip populating all pages that fall into a discarded range as managed by
+     * a RamDiscardManager responsible for the mapped memory region of the
+     * RAMBlock. Such discarded ("logically unplugged") parts of a RAMBlock
+     * must not get populated automatically. We don't have to track
+     * modifications via userfaultfd WP reliably, because these pages will
+     * not be part of the migration stream either way -- see
+     * ramblock_dirty_bitmap_exclude_discarded_pages().
+     *
+     * Note: The result is only stable while migrating (precopy/postcopy).
+     */
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = 0,
+            .size = rb->mr->size,
+        };
+
+        ram_discard_manager_replay_populated(rdm, &section,
+                                             populate_read_section, NULL);
+    } else {
+        populate_read_range(rb, 0, rb->used_length);
+    }
 }
 
 /*
-- 
2.33.1



  parent reply	other threads:[~2021-11-01 22:09 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 22:08 [PULL 00/20] Migration 20211031 patches Juan Quintela
2021-11-01 22:08 ` Juan Quintela
2021-11-01 22:08 ` [PULL 01/20] migration/rdma: Fix out of order wrid Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 02/20] KVM: introduce dirty_pages and kvm_dirty_ring_enabled Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 03/20] memory: make global_dirty_tracking a bitmask Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 04/20] migration/dirtyrate: introduce struct and adjust DirtyRateStat Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-04 20:54   ` Eric Blake
2021-11-04 20:54     ` Eric Blake
2021-11-01 22:08 ` [PULL 05/20] migration/dirtyrate: adjust order of registering thread Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 06/20] migration/dirtyrate: move init step of calculation to main thread Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-01 22:08 ` [PULL 07/20] migration/dirtyrate: implement dirty-ring dirtyrate calculation Juan Quintela
2021-11-01 22:08   ` Juan Quintela
2021-11-04 22:05   ` Philippe Mathieu-Daudé
2021-11-04 22:05     ` Philippe Mathieu-Daudé
2021-11-06 11:45     ` Juan Quintela
2021-11-06 11:45       ` Juan Quintela
2021-11-01 22:09 ` [PULL 08/20] migration: Make migration blocker work for snapshots too Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 09/20] migration: Add migrate_add_blocker_internal() Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 10/20] dump-guest-memory: Block live migration Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 11/20] memory: Introduce replay_discarded callback for RamDiscardManager Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 12/20] virtio-mem: Implement replay_discarded RamDiscardManager callback Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 13/20] migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 14/20] virtio-mem: Drop precopy notifier Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 15/20] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 16/20] migration: Simplify alignment and alignment checks Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 17/20] migration/ram: Factor out populating pages readable in ram_block_populate_pages() Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` Juan Quintela [this message]
2021-11-01 22:09   ` [PULL 18/20] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots Juan Quintela
2021-11-01 22:09 ` [PULL 19/20] memory: introduce total_dirty_pages to stat dirty pages Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-01 22:09 ` [PULL 20/20] migration/dirtyrate: implement dirty-bitmap dirtyrate calculation Juan Quintela
2021-11-01 22:09   ` Juan Quintela
2021-11-02 15:45 ` [PULL 00/20] Migration 20211031 patches Richard Henderson
2021-11-02 15:45   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211101220912.10039-19-quintela@redhat.com \
    --to=quintela@redhat.com \
    --cc=anthony.perard@citrix.com \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcandre.lureau@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.