All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] UFFD write-tracking migration/snapshots
@ 2020-11-18 11:22 Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 1/7] Introduce 'track-writes-ram' migration capability Andrey Gruzdev via
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

Currently the only way to make (external) live VM snapshot is using existing
dirty page logging migration mechanism. The main problem is that it tends to
produce a lot of page duplicates while running VM goes on updating already
saved pages. That leads to the fact that vmstate image size is commonly several
times bigger then non-zero part of virtual machine's RSS. Time required to
converge RAM migration and the size of snapshot image severely depend on the
guest memory write rate, sometimes resulting in unacceptably long snapshot
creation time and huge image size 

This series propose a way to solve the aforementioned problems. This is done
by using different RAM migration mechanism based on UFFD write protection
management introduced in v5.7 kernel. The migration strategy is to 'freeze'
guest RAM content using write-protection and iteratively release protection
for memory ranges that have already been saved to the migration stream.
At the same time we read in pending UFFD write fault events and save those
pages out-of-order with higher priority.

How to use:
1. Enable write-tracking migration capability
   virsh qemu-monitor-command <domain> --hmp migrate_set_capability track-writes-ram on

2. Start the external migration to a file
   virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'

3. Wait for the migration finish and check that the migration has completed state.

Andrey Gruzdev (7):
  Introduce 'track-writes-ram' migration capability.
  Introduced UFFD-WP low-level interface helpers. Implemented support
    for the whole RAM block memory protection/un-protection. Higher
    level ram_write_tracking_start() and ram_write_tracking_stop() to
    start/stop tracking memory writes on the whole VM memory.
  Support UFFD write fault processing in ram_save_iterate().
  Implementation of write-tracking migration thread.
  Implementation of vm_start() BH.
  The rest of write tracking migration code.
  Introduced simple linear scan rate limiting mechanism for write
    tracking migration.

 include/exec/memory.h |   7 +
 migration/migration.c | 344 ++++++++++++++++++++++++++++++++-
 migration/migration.h |   4 +
 migration/ram.c       | 439 +++++++++++++++++++++++++++++++++++++++++-
 migration/ram.h       |   4 +
 migration/savevm.c    |   1 -
 migration/savevm.h    |   2 +
 qapi/migration.json   |   7 +-
 8 files changed, 796 insertions(+), 12 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] Introduce 'track-writes-ram' migration capability.
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 14:31   ` Eric Blake
  2020-11-18 11:22 ` [PATCH 2/7] Introduced UFFD-WP low-level interface helpers. Implemented support for the whole RAM block memory protection/un-protection. Higher level ram_write_tracking_start() and ram_write_tracking_stop() to start/stop tracking memory writes on the whole VM memory Andrey Gruzdev via
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/migration.c | 96 +++++++++++++++++++++++++++++++++++++++++++
 migration/migration.h |  1 +
 qapi/migration.json   |  7 +++-
 3 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 87a9b59f83..ff0364dde0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -56,6 +56,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "sysemu/cpus.h"
 
 #ifdef CONFIG_VFIO
 #include "hw/vfio/vfio-common.h"
@@ -1165,6 +1166,91 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_TRACK_WRITES_RAM]) {
+        if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with postcopy-ram");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_DIRTY_BITMAPS]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with dirty-bitmaps");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with postcopy-blocktime");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_LATE_BLOCK_ACTIVATE]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with late-block-activate");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_RETURN_PATH]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with return-path");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+            error_setg(errp, "Track-writes is not compatible with multifd");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_PAUSE_BEFORE_SWITCHOVER]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with pause-before-switchover");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_AUTO_CONVERGE]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with auto-converge");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_RELEASE_RAM]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with release-ram");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_RDMA_PIN_ALL]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with rdma-pin-all");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with compression");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with XBZLRE");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_X_COLO]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with x-colo");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_VALIDATE_UUID]) {
+            error_setg(errp,
+                    "Track-writes is not compatible with validate-uuid");
+            return false;
+        }
+    }
+
     return true;
 }
 
@@ -2490,6 +2576,15 @@ bool migrate_use_block_incremental(void)
     return s->parameters.block_incremental;
 }
 
+bool migrate_track_writes_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_TRACK_WRITES_RAM];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -3783,6 +3878,7 @@ static Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
     DEFINE_PROP_MIG_CAP("x-return-path", MIGRATION_CAPABILITY_RETURN_PATH),
     DEFINE_PROP_MIG_CAP("x-multifd", MIGRATION_CAPABILITY_MULTIFD),
+    DEFINE_PROP_MIG_CAP("x-track-writes-ram", MIGRATION_CAPABILITY_TRACK_WRITES_RAM),
 
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/migration/migration.h b/migration/migration.h
index d096b77f74..339ae720e0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -341,6 +341,7 @@ int migrate_compress_wait_thread(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
+bool migrate_track_writes_ram(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index 3c75820527..03fe0547f1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -442,6 +442,11 @@
 # @validate-uuid: Send the UUID of the source to allow the destination
 #                 to ensure it is the same. (since 4.2)
 #
+# @track-writes-ram: If enabled, the migration stream will be a snapshot
+#                    of the VM exactly at the point when the migration
+#                    procedure starts. The VM RAM is saved with running VM.
+#                    (since 5.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
@@ -449,7 +454,7 @@
            'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
            'block', 'return-path', 'pause-before-switchover', 'multifd',
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
-           'x-ignore-shared', 'validate-uuid' ] }
+           'x-ignore-shared', 'validate-uuid', 'track-writes-ram'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] Introduced UFFD-WP low-level interface helpers. Implemented support for the whole RAM block memory protection/un-protection. Higher level ram_write_tracking_start() and ram_write_tracking_stop() to start/stop tracking memory writes on the whole VM memory.
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 1/7] Introduce 'track-writes-ram' migration capability Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 3/7] Support UFFD write fault processing in ram_save_iterate() Andrey Gruzdev via
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 include/exec/memory.h |   7 ++
 migration/ram.c       | 267 ++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h       |   4 +
 3 files changed, 278 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0f3e6bcd5e..3d798fce16 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -139,6 +139,13 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/*
+ * UFFDIO_WRITEPROTECT is used on this RAMBlock to
+ * support 'write-tracking' migration type.
+ * Implies ram_state->ram_wt_enabled.
+ */
+#define RAM_UF_WRITEPROTECT (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/migration/ram.c b/migration/ram.c
index 7811cde643..7f273c9996 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -56,6 +56,12 @@
 #include "savevm.h"
 #include "qemu/iov.h"
 #include "multifd.h"
+#include <inttypes.h>
+#include <poll.h>
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <linux/userfaultfd.h>
+#include "sysemu/runstate.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -298,6 +304,8 @@ struct RAMSrcPageRequest {
 struct RAMState {
     /* QEMUFile used for this migration */
     QEMUFile *f;
+    /* UFFD file descriptor, used in 'write-tracking' migration */
+    int uffdio_fd;
     /* Last block that we have visited searching for dirty pages */
     RAMBlock *last_seen_block;
     /* Last block from where we have sent data */
@@ -453,6 +461,181 @@ static QemuThread *decompress_threads;
 static QemuMutex decomp_done_lock;
 static QemuCond decomp_done_cond;
 
+/**
+ * uffd_create_fd: create UFFD file descriptor
+ *
+ * Returns non-negative file descriptor or negative value in case of an error
+ */
+static int uffd_create_fd(void)
+{
+    int uffd;
+    struct uffdio_api api_struct;
+    uint64_t ioctl_mask = BIT(_UFFDIO_REGISTER) | BIT(_UFFDIO_UNREGISTER);
+
+    uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    if (uffd < 0) {
+        error_report("uffd_create_fd() failed: UFFD not supported");
+        return -1;
+    }
+
+    api_struct.api = UFFD_API;
+    api_struct.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
+    if (ioctl(uffd, UFFDIO_API, &api_struct)) {
+        error_report("uffd_create_fd() failed: "
+                "API version not supported version=%llx errno=%i",
+                api_struct.api, errno);
+        goto fail;
+    }
+
+    if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
+        error_report("uffd_create_fd() failed: "
+                "PAGEFAULT_FLAG_WP feature missing");
+        goto fail;
+    }
+
+    return uffd;
+
+fail:
+    close(uffd);
+    return -1;
+}
+
+/**
+ * uffd_close_fd: close UFFD file descriptor
+ *
+ * @uffd: UFFD file descriptor
+ */
+static void uffd_close_fd(int uffd)
+{
+    assert(uffd >= 0);
+    close(uffd);
+}
+
+/**
+ * uffd_register_memory: register memory range with UFFD
+ *
+ * Returns 0 in case of success, negative value on error
+ *
+ * @uffd: UFFD file descriptor
+ * @start: starting virtual address of memory range
+ * @length: length of memory range
+ * @track_missing: generate events on missing-page faults
+ * @track_wp: generate events on write-protected-page faults
+ */
+static int uffd_register_memory(int uffd, hwaddr start, hwaddr length,
+        bool track_missing, bool track_wp)
+{
+    struct uffdio_register uffd_register;
+
+    uffd_register.range.start = start;
+    uffd_register.range.len = length;
+    uffd_register.mode = (track_missing ? UFFDIO_REGISTER_MODE_MISSING : 0) |
+                         (track_wp ? UFFDIO_REGISTER_MODE_WP : 0);
+
+    if (ioctl(uffd, UFFDIO_REGISTER, &uffd_register)) {
+        error_report("uffd_register_memory() failed: "
+                "start=%0"PRIx64" len=%"PRIu64" mode=%llu errno=%i",
+                start, length, uffd_register.mode, errno);
+        return -1;
+    }
+
+    return 0;
+}
+
+/**
+ * uffd_protect_memory: protect/unprotect memory range for writes with UFFD
+ *
+ * Returns 0 on success or negative value in case of error
+ *
+ * @uffd: UFFD file descriptor
+ * @start: starting virtual address of memory range
+ * @length: length of memory range
+ * @wp: write-protect/unprotect
+ */
+static int uffd_protect_memory(int uffd, hwaddr start, hwaddr length, bool wp)
+{
+    struct uffdio_writeprotect uffd_writeprotect;
+    int res;
+
+    uffd_writeprotect.range.start = start;
+    uffd_writeprotect.range.len = length;
+    uffd_writeprotect.mode = (wp ? UFFDIO_WRITEPROTECT_MODE_WP : 0);
+
+    do {
+        res = ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect);
+    } while (res < 0 && errno == EINTR);
+    if (res < 0) {
+        error_report("uffd_protect_memory() failed: "
+                "start=%0"PRIx64" len=%"PRIu64" mode=%llu errno=%i",
+                start, length, uffd_writeprotect.mode, errno);
+        return -1;
+    }
+
+    return 0;
+}
+
+__attribute__ ((unused))
+static int uffd_read_events(int uffd, struct uffd_msg *msgs, int count);
+__attribute__ ((unused))
+static bool uffd_poll_events(int uffd, int tmo);
+
+/**
+ * uffd_read_events: read pending UFFD events
+ *
+ * Returns number of fetched messages, 0 if non is available or
+ * negative value in case of an error
+ *
+ * @uffd: UFFD file descriptor
+ * @msgs: pointer to message buffer
+ * @count: number of messages that can fit in the buffer
+ */
+static int uffd_read_events(int uffd, struct uffd_msg *msgs, int count)
+{
+    ssize_t res;
+    do {
+        res = read(uffd, msgs, count * sizeof(struct uffd_msg));
+    } while (res < 0 && errno == EINTR);
+
+    if ((res < 0 && errno == EAGAIN)) {
+        return 0;
+    }
+    if (res < 0) {
+        error_report("uffd_read_events() failed: errno=%i", errno);
+        return -1;
+    }
+
+    return (int) (res / sizeof(struct uffd_msg));
+}
+
+/**
+ * uffd_poll_events: poll UFFD file descriptor for read
+ *
+ * Returns true if events are available for read, false otherwise
+ *
+ * @uffd: UFFD file descriptor
+ * @tmo: timeout in milliseconds, 0 for non-blocking operation,
+ *       negative value for infinite wait
+ */
+static bool uffd_poll_events(int uffd, int tmo)
+{
+    int res;
+    struct pollfd poll_fd = { .fd = uffd, .events = POLLIN, .revents = 0 };
+
+    do {
+        res = poll(&poll_fd, 1, tmo);
+    } while (res < 0 && errno == EINTR);
+
+    if (res == 0) {
+        return false;
+    }
+    if (res < 0) {
+        error_report("uffd_poll_events() failed: errno=%i", errno);
+        return false;
+    }
+
+    return (poll_fd.revents & POLLIN) != 0;
+}
+
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf);
 
@@ -3788,6 +3971,90 @@ static int ram_resume_prepare(MigrationState *s, void *opaque)
     return 0;
 }
 
+/**
+ * ram_write_tracking_start: start UFFD-WP memory tracking
+ *
+ * Returns 0 for success or negative value in case of error
+ *
+ */
+int ram_write_tracking_start(void)
+{
+    int uffd;
+    RAMState *rs = ram_state;
+    RAMBlock *bs;
+
+    /* Open UFFD file descriptor */
+    uffd = uffd_create_fd();
+    if (uffd < 0) {
+        return uffd;
+    }
+    rs->uffdio_fd = uffd;
+
+    RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+        /* Nothing to do with read-only and MMIO-writable regions */
+        if (bs->mr->readonly || bs->mr->rom_device) {
+            continue;
+        }
+
+        /* Register block memory with UFFD to track writes */
+        if (uffd_register_memory(rs->uffdio_fd, (hwaddr) bs->host,
+                bs->max_length, false, true)) {
+            goto fail;
+        }
+        /* Apply UFFD write protection to the block memory range */
+        if (uffd_protect_memory(rs->uffdio_fd, (hwaddr) bs->host,
+                bs->max_length, true)) {
+            goto fail;
+        }
+        bs->flags |= RAM_UF_WRITEPROTECT;
+
+        info_report("UFFD-WP write-tracking enabled: "
+                "block_id=%s page_size=%zu start=%p length=%lu "
+                "romd_mode=%i ram=%i readonly=%i nonvolatile=%i rom_device=%i",
+                bs->idstr, bs->page_size, bs->host, bs->max_length,
+                bs->mr->romd_mode, bs->mr->ram, bs->mr->readonly,
+                bs->mr->nonvolatile, bs->mr->rom_device);
+    }
+
+    return 0;
+
+fail:
+    uffd_close_fd(uffd);
+    rs->uffdio_fd = -1;
+    return -1;
+}
+
+/**
+ * ram_write_tracking_stop: stop UFFD-WP memory tracking and remove protection
+ */
+void ram_write_tracking_stop(void)
+{
+    RAMState *rs = ram_state;
+    RAMBlock *bs;
+    assert(rs->uffdio_fd >= 0);
+
+    RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+        if ((bs->flags & RAM_UF_WRITEPROTECT) == 0) {
+            continue;
+        }
+        info_report("UFFD-WP write-tracking disabled: "
+                "block_id=%s page_size=%zu start=%p length=%lu "
+                "romd_mode=%i ram=%i readonly=%i nonvolatile=%i rom_device=%i",
+                bs->idstr, bs->page_size, bs->host, bs->max_length,
+                bs->mr->romd_mode, bs->mr->ram, bs->mr->readonly,
+                bs->mr->nonvolatile, bs->mr->rom_device);
+        /* Cleanup flags */
+        bs->flags &= ~RAM_UF_WRITEPROTECT;
+    }
+
+    /*
+     * Close UFFD file descriptor to remove protection,
+     * release registered memory regions and flush wait queues
+     */
+    uffd_close_fd(rs->uffdio_fd);
+    rs->uffdio_fd = -1;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/ram.h b/migration/ram.h
index 011e85414e..3611cb51de 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -79,4 +79,8 @@ void colo_flush_ram_cache(void);
 void colo_release_ram_cache(void);
 void colo_incoming_start_dirty_log(void);
 
+/* Live snapshots */
+int ram_write_tracking_start(void);
+void ram_write_tracking_stop(void);
+
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] Support UFFD write fault processing in ram_save_iterate().
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 1/7] Introduce 'track-writes-ram' migration capability Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 2/7] Introduced UFFD-WP low-level interface helpers. Implemented support for the whole RAM block memory protection/un-protection. Higher level ram_write_tracking_start() and ram_write_tracking_stop() to start/stop tracking memory writes on the whole VM memory Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 4/7] Implementation of write-tracking migration thread Andrey Gruzdev via
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

In this particular implementation the same single migration
thread is responsible for both normal linear dirty page
migration and procesing UFFD page fault events.

Processing write faults includes reading UFFD file descriptor,
finding respective RAM block and saving faulting page to
the migration stream. After page has been saved, write protection
can be removed. Since asynchronous version of qemu_put_buffer()
is expected to be used to save pages, we also have to flush
migraion stream prior to un-protecting saved memory range.

Write protection is being removed for any previously protected
memory chunk that has hit the migration stream. That's valid
for pages from linear page scan along with write fault pages.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/ram.c | 124 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 115 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7f273c9996..08a1d7a252 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -314,6 +314,8 @@ struct RAMState {
     ram_addr_t last_page;
     /* last ram version we have seen */
     uint32_t last_version;
+    /* 'write-tracking' migration is enabled */
+    bool ram_wt_enabled;
     /* We are in the first round */
     bool ram_bulk_stage;
     /* The free page optimization is enabled */
@@ -574,8 +576,6 @@ static int uffd_protect_memory(int uffd, hwaddr start, hwaddr length, bool wp)
     return 0;
 }
 
-__attribute__ ((unused))
-static int uffd_read_events(int uffd, struct uffd_msg *msgs, int count);
 __attribute__ ((unused))
 static bool uffd_poll_events(int uffd, int tmo);
 
@@ -1929,6 +1929,86 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
     return pages;
 }
 
+/**
+ * ram_find_block_by_host_address: find RAM block containing host page
+ *
+ * Returns true if RAM block is found and pss->block/page are
+ * pointing to the given host page, false in case of an error
+ *
+ * @rs: current RAM state
+ * @pss: page-search-status structure
+ */
+static bool ram_find_block_by_host_address(RAMState *rs, PageSearchStatus *pss,
+        hwaddr page_address)
+{
+    bool found = false;
+
+    pss->block = rs->last_seen_block;
+    do {
+        if (page_address >= (hwaddr) pss->block->host &&
+            (page_address + TARGET_PAGE_SIZE) <=
+                    ((hwaddr) pss->block->host + pss->block->used_length)) {
+            pss->page = (unsigned long)
+                    ((page_address - (hwaddr) pss->block->host) >> TARGET_PAGE_BITS);
+            found = true;
+            break;
+        }
+
+        pss->block = QLIST_NEXT_RCU(pss->block, next);
+        if (!pss->block) {
+            /* Hit the end of the list */
+            pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
+        }
+    } while (pss->block != rs->last_seen_block);
+
+    rs->last_seen_block = pss->block;
+    /*
+     * Since we are in the same loop with ram_find_and_save_block(),
+     * need to reset pss->complete_round after switching to
+     * other block/page in pss.
+     */
+    pss->complete_round = false;
+
+    return found;
+}
+
+/**
+ * get_fault_page: try to get next UFFD write fault page and, if pending fault
+ *   is found, put info about RAM block and block page into pss structure
+ *
+ * Returns true in case of UFFD write fault detected, false otherwise
+ *
+ * @rs: current RAM state
+ * @pss: page-search-status structure
+ *
+ */
+static bool get_fault_page(RAMState *rs, PageSearchStatus *pss)
+{
+    struct uffd_msg uffd_msg;
+    hwaddr page_address;
+    int res;
+
+    if (!rs->ram_wt_enabled) {
+        return false;
+    }
+
+    res = uffd_read_events(rs->uffdio_fd, &uffd_msg, 1);
+    if (res <= 0) {
+        return false;
+    }
+
+    page_address = uffd_msg.arg.pagefault.address;
+    if (!ram_find_block_by_host_address(rs, pss, page_address)) {
+        /* In case we couldn't find respective block, just unprotect faulting page */
+        uffd_protect_memory(rs->uffdio_fd, page_address, TARGET_PAGE_SIZE, false);
+        error_report("ram_find_block_by_host_address() failed: address=0x%0lx",
+                     page_address);
+        return false;
+    }
+
+    return true;
+}
+
 /**
  * ram_find_and_save_block: finds a dirty page and sends it to f
  *
@@ -1955,25 +2035,50 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage)
         return pages;
     }
 
+    if (!rs->last_seen_block) {
+        rs->last_seen_block = QLIST_FIRST_RCU(&ram_list.blocks);
+    }
     pss.block = rs->last_seen_block;
     pss.page = rs->last_page;
     pss.complete_round = false;
 
-    if (!pss.block) {
-        pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
-    }
-
     do {
+        ram_addr_t page;
+        ram_addr_t page_to;
+
         again = true;
-        found = get_queued_page(rs, &pss);
-
+        /* In case of 'write-tracking' migration we first try
+         * to poll UFFD and get write page fault event */
+        found = get_fault_page(rs, &pss);
+        if (!found) {
+            /* No trying to fetch something from the priority queue */
+            found = get_queued_page(rs, &pss);
+        }
         if (!found) {
             /* priority queue empty, so just search for something dirty */
             found = find_dirty_block(rs, &pss, &again);
         }
 
         if (found) {
+            page = pss.page;
             pages = ram_save_host_page(rs, &pss, last_stage);
+            page_to = pss.page;
+
+            /* Check if page is from UFFD-managed region */
+            if (pss.block->flags & RAM_UF_WRITEPROTECT) {
+                hwaddr page_address = (hwaddr) pss.block->host +
+                        ((hwaddr) page << TARGET_PAGE_BITS);
+                hwaddr run_length = (hwaddr) (page_to - page + 1) << TARGET_PAGE_BITS;
+                int res;
+
+                /* Flush async buffers before un-protect */
+                qemu_fflush(rs->f);
+                /* Un-protect memory range */
+                res = uffd_protect_memory(rs->uffdio_fd, page_address, run_length, false);
+                if (res < 0) {
+                    break;
+                }
+            }
         }
     } while (!pages && again);
 
@@ -2086,7 +2191,8 @@ static void ram_state_reset(RAMState *rs)
     rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
-    rs->ram_bulk_stage = true;
+    rs->ram_wt_enabled = migrate_track_writes_ram();
+    rs->ram_bulk_stage = !rs->ram_wt_enabled;
     rs->fpo_enabled = false;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] Implementation of write-tracking migration thread.
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
                   ` (2 preceding siblings ...)
  2020-11-18 11:22 ` [PATCH 3/7] Support UFFD write fault processing in ram_save_iterate() Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 5/7] Implementation of vm_start() BH Andrey Gruzdev via
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/migration.c | 174 +++++++++++++++++++++++++++++++++++++++++-
 migration/migration.h |   3 +
 migration/savevm.c    |   1 -
 migration/savevm.h    |   2 +
 4 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ff0364dde0..1ffbb4ce4a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2013,6 +2013,7 @@ void migrate_init(MigrationState *s)
      * locks.
      */
     s->cleanup_bh = 0;
+    s->wt_vm_start_bh = 0;
     s->to_dst_file = NULL;
     s->state = MIGRATION_STATUS_NONE;
     s->rp_state.from_dst_file = NULL;
@@ -3551,6 +3552,21 @@ static void migration_iteration_finish(MigrationState *s)
     qemu_mutex_unlock_iothread();
 }
 
+static void wt_migration_iteration_finish(MigrationState *s)
+{
+    /* TODO: implement */
+}
+
+/*
+ * Return true if continue to the next iteration directly, false
+ * otherwise.
+ */
+static MigIterateState wt_migration_iteration_run(MigrationState *s)
+{
+    /* TODO: implement */
+    return MIG_ITERATE_RESUME;
+}
+
 void migration_make_urgent_request(void)
 {
     qemu_sem_post(&migrate_get_current()->rate_limit_sem);
@@ -3698,6 +3714,154 @@ static void *migration_thread(void *opaque)
     return NULL;
 }
 
+static void wt_migration_vm_start_bh(void *opaque)
+{
+    /* TODO: implement */
+}
+
+/*
+ * Master migration thread on the source VM.
+ * This is an alternative implementation of live migration
+ * which uses userfault_fd write protection mechanism introduced in
+ * 5.7 kernel. Compared to existing dirty page logging migration
+ * it produces much lesser traffic and smaller snapshot images since
+ * no page duplicates can get into the stream. Another the key point
+ * is that generated vmstate stream reflects machine state 'frozen'
+ * at the beginning of migration compared to dirty page logging
+ * mechanism, which effectively results in that saved snapshot is the
+ * state at the end of migration process.
+ */
+static void *wt_migration_thread(void *opaque)
+{
+    MigrationState *s = opaque;
+    int64_t setup_start;
+    MigThrError thr_error;
+    QEMUFile *fb;
+
+    rcu_register_thread();
+    object_ref(OBJECT(s));
+
+    qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
+
+    setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    /*
+     * We want to save vmstate for the moment when migration has been
+     * initiated but also we want to save RAM content while VM is running.
+     * The RAM content should appear first in the vmstate. So, we first
+     * stash the non-RAM part of the vmstate to the temporary buffer,
+     * then write RAM part of the vmstate to the migration stream
+     * with vCPUs running and, finally, write stashed non-RAM part of
+     * the vmstate from the buffer to the migration stream.
+     */
+    s->bioc = qio_channel_buffer_new(128 * 1024);
+    qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
+    fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
+    object_unref(OBJECT(s->bioc));
+
+    update_iteration_initial_status(s);
+
+    qemu_savevm_state_header(s->to_dst_file);
+    qemu_savevm_state_setup(s->to_dst_file);
+
+    if (qemu_savevm_state_guest_unplug_pending()) {
+        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_WAIT_UNPLUG);
+
+        while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
+               qemu_savevm_state_guest_unplug_pending()) {
+            qemu_sem_timedwait(&s->wait_unplug_sem, 250);
+        }
+
+        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
+                          MIGRATION_STATUS_ACTIVE);
+    }
+    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_ACTIVE);
+    trace_migration_thread_setup_complete();
+    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    qemu_mutex_lock_iothread();
+
+    bool fail_early = false;
+    do {
+        if (global_state_store()) {
+            fail_early = true;
+            break;
+        }
+        /* Forcibly stop VM before saving state of vCPUs and devices */
+        if (vm_stop_force_state(RUN_STATE_PAUSED)) {
+            fail_early = true;
+            break;
+        }
+        /*
+         * Put vCPUs in sync with shadow context structures, then
+         * save their state to channel-buffer along with devices.
+         */
+        cpu_synchronize_all_states();
+        if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
+            fail_early = true;
+            break;
+        }
+        /* Now initialize UFFD context and start tracking RAM writes */
+        if (ram_write_tracking_start()) {
+            fail_early = true;
+        }
+    } while (false);
+
+    if (fail_early) {
+        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+
+        qemu_mutex_unlock_iothread();
+        goto fail;
+    }
+
+    /*
+     * Start VM from BH handler to avoid write-fault lock here.
+     * UFFD-WP protection for the whole RAM is already enabled so
+     * calling VM state change notifiers from vm_start() would initiate
+     * writes to virtio VQs memory which is in write-protected region.
+     */
+    s->wt_vm_start_bh = qemu_bh_new(wt_migration_vm_start_bh, s);
+    qemu_bh_schedule(s->wt_vm_start_bh);
+
+    qemu_mutex_unlock_iothread();
+
+    while (migration_is_active(s)) {
+        MigIterateState iter_state = wt_migration_iteration_run(s);
+        if (iter_state == MIG_ITERATE_SKIP) {
+            continue;
+        } else if (iter_state == MIG_ITERATE_BREAK) {
+            break;
+        }
+
+        /*
+         * Try to detect any kind of failures, and see whether we
+         * should stop the migration now.
+         */
+        thr_error = migration_detect_error(s);
+        if (thr_error == MIG_THR_ERR_FATAL) {
+            /* Stop migration */
+            break;
+        }
+
+        migration_update_counters(s, qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
+    }
+
+    trace_migration_thread_after_loop();
+
+fail:
+    wt_migration_iteration_finish(s);
+
+    qemu_fclose(fb);
+    object_unref(OBJECT(s));
+    rcu_unregister_thread();
+
+    return NULL;
+}
+
 void migrate_fd_connect(MigrationState *s, Error *error_in)
 {
     Error *local_err = NULL;
@@ -3761,8 +3925,14 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         migrate_fd_cleanup(s);
         return;
     }
-    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
-                       QEMU_THREAD_JOINABLE);
+
+    if (migrate_track_writes_ram()) {
+        qemu_thread_create(&s->thread, "wt_live_migration",
+                wt_migration_thread, s, QEMU_THREAD_JOINABLE);
+    } else {
+        qemu_thread_create(&s->thread, "live_migration",
+                migration_thread, s, QEMU_THREAD_JOINABLE);
+    }
     s->migration_thread_running = true;
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 339ae720e0..c3b4c7f2fd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -20,6 +20,7 @@
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
+#include "io/channel-buffer.h"
 #include "net/announce.h"
 #include "qom/object.h"
 
@@ -147,8 +148,10 @@ struct MigrationState {
 
     /*< public >*/
     QemuThread thread;
+    QEMUBH *wt_vm_start_bh;
     QEMUBH *cleanup_bh;
     QEMUFile *to_dst_file;
+    QIOChannelBuffer *bioc;
     /*
      * Protects to_dst_file pointer.  We need to make sure we won't
      * yield or hang during the critical section, since this lock will
diff --git a/migration/savevm.c b/migration/savevm.c
index 5f937a2762..62d5f8a869 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1352,7 +1352,6 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
     return 0;
 }
 
-static
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
diff --git a/migration/savevm.h b/migration/savevm.h
index ba64a7e271..aaee2528ed 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -64,5 +64,7 @@ int qemu_loadvm_state(QEMUFile *f);
 void qemu_loadvm_state_cleanup(void);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
+int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
+        bool in_postcopy, bool inactivate_disks);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/7] Implementation of vm_start() BH.
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
                   ` (3 preceding siblings ...)
  2020-11-18 11:22 ` [PATCH 4/7] Implementation of write-tracking migration thread Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 6/7] The rest of write tracking migration code Andrey Gruzdev via
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

To avoid saving updated versions of memory pages we need
to start tracking RAM writes before we resume operation of
vCPUs. This sequence is especially critical for virtio device
backends whos VQs are mapped to main memory and accessed
directly not using MMIO callbacks.

One problem is that vm_start() routine makes calls state
change notifier callbacks directly from itself. Virtio drivers
do some stuff with syncing/flusing VQs in its notifier routines.
Since we poll UFFD and process faults on the same thread, that
leads to the situation when the thread locks in vm_start()
if we try to call it from the migration thread.

The solution is to call ram_write_tracking_start() directly
from migration thread and then schedule BH for vm_start.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/migration.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1ffbb4ce4a..381da4a4d3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3716,7 +3716,13 @@ static void *migration_thread(void *opaque)
 
 static void wt_migration_vm_start_bh(void *opaque)
 {
-    /* TODO: implement */
+    MigrationState *s = opaque;
+
+    qemu_bh_delete(s->wt_vm_start_bh);
+    s->wt_vm_start_bh = NULL;
+
+    vm_start();
+    s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - s->downtime_start;
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] The rest of write tracking migration code.
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
                   ` (4 preceding siblings ...)
  2020-11-18 11:22 ` [PATCH 5/7] Implementation of vm_start() BH Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 11:22 ` [PATCH 7/7] Introduced simple linear scan rate limiting mechanism for write tracking migration Andrey Gruzdev via
  2020-11-18 11:39 ` [PATCH 0/7] UFFD write-tracking migration/snapshots no-reply
  7 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/migration.c | 72 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 381da4a4d3..92bcb9105d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3212,6 +3212,48 @@ fail:
                       MIGRATION_STATUS_FAILED);
 }
 
+/**
+ * wt_migration_completion: Used by wt_migration_thread when after all the RAM has been saved.
+ *   The caller 'breaks' the loop when this returns.
+ *
+ * @s: Current migration state
+ */
+static void wt_migration_completion(MigrationState *s)
+{
+    int current_active_state = s->state;
+
+    /* Stop tracking RAM writes - un-protect memory, un-register UFFD memory ranges,
+     * flush kernel wait queues and wake up threads waiting for write fault to be
+     * resolved. All of this is essentially done by closing UFFD file descriptor */
+    ram_write_tracking_stop();
+
+    if (s->state == MIGRATION_STATUS_ACTIVE) {
+        /*
+         * By this moment we have RAM content saved into the migration stream.
+         * The next step is to flush the non-RAM content (device state)
+         * right after the ram content. The device state has been stored into
+         * the temporary buffer before RAM saving started.
+         */
+        qemu_put_buffer(s->to_dst_file, s->bioc->data, s->bioc->usage);
+        qemu_fflush(s->to_dst_file);
+    } else if (s->state == MIGRATION_STATUS_CANCELLING) {
+        goto fail;
+    }
+
+    if (qemu_file_get_error(s->to_dst_file)) {
+        trace_migration_completion_file_err();
+        goto fail;
+    }
+
+    migrate_set_state(&s->state, current_active_state,
+                      MIGRATION_STATUS_COMPLETED);
+    return;
+
+fail:
+    migrate_set_state(&s->state, current_active_state,
+                      MIGRATION_STATUS_FAILED);
+}
+
 bool migrate_colo_enabled(void)
 {
     MigrationState *s = migrate_get_current();
@@ -3554,7 +3596,26 @@ static void migration_iteration_finish(MigrationState *s)
 
 static void wt_migration_iteration_finish(MigrationState *s)
 {
-    /* TODO: implement */
+    qemu_mutex_lock_iothread();
+    switch (s->state) {
+        case MIGRATION_STATUS_COMPLETED:
+            migration_calculate_complete(s);
+            break;
+
+        case MIGRATION_STATUS_ACTIVE:
+        case MIGRATION_STATUS_FAILED:
+        case MIGRATION_STATUS_CANCELLED:
+        case MIGRATION_STATUS_CANCELLING:
+            break;
+
+        default:
+            /* Should not reach here, but if so, forgive the VM. */
+            error_report("%s: Unknown ending state %d", __func__, s->state);
+            break;
+    }
+
+    migrate_fd_cleanup_schedule(s);
+    qemu_mutex_unlock_iothread();
 }
 
 /*
@@ -3563,7 +3624,14 @@ static void wt_migration_iteration_finish(MigrationState *s)
  */
 static MigIterateState wt_migration_iteration_run(MigrationState *s)
 {
-    /* TODO: implement */
+    int res;
+
+    res = qemu_savevm_state_iterate(s->to_dst_file, false);
+    if (res) {
+        wt_migration_completion(s);
+        return MIG_ITERATE_BREAK;
+    }
+
     return MIG_ITERATE_RESUME;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] Introduced simple linear scan rate limiting mechanism for write tracking migration.
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
                   ` (5 preceding siblings ...)
  2020-11-18 11:22 ` [PATCH 6/7] The rest of write tracking migration code Andrey Gruzdev via
@ 2020-11-18 11:22 ` Andrey Gruzdev via
  2020-11-18 11:39 ` [PATCH 0/7] UFFD write-tracking migration/snapshots no-reply
  7 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev via @ 2020-11-18 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Andrey Gruzdev

Since reading UFFD events and saving paged data are performed
from the same thread, write fault latencies are sensitive to
migration stream stalls. Limiting total page saving rate is a
method to reduce amount of noticiable fault resolution latencies.

Migration bandwidth limiting is achieved via noticing cases of
out-of-threshold write fault latencies and temporarily disabling
(strictly speaking, severely throttling) saving non-faulting pages.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/ram.c | 58 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 08a1d7a252..89fe106585 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -325,6 +325,10 @@ struct RAMState {
     /* these variables are used for bitmap sync */
     /* last time we did a full bitmap_sync */
     int64_t time_last_bitmap_sync;
+    /* last time UFFD fault occured */
+    int64_t last_fault_ns;
+    /* linear scan throttling counter */
+    int throttle_skip_counter;
     /* bytes transferred at start_time */
     uint64_t bytes_xfer_prev;
     /* number of dirty pages since start_time */
@@ -576,9 +580,6 @@ static int uffd_protect_memory(int uffd, hwaddr start, hwaddr length, bool wp)
     return 0;
 }
 
-__attribute__ ((unused))
-static bool uffd_poll_events(int uffd, int tmo);
-
 /**
  * uffd_read_events: read pending UFFD events
  *
@@ -2006,9 +2007,51 @@ static bool get_fault_page(RAMState *rs, PageSearchStatus *pss)
         return false;
     }
 
+    rs->last_fault_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     return true;
 }
 
+#define FAULT_HIGH_LATENCY_NS   5000000     /* 5 ms */
+#define SLOW_FAULT_POLL_TMO     5           /* 5 ms */
+#define SLOW_FAULT_SKIP_PAGES   200
+
+/**
+ * limit_scan_rate: limit RAM linear scan rate in case of growing write fault
+ *  latencies, used in write-tracking migration implementation
+ *
+ * @rs: current RAM state
+ *
+ */
+static void limit_scan_rate(RAMState *rs)
+{
+    int64_t last_fault_latency_ns = 0;
+
+    if (!rs->ram_wt_enabled) {
+        return;
+    }
+
+    /* Check if last write fault time is available */
+    if (rs->last_fault_ns) {
+        last_fault_latency_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) -
+                rs->last_fault_ns;
+        rs->last_fault_ns = 0;
+    }
+
+    /* In case last fault time was available and we have
+     * latency value, check if it's not too high */
+    if (last_fault_latency_ns > FAULT_HIGH_LATENCY_NS) {
+        /* Reset counter after each slow write fault */
+        rs->throttle_skip_counter = SLOW_FAULT_SKIP_PAGES;
+    }
+    /* Delay thread execution till next write fault occures or timeout expires.
+     * Next SLOW_FAULT_SKIP_PAGES can be write fault pages only, not from pages going from
+     * linear scan logic. Thus we moderate migration stream rate to reduce latencies */
+    if (rs->throttle_skip_counter > 0) {
+        uffd_poll_events(rs->uffdio_fd, SLOW_FAULT_POLL_TMO);
+        rs->throttle_skip_counter--;
+    }
+}
+
 /**
  * ram_find_and_save_block: finds a dirty page and sends it to f
  *
@@ -2078,6 +2121,9 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage)
                 if (res < 0) {
                     break;
                 }
+
+                /* Linear scan rate limiting */
+                limit_scan_rate(rs);
             }
         }
     } while (!pages && again);
@@ -2191,12 +2237,15 @@ static void ram_state_reset(RAMState *rs)
     rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
+    rs->last_fault_ns = 0;
+    rs->throttle_skip_counter = 0;
     rs->ram_wt_enabled = migrate_track_writes_ram();
     rs->ram_bulk_stage = !rs->ram_wt_enabled;
     rs->fpo_enabled = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
+#define WT_MAX_WAIT 1000 /* 1000 ms, need bigger limit for 'write-tracking' migration */
 
 /*
  * 'expected' is the value you expect the bitmap mostly to be full
@@ -2872,7 +2921,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
             if ((i & 63) == 0) {
                 uint64_t t1 = (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - t0) /
                               1000000;
-                if (t1 > MAX_WAIT) {
+                uint64_t max_wait = rs->ram_wt_enabled ? WT_MAX_WAIT : MAX_WAIT;
+                if (t1 > max_wait) {
                     trace_ram_save_iterate_big_wait(t1, i);
                     break;
                 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/7] UFFD write-tracking migration/snapshots
  2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
                   ` (6 preceding siblings ...)
  2020-11-18 11:22 ` [PATCH 7/7] Introduced simple linear scan rate limiting mechanism for write tracking migration Andrey Gruzdev via
@ 2020-11-18 11:39 ` no-reply
  7 siblings, 0 replies; 11+ messages in thread
From: no-reply @ 2020-11-18 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, quintela, armbru, qemu-devel, dgilbert, pbonzini, andrey.gruzdev

Patchew URL: https://patchew.org/QEMU/20201118112233.264530-1-andrey.gruzdev@virtuozzo.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201118112233.264530-1-andrey.gruzdev@virtuozzo.com
Subject: [PATCH 0/7] UFFD write-tracking migration/snapshots

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20201118112233.264530-1-andrey.gruzdev@virtuozzo.com -> patchew/20201118112233.264530-1-andrey.gruzdev@virtuozzo.com
Switched to a new branch 'test'
5589aed Introduced simple linear scan rate limiting mechanism for write tracking migration.
3c614ac The rest of write tracking migration code.
1aa4d6b Implementation of vm_start() BH.
39d7ce1 Implementation of write-tracking migration thread.
b33c61f Support UFFD write fault processing in ram_save_iterate().
eb9f1b9 Introduced UFFD-WP low-level interface helpers. Implemented support for the whole RAM block memory protection/un-protection. Higher level ram_write_tracking_start() and ram_write_tracking_stop() to start/stop tracking memory writes on the whole VM memory.
a32445e Introduce 'track-writes-ram' migration capability.

=== OUTPUT BEGIN ===
1/7 Checking commit a32445ea3f4d (Introduce 'track-writes-ram' migration capability.)
WARNING: line over 80 characters
#134: FILE: migration/migration.c:3881:
+    DEFINE_PROP_MIG_CAP("x-track-writes-ram", MIGRATION_CAPABILITY_TRACK_WRITES_RAM),

total: 0 errors, 1 warnings, 146 lines checked

Patch 1/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/7 Checking commit eb9f1b9226fd (Introduced UFFD-WP low-level interface helpers. Implemented support for the whole RAM block memory protection/un-protection. Higher level ram_write_tracking_start() and ram_write_tracking_stop() to start/stop tracking memory writes on the whole VM memory.)
3/7 Checking commit b33c61fdea27 (Support UFFD write fault processing in ram_save_iterate().)
WARNING: line over 80 characters
#70: FILE: migration/ram.c:1952:
+                    ((page_address - (hwaddr) pss->block->host) >> TARGET_PAGE_BITS);

WARNING: line over 80 characters
#120: FILE: migration/ram.c:2002:
+        /* In case we couldn't find respective block, just unprotect faulting page */

WARNING: line over 80 characters
#121: FILE: migration/ram.c:2003:
+        uffd_protect_memory(rs->uffdio_fd, page_address, TARGET_PAGE_SIZE, false);

WARNING: Block comments use a leading /* on a separate line
#156: FILE: migration/ram.c:2050:
+        /* In case of 'write-tracking' migration we first try

WARNING: Block comments use a trailing */ on a separate line
#157: FILE: migration/ram.c:2051:
+         * to poll UFFD and get write page fault event */

WARNING: line over 80 characters
#177: FILE: migration/ram.c:2071:
+                hwaddr run_length = (hwaddr) (page_to - page + 1) << TARGET_PAGE_BITS;

WARNING: line over 80 characters
#183: FILE: migration/ram.c:2077:
+                res = uffd_protect_memory(rs->uffdio_fd, page_address, run_length, false);

total: 0 errors, 7 warnings, 167 lines checked

Patch 3/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/7 Checking commit 39d7ce1f4949 (Implementation of write-tracking migration thread.)
ERROR: suspicious ; after while (0)
#142: FILE: migration/migration.c:3811:
+    } while (false);

total: 1 errors, 0 warnings, 229 lines checked

Patch 4/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/7 Checking commit 1aa4d6b82d4b (Implementation of vm_start() BH.)
6/7 Checking commit 3c614ac0babe (The rest of write tracking migration code.)
ERROR: line over 90 characters
#19: FILE: migration/migration.c:3216:
+ * wt_migration_completion: Used by wt_migration_thread when after all the RAM has been saved.

WARNING: line over 80 characters
#28: FILE: migration/migration.c:3225:
+    /* Stop tracking RAM writes - un-protect memory, un-register UFFD memory ranges,

WARNING: Block comments use a leading /* on a separate line
#28: FILE: migration/migration.c:3225:
+    /* Stop tracking RAM writes - un-protect memory, un-register UFFD memory ranges,

WARNING: line over 80 characters
#29: FILE: migration/migration.c:3226:
+     * flush kernel wait queues and wake up threads waiting for write fault to be

WARNING: line over 80 characters
#30: FILE: migration/migration.c:3227:
+     * resolved. All of this is essentially done by closing UFFD file descriptor */

WARNING: Block comments use a trailing */ on a separate line
#30: FILE: migration/migration.c:3227:
+     * resolved. All of this is essentially done by closing UFFD file descriptor */

ERROR: switch and case should be at the same indent
#69: FILE: migration/migration.c:3600:
+    switch (s->state) {
+        case MIGRATION_STATUS_COMPLETED:
[...]
+        case MIGRATION_STATUS_ACTIVE:
+        case MIGRATION_STATUS_FAILED:
+        case MIGRATION_STATUS_CANCELLED:
+        case MIGRATION_STATUS_CANCELLING:
[...]
+        default:

total: 2 errors, 5 warnings, 90 lines checked

Patch 6/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/7 Checking commit 5589aed36ed6 (Introduced simple linear scan rate limiting mechanism for write tracking migration.)
WARNING: Block comments use a leading /* on a separate line
#78: FILE: migration/ram.c:2040:
+    /* In case last fault time was available and we have

WARNING: Block comments use a trailing */ on a separate line
#79: FILE: migration/ram.c:2041:
+     * latency value, check if it's not too high */

WARNING: Block comments use a leading /* on a separate line
#84: FILE: migration/ram.c:2046:
+    /* Delay thread execution till next write fault occures or timeout expires.

WARNING: line over 80 characters
#85: FILE: migration/ram.c:2047:
+     * Next SLOW_FAULT_SKIP_PAGES can be write fault pages only, not from pages going from

WARNING: line over 80 characters
#86: FILE: migration/ram.c:2048:
+     * linear scan logic. Thus we moderate migration stream rate to reduce latencies */

WARNING: Block comments use a trailing */ on a separate line
#86: FILE: migration/ram.c:2048:
+     * linear scan logic. Thus we moderate migration stream rate to reduce latencies */

WARNING: line over 80 characters
#118: FILE: migration/ram.c:2248:
+#define WT_MAX_WAIT 1000 /* 1000 ms, need bigger limit for 'write-tracking' migration */

total: 0 errors, 7 warnings, 103 lines checked

Patch 7/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201118112233.264530-1-andrey.gruzdev@virtuozzo.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/7] Introduce 'track-writes-ram' migration capability.
  2020-11-18 11:22 ` [PATCH 1/7] Introduce 'track-writes-ram' migration capability Andrey Gruzdev via
@ 2020-11-18 14:31   ` Eric Blake
  2020-11-19  7:14     ` Andrey Gruzdev
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Blake @ 2020-11-18 14:31 UTC (permalink / raw)
  To: Andrey Gruzdev, qemu-devel
  Cc: Den Lunev, Paolo Bonzini, Dr . David Alan Gilbert,
	Markus Armbruster, Juan Quintela

On 11/18/20 5:22 AM, Andrey Gruzdev wrote:
> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
> ---
>  migration/migration.c | 96 +++++++++++++++++++++++++++++++++++++++++++
>  migration/migration.h |  1 +
>  qapi/migration.json   |  7 +++-
>  3 files changed, 103 insertions(+), 1 deletion(-)
> 

Reviewing just the UI:

> +++ b/qapi/migration.json
> @@ -442,6 +442,11 @@
>  # @validate-uuid: Send the UUID of the source to allow the destination
>  #                 to ensure it is the same. (since 4.2)
>  #
> +# @track-writes-ram: If enabled, the migration stream will be a snapshot
> +#                    of the VM exactly at the point when the migration
> +#                    procedure starts. The VM RAM is saved with running VM.
> +#                    (since 5.2)

This is a new feature, so it is too late for 5.2; this will need to be 6.0.

> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> @@ -449,7 +454,7 @@
>             'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>             'block', 'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> -           'x-ignore-shared', 'validate-uuid' ] }
> +           'x-ignore-shared', 'validate-uuid', 'track-writes-ram'] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/7] Introduce 'track-writes-ram' migration capability.
  2020-11-18 14:31   ` Eric Blake
@ 2020-11-19  7:14     ` Andrey Gruzdev
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Gruzdev @ 2020-11-19  7:14 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Den Lunev, Paolo Bonzini, Juan Quintela, Dr . David Alan Gilbert,
	Markus Armbruster

On 18.11.2020 17:31, Eric Blake wrote:
> On 11/18/20 5:22 AM, Andrey Gruzdev wrote:
>> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
>> ---
>>   migration/migration.c | 96 +++++++++++++++++++++++++++++++++++++++++++
>>   migration/migration.h |  1 +
>>   qapi/migration.json   |  7 +++-
>>   3 files changed, 103 insertions(+), 1 deletion(-)
>>
> 
> Reviewing just the UI:
> 
>> +++ b/qapi/migration.json
>> @@ -442,6 +442,11 @@
>>   # @validate-uuid: Send the UUID of the source to allow the destination
>>   #                 to ensure it is the same. (since 4.2)
>>   #
>> +# @track-writes-ram: If enabled, the migration stream will be a snapshot
>> +#                    of the VM exactly at the point when the migration
>> +#                    procedure starts. The VM RAM is saved with running VM.
>> +#                    (since 5.2)
> 
> This is a new feature, so it is too late for 5.2; this will need to be 6.0.
> 

Yes, agree.

>> +#
>>   # Since: 1.2
>>   ##
>>   { 'enum': 'MigrationCapability',
>> @@ -449,7 +454,7 @@
>>              'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>>              'block', 'return-path', 'pause-before-switchover', 'multifd',
>>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>> -           'x-ignore-shared', 'validate-uuid' ] }
>> +           'x-ignore-shared', 'validate-uuid', 'track-writes-ram'] }
>>   
>>   ##
>>   # @MigrationCapabilityStatus:
>>
> 

-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                 virtuzzo.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-11-19  7:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 11:22 [PATCH 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
2020-11-18 11:22 ` [PATCH 1/7] Introduce 'track-writes-ram' migration capability Andrey Gruzdev via
2020-11-18 14:31   ` Eric Blake
2020-11-19  7:14     ` Andrey Gruzdev
2020-11-18 11:22 ` [PATCH 2/7] Introduced UFFD-WP low-level interface helpers. Implemented support for the whole RAM block memory protection/un-protection. Higher level ram_write_tracking_start() and ram_write_tracking_stop() to start/stop tracking memory writes on the whole VM memory Andrey Gruzdev via
2020-11-18 11:22 ` [PATCH 3/7] Support UFFD write fault processing in ram_save_iterate() Andrey Gruzdev via
2020-11-18 11:22 ` [PATCH 4/7] Implementation of write-tracking migration thread Andrey Gruzdev via
2020-11-18 11:22 ` [PATCH 5/7] Implementation of vm_start() BH Andrey Gruzdev via
2020-11-18 11:22 ` [PATCH 6/7] The rest of write tracking migration code Andrey Gruzdev via
2020-11-18 11:22 ` [PATCH 7/7] Introduced simple linear scan rate limiting mechanism for write tracking migration Andrey Gruzdev via
2020-11-18 11:39 ` [PATCH 0/7] UFFD write-tracking migration/snapshots no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.