[Qemu-devel] [PATCH RESEND V3 0/6] calculate downtime for postcopy live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH RESEND V3 0/6] calculate downtime for postcopy live migration
       [not found] <CGME20170428065752eucas1p1b702ff53ba0bd96674e8cc35466f8046@eucas1p1.samsung.com>
@ 2017-04-28  6:57 ` Alexey Perevalov
       [not found]   ` <CGME20170428065752eucas1p190511b1932f61b6321c489f0eb4e816f@eucas1p1.samsung.com>
                     ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

This is third version of patch set.
First version was tagged as RFC, second was without version tag.

Difference since previous version (V3 -> V2)
    - Downtime calculation approach was changed, thanks to Peter Xu
    - Due to previous point no more need to keep GTree as well as bitmap of cpus.
So glib changes aren't included in this patch set, it could be resent in
another patch set, if it will be a good reason for it.
    - No procfs traces in this patchset, if somebody wants it, you could get it
from patchwork site to track down page fault initiators.
    - UFFD_FEATURE_THREAD_ID is requesting only when kernel supports it
    - It doesn't send back the downtime, just trace it

This patch set is based on master branch of git://git.qemu-project.org/qemu.git
base commit is commit 81b2d5ceb0cfb4cdc2163492e3169ed714b0cda9
"Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170426' into staging"

It contains patch for kernel header, just for convinience of applying current
patch set, for testing until kernel headers arn't synced.

Alexey Perevalov (6):
  userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  migration: pass ptr to MigrationIncomingState into migration
    ufd_version_check & postcopy_ram_supported_by_host
  migration: split ufd_version_check onto receive/request features part
  migration: add postcopy downtime into MigrationIncommingState
  migration: calculate downtime on dst side
  migration: trace postcopy total downtime

 include/migration/migration.h     |  15 +++++
 include/migration/postcopy-ram.h  |   2 +-
 linux-headers/linux/userfaultfd.h |   5 ++
 migration/migration.c             | 138 +++++++++++++++++++++++++++++++++++++-
 migration/postcopy-ram.c          | 106 ++++++++++++++++++++++++++---
 migration/savevm.c                |   2 +-
 migration/trace-events            |   7 +-
 7 files changed, 261 insertions(+), 14 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH RESEND V3 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_*
       [not found]   ` <CGME20170428065752eucas1p190511b1932f61b6321c489f0eb4e816f@eucas1p1.samsung.com>
@ 2017-04-28  6:57     ` Alexey Perevalov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

This commit duplicates header of "userfaultfd: provide pid in userfault msg"
into linux kernel.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 linux-headers/linux/userfaultfd.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 2ed5dc3..e7c8898 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -77,6 +77,9 @@ struct uffd_msg {
 		struct {
 			__u64	flags;
 			__u64	address;
+			union {
+				__u32   ptid;
+			} feat;
 		} pagefault;
 
 		struct {
@@ -158,6 +161,8 @@ struct uffdio_api {
 #define UFFD_FEATURE_EVENT_MADVDONTNEED		(1<<3)
 #define UFFD_FEATURE_MISSING_HUGETLBFS		(1<<4)
 #define UFFD_FEATURE_MISSING_SHMEM		(1<<5)
+#define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
+#define UFFD_FEATURE_THREAD_ID			(1<<7)
 	__u64 features;
 
 	__u64 ioctls;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH RESEND V3 2/6] migration: pass ptr to MigrationIncomingState into migration ufd_version_check & postcopy_ram_supported_by_host
       [not found]   ` <CGME20170428065753eucas1p1639528c4df0b459db96579fd5bee281c@eucas1p1.samsung.com>
@ 2017-04-28  6:57     ` Alexey Perevalov
  2017-04-28  9:04       ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/postcopy-ram.h |  2 +-
 migration/migration.c            |  2 +-
 migration/postcopy-ram.c         | 10 +++++-----
 migration/savevm.c               |  2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8e036b9..809f6db 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -14,7 +14,7 @@
 #define QEMU_POSTCOPY_RAM_H
 
 /* Return true if the host supports everything we need to do postcopy-ram */
-bool postcopy_ram_supported_by_host(void);
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
 
 /*
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
diff --git a/migration/migration.c b/migration/migration.c
index 353f272..569a7f6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -804,7 +804,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
          * special support.
          */
         if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
-            !postcopy_ram_supported_by_host()) {
+            !postcopy_ram_supported_by_host(NULL)) {
             /* postcopy_ram_supported_by_host will have emitted a more
              * detailed message
              */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 85fd8d7..4c859b4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -60,7 +60,7 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
-static bool ufd_version_check(int ufd)
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
 {
     struct uffdio_api api_struct;
     uint64_t ioctl_mask;
@@ -113,7 +113,7 @@ static int test_range_shared(const char *block_name, void *host_addr,
  * normally fine since if the postcopy succeeds it gets turned back on at the
  * end.
  */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     long pagesize = getpagesize();
     int ufd = -1;
@@ -136,7 +136,7 @@ bool postcopy_ram_supported_by_host(void)
     }
 
     /* Version and features check */
-    if (!ufd_version_check(ufd)) {
+    if (!ufd_version_check(ufd, mis)) {
         goto out;
     }
 
@@ -513,7 +513,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
      * Although the host check already tested the API, we need to
      * do the check again as an ABI handshake on the new fd.
      */
-    if (!ufd_version_check(mis->userfault_fd)) {
+    if (!ufd_version_check(mis->userfault_fd, mis)) {
         return -1;
     }
 
@@ -651,7 +651,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     error_report("%s: No OS support", __func__);
     return false;
diff --git a/migration/savevm.c b/migration/savevm.c
index 03ae1bd..2aff64c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1360,7 +1360,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
         return -1;
     }
 
-    if (!postcopy_ram_supported_by_host()) {
+    if (!postcopy_ram_supported_by_host(mis)) {
         postcopy_state_set(POSTCOPY_INCOMING_NONE);
         return -1;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part
       [not found]   ` <CGME20170428065753eucas1p1524aa2bd8e469e6c94a88ee80eb54a6e@eucas1p1.samsung.com>
@ 2017-04-28  6:57     ` Alexey Perevalov
  2017-04-28  9:01       ` Peter Xu
  2017-04-28 15:55       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.

QEMU need to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but
kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So
only one ioctl with UFFD_API is possible per ufd.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 4c859b4..21e7150 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -60,15 +60,51 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
-static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
+
+/*
+ * Check userfault fd features, to request only supported features in
+ * future.
+ * __NR_userfaultfd - should be checked before
+ * Return obtained features
+ */
+static bool receive_ufd_features(__u64 *features)
 {
-    struct uffdio_api api_struct;
-    uint64_t ioctl_mask;
+    struct uffdio_api api_struct = {0};
+    int ufd;
+    bool ret = true;
 
+    /* if we are here __NR_userfaultfd should exists */
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        return false;
+    }
+
+    /* ask features */
     api_struct.api = UFFD_API;
     api_struct.features = 0;
     if (ioctl(ufd, UFFDIO_API, &api_struct)) {
-        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+        error_report("receive_ufd_features: UFFDIO_API failed: %s",
+                strerror(errno));
+        ret = false;
+        goto release_ufd;
+    }
+
+    *features = api_struct.features;
+
+release_ufd:
+    close(ufd);
+    return ret;
+}
+
+static bool request_ufd_features(int ufd, __u64 features)
+{
+    struct uffdio_api api_struct = {0};
+    uint64_t ioctl_mask;
+
+    api_struct.api = UFFD_API;
+    api_struct.features = features;
+    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+        error_report("request_ufd_features: UFFDIO_API failed: %s",
                      strerror(errno));
         return false;
     }
@@ -81,11 +117,33 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
         return false;
     }
 
+    return true;
+}
+
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
+{
+    __u64 new_features = 0;
+
+    /* ask features */
+    __u64 supported_features;
+
+    if (!receive_ufd_features(&supported_features)) {
+        error_report("ufd_version_check failed");
+        return false;
+    }
+
+    /* request features */
+    if (new_features && !request_ufd_features(ufd, new_features)) {
+        error_report("ufd_version_check failed: features %" PRIu64,
+                (uint64_t)new_features);
+        return false;
+    }
+
     if (getpagesize() != ram_pagesize_summary()) {
         bool have_hp = false;
         /* We've got a huge page */
 #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
-        have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
+        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
 #endif
         if (!have_hp) {
             error_report("Userfault on this host does not support huge pages");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
       [not found]   ` <CGME20170428065754eucas1p1f51713373ce8c2d19945a4f91c52bd5c@eucas1p1.samsung.com>
@ 2017-04-28  6:57     ` Alexey Perevalov
  2017-04-28  9:38       ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

This patch add request to kernel space for UFFD_FEATURE_THREAD_ID,
in case when this feature is provided by kernel.

DowntimeContext is incapsulated inside migration.c.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h | 12 ++++++++++++
 migration/migration.c         | 33 +++++++++++++++++++++++++++++++++
 migration/postcopy-ram.c      |  8 ++++++++
 3 files changed, 53 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba1a16c..e8fb68f 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -83,6 +83,8 @@ typedef enum {
     POSTCOPY_INCOMING_END
 } PostcopyState;
 
+struct DowntimeContext;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -123,10 +125,20 @@ struct MigrationIncomingState {
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
+
+    /*
+     * DowntimeContext to keep information for postcopy
+     * live migration, to calculate downtime
+     * */
+    struct DowntimeContext *downtime_ctx;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+/*
+ * Functions to work with downtime context
+ */
+struct DowntimeContext *downtime_context_new(void);
 
 struct MigrationState
 {
diff --git a/migration/migration.c b/migration/migration.c
index 569a7f6..ec76e5c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -77,6 +77,18 @@ static NotifierList migration_state_notifiers =
 
 static bool deferred_incoming;
 
+typedef struct DowntimeContext {
+    /* time when page fault initiated per vCPU */
+    int64_t *page_fault_vcpu_time;
+    /* page address per vCPU */
+    uint64_t *vcpu_addr;
+    int64_t total_downtime;
+    /* downtime per vCPU */
+    int64_t *vcpu_downtime;
+    /* point in time when last page fault was initiated */
+    int64_t last_begin;
+} DowntimeContext;
+
 /*
  * Current state of incoming postcopy; note this is not part of
  * MigrationIncomingState since it's state is used during cleanup
@@ -116,6 +128,23 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+struct DowntimeContext *downtime_context_new(void)
+{
+    DowntimeContext *ctx = g_new0(DowntimeContext, 1);
+    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
+    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
+    ctx->vcpu_downtime = g_new0(int64_t, smp_cpus);
+    return ctx;
+}
+
+static void destroy_downtime_context(struct DowntimeContext *ctx)
+{
+    g_free(ctx->page_fault_vcpu_time);
+    g_free(ctx->vcpu_addr);
+    g_free(ctx->vcpu_downtime);
+    g_free(ctx);
+}
+
 MigrationIncomingState *migration_incoming_get_current(void)
 {
     static bool once;
@@ -138,6 +167,10 @@ void migration_incoming_state_destroy(void)
 
     qemu_event_destroy(&mis->main_thread_load_event);
     loadvm_free_handlers(mis);
+    if (mis->downtime_ctx) {
+        destroy_downtime_context(mis->downtime_ctx);
+        mis->downtime_ctx = NULL;
+    }
 }
 
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 21e7150..f3688f5 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
         return false;
     }
 
+#ifdef UFFD_FEATURE_THREAD_ID
+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
+        /* kernel supports that feature */
+        mis->downtime_ctx = downtime_context_new();
+        new_features |= UFFD_FEATURE_THREAD_ID;
+    }
+#endif
+
     /* request features */
     if (new_features && !request_ufd_features(ufd, new_features)) {
         error_report("ufd_version_check failed: features %" PRIu64,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
       [not found]   ` <CGME20170428065755eucas1p2ff9aa17eaa294e741d8c65f8d58a71fb@eucas1p2.samsung.com>
@ 2017-04-28  6:57     ` Alexey Perevalov
  2017-04-28 10:00       ` Peter Xu
  2017-04-28 16:34       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

This patch provides downtime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for downtime per vCPU (could be traced with page_fault_addr)

For more details see comments for get_postcopy_total_downtime
implementation.

Downtime will not calculated if postcopy_downtime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h |   3 ++
 migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.c      |  20 +++++++-
 migration/trace-events        |   6 ++-
 4 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e8fb68f..a22f9ce 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
  * Functions to work with downtime context
  */
 struct DowntimeContext *downtime_context_new(void);
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
+void mark_postcopy_downtime_end(uint64_t addr);
+uint64_t get_postcopy_total_downtime(void);
 
 struct MigrationState
 {
diff --git a/migration/migration.c b/migration/migration.c
index ec76e5c..2c6f150 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
     return atomic_xchg(&incoming_postcopy_state, new_state);
 }
 
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    DowntimeContext *dc;
+    if (!mis->downtime_ctx || cpu < 0) {
+        return;
+    }
+    dc = mis->downtime_ctx;
+    dc->vcpu_addr[cpu] = addr;
+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
+            cpu);
+}
+
+void mark_postcopy_downtime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    DowntimeContext *dc;
+    int i;
+    bool all_vcpu_down = true;
+    int64_t now;
+
+    if (!mis->downtime_ctx) {
+        return;
+    }
+    dc = mis->downtime_ctx;
+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    /* check all vCPU down,
+     * QEMU has bitmap.h, but even with bitmap_and
+     * will be a cycle */
+    for (i = 0; i < smp_cpus; i++) {
+        if (dc->vcpu_addr[i]) {
+            continue;
+        }
+        all_vcpu_down = false;
+        break;
+    }
+
+    if (all_vcpu_down) {
+        dc->total_downtime += now - dc->last_begin;
+    }
+
+    /* lookup cpu, to clear it */
+    for (i = 0; i < smp_cpus; i++) {
+        uint64_t vcpu_downtime;
+
+        if (dc->vcpu_addr[i] != addr) {
+            continue;
+        }
+
+        vcpu_downtime = now - dc->page_fault_vcpu_time[i];
+
+        dc->vcpu_addr[i] = 0;
+        dc->vcpu_downtime[i] += vcpu_downtime;
+    }
+
+    trace_mark_postcopy_downtime_end(addr, dc, dc->total_downtime);
+}
+
+/*
+ * This function just provide calculated before downtime per cpu and trace it.
+ * Total downtime is calculated in mark_postcopy_downtime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *            it's a part of total downtime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *              * - means downtime per vCPU
+ *              x - means overlapped downtime (total downtime)
+ */
+uint64_t get_postcopy_total_downtime(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (!mis->downtime_ctx) {
+        return 0;
+    }
+
+    if (trace_event_get_state(TRACE_DOWNTIME_PER_CPU)) {
+        int i;
+        for (i = 0; i < smp_cpus; i++) {
+            trace_downtime_per_cpu(i, mis->downtime_ctx->vcpu_downtime[i]);
+        }
+    }
+    return mis->downtime_ctx->total_downtime;
+}
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index f3688f5..cf2b935 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,6 +23,7 @@
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
+#include <sys/param.h>
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -468,6 +469,19 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid) {
+            return cpu_iter->cpu_index;
+        }
+    }
+    trace_get_mem_fault_cpu_index(pid);
+    return -1;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -545,8 +559,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset,
+                                                msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_downtime_begin((uintptr_t)(msg.arg.pagefault.address),
+                         get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -641,6 +658,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
         return -e;
     }
+    mark_postcopy_downtime_end((uint64_t)host);
 
     trace_postcopy_place_page(host);
     return 0;
diff --git a/migration/trace-events b/migration/trace-events
index b8f01a2..d338810 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -110,6 +110,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
+mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
+downtime_per_cpu(int cpu_index, int64_t downtime) "downtime cpu[%d]=%" PRId64
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -186,7 +189,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -195,6 +198,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH RESEND V3 6/6] migration: trace postcopy total downtime
       [not found]   ` <CGME20170428065755eucas1p1cdd0f278a235f176e9f63c40bc64a7a9@eucas1p1.samsung.com>
@ 2017-04-28  6:57     ` Alexey Perevalov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28  6:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, a.perevalov, i.maximets, f4bug, peterx

It's not possible to transmit it back to source host,
due to RP protocol is not expandable.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 2 ++
 migration/trace-events   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cf2b935..35e77ba 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -388,6 +388,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     }
 
     postcopy_state_set(POSTCOPY_INCOMING_END);
+    /* here should be downtime receiving back operation */
+    trace_postcopy_ram_incoming_cleanup_downtime(get_postcopy_total_downtime());
     migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
 
     if (mis->postcopy_tmp_page) {
diff --git a/migration/trace-events b/migration/trace-events
index d338810..faa1b22 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -194,6 +194,7 @@ postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
+postcopy_ram_incoming_cleanup_downtime(uint64_t total) "total downtime %" PRIu64
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part
  2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
@ 2017-04-28  9:01       ` Peter Xu
  2017-04-28 10:58         ` Alexey Perevalov
  2017-04-28 15:55       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-04-28  9:01 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On Fri, Apr 28, 2017 at 09:57:35AM +0300, Alexey Perevalov wrote:
> This modification is necessary for userfault fd features which are
> required to be requested from userspace.
> UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
> be introduced in the next patch.
> 
> QEMU need to use separate userfault file descriptor, due to
> userfault context has internal state, and after first call of
> ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
> success), but
> kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So
> only one ioctl with UFFD_API is possible per ufd.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 4c859b4..21e7150 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -60,15 +60,51 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> +
> +/*
> + * Check userfault fd features, to request only supported features in
> + * future.
> + * __NR_userfaultfd - should be checked before
> + * Return obtained features
> + */
> +static bool receive_ufd_features(__u64 *features)
>  {
> -    struct uffdio_api api_struct;
> -    uint64_t ioctl_mask;
> +    struct uffdio_api api_struct = {0};
> +    int ufd;
> +    bool ret = true;
>  
> +    /* if we are here __NR_userfaultfd should exists */
> +    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +    if (ufd == -1) {

This check should be <0 rather than -1?

> +        return false;
> +    }
> +
> +    /* ask features */
>      api_struct.api = UFFD_API;
>      api_struct.features = 0;
>      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> -        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> +        error_report("receive_ufd_features: UFFDIO_API failed: %s",
> +                strerror(errno));
> +        ret = false;
> +        goto release_ufd;
> +    }
> +
> +    *features = api_struct.features;
> +
> +release_ufd:
> +    close(ufd);
> +    return ret;
> +}
> +
> +static bool request_ufd_features(int ufd, __u64 features)
> +{
> +    struct uffdio_api api_struct = {0};
> +    uint64_t ioctl_mask;
> +
> +    api_struct.api = UFFD_API;
> +    api_struct.features = features;
> +    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> +        error_report("request_ufd_features: UFFDIO_API failed: %s",
>                       strerror(errno));
>          return false;
>      }
> @@ -81,11 +117,33 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>          return false;
>      }
>  
> +    return true;
> +}
> +
> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)

This is not only a check not... It enables something in the kernel. So
I'll suggest change the function name correspondingly.

> +{
> +    __u64 new_features = 0;
> +
> +    /* ask features */
> +    __u64 supported_features;
> +
> +    if (!receive_ufd_features(&supported_features)) {
> +        error_report("ufd_version_check failed");
> +        return false;
> +    }
> +
> +    /* request features */
> +    if (new_features && !request_ufd_features(ufd, new_features)) {

Firstly, looks like new_features == 0 here always, no?

Second, I would suggest we enable feature explicitly. For this series,
it's only for the THREAD_ID thing. I would mask the rest. The problem
is, what if new features introduced in the future that we don't really
want to enable for postcopy?

Thanks,

> +        error_report("ufd_version_check failed: features %" PRIu64,
> +                (uint64_t)new_features);
> +        return false;
> +    }
> +
>      if (getpagesize() != ram_pagesize_summary()) {
>          bool have_hp = false;
>          /* We've got a huge page */
>  #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
> -        have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
> +        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
>  #endif
>          if (!have_hp) {
>              error_report("Userfault on this host does not support huge pages");
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 2/6] migration: pass ptr to MigrationIncomingState into migration ufd_version_check & postcopy_ram_supported_by_host
  2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 2/6] migration: pass ptr to MigrationIncomingState into migration ufd_version_check & postcopy_ram_supported_by_host Alexey Perevalov
@ 2017-04-28  9:04       ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2017-04-28  9:04 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On Fri, Apr 28, 2017 at 09:57:34AM +0300, Alexey Perevalov wrote:
> That tiny refactoring is necessary to be able to set
> UFFD_FEATURE_THREAD_ID while requesting features, and then
> to create downtime context in case when kernel supports it.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/migration/postcopy-ram.h |  2 +-
>  migration/migration.c            |  2 +-
>  migration/postcopy-ram.c         | 10 +++++-----
>  migration/savevm.c               |  2 +-
>  4 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> index 8e036b9..809f6db 100644
> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -14,7 +14,7 @@
>  #define QEMU_POSTCOPY_RAM_H
>  
>  /* Return true if the host supports everything we need to do postcopy-ram */
> -bool postcopy_ram_supported_by_host(void);
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
>  
>  /*
>   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> diff --git a/migration/migration.c b/migration/migration.c
> index 353f272..569a7f6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -804,7 +804,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>           * special support.
>           */
>          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> -            !postcopy_ram_supported_by_host()) {
> +            !postcopy_ram_supported_by_host(NULL)) {
>              /* postcopy_ram_supported_by_host will have emitted a more
>               * detailed message
>               */
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 85fd8d7..4c859b4 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -60,7 +60,7 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> -static bool ufd_version_check(int ufd)
> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)

This patch is mostly passing the incoming state around. IMHO it'll be
nicer if we squash this patch into the one that really uses the state.
What do you think?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState Alexey Perevalov
@ 2017-04-28  9:38       ` Peter Xu
  2017-04-28 10:03         ` Alexey Perevalov
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-04-28  9:38 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On Fri, Apr 28, 2017 at 09:57:36AM +0300, Alexey Perevalov wrote:
> This patch add request to kernel space for UFFD_FEATURE_THREAD_ID,
> in case when this feature is provided by kernel.
> 
> DowntimeContext is incapsulated inside migration.c.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/migration/migration.h | 12 ++++++++++++
>  migration/migration.c         | 33 +++++++++++++++++++++++++++++++++
>  migration/postcopy-ram.c      |  8 ++++++++
>  3 files changed, 53 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ba1a16c..e8fb68f 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -83,6 +83,8 @@ typedef enum {
>      POSTCOPY_INCOMING_END
>  } PostcopyState;
>  
> +struct DowntimeContext;

Nit: shall we embed something like "Postcopy" (or short form) into
this struct name? Since the whole thing is really tailored for
postcopy, only.

> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> @@ -123,10 +125,20 @@ struct MigrationIncomingState {
>  
>      /* See savevm.c */
>      LoadStateEntry_Head loadvm_handlers;
> +
> +    /*
> +     * DowntimeContext to keep information for postcopy
> +     * live migration, to calculate downtime
> +     * */
> +    struct DowntimeContext *downtime_ctx;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
>  void migration_incoming_state_destroy(void);
> +/*
> + * Functions to work with downtime context
> + */
> +struct DowntimeContext *downtime_context_new(void);
>  
>  struct MigrationState
>  {
> diff --git a/migration/migration.c b/migration/migration.c
> index 569a7f6..ec76e5c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -77,6 +77,18 @@ static NotifierList migration_state_notifiers =
>  
>  static bool deferred_incoming;
>  
> +typedef struct DowntimeContext {
> +    /* time when page fault initiated per vCPU */
> +    int64_t *page_fault_vcpu_time;
> +    /* page address per vCPU */
> +    uint64_t *vcpu_addr;
> +    int64_t total_downtime;
> +    /* downtime per vCPU */
> +    int64_t *vcpu_downtime;
> +    /* point in time when last page fault was initiated */
> +    int64_t last_begin;
> +} DowntimeContext;
> +
>  /*
>   * Current state of incoming postcopy; note this is not part of
>   * MigrationIncomingState since it's state is used during cleanup
> @@ -116,6 +128,23 @@ MigrationState *migrate_get_current(void)
>      return &current_migration;
>  }
>  
> +struct DowntimeContext *downtime_context_new(void)
> +{
> +    DowntimeContext *ctx = g_new0(DowntimeContext, 1);
> +    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
> +    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
> +    ctx->vcpu_downtime = g_new0(int64_t, smp_cpus);
> +    return ctx;
> +}
> +
> +static void destroy_downtime_context(struct DowntimeContext *ctx)
> +{
> +    g_free(ctx->page_fault_vcpu_time);
> +    g_free(ctx->vcpu_addr);
> +    g_free(ctx->vcpu_downtime);
> +    g_free(ctx);
> +}
> +
>  MigrationIncomingState *migration_incoming_get_current(void)
>  {
>      static bool once;
> @@ -138,6 +167,10 @@ void migration_incoming_state_destroy(void)
>  
>      qemu_event_destroy(&mis->main_thread_load_event);
>      loadvm_free_handlers(mis);
> +    if (mis->downtime_ctx) {
> +        destroy_downtime_context(mis->downtime_ctx);
> +        mis->downtime_ctx = NULL;
> +    }
>  }
>  
>  
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 21e7150..f3688f5 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>          return false;
>      }
>  
> +#ifdef UFFD_FEATURE_THREAD_ID
> +    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> +        /* kernel supports that feature */
> +        mis->downtime_ctx = downtime_context_new();
> +        new_features |= UFFD_FEATURE_THREAD_ID;

So here I know why in patch 2 new_features == 0... 

If I were you, I would like the series be done in below 4 patches:

1. update header
2. introduce THREAD_ID feature, and enable it conditionally
3. squash all the downtime thing (downtime context, calculation) in
   one patch here
4. introduce trace

IMHO that's clearer and easier for review. But I'm okay with current
as well as long as the maintainers (Dave/Juan) won't disagree. :)

Thanks,

> +    }
> +#endif
> +
>      /* request features */
>      if (new_features && !request_ufd_features(ufd, new_features)) {
>          error_report("ufd_version_check failed: features %" PRIu64,
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side Alexey Perevalov
@ 2017-04-28 10:00       ` Peter Xu
  2017-04-28 11:11         ` Alexey Perevalov
  2017-04-28 16:34       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-04-28 10:00 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> This patch provides downtime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for downtime per vCPU (could be traced with page_fault_addr)
> 
> For more details see comments for get_postcopy_total_downtime
> implementation.
> 
> Downtime will not calculated if postcopy_downtime field of
> MigrationIncomingState wasn't initialized.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/migration/migration.h |   3 ++
>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
>  migration/postcopy-ram.c      |  20 +++++++-
>  migration/trace-events        |   6 ++-
>  4 files changed, 130 insertions(+), 2 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index e8fb68f..a22f9ce 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
>   * Functions to work with downtime context
>   */
>  struct DowntimeContext *downtime_context_new(void);
> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> +void mark_postcopy_downtime_end(uint64_t addr);
> +uint64_t get_postcopy_total_downtime(void);
>  
>  struct MigrationState
>  {
> diff --git a/migration/migration.c b/migration/migration.c
> index ec76e5c..2c6f150 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
>      return atomic_xchg(&incoming_postcopy_state, new_state);
>  }
>  
> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    DowntimeContext *dc;
> +    if (!mis->downtime_ctx || cpu < 0) {
> +        return;
> +    }
> +    dc = mis->downtime_ctx;
> +    dc->vcpu_addr[cpu] = addr;
> +    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> +        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> +            cpu);
> +}
> +
> +void mark_postcopy_downtime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    DowntimeContext *dc;
> +    int i;
> +    bool all_vcpu_down = true;
> +    int64_t now;
> +
> +    if (!mis->downtime_ctx) {
> +        return;
> +    }
> +    dc = mis->downtime_ctx;
> +    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    /* check all vCPU down,
> +     * QEMU has bitmap.h, but even with bitmap_and
> +     * will be a cycle */
> +    for (i = 0; i < smp_cpus; i++) {
> +        if (dc->vcpu_addr[i]) {
> +            continue;
> +        }
> +        all_vcpu_down = false;
> +        break;
> +    }
> +
> +    if (all_vcpu_down) {
> +        dc->total_downtime += now - dc->last_begin;

Shall we do this accouting only if we are sure the copied page address
is one of the page faulted addresses? Can it be some other page? I
don't know. But since we have the loop below to make sure of it, why
not?

A nitpick on perf: when there are lots of vcpus, the algo might be
slow since we have several places that loops over the smp_vcpus. But
this can be totally future work on top, and current way is good enough
at least for me.

(for the nit: maybe add a hash, key=thread_id, value=cpu_index, then
 get_mem_fault_cpu_index() can be faster using the hash; meanwhile
 keep a counter A of page faulted vcpus, use atomic ops with it, then
 here all_vcpu_down can be checked by A == smp_vcpus)

Thanks,

> +    }
> +
> +    /* lookup cpu, to clear it */
> +    for (i = 0; i < smp_cpus; i++) {
> +        uint64_t vcpu_downtime;
> +
> +        if (dc->vcpu_addr[i] != addr) {
> +            continue;
> +        }
> +
> +        vcpu_downtime = now - dc->page_fault_vcpu_time[i];
> +
> +        dc->vcpu_addr[i] = 0;
> +        dc->vcpu_downtime[i] += vcpu_downtime;
> +    }
> +
> +    trace_mark_postcopy_downtime_end(addr, dc, dc->total_downtime);
> +}
> +
> +/*
> + * This function just provide calculated before downtime per cpu and trace it.
> + * Total downtime is calculated in mark_postcopy_downtime_end.
> + *
> + *
> + * Assume we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> + *            it's a part of total downtime.
> + * S1 - here is last_begin
> + * Legend of the picture is following:
> + *              * - means downtime per vCPU
> + *              x - means overlapped downtime (total downtime)
> + */
> +uint64_t get_postcopy_total_downtime(void)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    if (!mis->downtime_ctx) {
> +        return 0;
> +    }
> +
> +    if (trace_event_get_state(TRACE_DOWNTIME_PER_CPU)) {
> +        int i;
> +        for (i = 0; i < smp_cpus; i++) {
> +            trace_downtime_per_cpu(i, mis->downtime_ctx->vcpu_downtime[i]);
> +        }
> +    }
> +    return mis->downtime_ctx->total_downtime;
> +}
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index f3688f5..cf2b935 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -23,6 +23,7 @@
>  #include "migration/postcopy-ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include <sys/param.h>
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -468,6 +469,19 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid) {
> +            return cpu_iter->cpu_index;
> +        }
> +    }
> +    trace_get_mem_fault_cpu_index(pid);
> +    return -1;
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -545,8 +559,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset,
> +                                                msg.arg.pagefault.feat.ptid);
>  
> +        mark_postcopy_downtime_begin((uintptr_t)(msg.arg.pagefault.address),
> +                         get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -641,6 +658,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  
>          return -e;
>      }
> +    mark_postcopy_downtime_end((uint64_t)host);
>  
>      trace_postcopy_place_page(host);
>      return 0;
> diff --git a/migration/trace-events b/migration/trace-events
> index b8f01a2..d338810 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -110,6 +110,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> +downtime_per_cpu(int cpu_index, int64_t downtime) "downtime cpu[%d]=%" PRId64
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -186,7 +189,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -195,6 +198,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-28  9:38       ` Peter Xu
@ 2017-04-28 10:03         ` Alexey Perevalov
  2017-04-28 10:07           ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28 10:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On 04/28/2017 12:38 PM, Peter Xu wrote:
> On Fri, Apr 28, 2017 at 09:57:36AM +0300, Alexey Perevalov wrote:
>> This patch add request to kernel space for UFFD_FEATURE_THREAD_ID,
>> in case when this feature is provided by kernel.
>>
>> DowntimeContext is incapsulated inside migration.c.
>>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>>   include/migration/migration.h | 12 ++++++++++++
>>   migration/migration.c         | 33 +++++++++++++++++++++++++++++++++
>>   migration/postcopy-ram.c      |  8 ++++++++
>>   3 files changed, 53 insertions(+)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index ba1a16c..e8fb68f 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -83,6 +83,8 @@ typedef enum {
>>       POSTCOPY_INCOMING_END
>>   } PostcopyState;
>>   
>> +struct DowntimeContext;
> Nit: shall we embed something like "Postcopy" (or short form) into
> this struct name? Since the whole thing is really tailored for
> postcopy, only.
Yes, that is postcopy only structure, so maybe PostcopyDowntimeContext
is more readable.

>
>> +
>>   /* State for the incoming migration */
>>   struct MigrationIncomingState {
>>       QEMUFile *from_src_file;
>> @@ -123,10 +125,20 @@ struct MigrationIncomingState {
>>   
>>       /* See savevm.c */
>>       LoadStateEntry_Head loadvm_handlers;
>> +
>> +    /*
>> +     * DowntimeContext to keep information for postcopy
>> +     * live migration, to calculate downtime
>> +     * */
>> +    struct DowntimeContext *downtime_ctx;
>>   };
>>   
>>   MigrationIncomingState *migration_incoming_get_current(void);
>>   void migration_incoming_state_destroy(void);
>> +/*
>> + * Functions to work with downtime context
>> + */
>> +struct DowntimeContext *downtime_context_new(void);
>>   
>>   struct MigrationState
>>   {
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 569a7f6..ec76e5c 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -77,6 +77,18 @@ static NotifierList migration_state_notifiers =
>>   
>>   static bool deferred_incoming;
>>   
>> +typedef struct DowntimeContext {
>> +    /* time when page fault initiated per vCPU */
>> +    int64_t *page_fault_vcpu_time;
>> +    /* page address per vCPU */
>> +    uint64_t *vcpu_addr;
>> +    int64_t total_downtime;
>> +    /* downtime per vCPU */
>> +    int64_t *vcpu_downtime;
>> +    /* point in time when last page fault was initiated */
>> +    int64_t last_begin;
>> +} DowntimeContext;
>> +
>>   /*
>>    * Current state of incoming postcopy; note this is not part of
>>    * MigrationIncomingState since it's state is used during cleanup
>> @@ -116,6 +128,23 @@ MigrationState *migrate_get_current(void)
>>       return &current_migration;
>>   }
>>   
>> +struct DowntimeContext *downtime_context_new(void)
>> +{
>> +    DowntimeContext *ctx = g_new0(DowntimeContext, 1);
>> +    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
>> +    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
>> +    ctx->vcpu_downtime = g_new0(int64_t, smp_cpus);
>> +    return ctx;
>> +}
>> +
>> +static void destroy_downtime_context(struct DowntimeContext *ctx)
>> +{
>> +    g_free(ctx->page_fault_vcpu_time);
>> +    g_free(ctx->vcpu_addr);
>> +    g_free(ctx->vcpu_downtime);
>> +    g_free(ctx);
>> +}
>> +
>>   MigrationIncomingState *migration_incoming_get_current(void)
>>   {
>>       static bool once;
>> @@ -138,6 +167,10 @@ void migration_incoming_state_destroy(void)
>>   
>>       qemu_event_destroy(&mis->main_thread_load_event);
>>       loadvm_free_handlers(mis);
>> +    if (mis->downtime_ctx) {
>> +        destroy_downtime_context(mis->downtime_ctx);
>> +        mis->downtime_ctx = NULL;
>> +    }
>>   }
>>   
>>   
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index 21e7150..f3688f5 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>>           return false;
>>       }
>>   
>> +#ifdef UFFD_FEATURE_THREAD_ID
>> +    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
>> +        /* kernel supports that feature */
>> +        mis->downtime_ctx = downtime_context_new();
>> +        new_features |= UFFD_FEATURE_THREAD_ID;
> So here I know why in patch 2 new_features == 0...
>
> If I were you, I would like the series be done in below 4 patches:
>
> 1. update header
> 2. introduce THREAD_ID feature, and enable it conditionally
> 3. squash all the downtime thing (downtime context, calculation) in
>     one patch here
> 4. introduce trace
>
> IMHO that's clearer and easier for review. But I'm okay with current
> as well as long as the maintainers (Dave/Juan) won't disagree. :)
In previous series, David asked me to split one patch into 2
[Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature 
support

 >There seem to be two parts to this:
 >  a) Adding the mis parameter to ufd_version_check
 >  b) Asking for the feature

 >Please split it into two patches.

So in current patch set, I also added re-factoring, which was missed before
"migration: split ufd_version_check onto receive/request features part"

>
> Thanks,
>
>> +    }
>> +#endif
>> +
>>       /* request features */
>>       if (new_features && !request_ufd_features(ufd, new_features)) {
>>           error_report("ufd_version_check failed: features %" PRIu64,
>> -- 
>> 1.9.1
>>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-28 10:03         ` Alexey Perevalov
@ 2017-04-28 10:07           ` Peter Xu
  2017-04-28 16:22             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-04-28 10:07 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:

[...]

> >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >>index 21e7150..f3688f5 100644
> >>--- a/migration/postcopy-ram.c
> >>+++ b/migration/postcopy-ram.c
> >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> >>          return false;
> >>      }
> >>+#ifdef UFFD_FEATURE_THREAD_ID
> >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> >>+        /* kernel supports that feature */
> >>+        mis->downtime_ctx = downtime_context_new();
> >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> >So here I know why in patch 2 new_features == 0...
> >
> >If I were you, I would like the series be done in below 4 patches:
> >
> >1. update header
> >2. introduce THREAD_ID feature, and enable it conditionally
> >3. squash all the downtime thing (downtime context, calculation) in
> >    one patch here
> >4. introduce trace
> >
> >IMHO that's clearer and easier for review. But I'm okay with current
> >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> In previous series, David asked me to split one patch into 2
> [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> support
> 
> >There seem to be two parts to this:
> >  a) Adding the mis parameter to ufd_version_check
> >  b) Asking for the feature
> 
> >Please split it into two patches.
> 
> So in current patch set, I also added re-factoring, which was missed before
> "migration: split ufd_version_check onto receive/request features part"

Sure. As long as Dave agrees, I'm okay with either way.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part
  2017-04-28  9:01       ` Peter Xu
@ 2017-04-28 10:58         ` Alexey Perevalov
  2017-04-28 12:57           ` Alexey Perevalov
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28 10:58 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On 04/28/2017 12:01 PM, Peter Xu wrote:
> On Fri, Apr 28, 2017 at 09:57:35AM +0300, Alexey Perevalov wrote:
>> This modification is necessary for userfault fd features which are
>> required to be requested from userspace.
>> UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
>> be introduced in the next patch.
>>
>> QEMU need to use separate userfault file descriptor, due to
>> userfault context has internal state, and after first call of
>> ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
>> success), but
>> kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So
>> only one ioctl with UFFD_API is possible per ufd.
>>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>>   migration/postcopy-ram.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 63 insertions(+), 5 deletions(-)
>>
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index 4c859b4..21e7150 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -60,15 +60,51 @@ struct PostcopyDiscardState {
>>   #include <sys/eventfd.h>
>>   #include <linux/userfaultfd.h>
>>   
>> -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>> +
>> +/*
>> + * Check userfault fd features, to request only supported features in
>> + * future.
>> + * __NR_userfaultfd - should be checked before
>> + * Return obtained features
>> + */
>> +static bool receive_ufd_features(__u64 *features)
>>   {
>> -    struct uffdio_api api_struct;
>> -    uint64_t ioctl_mask;
>> +    struct uffdio_api api_struct = {0};
>> +    int ufd;
>> +    bool ret = true;
>>   
>> +    /* if we are here __NR_userfaultfd should exists */
>> +    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
>> +    if (ufd == -1) {
> This check should be <0 rather than -1?
right, kernel could return any type of error,
if (error < 0)
     return error;

>
>> +        return false;
>> +    }
>> +
>> +    /* ask features */
>>       api_struct.api = UFFD_API;
>>       api_struct.features = 0;
>>       if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>> -        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
>> +        error_report("receive_ufd_features: UFFDIO_API failed: %s",
>> +                strerror(errno));
>> +        ret = false;
>> +        goto release_ufd;
>> +    }
>> +
>> +    *features = api_struct.features;
>> +
>> +release_ufd:
>> +    close(ufd);
>> +    return ret;
>> +}
>> +
>> +static bool request_ufd_features(int ufd, __u64 features)
>> +{
>> +    struct uffdio_api api_struct = {0};
>> +    uint64_t ioctl_mask;
>> +
>> +    api_struct.api = UFFD_API;
>> +    api_struct.features = features;
>> +    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>> +        error_report("request_ufd_features: UFFDIO_API failed: %s",
>>                        strerror(errno));
>>           return false;
>>       }
>> @@ -81,11 +117,33 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>>           return false;
>>       }
>>   
>> +    return true;
>> +}
>> +
>> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> This is not only a check not... It enables something in the kernel. So
> I'll suggest change the function name correspondingly.
yes, after that small changes, the meaning of the function has changed
maybe it's ufd_assign_and_check_features
>
>> +{
>> +    __u64 new_features = 0;
>> +
>> +    /* ask features */
>> +    __u64 supported_features;
>> +
>> +    if (!receive_ufd_features(&supported_features)) {
>> +        error_report("ufd_version_check failed");
>> +        return false;
>> +    }
>> +
>> +    /* request features */
>> +    if (new_features && !request_ufd_features(ufd, new_features)) {
> Firstly, looks like new_features == 0 here always, no?
I will use it in next patch.
>
> Second, I would suggest we enable feature explicitly. For this series,
> it's only for the THREAD_ID thing. I would mask the rest. The problem
> is, what if new features introduced in the future that we don't really
> want to enable for postcopy?
right now I think to rename new_features to enabled_features
or features_to_request,
if we don't want to enable feature - don't set according bit in 
enabled_features

>
> Thanks,
>
>> +        error_report("ufd_version_check failed: features %" PRIu64,
>> +                (uint64_t)new_features);
>> +        return false;
>> +    }
>> +
>>       if (getpagesize() != ram_pagesize_summary()) {
>>           bool have_hp = false;
>>           /* We've got a huge page */
>>   #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
>> -        have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
>> +        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
>>   #endif
>>           if (!have_hp) {
>>               error_report("Userfault on this host does not support huge pages");
>> -- 
>> 1.9.1
>>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-04-28 10:00       ` Peter Xu
@ 2017-04-28 11:11         ` Alexey Perevalov
  2017-05-08  6:29           ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28 11:11 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On 04/28/2017 01:00 PM, Peter Xu wrote:
> On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
>> This patch provides downtime calculation per vCPU,
>> as a summary and as a overlapped value for all vCPUs.
>>
>> This approach was suggested by Peter Xu, as an improvements of
>> previous approch where QEMU kept tree with faulted page address and cpus bitmask
>> in it. Now QEMU is keeping array with faulted page address as value and vCPU
>> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
>> list for downtime per vCPU (could be traced with page_fault_addr)
>>
>> For more details see comments for get_postcopy_total_downtime
>> implementation.
>>
>> Downtime will not calculated if postcopy_downtime field of
>> MigrationIncomingState wasn't initialized.
>>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>>   include/migration/migration.h |   3 ++
>>   migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
>>   migration/postcopy-ram.c      |  20 +++++++-
>>   migration/trace-events        |   6 ++-
>>   4 files changed, 130 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index e8fb68f..a22f9ce 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
>>    * Functions to work with downtime context
>>    */
>>   struct DowntimeContext *downtime_context_new(void);
>> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
>> +void mark_postcopy_downtime_end(uint64_t addr);
>> +uint64_t get_postcopy_total_downtime(void);
>>   
>>   struct MigrationState
>>   {
>> diff --git a/migration/migration.c b/migration/migration.c
>> index ec76e5c..2c6f150 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
>>       return atomic_xchg(&incoming_postcopy_state, new_state);
>>   }
>>   
>> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
>> +{
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    DowntimeContext *dc;
>> +    if (!mis->downtime_ctx || cpu < 0) {
>> +        return;
>> +    }
>> +    dc = mis->downtime_ctx;
>> +    dc->vcpu_addr[cpu] = addr;
>> +    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
>> +        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +
>> +    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
>> +            cpu);
>> +}
>> +
>> +void mark_postcopy_downtime_end(uint64_t addr)
>> +{
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    DowntimeContext *dc;
>> +    int i;
>> +    bool all_vcpu_down = true;
>> +    int64_t now;
>> +
>> +    if (!mis->downtime_ctx) {
>> +        return;
>> +    }
>> +    dc = mis->downtime_ctx;
>> +    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +
>> +    /* check all vCPU down,
>> +     * QEMU has bitmap.h, but even with bitmap_and
>> +     * will be a cycle */
>> +    for (i = 0; i < smp_cpus; i++) {
>> +        if (dc->vcpu_addr[i]) {
>> +            continue;
>> +        }
>> +        all_vcpu_down = false;
>> +        break;
>> +    }
>> +
>> +    if (all_vcpu_down) {
>> +        dc->total_downtime += now - dc->last_begin;
> Shall we do this accouting only if we are sure the copied page address
> is one of the page faulted addresses? Can it be some other page? I
> don't know. But since we have the loop below to make sure of it, why
> not?
no, the downtime implies since page fault till the
page will be copied.
Yes another pages could be copied as well as pagefaulted,
and they are copied due to prefetching, but it's not a downtime.

> A nitpick on perf: when there are lots of vcpus, the algo might be
> slow since we have several places that loops over the smp_vcpus. But
> this can be totally future work on top, and current way is good enough
> at least for me.

> (for the nit: maybe add a hash, key=thread_id, value=cpu_index, then
>   get_mem_fault_cpu_index() can be faster using the hash; meanwhile
>   keep a counter A of page faulted vcpus, use atomic ops with it, then
>   here all_vcpu_down can be checked by A == smp_vcpus)
just binary search in get_mem_fault_cpu_index will be nice too )
also, it's good idea to keep all_vcpu_down in PostcopyDowntimeContext.

>
> Thanks,
>
>> +    }
>> +
>> +    /* lookup cpu, to clear it */
>> +    for (i = 0; i < smp_cpus; i++) {
>> +        uint64_t vcpu_downtime;
>> +
>> +        if (dc->vcpu_addr[i] != addr) {
>> +            continue;
>> +        }
>> +
>> +        vcpu_downtime = now - dc->page_fault_vcpu_time[i];
>> +
>> +        dc->vcpu_addr[i] = 0;
>> +        dc->vcpu_downtime[i] += vcpu_downtime;
>> +    }
>> +
>> +    trace_mark_postcopy_downtime_end(addr, dc, dc->total_downtime);
>> +}
>> +
>> +/*
>> + * This function just provide calculated before downtime per cpu and trace it.
>> + * Total downtime is calculated in mark_postcopy_downtime_end.
>> + *
>> + *
>> + * Assume we have 3 CPU
>> + *
>> + *      S1        E1           S1               E1
>> + * -----***********------------xxx***************------------------------> CPU1
>> + *
>> + *             S2                E2
>> + * ------------****************xxx---------------------------------------> CPU2
>> + *
>> + *                         S3            E3
>> + * ------------------------****xxx********-------------------------------> CPU3
>> + *
>> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
>> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
>> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
>> + *            it's a part of total downtime.
>> + * S1 - here is last_begin
>> + * Legend of the picture is following:
>> + *              * - means downtime per vCPU
>> + *              x - means overlapped downtime (total downtime)
>> + */
>> +uint64_t get_postcopy_total_downtime(void)
>> +{
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +
>> +    if (!mis->downtime_ctx) {
>> +        return 0;
>> +    }
>> +
>> +    if (trace_event_get_state(TRACE_DOWNTIME_PER_CPU)) {
>> +        int i;
>> +        for (i = 0; i < smp_cpus; i++) {
>> +            trace_downtime_per_cpu(i, mis->downtime_ctx->vcpu_downtime[i]);
>> +        }
>> +    }
>> +    return mis->downtime_ctx->total_downtime;
>> +}
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index f3688f5..cf2b935 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -23,6 +23,7 @@
>>   #include "migration/postcopy-ram.h"
>>   #include "sysemu/sysemu.h"
>>   #include "sysemu/balloon.h"
>> +#include <sys/param.h>
>>   #include "qemu/error-report.h"
>>   #include "trace.h"
>>   
>> @@ -468,6 +469,19 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>>       return 0;
>>   }
>>   
>> +static int get_mem_fault_cpu_index(uint32_t pid)
>> +{
>> +    CPUState *cpu_iter;
>> +
>> +    CPU_FOREACH(cpu_iter) {
>> +        if (cpu_iter->thread_id == pid) {
>> +            return cpu_iter->cpu_index;
>> +        }
>> +    }
>> +    trace_get_mem_fault_cpu_index(pid);
>> +    return -1;
>> +}
>> +
>>   /*
>>    * Handle faults detected by the USERFAULT markings
>>    */
>> @@ -545,8 +559,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>>           rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>>           trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>>                                                   qemu_ram_get_idstr(rb),
>> -                                                rb_offset);
>> +                                                rb_offset,
>> +                                                msg.arg.pagefault.feat.ptid);
>>   
>> +        mark_postcopy_downtime_begin((uintptr_t)(msg.arg.pagefault.address),
>> +                         get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
>>           /*
>>            * Send the request to the source - we want to request one
>>            * of our host page sizes (which is >= TPS)
>> @@ -641,6 +658,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>>   
>>           return -e;
>>       }
>> +    mark_postcopy_downtime_end((uint64_t)host);
>>   
>>       trace_postcopy_place_page(host);
>>       return 0;
>> diff --git a/migration/trace-events b/migration/trace-events
>> index b8f01a2..d338810 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -110,6 +110,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>>   process_incoming_migration_co_postcopy_end_main(void) ""
>>   migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>>   migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
>> +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
>> +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
>> +downtime_per_cpu(int cpu_index, int64_t downtime) "downtime cpu[%d]=%" PRId64
>>   
>>   # migration/rdma.c
>>   qemu_rdma_accept_incoming_migration(void) ""
>> @@ -186,7 +189,7 @@ postcopy_ram_enable_notify(void) ""
>>   postcopy_ram_fault_thread_entry(void) ""
>>   postcopy_ram_fault_thread_exit(void) ""
>>   postcopy_ram_fault_thread_quit(void) ""
>> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
>> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
>>   postcopy_ram_incoming_cleanup_closeuf(void) ""
>>   postcopy_ram_incoming_cleanup_entry(void) ""
>>   postcopy_ram_incoming_cleanup_exit(void) ""
>> @@ -195,6 +198,7 @@ save_xbzrle_page_skipping(void) ""
>>   save_xbzrle_page_overflow(void) ""
>>   ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>>   ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
>> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>>   
>>   # migration/exec.c
>>   migration_exec_outgoing(const char *cmd) "cmd=%s"
>> -- 
>> 1.9.1
>>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part
  2017-04-28 10:58         ` Alexey Perevalov
@ 2017-04-28 12:57           ` Alexey Perevalov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey Perevalov @ 2017-04-28 12:57 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On 04/28/2017 01:58 PM, Alexey Perevalov wrote:
> On 04/28/2017 12:01 PM, Peter Xu wrote:
>> On Fri, Apr 28, 2017 at 09:57:35AM +0300, Alexey Perevalov wrote:
>>> This modification is necessary for userfault fd features which are
>>> required to be requested from userspace.
>>> UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
>>> be introduced in the next patch.
>>>
>>> QEMU need to use separate userfault file descriptor, due to
>>> userfault context has internal state, and after first call of
>>> ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
>>> success), but
>>> kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So
>>> only one ioctl with UFFD_API is possible per ufd.
>>>
>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>> ---
>>>   migration/postcopy-ram.c | 68 
>>> ++++++++++++++++++++++++++++++++++++++++++++----
>>>   1 file changed, 63 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>> index 4c859b4..21e7150 100644
>>> --- a/migration/postcopy-ram.c
>>> +++ b/migration/postcopy-ram.c
>>> @@ -60,15 +60,51 @@ struct PostcopyDiscardState {
>>>   #include <sys/eventfd.h>
>>>   #include <linux/userfaultfd.h>
>>>   -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>>> +
>>> +/*
>>> + * Check userfault fd features, to request only supported features in
>>> + * future.
>>> + * __NR_userfaultfd - should be checked before
>>> + * Return obtained features
>>> + */
>>> +static bool receive_ufd_features(__u64 *features)
>>>   {
>>> -    struct uffdio_api api_struct;
>>> -    uint64_t ioctl_mask;
>>> +    struct uffdio_api api_struct = {0};
>>> +    int ufd;
>>> +    bool ret = true;
>>>   +    /* if we are here __NR_userfaultfd should exists */
>>> +    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
>>> +    if (ufd == -1) {
>> This check should be <0 rather than -1?
> right, kernel could return any type of error,
> if (error < 0)
>     return error;
sorry, I was wrong, -1 it's general contract for syscall and error code 
in errno.

>
>>
>>> +        return false;
>>> +    }
>>> +
>>> +    /* ask features */
>>>       api_struct.api = UFFD_API;
>>>       api_struct.features = 0;
>>>       if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>>> -        error_report("postcopy_ram_supported_by_host: UFFDIO_API 
>>> failed: %s",
>>> +        error_report("receive_ufd_features: UFFDIO_API failed: %s",
>>> +                strerror(errno));
>>> +        ret = false;
>>> +        goto release_ufd;
>>> +    }
>>> +
>>> +    *features = api_struct.features;
>>> +
>>> +release_ufd:
>>> +    close(ufd);
>>> +    return ret;
>>> +}
>>> +
>>> +static bool request_ufd_features(int ufd, __u64 features)
>>> +{
>>> +    struct uffdio_api api_struct = {0};
>>> +    uint64_t ioctl_mask;
>>> +
>>> +    api_struct.api = UFFD_API;
>>> +    api_struct.features = features;
>>> +    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>>> +        error_report("request_ufd_features: UFFDIO_API failed: %s",
>>>                        strerror(errno));
>>>           return false;
>>>       }
>>> @@ -81,11 +117,33 @@ static bool ufd_version_check(int ufd, 
>>> MigrationIncomingState *mis)
>>>           return false;
>>>       }
>>>   +    return true;
>>> +}
>>> +
>>> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>> This is not only a check not... It enables something in the kernel. So
>> I'll suggest change the function name correspondingly.
> yes, after that small changes, the meaning of the function has changed
> maybe it's ufd_assign_and_check_features
>>
>>> +{
>>> +    __u64 new_features = 0;
>>> +
>>> +    /* ask features */
>>> +    __u64 supported_features;
>>> +
>>> +    if (!receive_ufd_features(&supported_features)) {
>>> +        error_report("ufd_version_check failed");
>>> +        return false;
>>> +    }
>>> +
>>> +    /* request features */
>>> +    if (new_features && !request_ufd_features(ufd, new_features)) {
>> Firstly, looks like new_features == 0 here always, no?
> I will use it in next patch.
>>
>> Second, I would suggest we enable feature explicitly. For this series,
>> it's only for the THREAD_ID thing. I would mask the rest. The problem
>> is, what if new features introduced in the future that we don't really
>> want to enable for postcopy?
> right now I think to rename new_features to enabled_features
> or features_to_request,
> if we don't want to enable feature - don't set according bit in 
> enabled_features
>
>>
>> Thanks,
>>
>>> +        error_report("ufd_version_check failed: features %" PRIu64,
>>> +                (uint64_t)new_features);
>>> +        return false;
>>> +    }
>>> +
>>>       if (getpagesize() != ram_pagesize_summary()) {
>>>           bool have_hp = false;
>>>           /* We've got a huge page */
>>>   #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
>>> -        have_hp = api_struct.features & 
>>> UFFD_FEATURE_MISSING_HUGETLBFS;
>>> +        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
>>>   #endif
>>>           if (!have_hp) {
>>>               error_report("Userfault on this host does not support 
>>> huge pages");
>>> -- 
>>> 1.9.1
>>>
>
>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part
  2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
  2017-04-28  9:01       ` Peter Xu
@ 2017-04-28 15:55       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-28 15:55 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, f4bug, peterx

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This modification is necessary for userfault fd features which are
> required to be requested from userspace.
> UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
> be introduced in the next patch.
> 
> QEMU need to use separate userfault file descriptor, due to
> userfault context has internal state, and after first call of
> ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
> success), but
> kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So
> only one ioctl with UFFD_API is possible per ufd.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 4c859b4..21e7150 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -60,15 +60,51 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> +
> +/*
> + * Check userfault fd features, to request only supported features in
> + * future.
> + * __NR_userfaultfd - should be checked before
> + * Return obtained features

Well, it returns true on success I think, sets *features

> + */
> +static bool receive_ufd_features(__u64 *features)
>  {
> -    struct uffdio_api api_struct;
> -    uint64_t ioctl_mask;
> +    struct uffdio_api api_struct = {0};
> +    int ufd;
> +    bool ret = true;
>  
> +    /* if we are here __NR_userfaultfd should exists */
> +    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +    if (ufd == -1) {

error_report

> +        return false;
> +    }
> +
> +    /* ask features */
>      api_struct.api = UFFD_API;
>      api_struct.features = 0;
>      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> -        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> +        error_report("receive_ufd_features: UFFDIO_API failed: %s",
> +                strerror(errno));

I've tended to use "%s: .....", __func__   - it avoids having to rename
things later.

> +        ret = false;
> +        goto release_ufd;
> +    }
> +
> +    *features = api_struct.features;
> +
> +release_ufd:
> +    close(ufd);
> +    return ret;
> +}
> +
> +static bool request_ufd_features(int ufd, __u64 features)
> +{
> +    struct uffdio_api api_struct = {0};
> +    uint64_t ioctl_mask;
> +
> +    api_struct.api = UFFD_API;
> +    api_struct.features = features;
> +    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> +        error_report("request_ufd_features: UFFDIO_API failed: %s",
>                       strerror(errno));
>          return false;
>      }
> @@ -81,11 +117,33 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>          return false;
>      }
>  
> +    return true;
> +}
> +
> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> +{
> +    __u64 new_features = 0;

Minor point; uint64_t in all qemu code please.

> +    /* ask features */
> +    __u64 supported_features;
> +
> +    if (!receive_ufd_features(&supported_features)) {
> +        error_report("ufd_version_check failed");

Say what failed!

> +        return false;
> +    }
> +
> +    /* request features */
> +    if (new_features && !request_ufd_features(ufd, new_features)) {
> +        error_report("ufd_version_check failed: features %" PRIu64,
> +                (uint64_t)new_features);
> +        return false;
> +    }
> +
>      if (getpagesize() != ram_pagesize_summary()) {
>          bool have_hp = false;
>          /* We've got a huge page */
>  #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
> -        have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
> +        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
>  #endif
>          if (!have_hp) {
>              error_report("Userfault on this host does not support huge pages");
> -- 
> 1.9.1

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-28 10:07           ` Peter Xu
@ 2017-04-28 16:22             ` Dr. David Alan Gilbert
  2017-04-29  9:16               ` Alexey
  0 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-28 16:22 UTC (permalink / raw)
  To: Peter Xu; +Cc: Alexey Perevalov, qemu-devel, i.maximets, f4bug

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:
> 
> [...]
> 
> > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > >>index 21e7150..f3688f5 100644
> > >>--- a/migration/postcopy-ram.c
> > >>+++ b/migration/postcopy-ram.c
> > >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > >>          return false;
> > >>      }
> > >>+#ifdef UFFD_FEATURE_THREAD_ID
> > >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> > >>+        /* kernel supports that feature */
> > >>+        mis->downtime_ctx = downtime_context_new();
> > >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> > >So here I know why in patch 2 new_features == 0...
> > >
> > >If I were you, I would like the series be done in below 4 patches:
> > >
> > >1. update header
> > >2. introduce THREAD_ID feature, and enable it conditionally
> > >3. squash all the downtime thing (downtime context, calculation) in
> > >    one patch here
> > >4. introduce trace
> > >
> > >IMHO that's clearer and easier for review. But I'm okay with current
> > >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> > In previous series, David asked me to split one patch into 2
> > [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> > support
> > 
> > >There seem to be two parts to this:
> > >  a) Adding the mis parameter to ufd_version_check
> > >  b) Asking for the feature
> > 
> > >Please split it into two patches.
> > 
> > So in current patch set, I also added re-factoring, which was missed before
> > "migration: split ufd_version_check onto receive/request features part"
> 
> Sure. As long as Dave agrees, I'm okay with either way.

I'm OK with the split, it pretty much matches what I asked last time I think.

The question I still have is how is this memory-expensive feature turned
on and off by the user?
Also I think Peter had some ideas for simpler data structures, how did
that play out?

Dave


> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side Alexey Perevalov
  2017-04-28 10:00       ` Peter Xu
@ 2017-04-28 16:34       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-28 16:34 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, f4bug, peterx

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch provides downtime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for downtime per vCPU (could be traced with page_fault_addr)
> 
> For more details see comments for get_postcopy_total_downtime
> implementation.
> 
> Downtime will not calculated if postcopy_downtime field of
> MigrationIncomingState wasn't initialized.

To partly answer my last email, ah I see you switched to Peter's structure.

> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/migration/migration.h |   3 ++
>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
>  migration/postcopy-ram.c      |  20 +++++++-
>  migration/trace-events        |   6 ++-
>  4 files changed, 130 insertions(+), 2 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index e8fb68f..a22f9ce 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
>   * Functions to work with downtime context
>   */
>  struct DowntimeContext *downtime_context_new(void);
> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> +void mark_postcopy_downtime_end(uint64_t addr);
> +uint64_t get_postcopy_total_downtime(void);
>  
>  struct MigrationState
>  {
> diff --git a/migration/migration.c b/migration/migration.c
> index ec76e5c..2c6f150 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
>      return atomic_xchg(&incoming_postcopy_state, new_state);
>  }
>  
> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    DowntimeContext *dc;
> +    if (!mis->downtime_ctx || cpu < 0) {
> +        return;
> +    }
> +    dc = mis->downtime_ctx;
> +    dc->vcpu_addr[cpu] = addr;
> +    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> +        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> +            cpu);
> +}
> +
> +void mark_postcopy_downtime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    DowntimeContext *dc;
> +    int i;
> +    bool all_vcpu_down = true;
> +    int64_t now;
> +
> +    if (!mis->downtime_ctx) {
> +        return;
> +    }
> +    dc = mis->downtime_ctx;
> +    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    /* check all vCPU down,
> +     * QEMU has bitmap.h, but even with bitmap_and
> +     * will be a cycle */
> +    for (i = 0; i < smp_cpus; i++) {
> +        if (dc->vcpu_addr[i]) {
> +            continue;
> +        }
> +        all_vcpu_down = false;
> +        break;
> +    }
> +
> +    if (all_vcpu_down) {
> +        dc->total_downtime += now - dc->last_begin;
> +    }
> +
> +    /* lookup cpu, to clear it */
> +    for (i = 0; i < smp_cpus; i++) {
> +        uint64_t vcpu_downtime;
> +
> +        if (dc->vcpu_addr[i] != addr) {
> +            continue;
> +        }
> +
> +        vcpu_downtime = now - dc->page_fault_vcpu_time[i];
> +
> +        dc->vcpu_addr[i] = 0;
> +        dc->vcpu_downtime[i] += vcpu_downtime;
> +    }
> +
> +    trace_mark_postcopy_downtime_end(addr, dc, dc->total_downtime);
> +}

I don't think this is thread safe.
postcopy_downtime_begin is called from the fault thread.
postcopy_downtime_end is called from the listener thread; they can happen
at about the same time.

Dave

> +/*
> + * This function just provide calculated before downtime per cpu and trace it.
> + * Total downtime is calculated in mark_postcopy_downtime_end.
> + *
> + *
> + * Assume we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> + *            it's a part of total downtime.
> + * S1 - here is last_begin
> + * Legend of the picture is following:
> + *              * - means downtime per vCPU
> + *              x - means overlapped downtime (total downtime)
> + */
> +uint64_t get_postcopy_total_downtime(void)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    if (!mis->downtime_ctx) {
> +        return 0;
> +    }
> +
> +    if (trace_event_get_state(TRACE_DOWNTIME_PER_CPU)) {
> +        int i;
> +        for (i = 0; i < smp_cpus; i++) {
> +            trace_downtime_per_cpu(i, mis->downtime_ctx->vcpu_downtime[i]);
> +        }
> +    }
> +    return mis->downtime_ctx->total_downtime;
> +}
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index f3688f5..cf2b935 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -23,6 +23,7 @@
>  #include "migration/postcopy-ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include <sys/param.h>
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -468,6 +469,19 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid) {
> +            return cpu_iter->cpu_index;
> +        }
> +    }
> +    trace_get_mem_fault_cpu_index(pid);
> +    return -1;
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -545,8 +559,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset,
> +                                                msg.arg.pagefault.feat.ptid);
>  
> +        mark_postcopy_downtime_begin((uintptr_t)(msg.arg.pagefault.address),
> +                         get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -641,6 +658,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  
>          return -e;
>      }
> +    mark_postcopy_downtime_end((uint64_t)host);
>  
>      trace_postcopy_place_page(host);
>      return 0;
> diff --git a/migration/trace-events b/migration/trace-events
> index b8f01a2..d338810 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -110,6 +110,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> +downtime_per_cpu(int cpu_index, int64_t downtime) "downtime cpu[%d]=%" PRId64
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -186,7 +189,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -195,6 +198,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 1.9.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-28 16:22             ` Dr. David Alan Gilbert
@ 2017-04-29  9:16               ` Alexey
  2017-04-29 15:02                 ` Eric Blake
  2017-05-02  8:51                 ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 39+ messages in thread
From: Alexey @ 2017-04-29  9:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Peter Xu, i.maximets, f4bug, qemu-devel

On Fri, Apr 28, 2017 at 05:22:05PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:
> > 
> > [...]
> > 
> > > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > >>index 21e7150..f3688f5 100644
> > > >>--- a/migration/postcopy-ram.c
> > > >>+++ b/migration/postcopy-ram.c
> > > >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > >>          return false;
> > > >>      }
> > > >>+#ifdef UFFD_FEATURE_THREAD_ID
> > > >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> > > >>+        /* kernel supports that feature */
> > > >>+        mis->downtime_ctx = downtime_context_new();
> > > >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> > > >So here I know why in patch 2 new_features == 0...
> > > >
> > > >If I were you, I would like the series be done in below 4 patches:
> > > >
> > > >1. update header
> > > >2. introduce THREAD_ID feature, and enable it conditionally
> > > >3. squash all the downtime thing (downtime context, calculation) in
> > > >    one patch here
> > > >4. introduce trace
> > > >
> > > >IMHO that's clearer and easier for review. But I'm okay with current
> > > >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> > > In previous series, David asked me to split one patch into 2
> > > [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> > > support
> > > 
> > > >There seem to be two parts to this:
> > > >  a) Adding the mis parameter to ufd_version_check
> > > >  b) Asking for the feature
> > > 
> > > >Please split it into two patches.
> > > 
> > > So in current patch set, I also added re-factoring, which was missed before
> > > "migration: split ufd_version_check onto receive/request features part"
> > 
> > Sure. As long as Dave agrees, I'm okay with either way.
> 
> I'm OK with the split, it pretty much matches what I asked last time I think.
> 
> The question I still have is how is this memory-expensive feature turned
> on and off by the user?
> Also I think Peter had some ideas for simpler data structures, how did
> that play out?
Maybe introduce it as extension of MigrationParameter,
I mean { "execute": "migrate-set-parameters" , "arguments":
	{ "calculate-postcopy-downtime": 1 } }


> 
> Dave
> 
> 
> > -- 
> > Peter Xu
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-29  9:16               ` Alexey
@ 2017-04-29 15:02                 ` Eric Blake
  2017-05-02  8:51                 ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 39+ messages in thread
From: Eric Blake @ 2017-04-29 15:02 UTC (permalink / raw)
  To: Alexey, Dr. David Alan Gilbert; +Cc: i.maximets, f4bug, Peter Xu, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On 04/29/2017 04:16 AM, Alexey wrote:

>>>>
>>>>> There seem to be two parts to this:
>>>>>  a) Adding the mis parameter to ufd_version_check
>>>>>  b) Asking for the feature
>>>>
>>>>> Please split it into two patches.

Also, fix the typo in the subject line: s/Incomming/Incoming/

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-04-29  9:16               ` Alexey
  2017-04-29 15:02                 ` Eric Blake
@ 2017-05-02  8:51                 ` Dr. David Alan Gilbert
  2017-05-04 13:09                   ` Alexey
  1 sibling, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-05-02  8:51 UTC (permalink / raw)
  To: Alexey; +Cc: Peter Xu, i.maximets, f4bug, qemu-devel

* Alexey (a.perevalov@samsung.com) wrote:
> On Fri, Apr 28, 2017 at 05:22:05PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:
> > > 
> > > [...]
> > > 
> > > > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > >>index 21e7150..f3688f5 100644
> > > > >>--- a/migration/postcopy-ram.c
> > > > >>+++ b/migration/postcopy-ram.c
> > > > >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > >>          return false;
> > > > >>      }
> > > > >>+#ifdef UFFD_FEATURE_THREAD_ID
> > > > >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> > > > >>+        /* kernel supports that feature */
> > > > >>+        mis->downtime_ctx = downtime_context_new();
> > > > >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> > > > >So here I know why in patch 2 new_features == 0...
> > > > >
> > > > >If I were you, I would like the series be done in below 4 patches:
> > > > >
> > > > >1. update header
> > > > >2. introduce THREAD_ID feature, and enable it conditionally
> > > > >3. squash all the downtime thing (downtime context, calculation) in
> > > > >    one patch here
> > > > >4. introduce trace
> > > > >
> > > > >IMHO that's clearer and easier for review. But I'm okay with current
> > > > >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> > > > In previous series, David asked me to split one patch into 2
> > > > [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> > > > support
> > > > 
> > > > >There seem to be two parts to this:
> > > > >  a) Adding the mis parameter to ufd_version_check
> > > > >  b) Asking for the feature
> > > > 
> > > > >Please split it into two patches.
> > > > 
> > > > So in current patch set, I also added re-factoring, which was missed before
> > > > "migration: split ufd_version_check onto receive/request features part"
> > > 
> > > Sure. As long as Dave agrees, I'm okay with either way.
> > 
> > I'm OK with the split, it pretty much matches what I asked last time I think.
> > 
> > The question I still have is how is this memory-expensive feature turned
> > on and off by the user?
> > Also I think Peter had some ideas for simpler data structures, how did
> > that play out?
> Maybe introduce it as extension of MigrationParameter,
> I mean { "execute": "migrate-set-parameters" , "arguments":
> 	{ "calculate-postcopy-downtime": 1 } }

Use migrate-set-capabilities, they're effectively the same but just booleans.

Dave

> 
> > 
> > Dave
> > 
> > 
> > > -- 
> > > Peter Xu
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-05-02  8:51                 ` Dr. David Alan Gilbert
@ 2017-05-04 13:09                   ` Alexey
  2017-05-05 14:11                     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey @ 2017-05-04 13:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, f4bug, Peter Xu, qemu-devel

On Tue, May 02, 2017 at 09:51:44AM +0100, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
> > On Fri, Apr 28, 2017 at 05:22:05PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > > >>index 21e7150..f3688f5 100644
> > > > > >>--- a/migration/postcopy-ram.c
> > > > > >>+++ b/migration/postcopy-ram.c
> > > > > >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > > >>          return false;
> > > > > >>      }
> > > > > >>+#ifdef UFFD_FEATURE_THREAD_ID
> > > > > >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> > > > > >>+        /* kernel supports that feature */
> > > > > >>+        mis->downtime_ctx = downtime_context_new();
> > > > > >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> > > > > >So here I know why in patch 2 new_features == 0...
> > > > > >
> > > > > >If I were you, I would like the series be done in below 4 patches:
> > > > > >
> > > > > >1. update header
> > > > > >2. introduce THREAD_ID feature, and enable it conditionally
> > > > > >3. squash all the downtime thing (downtime context, calculation) in
> > > > > >    one patch here
> > > > > >4. introduce trace
> > > > > >
> > > > > >IMHO that's clearer and easier for review. But I'm okay with current
> > > > > >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> > > > > In previous series, David asked me to split one patch into 2
> > > > > [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> > > > > support
> > > > > 
> > > > > >There seem to be two parts to this:
> > > > > >  a) Adding the mis parameter to ufd_version_check
> > > > > >  b) Asking for the feature
> > > > > 
> > > > > >Please split it into two patches.
> > > > > 
> > > > > So in current patch set, I also added re-factoring, which was missed before
> > > > > "migration: split ufd_version_check onto receive/request features part"
> > > > 
> > > > Sure. As long as Dave agrees, I'm okay with either way.
> > > 
> > > I'm OK with the split, it pretty much matches what I asked last time I think.
> > > 
> > > The question I still have is how is this memory-expensive feature turned
> > > on and off by the user?
> > > Also I think Peter had some ideas for simpler data structures, how did
> > > that play out?
> > Maybe introduce it as extension of MigrationParameter,
> > I mean { "execute": "migrate-set-parameters" , "arguments":
> > 	{ "calculate-postcopy-downtime": 1 } }
> 
> Use migrate-set-capabilities, they're effectively the same but just booleans.

For me it's not so clear, where to set that capability, on destination or on source
side. User sets postcopy ram capability on source side, probably on
source side user wants to set postcopy-downtime.
If I'm not wrong, neither capabilities nor parameters are transferring
from source to destination.

I wanted to pass in in MIG_CMD_POSTCOPY_ADVISE, but it holds only 2
uint64, and they are already occupied.
Like with RETURN PATH protocol, MIG couldn't be extended w/o breaking backward
compatibility. Length for cmd is transmitted, but compared with
predefined len from mig_cmd_args.

Maybe just increase QEMU_VM_FILE_VERSION in this case, it will be
possible to return downtime back to source by return path.
For supporting backward compatibility keep several versions of mig_cmd_args
per QEMU_VM_FILE_VERSION. 


> Dave
> 
> > 
> > > 
> > > Dave
> > > 
> > > 
> > > > -- 
> > > > Peter Xu
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > 
> > -- 
> > 
> > BR
> > Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-05-04 13:09                   ` Alexey
@ 2017-05-05 14:11                     ` Dr. David Alan Gilbert
  2017-05-05 16:25                       ` Alexey
  0 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-05-05 14:11 UTC (permalink / raw)
  To: Alexey; +Cc: i.maximets, f4bug, Peter Xu, qemu-devel

* Alexey (a.perevalov@samsung.com) wrote:
> On Tue, May 02, 2017 at 09:51:44AM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey (a.perevalov@samsung.com) wrote:
> > > On Fri, Apr 28, 2017 at 05:22:05PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > > > >>index 21e7150..f3688f5 100644
> > > > > > >>--- a/migration/postcopy-ram.c
> > > > > > >>+++ b/migration/postcopy-ram.c
> > > > > > >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > > > >>          return false;
> > > > > > >>      }
> > > > > > >>+#ifdef UFFD_FEATURE_THREAD_ID
> > > > > > >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> > > > > > >>+        /* kernel supports that feature */
> > > > > > >>+        mis->downtime_ctx = downtime_context_new();
> > > > > > >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> > > > > > >So here I know why in patch 2 new_features == 0...
> > > > > > >
> > > > > > >If I were you, I would like the series be done in below 4 patches:
> > > > > > >
> > > > > > >1. update header
> > > > > > >2. introduce THREAD_ID feature, and enable it conditionally
> > > > > > >3. squash all the downtime thing (downtime context, calculation) in
> > > > > > >    one patch here
> > > > > > >4. introduce trace
> > > > > > >
> > > > > > >IMHO that's clearer and easier for review. But I'm okay with current
> > > > > > >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> > > > > > In previous series, David asked me to split one patch into 2
> > > > > > [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> > > > > > support
> > > > > > 
> > > > > > >There seem to be two parts to this:
> > > > > > >  a) Adding the mis parameter to ufd_version_check
> > > > > > >  b) Asking for the feature
> > > > > > 
> > > > > > >Please split it into two patches.
> > > > > > 
> > > > > > So in current patch set, I also added re-factoring, which was missed before
> > > > > > "migration: split ufd_version_check onto receive/request features part"
> > > > > 
> > > > > Sure. As long as Dave agrees, I'm okay with either way.
> > > > 
> > > > I'm OK with the split, it pretty much matches what I asked last time I think.
> > > > 
> > > > The question I still have is how is this memory-expensive feature turned
> > > > on and off by the user?
> > > > Also I think Peter had some ideas for simpler data structures, how did
> > > > that play out?
> > > Maybe introduce it as extension of MigrationParameter,
> > > I mean { "execute": "migrate-set-parameters" , "arguments":
> > > 	{ "calculate-postcopy-downtime": 1 } }
> > 
> > Use migrate-set-capabilities, they're effectively the same but just booleans.
> 
> For me it's not so clear, where to set that capability, on destination or on source
> side. User sets postcopy ram capability on source side, probably on
> source side user wants to set postcopy-downtime.
> If I'm not wrong, neither capabilities nor parameters are transferring
> from source to destination.

Use a capability on the destination specifically for this; it's OK to set capabilities
on the destination, and actually libvirt already sets some for us.

One question: Now we're using Peter's idea, so you don't have that big tree
structure, what are the costs now - is it as big a problem as it was?

> I wanted to pass in in MIG_CMD_POSTCOPY_ADVISE, but it holds only 2
> uint64, and they are already occupied.
> Like with RETURN PATH protocol, MIG couldn't be extended w/o breaking backward
> compatibility. Length for cmd is transmitted, but compared with
> predefined len from mig_cmd_args.
> 
> Maybe just increase QEMU_VM_FILE_VERSION in this case, it will be
> possible to return downtime back to source by return path.
> For supporting backward compatibility keep several versions of mig_cmd_args
> per QEMU_VM_FILE_VERSION. 

No, we'll only change file version on some massive improvement.

Dave

> 
> > Dave
> > 
> > > 
> > > > 
> > > > Dave
> > > > 
> > > > 
> > > > > -- 
> > > > > Peter Xu
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > 
> > > -- 
> > > 
> > > BR
> > > Alexey
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState
  2017-05-05 14:11                     ` Dr. David Alan Gilbert
@ 2017-05-05 16:25                       ` Alexey
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey @ 2017-05-05 16:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, f4bug, Peter Xu, qemu-devel

On Fri, May 05, 2017 at 03:11:14PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
> > On Tue, May 02, 2017 at 09:51:44AM +0100, Dr. David Alan Gilbert wrote:
> > > * Alexey (a.perevalov@samsung.com) wrote:
> > > > On Fri, Apr 28, 2017 at 05:22:05PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > > On Fri, Apr 28, 2017 at 01:03:45PM +0300, Alexey Perevalov wrote:
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > > > > >>index 21e7150..f3688f5 100644
> > > > > > > >>--- a/migration/postcopy-ram.c
> > > > > > > >>+++ b/migration/postcopy-ram.c
> > > > > > > >>@@ -132,6 +132,14 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > > > > >>          return false;
> > > > > > > >>      }
> > > > > > > >>+#ifdef UFFD_FEATURE_THREAD_ID
> > > > > > > >>+    if (mis && UFFD_FEATURE_THREAD_ID & supported_features) {
> > > > > > > >>+        /* kernel supports that feature */
> > > > > > > >>+        mis->downtime_ctx = downtime_context_new();
> > > > > > > >>+        new_features |= UFFD_FEATURE_THREAD_ID;
> > > > > > > >So here I know why in patch 2 new_features == 0...
> > > > > > > >
> > > > > > > >If I were you, I would like the series be done in below 4 patches:
> > > > > > > >
> > > > > > > >1. update header
> > > > > > > >2. introduce THREAD_ID feature, and enable it conditionally
> > > > > > > >3. squash all the downtime thing (downtime context, calculation) in
> > > > > > > >    one patch here
> > > > > > > >4. introduce trace
> > > > > > > >
> > > > > > > >IMHO that's clearer and easier for review. But I'm okay with current
> > > > > > > >as well as long as the maintainers (Dave/Juan) won't disagree. :)
> > > > > > > In previous series, David asked me to split one patch into 2
> > > > > > > [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature
> > > > > > > support
> > > > > > > 
> > > > > > > >There seem to be two parts to this:
> > > > > > > >  a) Adding the mis parameter to ufd_version_check
> > > > > > > >  b) Asking for the feature
> > > > > > > 
> > > > > > > >Please split it into two patches.
> > > > > > > 
> > > > > > > So in current patch set, I also added re-factoring, which was missed before
> > > > > > > "migration: split ufd_version_check onto receive/request features part"
> > > > > > 
> > > > > > Sure. As long as Dave agrees, I'm okay with either way.
> > > > > 
> > > > > I'm OK with the split, it pretty much matches what I asked last time I think.
> > > > > 
> > > > > The question I still have is how is this memory-expensive feature turned
> > > > > on and off by the user?
> > > > > Also I think Peter had some ideas for simpler data structures, how did
> > > > > that play out?
> > > > Maybe introduce it as extension of MigrationParameter,
> > > > I mean { "execute": "migrate-set-parameters" , "arguments":
> > > > 	{ "calculate-postcopy-downtime": 1 } }
> > > 
> > > Use migrate-set-capabilities, they're effectively the same but just booleans.
> > 
> > For me it's not so clear, where to set that capability, on destination or on source
> > side. User sets postcopy ram capability on source side, probably on
> > source side user wants to set postcopy-downtime.
> > If I'm not wrong, neither capabilities nor parameters are transferring
> > from source to destination.
> 
> Use a capability on the destination specifically for this; it's OK to set capabilities
> on the destination, and actually libvirt already sets some for us.
> 
> One question: Now we're using Peter's idea, so you don't have that big tree
> structure, what are the costs now - is it as big a problem as it was?
It was a tree where key was a page address, so in worst case when we could face
with huge number of pages (Tera bytes of RAM and 4kb page size) that structure was
big, and it consumes a lot of memory, but lookup wasn't so bad due to
tree. Every time in _begin and in _end  logarithm complexity search
processed. Right now O(1) complexity in _begin to fill necessary field,
due to cpu_index is an array index (but need to lookup for cpu_index by
thread_id, see bellow), and O(n) complexity in _end, where n is number of cpus.
Size of PostcopyDowntime context depends just on vCPU number.
It's about vCPU_Number * (array of int64_t for page_fault_vcpu_time + array of
uint64_t for vcpu_addr + array of int64_t  for vcpu_downtime) + int64_t
for last_begin + int for number of vCPC suspended + int64_t for total
downtime.
In case of 2046 vCPU it will be 49124 bytes. Algorithm doesn't depends
on number of pages, but depends on number of vCPU, obviously in common
case that number is lesser.

for recall:
typede PostcopyDowntimeContext {
    /* time when page fault initiated per vCPU */
    int64_t *page_fault_vcpu_time;
    /* page address per vCPU */
    uint64_t *vcpu_addr;
    int64_t total_downtime;
    /* downtime per vCPU */
    int64_t *vcpu_downtime;
    /* point in time when last page fault was initiated */
    int64_t last_begin;
    /* number of vCPU are suspended */
    int smp_cpus_down;
} PostcopyDowntimeContext;


And need to remember about get_mem_fault_cpu_index where QEMU iterates
on cpus every pagefault to lookup cpu_index by process's thread id.
Peter suggested hash there, but I think tree will be enough. Currently
servers with about 2000 of cpus just begin selling. I prepared patch
set, but didn't include tree for lookup in get_mem_fault_cpu_index, not
so clear how to be with it, live time of that lookup tree, should it be
just a part of PostcopyDowntimeContext or general code. BTW similar
lookup (linear) is doing in qemu_get_cpu by cpu_index, so I think it
will be useful to have macros for construct/destruct search tree for
cpus per unique field.


> 
> > I wanted to pass in in MIG_CMD_POSTCOPY_ADVISE, but it holds only 2
> > uint64, and they are already occupied.
> > Like with RETURN PATH protocol, MIG couldn't be extended w/o breaking backward
> > compatibility. Length for cmd is transmitted, but compared with
> > predefined len from mig_cmd_args.
> > 
> > Maybe just increase QEMU_VM_FILE_VERSION in this case, it will be
> > possible to return downtime back to source by return path.
> > For supporting backward compatibility keep several versions of mig_cmd_args
> > per QEMU_VM_FILE_VERSION. 
> 
> No, we'll only change file version on some massive improvement.
> 
> Dave
> 
> > 
> > > Dave
> > > 
> > > > 
> > > > > 
> > > > > Dave
> > > > > 
> > > > > 
> > > > > > -- 
> > > > > > Peter Xu
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > 
> > > > 
> > > > -- 
> > > > 
> > > > BR
> > > > Alexey
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > 
> > -- 
> > 
> > BR
> > Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-04-28 11:11         ` Alexey Perevalov
@ 2017-05-08  6:29           ` Peter Xu
  2017-05-08  9:08             ` Alexey
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-05-08  6:29 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets, f4bug

On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> On 04/28/2017 01:00 PM, Peter Xu wrote:
> >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> >>This patch provides downtime calculation per vCPU,
> >>as a summary and as a overlapped value for all vCPUs.
> >>
> >>This approach was suggested by Peter Xu, as an improvements of
> >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> >>list for downtime per vCPU (could be traced with page_fault_addr)
> >>
> >>For more details see comments for get_postcopy_total_downtime
> >>implementation.
> >>
> >>Downtime will not calculated if postcopy_downtime field of
> >>MigrationIncomingState wasn't initialized.
> >>
> >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>---
> >>  include/migration/migration.h |   3 ++
> >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> >>  migration/postcopy-ram.c      |  20 +++++++-
> >>  migration/trace-events        |   6 ++-
> >>  4 files changed, 130 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> >>index e8fb68f..a22f9ce 100644
> >>--- a/include/migration/migration.h
> >>+++ b/include/migration/migration.h
> >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> >>   * Functions to work with downtime context
> >>   */
> >>  struct DowntimeContext *downtime_context_new(void);
> >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> >>+void mark_postcopy_downtime_end(uint64_t addr);
> >>+uint64_t get_postcopy_total_downtime(void);
> >>  struct MigrationState
> >>  {
> >>diff --git a/migration/migration.c b/migration/migration.c
> >>index ec76e5c..2c6f150 100644
> >>--- a/migration/migration.c
> >>+++ b/migration/migration.c
> >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> >>  }
> >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> >>+{
> >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> >>+    DowntimeContext *dc;
> >>+    if (!mis->downtime_ctx || cpu < 0) {
> >>+        return;
> >>+    }
> >>+    dc = mis->downtime_ctx;
> >>+    dc->vcpu_addr[cpu] = addr;
> >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >>+
> >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> >>+            cpu);
> >>+}
> >>+
> >>+void mark_postcopy_downtime_end(uint64_t addr)
> >>+{
> >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> >>+    DowntimeContext *dc;
> >>+    int i;
> >>+    bool all_vcpu_down = true;
> >>+    int64_t now;
> >>+
> >>+    if (!mis->downtime_ctx) {
> >>+        return;
> >>+    }
> >>+    dc = mis->downtime_ctx;
> >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >>+
> >>+    /* check all vCPU down,
> >>+     * QEMU has bitmap.h, but even with bitmap_and
> >>+     * will be a cycle */
> >>+    for (i = 0; i < smp_cpus; i++) {
> >>+        if (dc->vcpu_addr[i]) {
> >>+            continue;
> >>+        }
> >>+        all_vcpu_down = false;
> >>+        break;
> >>+    }
> >>+
> >>+    if (all_vcpu_down) {
> >>+        dc->total_downtime += now - dc->last_begin;
> >Shall we do this accouting only if we are sure the copied page address
> >is one of the page faulted addresses? Can it be some other page? I
> >don't know. But since we have the loop below to make sure of it, why
> >not?
> no, the downtime implies since page fault till the
> page will be copied.
> Yes another pages could be copied as well as pagefaulted,
> and they are copied due to prefetching, but it's not a downtime.

Not sure I got the point... Do you mean that when reach here, then
this page address is definitely one of the faulted addresses? I am not
100% sure of this, but if you are sure, I am okay with it.

> 
> >A nitpick on perf: when there are lots of vcpus, the algo might be
> >slow since we have several places that loops over the smp_vcpus. But
> >this can be totally future work on top, and current way is good enough
> >at least for me.
> 
> >(for the nit: maybe add a hash, key=thread_id, value=cpu_index, then
> >  get_mem_fault_cpu_index() can be faster using the hash; meanwhile
> >  keep a counter A of page faulted vcpus, use atomic ops with it, then
> >  here all_vcpu_down can be checked by A == smp_vcpus)
> just binary search in get_mem_fault_cpu_index will be nice too )
> also, it's good idea to keep all_vcpu_down in PostcopyDowntimeContext.
> 
> >
> >Thanks,
> >
> >>+    }
> >>+
> >>+    /* lookup cpu, to clear it */
> >>+    for (i = 0; i < smp_cpus; i++) {
> >>+        uint64_t vcpu_downtime;
> >>+
> >>+        if (dc->vcpu_addr[i] != addr) {
> >>+            continue;
> >>+        }
> >>+
> >>+        vcpu_downtime = now - dc->page_fault_vcpu_time[i];
> >>+
> >>+        dc->vcpu_addr[i] = 0;
> >>+        dc->vcpu_downtime[i] += vcpu_downtime;
> >>+    }
> >>+
> >>+    trace_mark_postcopy_downtime_end(addr, dc, dc->total_downtime);
> >>+}
> >>+
> >>+/*
> >>+ * This function just provide calculated before downtime per cpu and trace it.
> >>+ * Total downtime is calculated in mark_postcopy_downtime_end.
> >>+ *
> >>+ *
> >>+ * Assume we have 3 CPU
> >>+ *
> >>+ *      S1        E1           S1               E1
> >>+ * -----***********------------xxx***************------------------------> CPU1
> >>+ *
> >>+ *             S2                E2
> >>+ * ------------****************xxx---------------------------------------> CPU2
> >>+ *
> >>+ *                         S3            E3
> >>+ * ------------------------****xxx********-------------------------------> CPU3
> >>+ *
> >>+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> >>+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> >>+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> >>+ *            it's a part of total downtime.
> >>+ * S1 - here is last_begin
> >>+ * Legend of the picture is following:
> >>+ *              * - means downtime per vCPU
> >>+ *              x - means overlapped downtime (total downtime)
> >>+ */
> >>+uint64_t get_postcopy_total_downtime(void)
> >>+{
> >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> >>+
> >>+    if (!mis->downtime_ctx) {
> >>+        return 0;
> >>+    }
> >>+
> >>+    if (trace_event_get_state(TRACE_DOWNTIME_PER_CPU)) {
> >>+        int i;
> >>+        for (i = 0; i < smp_cpus; i++) {
> >>+            trace_downtime_per_cpu(i, mis->downtime_ctx->vcpu_downtime[i]);
> >>+        }
> >>+    }
> >>+    return mis->downtime_ctx->total_downtime;
> >>+}
> >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >>index f3688f5..cf2b935 100644
> >>--- a/migration/postcopy-ram.c
> >>+++ b/migration/postcopy-ram.c
> >>@@ -23,6 +23,7 @@
> >>  #include "migration/postcopy-ram.h"
> >>  #include "sysemu/sysemu.h"
> >>  #include "sysemu/balloon.h"
> >>+#include <sys/param.h>
> >>  #include "qemu/error-report.h"
> >>  #include "trace.h"
> >>@@ -468,6 +469,19 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> >>      return 0;
> >>  }
> >>+static int get_mem_fault_cpu_index(uint32_t pid)
> >>+{
> >>+    CPUState *cpu_iter;
> >>+
> >>+    CPU_FOREACH(cpu_iter) {
> >>+        if (cpu_iter->thread_id == pid) {
> >>+            return cpu_iter->cpu_index;
> >>+        }
> >>+    }
> >>+    trace_get_mem_fault_cpu_index(pid);
> >>+    return -1;
> >>+}
> >>+
> >>  /*
> >>   * Handle faults detected by the USERFAULT markings
> >>   */
> >>@@ -545,8 +559,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
> >>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
> >>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> >>                                                  qemu_ram_get_idstr(rb),
> >>-                                                rb_offset);
> >>+                                                rb_offset,
> >>+                                                msg.arg.pagefault.feat.ptid);
> >>+        mark_postcopy_downtime_begin((uintptr_t)(msg.arg.pagefault.address),
> >>+                         get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
> >>          /*
> >>           * Send the request to the source - we want to request one
> >>           * of our host page sizes (which is >= TPS)
> >>@@ -641,6 +658,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> >>          return -e;
> >>      }
> >>+    mark_postcopy_downtime_end((uint64_t)host);
> >>      trace_postcopy_place_page(host);
> >>      return 0;
> >>diff --git a/migration/trace-events b/migration/trace-events
> >>index b8f01a2..d338810 100644
> >>--- a/migration/trace-events
> >>+++ b/migration/trace-events
> >>@@ -110,6 +110,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> >>  process_incoming_migration_co_postcopy_end_main(void) ""
> >>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> >>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> >>+mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> >>+mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> >>+downtime_per_cpu(int cpu_index, int64_t downtime) "downtime cpu[%d]=%" PRId64
> >>  # migration/rdma.c
> >>  qemu_rdma_accept_incoming_migration(void) ""
> >>@@ -186,7 +189,7 @@ postcopy_ram_enable_notify(void) ""
> >>  postcopy_ram_fault_thread_entry(void) ""
> >>  postcopy_ram_fault_thread_exit(void) ""
> >>  postcopy_ram_fault_thread_quit(void) ""
> >>-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> >>+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
> >>  postcopy_ram_incoming_cleanup_closeuf(void) ""
> >>  postcopy_ram_incoming_cleanup_entry(void) ""
> >>  postcopy_ram_incoming_cleanup_exit(void) ""
> >>@@ -195,6 +198,7 @@ save_xbzrle_page_skipping(void) ""
> >>  save_xbzrle_page_overflow(void) ""
> >>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
> >>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> >>+get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> >>  # migration/exec.c
> >>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> >>-- 
> >>1.9.1
> >>
> 
> 
> -- 
> Best regards,
> Alexey Perevalov

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-08  6:29           ` Peter Xu
@ 2017-05-08  9:08             ` Alexey
  2017-05-09  8:26               ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey @ 2017-05-08  9:08 UTC (permalink / raw)
  To: Peter Xu; +Cc: i.maximets, f4bug, qemu-devel, dgilbert

On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > >>This patch provides downtime calculation per vCPU,
> > >>as a summary and as a overlapped value for all vCPUs.
> > >>
> > >>This approach was suggested by Peter Xu, as an improvements of
> > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > >>
> > >>For more details see comments for get_postcopy_total_downtime
> > >>implementation.
> > >>
> > >>Downtime will not calculated if postcopy_downtime field of
> > >>MigrationIncomingState wasn't initialized.
> > >>
> > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > >>---
> > >>  include/migration/migration.h |   3 ++
> > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > >>  migration/postcopy-ram.c      |  20 +++++++-
> > >>  migration/trace-events        |   6 ++-
> > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > >>
> > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > >>index e8fb68f..a22f9ce 100644
> > >>--- a/include/migration/migration.h
> > >>+++ b/include/migration/migration.h
> > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > >>   * Functions to work with downtime context
> > >>   */
> > >>  struct DowntimeContext *downtime_context_new(void);
> > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > >>+uint64_t get_postcopy_total_downtime(void);
> > >>  struct MigrationState
> > >>  {
> > >>diff --git a/migration/migration.c b/migration/migration.c
> > >>index ec76e5c..2c6f150 100644
> > >>--- a/migration/migration.c
> > >>+++ b/migration/migration.c
> > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > >>  }
> > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > >>+{
> > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > >>+    DowntimeContext *dc;
> > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > >>+        return;
> > >>+    }
> > >>+    dc = mis->downtime_ctx;
> > >>+    dc->vcpu_addr[cpu] = addr;
> > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > >>+
> > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > >>+            cpu);
> > >>+}
> > >>+
> > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > >>+{
> > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > >>+    DowntimeContext *dc;
> > >>+    int i;
> > >>+    bool all_vcpu_down = true;
> > >>+    int64_t now;
> > >>+
> > >>+    if (!mis->downtime_ctx) {
> > >>+        return;
> > >>+    }
> > >>+    dc = mis->downtime_ctx;
> > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > >>+
> > >>+    /* check all vCPU down,
> > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > >>+     * will be a cycle */
> > >>+    for (i = 0; i < smp_cpus; i++) {
> > >>+        if (dc->vcpu_addr[i]) {
> > >>+            continue;
> > >>+        }
> > >>+        all_vcpu_down = false;
> > >>+        break;
> > >>+    }
> > >>+
> > >>+    if (all_vcpu_down) {
> > >>+        dc->total_downtime += now - dc->last_begin;
> > >Shall we do this accouting only if we are sure the copied page address
> > >is one of the page faulted addresses? Can it be some other page? I
> > >don't know. But since we have the loop below to make sure of it, why
> > >not?
> > no, the downtime implies since page fault till the
> > page will be copied.
> > Yes another pages could be copied as well as pagefaulted,
> > and they are copied due to prefetching, but it's not a downtime.
> 
> Not sure I got the point... Do you mean that when reach here, then
> this page address is definitely one of the faulted addresses? I am not
> 100% sure of this, but if you are sure, I am okay with it.
Let me clarify.

> > >Shall we do this accouting only if we are sure the copied page address
> > >is one of the page faulted addresses?
Yes it's primary condition, due to there are could be another pages,
which weren't faulted, they just was sent from source to destination,
I called it prefetching.

I think I got why did you ask that question, because in this version
all_vcpu_down and as a result total_downtime calculated incorrectly,
it calculates every time when any page is copied, but it should
be calculated only when faulted page copied, so only dc->vcpu_downtime
was correctly calculated.

> > > Can it be some other page? I
> > >don't know. But since we have the loop below to make sure of it, why
> > >not?

> > 
> > >A nitpick on perf: when there are lots of vcpus, the algo might be
> > >slow since we have several places that loops over the smp_vcpus. But
> > >this can be totally future work on top, and current way is good enough
> > >at least for me.
> > 
> > >(for the nit: maybe add a hash, key=thread_id, value=cpu_index, then
> > >  get_mem_fault_cpu_index() can be faster using the hash; meanwhile
> > >  keep a counter A of page faulted vcpus, use atomic ops with it, then
> > >  here all_vcpu_down can be checked by A == smp_vcpus)
> > just binary search in get_mem_fault_cpu_index will be nice too )
> > also, it's good idea to keep all_vcpu_down in PostcopyDowntimeContext.
> > 
> > >
> > >Thanks,
> > >
> > >>+    }
> > >>+
> > >>+    /* lookup cpu, to clear it */
> > >>+    for (i = 0; i < smp_cpus; i++) {
> > >>+        uint64_t vcpu_downtime;
> > >>+
> > >>+        if (dc->vcpu_addr[i] != addr) {
> > >>+            continue;
> > >>+        }
> > >>+
> > >>+        vcpu_downtime = now - dc->page_fault_vcpu_time[i];
> > >>+
> > >>+        dc->vcpu_addr[i] = 0;
> > >>+        dc->vcpu_downtime[i] += vcpu_downtime;
> > >>+    }
> > >>+
> > >>+    trace_mark_postcopy_downtime_end(addr, dc, dc->total_downtime);
> > >>+}
> > >>+
> > >>+/*
> > >>+ * This function just provide calculated before downtime per cpu and trace it.
> > >>+ * Total downtime is calculated in mark_postcopy_downtime_end.
> > >>+ *
> > >>+ *
> > >>+ * Assume we have 3 CPU
> > >>+ *
> > >>+ *      S1        E1           S1               E1
> > >>+ * -----***********------------xxx***************------------------------> CPU1
> > >>+ *
> > >>+ *             S2                E2
> > >>+ * ------------****************xxx---------------------------------------> CPU2
> > >>+ *
> > >>+ *                         S3            E3
> > >>+ * ------------------------****xxx********-------------------------------> CPU3
> > >>+ *
> > >>+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> > >>+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> > >>+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> > >>+ *            it's a part of total downtime.
> > >>+ * S1 - here is last_begin
> > >>+ * Legend of the picture is following:
> > >>+ *              * - means downtime per vCPU
> > >>+ *              x - means overlapped downtime (total downtime)
> > >>+ */
> > >>+uint64_t get_postcopy_total_downtime(void)
> > >>+{
> > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > >>+
> > >>+    if (!mis->downtime_ctx) {
> > >>+        return 0;
> > >>+    }
> > >>+
> > >>+    if (trace_event_get_state(TRACE_DOWNTIME_PER_CPU)) {
> > >>+        int i;
> > >>+        for (i = 0; i < smp_cpus; i++) {
> > >>+            trace_downtime_per_cpu(i, mis->downtime_ctx->vcpu_downtime[i]);
> > >>+        }
> > >>+    }
> > >>+    return mis->downtime_ctx->total_downtime;
> > >>+}
> > >>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > >>index f3688f5..cf2b935 100644
> > >>--- a/migration/postcopy-ram.c
> > >>+++ b/migration/postcopy-ram.c
> > >>@@ -23,6 +23,7 @@
> > >>  #include "migration/postcopy-ram.h"
> > >>  #include "sysemu/sysemu.h"
> > >>  #include "sysemu/balloon.h"
> > >>+#include <sys/param.h>
> > >>  #include "qemu/error-report.h"
> > >>  #include "trace.h"
> > >>@@ -468,6 +469,19 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> > >>      return 0;
> > >>  }
> > >>+static int get_mem_fault_cpu_index(uint32_t pid)
> > >>+{
> > >>+    CPUState *cpu_iter;
> > >>+
> > >>+    CPU_FOREACH(cpu_iter) {
> > >>+        if (cpu_iter->thread_id == pid) {
> > >>+            return cpu_iter->cpu_index;
> > >>+        }
> > >>+    }
> > >>+    trace_get_mem_fault_cpu_index(pid);
> > >>+    return -1;
> > >>+}
> > >>+
> > >>  /*
> > >>   * Handle faults detected by the USERFAULT markings
> > >>   */
> > >>@@ -545,8 +559,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
> > >>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
> > >>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> > >>                                                  qemu_ram_get_idstr(rb),
> > >>-                                                rb_offset);
> > >>+                                                rb_offset,
> > >>+                                                msg.arg.pagefault.feat.ptid);
> > >>+        mark_postcopy_downtime_begin((uintptr_t)(msg.arg.pagefault.address),
> > >>+                         get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
> > >>          /*
> > >>           * Send the request to the source - we want to request one
> > >>           * of our host page sizes (which is >= TPS)
> > >>@@ -641,6 +658,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > >>          return -e;
> > >>      }
> > >>+    mark_postcopy_downtime_end((uint64_t)host);
> > >>      trace_postcopy_place_page(host);
> > >>      return 0;
> > >>diff --git a/migration/trace-events b/migration/trace-events
> > >>index b8f01a2..d338810 100644
> > >>--- a/migration/trace-events
> > >>+++ b/migration/trace-events
> > >>@@ -110,6 +110,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > >>  process_incoming_migration_co_postcopy_end_main(void) ""
> > >>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> > >>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> > >>+mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> > >>+mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> > >>+downtime_per_cpu(int cpu_index, int64_t downtime) "downtime cpu[%d]=%" PRId64
> > >>  # migration/rdma.c
> > >>  qemu_rdma_accept_incoming_migration(void) ""
> > >>@@ -186,7 +189,7 @@ postcopy_ram_enable_notify(void) ""
> > >>  postcopy_ram_fault_thread_entry(void) ""
> > >>  postcopy_ram_fault_thread_exit(void) ""
> > >>  postcopy_ram_fault_thread_quit(void) ""
> > >>-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> > >>+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
> > >>  postcopy_ram_incoming_cleanup_closeuf(void) ""
> > >>  postcopy_ram_incoming_cleanup_entry(void) ""
> > >>  postcopy_ram_incoming_cleanup_exit(void) ""
> > >>@@ -195,6 +198,7 @@ save_xbzrle_page_skipping(void) ""
> > >>  save_xbzrle_page_overflow(void) ""
> > >>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
> > >>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> > >>+get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> > >>  # migration/exec.c
> > >>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> > >>-- 
> > >>1.9.1
> > >>
> > 
> > 
> > -- 
> > Best regards,
> > Alexey Perevalov
> 
> -- 
> Peter Xu
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-08  9:08             ` Alexey
@ 2017-05-09  8:26               ` Peter Xu
  2017-05-09  9:40                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-05-09  8:26 UTC (permalink / raw)
  To: Alexey; +Cc: i.maximets, f4bug, qemu-devel, dgilbert

On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > >>This patch provides downtime calculation per vCPU,
> > > >>as a summary and as a overlapped value for all vCPUs.
> > > >>
> > > >>This approach was suggested by Peter Xu, as an improvements of
> > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > >>
> > > >>For more details see comments for get_postcopy_total_downtime
> > > >>implementation.
> > > >>
> > > >>Downtime will not calculated if postcopy_downtime field of
> > > >>MigrationIncomingState wasn't initialized.
> > > >>
> > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > >>---
> > > >>  include/migration/migration.h |   3 ++
> > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > >>  migration/trace-events        |   6 ++-
> > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > >>
> > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > >>index e8fb68f..a22f9ce 100644
> > > >>--- a/include/migration/migration.h
> > > >>+++ b/include/migration/migration.h
> > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > >>   * Functions to work with downtime context
> > > >>   */
> > > >>  struct DowntimeContext *downtime_context_new(void);
> > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > >>+uint64_t get_postcopy_total_downtime(void);
> > > >>  struct MigrationState
> > > >>  {
> > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > >>index ec76e5c..2c6f150 100644
> > > >>--- a/migration/migration.c
> > > >>+++ b/migration/migration.c
> > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > >>  }
> > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > >>+{
> > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > >>+    DowntimeContext *dc;
> > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > >>+        return;
> > > >>+    }
> > > >>+    dc = mis->downtime_ctx;
> > > >>+    dc->vcpu_addr[cpu] = addr;
> > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > >>+
> > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > >>+            cpu);
> > > >>+}
> > > >>+
> > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > >>+{
> > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > >>+    DowntimeContext *dc;
> > > >>+    int i;
> > > >>+    bool all_vcpu_down = true;
> > > >>+    int64_t now;
> > > >>+
> > > >>+    if (!mis->downtime_ctx) {
> > > >>+        return;
> > > >>+    }
> > > >>+    dc = mis->downtime_ctx;
> > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > >>+
> > > >>+    /* check all vCPU down,
> > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > >>+     * will be a cycle */
> > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > >>+        if (dc->vcpu_addr[i]) {
> > > >>+            continue;
> > > >>+        }
> > > >>+        all_vcpu_down = false;
> > > >>+        break;
> > > >>+    }
> > > >>+
> > > >>+    if (all_vcpu_down) {
> > > >>+        dc->total_downtime += now - dc->last_begin;
> > > >Shall we do this accouting only if we are sure the copied page address
> > > >is one of the page faulted addresses? Can it be some other page? I
> > > >don't know. But since we have the loop below to make sure of it, why
> > > >not?
> > > no, the downtime implies since page fault till the
> > > page will be copied.
> > > Yes another pages could be copied as well as pagefaulted,
> > > and they are copied due to prefetching, but it's not a downtime.
> > 
> > Not sure I got the point... Do you mean that when reach here, then
> > this page address is definitely one of the faulted addresses? I am not
> > 100% sure of this, but if you are sure, I am okay with it.
> Let me clarify.
> 
> > > >Shall we do this accouting only if we are sure the copied page address
> > > >is one of the page faulted addresses?
> Yes it's primary condition, due to there are could be another pages,
> which weren't faulted, they just was sent from source to destination,
> I called it prefetching.
> 
> I think I got why did you ask that question, because in this version
> all_vcpu_down and as a result total_downtime calculated incorrectly,
> it calculates every time when any page is copied, but it should
> be calculated only when faulted page copied, so only dc->vcpu_downtime
> was correctly calculated.

Exactly. I am afraid if we have such "prefetching" stuff then
total_downtime will be more than its real value.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-09  8:26               ` Peter Xu
@ 2017-05-09  9:40                 ` Dr. David Alan Gilbert
  2017-05-09  9:44                   ` Daniel P. Berrange
  2017-05-09 15:19                   ` Alexey
  0 siblings, 2 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-05-09  9:40 UTC (permalink / raw)
  To: Peter Xu; +Cc: Alexey, i.maximets, f4bug, qemu-devel

* Peter Xu (peterx@redhat.com) wrote:
> On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > >>This patch provides downtime calculation per vCPU,
> > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > >>
> > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > >>
> > > > >>For more details see comments for get_postcopy_total_downtime
> > > > >>implementation.
> > > > >>
> > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > >>MigrationIncomingState wasn't initialized.
> > > > >>
> > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > >>---
> > > > >>  include/migration/migration.h |   3 ++
> > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > >>  migration/trace-events        |   6 ++-
> > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > >>
> > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > >>index e8fb68f..a22f9ce 100644
> > > > >>--- a/include/migration/migration.h
> > > > >>+++ b/include/migration/migration.h
> > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > >>   * Functions to work with downtime context
> > > > >>   */
> > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > >>  struct MigrationState
> > > > >>  {
> > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > >>index ec76e5c..2c6f150 100644
> > > > >>--- a/migration/migration.c
> > > > >>+++ b/migration/migration.c
> > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > >>  }
> > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > >>+{
> > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > >>+    DowntimeContext *dc;
> > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > >>+        return;
> > > > >>+    }
> > > > >>+    dc = mis->downtime_ctx;
> > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > >>+
> > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > >>+            cpu);
> > > > >>+}
> > > > >>+
> > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > >>+{
> > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > >>+    DowntimeContext *dc;
> > > > >>+    int i;
> > > > >>+    bool all_vcpu_down = true;
> > > > >>+    int64_t now;
> > > > >>+
> > > > >>+    if (!mis->downtime_ctx) {
> > > > >>+        return;
> > > > >>+    }
> > > > >>+    dc = mis->downtime_ctx;
> > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > >>+
> > > > >>+    /* check all vCPU down,
> > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > >>+     * will be a cycle */
> > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > >>+        if (dc->vcpu_addr[i]) {
> > > > >>+            continue;
> > > > >>+        }
> > > > >>+        all_vcpu_down = false;
> > > > >>+        break;
> > > > >>+    }
> > > > >>+
> > > > >>+    if (all_vcpu_down) {
> > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > >Shall we do this accouting only if we are sure the copied page address
> > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > >don't know. But since we have the loop below to make sure of it, why
> > > > >not?
> > > > no, the downtime implies since page fault till the
> > > > page will be copied.
> > > > Yes another pages could be copied as well as pagefaulted,
> > > > and they are copied due to prefetching, but it's not a downtime.
> > > 
> > > Not sure I got the point... Do you mean that when reach here, then
> > > this page address is definitely one of the faulted addresses? I am not
> > > 100% sure of this, but if you are sure, I am okay with it.
> > Let me clarify.
> > 
> > > > >Shall we do this accouting only if we are sure the copied page address
> > > > >is one of the page faulted addresses?
> > Yes it's primary condition, due to there are could be another pages,
> > which weren't faulted, they just was sent from source to destination,
> > I called it prefetching.
> > 
> > I think I got why did you ask that question, because in this version
> > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > it calculates every time when any page is copied, but it should
> > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > was correctly calculated.
> 
> Exactly. I am afraid if we have such "prefetching" stuff then
> total_downtime will be more than its real value.

It should be OK as long as we measure the time between
  userfault reporting a page miss for an address 
  and
  place_page for *that same address*

any places for other pages are irrelevant.

(I still worry that this definition of 'downtime' is possibly
arbitrary - since if all but one of the vCPUs are down we
don't count it but it's obviously still a big impact).

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-09  9:40                 ` Dr. David Alan Gilbert
@ 2017-05-09  9:44                   ` Daniel P. Berrange
  2017-05-10 15:46                     ` Alexey
  2017-05-09 15:19                   ` Alexey
  1 sibling, 1 reply; 39+ messages in thread
From: Daniel P. Berrange @ 2017-05-09  9:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Peter Xu, i.maximets, qemu-devel, Alexey, f4bug

On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > >>This patch provides downtime calculation per vCPU,
> > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > >>
> > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > >>
> > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > >>implementation.
> > > > > >>
> > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > >>MigrationIncomingState wasn't initialized.
> > > > > >>
> > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > >>---
> > > > > >>  include/migration/migration.h |   3 ++
> > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > >>  migration/trace-events        |   6 ++-
> > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > >>
> > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > >>index e8fb68f..a22f9ce 100644
> > > > > >>--- a/include/migration/migration.h
> > > > > >>+++ b/include/migration/migration.h
> > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > >>   * Functions to work with downtime context
> > > > > >>   */
> > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > >>  struct MigrationState
> > > > > >>  {
> > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > >>index ec76e5c..2c6f150 100644
> > > > > >>--- a/migration/migration.c
> > > > > >>+++ b/migration/migration.c
> > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > >>  }
> > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > >>+{
> > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > >>+    DowntimeContext *dc;
> > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > >>+        return;
> > > > > >>+    }
> > > > > >>+    dc = mis->downtime_ctx;
> > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > >>+
> > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > >>+            cpu);
> > > > > >>+}
> > > > > >>+
> > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > >>+{
> > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > >>+    DowntimeContext *dc;
> > > > > >>+    int i;
> > > > > >>+    bool all_vcpu_down = true;
> > > > > >>+    int64_t now;
> > > > > >>+
> > > > > >>+    if (!mis->downtime_ctx) {
> > > > > >>+        return;
> > > > > >>+    }
> > > > > >>+    dc = mis->downtime_ctx;
> > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > >>+
> > > > > >>+    /* check all vCPU down,
> > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > >>+     * will be a cycle */
> > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > >>+            continue;
> > > > > >>+        }
> > > > > >>+        all_vcpu_down = false;
> > > > > >>+        break;
> > > > > >>+    }
> > > > > >>+
> > > > > >>+    if (all_vcpu_down) {
> > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > >not?
> > > > > no, the downtime implies since page fault till the
> > > > > page will be copied.
> > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > 
> > > > Not sure I got the point... Do you mean that when reach here, then
> > > > this page address is definitely one of the faulted addresses? I am not
> > > > 100% sure of this, but if you are sure, I am okay with it.
> > > Let me clarify.
> > > 
> > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > >is one of the page faulted addresses?
> > > Yes it's primary condition, due to there are could be another pages,
> > > which weren't faulted, they just was sent from source to destination,
> > > I called it prefetching.
> > > 
> > > I think I got why did you ask that question, because in this version
> > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > it calculates every time when any page is copied, but it should
> > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > was correctly calculated.
> > 
> > Exactly. I am afraid if we have such "prefetching" stuff then
> > total_downtime will be more than its real value.
> 
> It should be OK as long as we measure the time between
>   userfault reporting a page miss for an address 
>   and
>   place_page for *that same address*
> 
> any places for other pages are irrelevant.
> 
> (I still worry that this definition of 'downtime' is possibly
> arbitrary - since if all but one of the vCPUs are down we
> don't count it but it's obviously still a big impact).

Can we also *not* call it "downtime", as it is measuring a very different
thing than the "downtime" we have measured today on the source during
pre-copy migration.

Call it "pagewait" or "delaytime" or something like that to indicate it
is counting delays to CPUs for page fetching.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-09  9:40                 ` Dr. David Alan Gilbert
  2017-05-09  9:44                   ` Daniel P. Berrange
@ 2017-05-09 15:19                   ` Alexey
  2017-05-09 19:01                     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 39+ messages in thread
From: Alexey @ 2017-05-09 15:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Peter Xu, i.maximets, qemu-devel, f4bug

On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > >>This patch provides downtime calculation per vCPU,
> > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > >>
> > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > >>
> > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > >>implementation.
> > > > > >>
> > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > >>MigrationIncomingState wasn't initialized.
> > > > > >>
> > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > >>---
> > > > > >>  include/migration/migration.h |   3 ++
> > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > >>  migration/trace-events        |   6 ++-
> > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > >>
> > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > >>index e8fb68f..a22f9ce 100644
> > > > > >>--- a/include/migration/migration.h
> > > > > >>+++ b/include/migration/migration.h
> > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > >>   * Functions to work with downtime context
> > > > > >>   */
> > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > >>  struct MigrationState
> > > > > >>  {
> > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > >>index ec76e5c..2c6f150 100644
> > > > > >>--- a/migration/migration.c
> > > > > >>+++ b/migration/migration.c
> > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > >>  }
> > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > >>+{
> > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > >>+    DowntimeContext *dc;
> > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > >>+        return;
> > > > > >>+    }
> > > > > >>+    dc = mis->downtime_ctx;
> > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > >>+
> > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > >>+            cpu);
> > > > > >>+}
> > > > > >>+
> > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > >>+{
> > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > >>+    DowntimeContext *dc;
> > > > > >>+    int i;
> > > > > >>+    bool all_vcpu_down = true;
> > > > > >>+    int64_t now;
> > > > > >>+
> > > > > >>+    if (!mis->downtime_ctx) {
> > > > > >>+        return;
> > > > > >>+    }
> > > > > >>+    dc = mis->downtime_ctx;
> > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > >>+
> > > > > >>+    /* check all vCPU down,
> > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > >>+     * will be a cycle */
> > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > >>+            continue;
> > > > > >>+        }
> > > > > >>+        all_vcpu_down = false;
> > > > > >>+        break;
> > > > > >>+    }
> > > > > >>+
> > > > > >>+    if (all_vcpu_down) {
> > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > >not?
> > > > > no, the downtime implies since page fault till the
> > > > > page will be copied.
> > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > 
> > > > Not sure I got the point... Do you mean that when reach here, then
> > > > this page address is definitely one of the faulted addresses? I am not
> > > > 100% sure of this, but if you are sure, I am okay with it.
> > > Let me clarify.
> > > 
> > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > >is one of the page faulted addresses?
> > > Yes it's primary condition, due to there are could be another pages,
> > > which weren't faulted, they just was sent from source to destination,
> > > I called it prefetching.
> > > 
> > > I think I got why did you ask that question, because in this version
> > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > it calculates every time when any page is copied, but it should
> > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > was correctly calculated.
> > 
> > Exactly. I am afraid if we have such "prefetching" stuff then
> > total_downtime will be more than its real value.
> 
> It should be OK as long as we measure the time between
>   userfault reporting a page miss for an address 
>   and
>   place_page for *that same address*
> 
> any places for other pages are irrelevant.
> 
> (I still worry that this definition of 'downtime' is possibly
> arbitrary - since if all but one of the vCPUs are down we
> don't count it but it's obviously still a big impact).
Technically we count downtime per vCPU  and storing it in
    vcpu_downtime field of PostcopyDowntimeContext (in this version
    still DowntimeContext). I traced downtime per vCPU in previous version.
But it just traced as total_downtime in current version.

Also total_downtime is not possible to get on destination, due to
query-migrate is about MigrationState, but not MigrationIncomingState,
so I think need to extend it to MigrationIncomingState too.

> 
> Dave
> 
> > -- 
> > Peter Xu
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-09 15:19                   ` Alexey
@ 2017-05-09 19:01                     ` Dr. David Alan Gilbert
  2017-05-11  6:32                       ` Alexey
  0 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-05-09 19:01 UTC (permalink / raw)
  To: Alexey; +Cc: Peter Xu, i.maximets, qemu-devel, f4bug

* Alexey (a.perevalov@samsung.com) wrote:
> On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > >>
> > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > >>
> > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > >>implementation.
> > > > > > >>
> > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > >>
> > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > >>---
> > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > >>
> > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > >>--- a/include/migration/migration.h
> > > > > > >>+++ b/include/migration/migration.h
> > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > >>   * Functions to work with downtime context
> > > > > > >>   */
> > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > >>  struct MigrationState
> > > > > > >>  {
> > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > >>--- a/migration/migration.c
> > > > > > >>+++ b/migration/migration.c
> > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > >>  }
> > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > >>+{
> > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > >>+    DowntimeContext *dc;
> > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > >>+        return;
> > > > > > >>+    }
> > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > >>+
> > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > >>+            cpu);
> > > > > > >>+}
> > > > > > >>+
> > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > >>+{
> > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > >>+    DowntimeContext *dc;
> > > > > > >>+    int i;
> > > > > > >>+    bool all_vcpu_down = true;
> > > > > > >>+    int64_t now;
> > > > > > >>+
> > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > >>+        return;
> > > > > > >>+    }
> > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > >>+
> > > > > > >>+    /* check all vCPU down,
> > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > >>+     * will be a cycle */
> > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > >>+            continue;
> > > > > > >>+        }
> > > > > > >>+        all_vcpu_down = false;
> > > > > > >>+        break;
> > > > > > >>+    }
> > > > > > >>+
> > > > > > >>+    if (all_vcpu_down) {
> > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > >not?
> > > > > > no, the downtime implies since page fault till the
> > > > > > page will be copied.
> > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > 
> > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > Let me clarify.
> > > > 
> > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > >is one of the page faulted addresses?
> > > > Yes it's primary condition, due to there are could be another pages,
> > > > which weren't faulted, they just was sent from source to destination,
> > > > I called it prefetching.
> > > > 
> > > > I think I got why did you ask that question, because in this version
> > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > it calculates every time when any page is copied, but it should
> > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > was correctly calculated.
> > > 
> > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > total_downtime will be more than its real value.
> > 
> > It should be OK as long as we measure the time between
> >   userfault reporting a page miss for an address 
> >   and
> >   place_page for *that same address*
> > 
> > any places for other pages are irrelevant.
> > 
> > (I still worry that this definition of 'downtime' is possibly
> > arbitrary - since if all but one of the vCPUs are down we
> > don't count it but it's obviously still a big impact).
> Technically we count downtime per vCPU  and storing it in
>     vcpu_downtime field of PostcopyDowntimeContext (in this version
>     still DowntimeContext). I traced downtime per vCPU in previous version.
> But it just traced as total_downtime in current version.
> 
> Also total_downtime is not possible to get on destination, due to
> query-migrate is about MigrationState, but not MigrationIncomingState,
> so I think need to extend it to MigrationIncomingState too.

I don't think that's too problematic; just add it to qmp_query_migrate;
the only thing to be careful of is what happens if the incoming
migration finishes during the info migrate is reading the
MigrationIncomingState.

Dave

> > 
> > Dave
> > 
> > > -- 
> > > Peter Xu
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-09  9:44                   ` Daniel P. Berrange
@ 2017-05-10 15:46                     ` Alexey
  2017-05-10 15:58                       ` Daniel P. Berrange
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey @ 2017-05-10 15:46 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Dr. David Alan Gilbert, i.maximets, f4bug, qemu-devel, Peter Xu

On Tue, May 09, 2017 at 10:44:34AM +0100, Daniel P. Berrange wrote:
> On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > >>
> > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > >>
> > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > >>implementation.
> > > > > > >>
> > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > >>
> > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > >>---
> > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > >>
> > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > >>--- a/include/migration/migration.h
> > > > > > >>+++ b/include/migration/migration.h
> > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > >>   * Functions to work with downtime context
> > > > > > >>   */
> > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > >>  struct MigrationState
> > > > > > >>  {
> > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > >>--- a/migration/migration.c
> > > > > > >>+++ b/migration/migration.c
> > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > >>  }
> > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > >>+{
> > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > >>+    DowntimeContext *dc;
> > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > >>+        return;
> > > > > > >>+    }
> > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > >>+
> > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > >>+            cpu);
> > > > > > >>+}
> > > > > > >>+
> > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > >>+{
> > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > >>+    DowntimeContext *dc;
> > > > > > >>+    int i;
> > > > > > >>+    bool all_vcpu_down = true;
> > > > > > >>+    int64_t now;
> > > > > > >>+
> > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > >>+        return;
> > > > > > >>+    }
> > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > >>+
> > > > > > >>+    /* check all vCPU down,
> > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > >>+     * will be a cycle */
> > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > >>+            continue;
> > > > > > >>+        }
> > > > > > >>+        all_vcpu_down = false;
> > > > > > >>+        break;
> > > > > > >>+    }
> > > > > > >>+
> > > > > > >>+    if (all_vcpu_down) {
> > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > >not?
> > > > > > no, the downtime implies since page fault till the
> > > > > > page will be copied.
> > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > 
> > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > Let me clarify.
> > > > 
> > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > >is one of the page faulted addresses?
> > > > Yes it's primary condition, due to there are could be another pages,
> > > > which weren't faulted, they just was sent from source to destination,
> > > > I called it prefetching.
> > > > 
> > > > I think I got why did you ask that question, because in this version
> > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > it calculates every time when any page is copied, but it should
> > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > was correctly calculated.
> > > 
> > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > total_downtime will be more than its real value.
> > 
> > It should be OK as long as we measure the time between
> >   userfault reporting a page miss for an address 
> >   and
> >   place_page for *that same address*
> > 
> > any places for other pages are irrelevant.
> > 
> > (I still worry that this definition of 'downtime' is possibly
> > arbitrary - since if all but one of the vCPUs are down we
> > don't count it but it's obviously still a big impact).
> 
> Can we also *not* call it "downtime", as it is measuring a very different
> thing than the "downtime" we have measured today on the source during
> pre-copy migration.

As I know downtime in pre-copy migration it's a time since all vCPU
were disabled on source till enabling these vCPU on destination,
but it's continuous time. So the meaning of both downtime is the same.
It's time when all vCPU are down.

But as David mentioned - downtime per vCPU is also important in case of
postcopy live migration, and I'll include it in final result of
query-migrate too.

> 
> Call it "pagewait" or "delaytime" or something like that to indicate it
> is counting delays to CPUs for page fetching.
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-10 15:46                     ` Alexey
@ 2017-05-10 15:58                       ` Daniel P. Berrange
  2017-05-11  4:56                         ` Peter Xu
       [not found]                         ` <CGME20170511064629eucas1p114c72db6d922a6a05a4ec4a4d3003b55@eucas1p1.samsung.com>
  0 siblings, 2 replies; 39+ messages in thread
From: Daniel P. Berrange @ 2017-05-10 15:58 UTC (permalink / raw)
  To: Alexey; +Cc: Dr. David Alan Gilbert, i.maximets, f4bug, qemu-devel, Peter Xu

On Wed, May 10, 2017 at 06:46:50PM +0300, Alexey wrote:
> On Tue, May 09, 2017 at 10:44:34AM +0100, Daniel P. Berrange wrote:
> > On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > > >>
> > > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > > >>
> > > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > > >>implementation.
> > > > > > > >>
> > > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > > >>
> > > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > > >>---
> > > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > > >>
> > > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > > >>--- a/include/migration/migration.h
> > > > > > > >>+++ b/include/migration/migration.h
> > > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > > >>   * Functions to work with downtime context
> > > > > > > >>   */
> > > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > > >>  struct MigrationState
> > > > > > > >>  {
> > > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > > >>--- a/migration/migration.c
> > > > > > > >>+++ b/migration/migration.c
> > > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > > >>  }
> > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > > >>+{
> > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > >>+    DowntimeContext *dc;
> > > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > > >>+        return;
> > > > > > > >>+    }
> > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > >>+
> > > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > > >>+            cpu);
> > > > > > > >>+}
> > > > > > > >>+
> > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > > >>+{
> > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > >>+    DowntimeContext *dc;
> > > > > > > >>+    int i;
> > > > > > > >>+    bool all_vcpu_down = true;
> > > > > > > >>+    int64_t now;
> > > > > > > >>+
> > > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > > >>+        return;
> > > > > > > >>+    }
> > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > >>+
> > > > > > > >>+    /* check all vCPU down,
> > > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > > >>+     * will be a cycle */
> > > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > > >>+            continue;
> > > > > > > >>+        }
> > > > > > > >>+        all_vcpu_down = false;
> > > > > > > >>+        break;
> > > > > > > >>+    }
> > > > > > > >>+
> > > > > > > >>+    if (all_vcpu_down) {
> > > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > > >not?
> > > > > > > no, the downtime implies since page fault till the
> > > > > > > page will be copied.
> > > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > > 
> > > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > > Let me clarify.
> > > > > 
> > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > >is one of the page faulted addresses?
> > > > > Yes it's primary condition, due to there are could be another pages,
> > > > > which weren't faulted, they just was sent from source to destination,
> > > > > I called it prefetching.
> > > > > 
> > > > > I think I got why did you ask that question, because in this version
> > > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > > it calculates every time when any page is copied, but it should
> > > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > > was correctly calculated.
> > > > 
> > > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > > total_downtime will be more than its real value.
> > > 
> > > It should be OK as long as we measure the time between
> > >   userfault reporting a page miss for an address 
> > >   and
> > >   place_page for *that same address*
> > > 
> > > any places for other pages are irrelevant.
> > > 
> > > (I still worry that this definition of 'downtime' is possibly
> > > arbitrary - since if all but one of the vCPUs are down we
> > > don't count it but it's obviously still a big impact).
> > 
> > Can we also *not* call it "downtime", as it is measuring a very different
> > thing than the "downtime" we have measured today on the source during
> > pre-copy migration.
> 
> As I know downtime in pre-copy migration it's a time since all vCPU
> were disabled on source till enabling these vCPU on destination,
> but it's continuous time. So the meaning of both downtime is the same.
> It's time when all vCPU are down.

It is really not the same. In pre-copy, all the CPUs are in an explicit
"stopped" state for a continuous period of time. In post-copy the CPUs
are all in the "running" state, but repeatedly get blocked with arbitrary
time intervals to fetch pages from the source. This is totally different
information being reported, so should have a different name.

Calling it downtime will also lead to confusion wrt to the "max downtime"
tunable parameter, which is not at all relevant here.

> But as David mentioned - downtime per vCPU is also important in case of
> postcopy live migration, and I'll include it in final result of
> query-migrate too.

That we need to report different data per-vCPU, is another reason why
we should not call it "downtime".

> > Call it "pagewait" or "delaytime" or something like that to indicate it
> > is counting delays to CPUs for page fetching.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-10 15:58                       ` Daniel P. Berrange
@ 2017-05-11  4:56                         ` Peter Xu
       [not found]                           ` <CGME20170511070940eucas1p2ca3e44c15c84eef00e33d755a11c0ea1@eucas1p2.samsung.com>
       [not found]                         ` <CGME20170511064629eucas1p114c72db6d922a6a05a4ec4a4d3003b55@eucas1p1.samsung.com>
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Xu @ 2017-05-11  4:56 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Alexey, Dr. David Alan Gilbert, i.maximets, f4bug, qemu-devel

On Wed, May 10, 2017 at 04:58:59PM +0100, Daniel P. Berrange wrote:
> On Wed, May 10, 2017 at 06:46:50PM +0300, Alexey wrote:
> > On Tue, May 09, 2017 at 10:44:34AM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > > > >>
> > > > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > > > >>
> > > > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > > > >>implementation.
> > > > > > > > >>
> > > > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > > > >>
> > > > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > > > >>---
> > > > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > > > >>
> > > > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > > > >>--- a/include/migration/migration.h
> > > > > > > > >>+++ b/include/migration/migration.h
> > > > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > > > >>   * Functions to work with downtime context
> > > > > > > > >>   */
> > > > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > > > >>  struct MigrationState
> > > > > > > > >>  {
> > > > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > > > >>--- a/migration/migration.c
> > > > > > > > >>+++ b/migration/migration.c
> > > > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > > > >>  }
> > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > > > >>+{
> > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > > > >>+        return;
> > > > > > > > >>+    }
> > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > >>+
> > > > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > > > >>+            cpu);
> > > > > > > > >>+}
> > > > > > > > >>+
> > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > > > >>+{
> > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > >>+    int i;
> > > > > > > > >>+    bool all_vcpu_down = true;
> > > > > > > > >>+    int64_t now;
> > > > > > > > >>+
> > > > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > > > >>+        return;
> > > > > > > > >>+    }
> > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > >>+
> > > > > > > > >>+    /* check all vCPU down,
> > > > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > > > >>+     * will be a cycle */
> > > > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > > > >>+            continue;
> > > > > > > > >>+        }
> > > > > > > > >>+        all_vcpu_down = false;
> > > > > > > > >>+        break;
> > > > > > > > >>+    }
> > > > > > > > >>+
> > > > > > > > >>+    if (all_vcpu_down) {
> > > > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > > > >not?
> > > > > > > > no, the downtime implies since page fault till the
> > > > > > > > page will be copied.
> > > > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > > > 
> > > > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > > > Let me clarify.
> > > > > > 
> > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > >is one of the page faulted addresses?
> > > > > > Yes it's primary condition, due to there are could be another pages,
> > > > > > which weren't faulted, they just was sent from source to destination,
> > > > > > I called it prefetching.
> > > > > > 
> > > > > > I think I got why did you ask that question, because in this version
> > > > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > > > it calculates every time when any page is copied, but it should
> > > > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > > > was correctly calculated.
> > > > > 
> > > > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > > > total_downtime will be more than its real value.
> > > > 
> > > > It should be OK as long as we measure the time between
> > > >   userfault reporting a page miss for an address 
> > > >   and
> > > >   place_page for *that same address*
> > > > 
> > > > any places for other pages are irrelevant.
> > > > 
> > > > (I still worry that this definition of 'downtime' is possibly
> > > > arbitrary - since if all but one of the vCPUs are down we
> > > > don't count it but it's obviously still a big impact).
> > > 
> > > Can we also *not* call it "downtime", as it is measuring a very different
> > > thing than the "downtime" we have measured today on the source during
> > > pre-copy migration.
> > 
> > As I know downtime in pre-copy migration it's a time since all vCPU
> > were disabled on source till enabling these vCPU on destination,
> > but it's continuous time. So the meaning of both downtime is the same.
> > It's time when all vCPU are down.
> 
> It is really not the same. In pre-copy, all the CPUs are in an explicit
> "stopped" state for a continuous period of time. In post-copy the CPUs
> are all in the "running" state, but repeatedly get blocked with arbitrary
> time intervals to fetch pages from the source. This is totally different
> information being reported, so should have a different name.
> 
> Calling it downtime will also lead to confusion wrt to the "max downtime"
> tunable parameter, which is not at all relevant here.

Indeed.

A big difference that this "postcopy downtime" differs from original
downtime is that, this new "downtime" may even not be detected by
guest user. It just like how operating system allocate time slides -
the "downtime" during postcopy phase is intermittent and I believe the
guest user can hardly differenciate that from general UI process
switching on some platforms.

Since we are at this, Alex, could I ask in what case do we really care
about this summation of "postcopy downtime"? Or say, how this
requirement come from?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-09 19:01                     ` Dr. David Alan Gilbert
@ 2017-05-11  6:32                       ` Alexey
  2017-05-11  8:25                         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey @ 2017-05-11  6:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, qemu-devel, Peter Xu, f4bug

On Tue, May 09, 2017 at 08:01:01PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
> > On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > > >>
> > > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > > >>
> > > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > > >>implementation.
> > > > > > > >>
> > > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > > >>
> > > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > > >>---
> > > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > > >>
> > > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > > >>--- a/include/migration/migration.h
> > > > > > > >>+++ b/include/migration/migration.h
> > > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > > >>   * Functions to work with downtime context
> > > > > > > >>   */
> > > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > > >>  struct MigrationState
> > > > > > > >>  {
> > > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > > >>--- a/migration/migration.c
> > > > > > > >>+++ b/migration/migration.c
> > > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > > >>  }
> > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > > >>+{
> > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > >>+    DowntimeContext *dc;
> > > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > > >>+        return;
> > > > > > > >>+    }
> > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > >>+
> > > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > > >>+            cpu);
> > > > > > > >>+}
> > > > > > > >>+
> > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > > >>+{
> > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > >>+    DowntimeContext *dc;
> > > > > > > >>+    int i;
> > > > > > > >>+    bool all_vcpu_down = true;
> > > > > > > >>+    int64_t now;
> > > > > > > >>+
> > > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > > >>+        return;
> > > > > > > >>+    }
> > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > >>+
> > > > > > > >>+    /* check all vCPU down,
> > > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > > >>+     * will be a cycle */
> > > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > > >>+            continue;
> > > > > > > >>+        }
> > > > > > > >>+        all_vcpu_down = false;
> > > > > > > >>+        break;
> > > > > > > >>+    }
> > > > > > > >>+
> > > > > > > >>+    if (all_vcpu_down) {
> > > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > > >not?
> > > > > > > no, the downtime implies since page fault till the
> > > > > > > page will be copied.
> > > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > > 
> > > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > > Let me clarify.
> > > > > 
> > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > >is one of the page faulted addresses?
> > > > > Yes it's primary condition, due to there are could be another pages,
> > > > > which weren't faulted, they just was sent from source to destination,
> > > > > I called it prefetching.
> > > > > 
> > > > > I think I got why did you ask that question, because in this version
> > > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > > it calculates every time when any page is copied, but it should
> > > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > > was correctly calculated.
> > > > 
> > > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > > total_downtime will be more than its real value.
> > > 
> > > It should be OK as long as we measure the time between
> > >   userfault reporting a page miss for an address 
> > >   and
> > >   place_page for *that same address*
> > > 
> > > any places for other pages are irrelevant.
> > > 
> > > (I still worry that this definition of 'downtime' is possibly
> > > arbitrary - since if all but one of the vCPUs are down we
> > > don't count it but it's obviously still a big impact).
> > Technically we count downtime per vCPU  and storing it in
> >     vcpu_downtime field of PostcopyDowntimeContext (in this version
> >     still DowntimeContext). I traced downtime per vCPU in previous version.
> > But it just traced as total_downtime in current version.
> > 
> > Also total_downtime is not possible to get on destination, due to
> > query-migrate is about MigrationState, but not MigrationIncomingState,
> > so I think need to extend it to MigrationIncomingState too.
> 
> I don't think that's too problematic; just add it to qmp_query_migrate;
> the only thing to be careful of is what happens if the incoming
> migration finishes during the info migrate is reading the
> MigrationIncomingState.
Do you mean none atomic read of s->state in qmp_query_migrate, if so
it's also a pre-copy problem. I saw you atomic operations like
postcopy_state_get/set.

> Dave
> 
> > > 
> > > Dave
> > > 
> > > > -- 
> > > > Peter Xu
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > 
> > -- 
> > 
> > BR
> > Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
       [not found]                         ` <CGME20170511064629eucas1p114c72db6d922a6a05a4ec4a4d3003b55@eucas1p1.samsung.com>
@ 2017-05-11  6:46                           ` Alexey
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey @ 2017-05-11  6:46 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: i.maximets, qemu-devel, Dr. David Alan Gilbert, Peter Xu, f4bug

On Wed, May 10, 2017 at 04:58:59PM +0100, Daniel P. Berrange wrote:
> On Wed, May 10, 2017 at 06:46:50PM +0300, Alexey wrote:
> > On Tue, May 09, 2017 at 10:44:34AM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > > > >>
> > > > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > > > >>
> > > > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > > > >>implementation.
> > > > > > > > >>
> > > > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > > > >>
> > > > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > > > >>---
> > > > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > > > >>
> > > > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > > > >>--- a/include/migration/migration.h
> > > > > > > > >>+++ b/include/migration/migration.h
> > > > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > > > >>   * Functions to work with downtime context
> > > > > > > > >>   */
> > > > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > > > >>  struct MigrationState
> > > > > > > > >>  {
> > > > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > > > >>--- a/migration/migration.c
> > > > > > > > >>+++ b/migration/migration.c
> > > > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > > > >>  }
> > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > > > >>+{
> > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > > > >>+        return;
> > > > > > > > >>+    }
> > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > >>+
> > > > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > > > >>+            cpu);
> > > > > > > > >>+}
> > > > > > > > >>+
> > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > > > >>+{
> > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > >>+    int i;
> > > > > > > > >>+    bool all_vcpu_down = true;
> > > > > > > > >>+    int64_t now;
> > > > > > > > >>+
> > > > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > > > >>+        return;
> > > > > > > > >>+    }
> > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > >>+
> > > > > > > > >>+    /* check all vCPU down,
> > > > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > > > >>+     * will be a cycle */
> > > > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > > > >>+            continue;
> > > > > > > > >>+        }
> > > > > > > > >>+        all_vcpu_down = false;
> > > > > > > > >>+        break;
> > > > > > > > >>+    }
> > > > > > > > >>+
> > > > > > > > >>+    if (all_vcpu_down) {
> > > > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > > > >not?
> > > > > > > > no, the downtime implies since page fault till the
> > > > > > > > page will be copied.
> > > > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > > > 
> > > > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > > > Let me clarify.
> > > > > > 
> > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > >is one of the page faulted addresses?
> > > > > > Yes it's primary condition, due to there are could be another pages,
> > > > > > which weren't faulted, they just was sent from source to destination,
> > > > > > I called it prefetching.
> > > > > > 
> > > > > > I think I got why did you ask that question, because in this version
> > > > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > > > it calculates every time when any page is copied, but it should
> > > > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > > > was correctly calculated.
> > > > > 
> > > > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > > > total_downtime will be more than its real value.
> > > > 
> > > > It should be OK as long as we measure the time between
> > > >   userfault reporting a page miss for an address 
> > > >   and
> > > >   place_page for *that same address*
> > > > 
> > > > any places for other pages are irrelevant.
> > > > 
> > > > (I still worry that this definition of 'downtime' is possibly
> > > > arbitrary - since if all but one of the vCPUs are down we
> > > > don't count it but it's obviously still a big impact).
> > > 
> > > Can we also *not* call it "downtime", as it is measuring a very different
> > > thing than the "downtime" we have measured today on the source during
> > > pre-copy migration.
> > 
> > As I know downtime in pre-copy migration it's a time since all vCPU
> > were disabled on source till enabling these vCPU on destination,
> > but it's continuous time. So the meaning of both downtime is the same.
> > It's time when all vCPU are down.
> 
> It is really not the same. In pre-copy, all the CPUs are in an explicit
> "stopped" state for a continuous period of time. In post-copy the CPUs
> are all in the "running" state, but repeatedly get blocked with arbitrary
> time intervals to fetch pages from the source. This is totally different
> information being reported, so should have a different name.
Frankly saying, in RFC I asked that question: why QEMU didn't change
vCPU state while waiting for a page.

> 
> Calling it downtime will also lead to confusion wrt to the "max downtime"
> tunable parameter, which is not at all relevant here.
> 
> > But as David mentioned - downtime per vCPU is also important in case of
> > postcopy live migration, and I'll include it in final result of
> > query-migrate too.
> 
> That we need to report different data per-vCPU, is another reason why
> we should not call it "downtime".
Here I totally agree.
So logically to call that list vcpu_blocktime, w/o page/fault/... in the
name, maybe with postcopy prefix. And value when all vcpu is blocked
vcpu_total_blocktime.

> 
> > > Call it "pagewait" or "delaytime" or something like that to indicate it
> > > is counting delays to CPUs for page fetching.
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
       [not found]                           ` <CGME20170511070940eucas1p2ca3e44c15c84eef00e33d755a11c0ea1@eucas1p2.samsung.com>
@ 2017-05-11  7:09                             ` Alexey
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey @ 2017-05-11  7:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrange, i.maximets, qemu-devel, f4bug,
	Dr. David Alan Gilbert

On Thu, May 11, 2017 at 12:56:55PM +0800, Peter Xu wrote:
> On Wed, May 10, 2017 at 04:58:59PM +0100, Daniel P. Berrange wrote:
> > On Wed, May 10, 2017 at 06:46:50PM +0300, Alexey wrote:
> > > On Tue, May 09, 2017 at 10:44:34AM +0100, Daniel P. Berrange wrote:
> > > > On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > > > > >>
> > > > > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > > > > >>
> > > > > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > > > > >>implementation.
> > > > > > > > > >>
> > > > > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > > > > >>
> > > > > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > > > > >>---
> > > > > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > > > > >>
> > > > > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > > > > >>--- a/include/migration/migration.h
> > > > > > > > > >>+++ b/include/migration/migration.h
> > > > > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > > > > >>   * Functions to work with downtime context
> > > > > > > > > >>   */
> > > > > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > > > > >>  struct MigrationState
> > > > > > > > > >>  {
> > > > > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > > > > >>--- a/migration/migration.c
> > > > > > > > > >>+++ b/migration/migration.c
> > > > > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > > > > >>  }
> > > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > > > > >>+{
> > > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > > > > >>+        return;
> > > > > > > > > >>+    }
> > > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > > >>+
> > > > > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > > > > >>+            cpu);
> > > > > > > > > >>+}
> > > > > > > > > >>+
> > > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > > > > >>+{
> > > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > > >>+    int i;
> > > > > > > > > >>+    bool all_vcpu_down = true;
> > > > > > > > > >>+    int64_t now;
> > > > > > > > > >>+
> > > > > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > > > > >>+        return;
> > > > > > > > > >>+    }
> > > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > > >>+
> > > > > > > > > >>+    /* check all vCPU down,
> > > > > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > > > > >>+     * will be a cycle */
> > > > > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > > > > >>+            continue;
> > > > > > > > > >>+        }
> > > > > > > > > >>+        all_vcpu_down = false;
> > > > > > > > > >>+        break;
> > > > > > > > > >>+    }
> > > > > > > > > >>+
> > > > > > > > > >>+    if (all_vcpu_down) {
> > > > > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > > > > >not?
> > > > > > > > > no, the downtime implies since page fault till the
> > > > > > > > > page will be copied.
> > > > > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > > > > 
> > > > > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > > > > Let me clarify.
> > > > > > > 
> > > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > > >is one of the page faulted addresses?
> > > > > > > Yes it's primary condition, due to there are could be another pages,
> > > > > > > which weren't faulted, they just was sent from source to destination,
> > > > > > > I called it prefetching.
> > > > > > > 
> > > > > > > I think I got why did you ask that question, because in this version
> > > > > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > > > > it calculates every time when any page is copied, but it should
> > > > > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > > > > was correctly calculated.
> > > > > > 
> > > > > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > > > > total_downtime will be more than its real value.
> > > > > 
> > > > > It should be OK as long as we measure the time between
> > > > >   userfault reporting a page miss for an address 
> > > > >   and
> > > > >   place_page for *that same address*
> > > > > 
> > > > > any places for other pages are irrelevant.
> > > > > 
> > > > > (I still worry that this definition of 'downtime' is possibly
> > > > > arbitrary - since if all but one of the vCPUs are down we
> > > > > don't count it but it's obviously still a big impact).
> > > > 
> > > > Can we also *not* call it "downtime", as it is measuring a very different
> > > > thing than the "downtime" we have measured today on the source during
> > > > pre-copy migration.
> > > 
> > > As I know downtime in pre-copy migration it's a time since all vCPU
> > > were disabled on source till enabling these vCPU on destination,
> > > but it's continuous time. So the meaning of both downtime is the same.
> > > It's time when all vCPU are down.
> > 
> > It is really not the same. In pre-copy, all the CPUs are in an explicit
> > "stopped" state for a continuous period of time. In post-copy the CPUs
> > are all in the "running" state, but repeatedly get blocked with arbitrary
> > time intervals to fetch pages from the source. This is totally different
> > information being reported, so should have a different name.
> > 
> > Calling it downtime will also lead to confusion wrt to the "max downtime"
> > tunable parameter, which is not at all relevant here.
> 
> Indeed.
> 
> A big difference that this "postcopy downtime" differs from original
> downtime is that, this new "downtime" may even not be detected by
> guest user. It just like how operating system allocate time slides -
> the "downtime" during postcopy phase is intermittent and I believe the
> guest user can hardly differenciate that from general UI process
> switching on some platforms.
> 
> Since we are at this, Alex, could I ask in what case do we really care
> about this summation of "postcopy downtime"? Or say, how this
> requirement come from?
Suspend during postcopy migration leads to network packet drop in
network specific software, e.g. based on DPDK or another framework.
It's interesting to know amount of time when all vCPU were down as well
as each vCPU, "downtime" on source side doesn't give us real information
about guest on source side.
Looks like I forget to add rationale in this version.

Another way of measure it, it's using ping, but precision of the method isn't well.
Or use something inside guest.

In previous series V2 I also included patch to trace procfs's process
state/stack. And I found that not only vCPU is suspended due to page
fault.

> 
> Thanks,
> 
> -- 
> Peter Xu
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
  2017-05-11  6:32                       ` Alexey
@ 2017-05-11  8:25                         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2017-05-11  8:25 UTC (permalink / raw)
  To: Alexey; +Cc: i.maximets, qemu-devel, Peter Xu, f4bug

* Alexey (a.perevalov@samsung.com) wrote:
> On Tue, May 09, 2017 at 08:01:01PM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey (a.perevalov@samsung.com) wrote:
> > > On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > > > >>
> > > > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > > > >>
> > > > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > > > >>implementation.
> > > > > > > > >>
> > > > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > > > >>
> > > > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > > > >>---
> > > > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > > > >>
> > > > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > > > >>--- a/include/migration/migration.h
> > > > > > > > >>+++ b/include/migration/migration.h
> > > > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > > > >>   * Functions to work with downtime context
> > > > > > > > >>   */
> > > > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > > > >>  struct MigrationState
> > > > > > > > >>  {
> > > > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > > > >>--- a/migration/migration.c
> > > > > > > > >>+++ b/migration/migration.c
> > > > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > > > >>  }
> > > > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > > > >>+{
> > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > > > >>+        return;
> > > > > > > > >>+    }
> > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > >>+
> > > > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > > > >>+            cpu);
> > > > > > > > >>+}
> > > > > > > > >>+
> > > > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > > > >>+{
> > > > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > > > >>+    DowntimeContext *dc;
> > > > > > > > >>+    int i;
> > > > > > > > >>+    bool all_vcpu_down = true;
> > > > > > > > >>+    int64_t now;
> > > > > > > > >>+
> > > > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > > > >>+        return;
> > > > > > > > >>+    }
> > > > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > > > >>+
> > > > > > > > >>+    /* check all vCPU down,
> > > > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > > > >>+     * will be a cycle */
> > > > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > > > >>+            continue;
> > > > > > > > >>+        }
> > > > > > > > >>+        all_vcpu_down = false;
> > > > > > > > >>+        break;
> > > > > > > > >>+    }
> > > > > > > > >>+
> > > > > > > > >>+    if (all_vcpu_down) {
> > > > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > > > >not?
> > > > > > > > no, the downtime implies since page fault till the
> > > > > > > > page will be copied.
> > > > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > > > 
> > > > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > > > Let me clarify.
> > > > > > 
> > > > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > > > >is one of the page faulted addresses?
> > > > > > Yes it's primary condition, due to there are could be another pages,
> > > > > > which weren't faulted, they just was sent from source to destination,
> > > > > > I called it prefetching.
> > > > > > 
> > > > > > I think I got why did you ask that question, because in this version
> > > > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > > > it calculates every time when any page is copied, but it should
> > > > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > > > was correctly calculated.
> > > > > 
> > > > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > > > total_downtime will be more than its real value.
> > > > 
> > > > It should be OK as long as we measure the time between
> > > >   userfault reporting a page miss for an address 
> > > >   and
> > > >   place_page for *that same address*
> > > > 
> > > > any places for other pages are irrelevant.
> > > > 
> > > > (I still worry that this definition of 'downtime' is possibly
> > > > arbitrary - since if all but one of the vCPUs are down we
> > > > don't count it but it's obviously still a big impact).
> > > Technically we count downtime per vCPU  and storing it in
> > >     vcpu_downtime field of PostcopyDowntimeContext (in this version
> > >     still DowntimeContext). I traced downtime per vCPU in previous version.
> > > But it just traced as total_downtime in current version.
> > > 
> > > Also total_downtime is not possible to get on destination, due to
> > > query-migrate is about MigrationState, but not MigrationIncomingState,
> > > so I think need to extend it to MigrationIncomingState too.
> > 
> > I don't think that's too problematic; just add it to qmp_query_migrate;
> > the only thing to be careful of is what happens if the incoming
> > migration finishes during the info migrate is reading the
> > MigrationIncomingState.
> Do you mean none atomic read of s->state in qmp_query_migrate, if so
> it's also a pre-copy problem. I saw you atomic operations like
> postcopy_state_get/set.

Yes, you're right - I'd remembered that MigrationState was statically
allocated, I'd forgotten that MigrationIncomingState was also statically
allocated - so we don't have to worry about it getting freed.

Dave

> > Dave
> > 
> > > > 
> > > > Dave
> > > > 
> > > > > -- 
> > > > > Peter Xu
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > 
> > > -- 
> > > 
> > > BR
> > > Alexey
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-05-11  8:25 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20170428065752eucas1p1b702ff53ba0bd96674e8cc35466f8046@eucas1p1.samsung.com>
2017-04-28  6:57 ` [Qemu-devel] [PATCH RESEND V3 0/6] calculate downtime for postcopy live migration Alexey Perevalov
     [not found]   ` <CGME20170428065752eucas1p190511b1932f61b6321c489f0eb4e816f@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
     [not found]   ` <CGME20170428065753eucas1p1639528c4df0b459db96579fd5bee281c@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 2/6] migration: pass ptr to MigrationIncomingState into migration ufd_version_check & postcopy_ram_supported_by_host Alexey Perevalov
2017-04-28  9:04       ` Peter Xu
     [not found]   ` <CGME20170428065753eucas1p1524aa2bd8e469e6c94a88ee80eb54a6e@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
2017-04-28  9:01       ` Peter Xu
2017-04-28 10:58         ` Alexey Perevalov
2017-04-28 12:57           ` Alexey Perevalov
2017-04-28 15:55       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170428065754eucas1p1f51713373ce8c2d19945a4f91c52bd5c@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState Alexey Perevalov
2017-04-28  9:38       ` Peter Xu
2017-04-28 10:03         ` Alexey Perevalov
2017-04-28 10:07           ` Peter Xu
2017-04-28 16:22             ` Dr. David Alan Gilbert
2017-04-29  9:16               ` Alexey
2017-04-29 15:02                 ` Eric Blake
2017-05-02  8:51                 ` Dr. David Alan Gilbert
2017-05-04 13:09                   ` Alexey
2017-05-05 14:11                     ` Dr. David Alan Gilbert
2017-05-05 16:25                       ` Alexey
     [not found]   ` <CGME20170428065755eucas1p2ff9aa17eaa294e741d8c65f8d58a71fb@eucas1p2.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side Alexey Perevalov
2017-04-28 10:00       ` Peter Xu
2017-04-28 11:11         ` Alexey Perevalov
2017-05-08  6:29           ` Peter Xu
2017-05-08  9:08             ` Alexey
2017-05-09  8:26               ` Peter Xu
2017-05-09  9:40                 ` Dr. David Alan Gilbert
2017-05-09  9:44                   ` Daniel P. Berrange
2017-05-10 15:46                     ` Alexey
2017-05-10 15:58                       ` Daniel P. Berrange
2017-05-11  4:56                         ` Peter Xu
     [not found]                           ` <CGME20170511070940eucas1p2ca3e44c15c84eef00e33d755a11c0ea1@eucas1p2.samsung.com>
2017-05-11  7:09                             ` Alexey
     [not found]                         ` <CGME20170511064629eucas1p114c72db6d922a6a05a4ec4a4d3003b55@eucas1p1.samsung.com>
2017-05-11  6:46                           ` Alexey
2017-05-09 15:19                   ` Alexey
2017-05-09 19:01                     ` Dr. David Alan Gilbert
2017-05-11  6:32                       ` Alexey
2017-05-11  8:25                         ` Dr. David Alan Gilbert
2017-04-28 16:34       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170428065755eucas1p1cdd0f278a235f176e9f63c40bc64a7a9@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 6/6] migration: trace postcopy total downtime Alexey Perevalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.