All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v11 0/6] calculate blocktime for postcopy live migration
       [not found] <CGME20171005111622eucas1p2093fedb3a69ca522d6b8260377b75419@eucas1p2.samsung.com>
@ 2017-10-05 11:16 ` Alexey Perevalov
       [not found]   ` <CGME20171005111622eucas1p1dd4545fb4b45add67b222d91355aa208@eucas1p1.samsung.com>
                     ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

This is 11th version.

The rationale for that idea is following:
vCPU could suspend during postcopy live migration until faulted
page is not copied into kernel. Downtime on source side it's a value -
time interval since source turn vCPU off, till destination start runnig
vCPU. But that value was proper value for precopy migration it really shows
amount of time when vCPU is down. But not for postcopy migration, because
several vCPU threads could susppend after vCPU was started. That is important
to estimate packet drop for SDN software.

(V11 -> V10)
    - rebase
    - update documentation (comment from David)
    - postcopy_notifier was removed from PostcopyBlocktimeContext (comment from
David)
    - fix "since 2.10" for postcopy-vcpu-blocktime (comment from Eric)
    - fix order in mark_postcopy_blocktime_begin/end (comment from David),
but I think it still have a slim race condition
    - remove error_report from fill_destination_postcopy_migration_info (comment
from David)

(V9 -> V10)
    - rebase
    - patch "update kernel header for UFFD_FEATURE_*" has changed,
and was generated by  scripts/update-linux-headers.sh as David suggested. 


(V8 -> V9)
    - rebase
    - traces

(V7 -> V8)
    - just one comma in
"migration: fix hardcoded function name in error report"
It was really missed, but fixed in futher patch.

(V6 -> V7)
    - copied bitmap was placed into RAMBlock as another migration
related bitmaps.
    - Ordering of mark_postcopy_blocktime_end call and ordering
of checking copied bitmap were changed.
    - linewrap style defects
    - new patch "postcopy_place_page factoring out"
    - postcopy_ram_supported_by_host accepts
MigrationIncomingState in qmp_migrate_set_capabilities
    - minor fixes of documentation. 
    and huge description of get_postcopy_total_blocktime was
moved. Davids comment.

(V5 -> V6)
    - blocktime was added into hmp command. Comment from David.
    - bitmap for copied pages was added as well as check in *_begin/_end
functions. Patch uses just introduced RAMBLOCK_FOREACH. Comment from David.
    - description of receive_ufd_features/request_ufd_features. Comment from David.
    - commit message headers/@since references were modified. Comment from Eric.
    - also typos in documentation. Comment from Eric.
    - style and description of field in MigrationInfo. Comment from Eric.
    - ufd_check_and_apply (former ufd_version_check) is calling twice,
so my previous patch contained double allocation of blocktime context and
as a result memory leak. In this patch series it was fixed.

(V4 -> V5)
    - fill_destination_postcopy_migration_info empty stub was missed for none linux
build

(V3 -> V4)
    - get rid of Downtime as a name for vCPU waiting time during postcopy migration
    - PostcopyBlocktimeContext renamed (it was just BlocktimeContext)
    - atomic operations are used for dealing with fields of PostcopyBlocktimeContext
affected in both threads.
    - hardcoded function names in error_report were replaced to %s and __line__
    - this patch set includes postcopy-downtime capability, but it used on
destination, coupled with not possibility to return calculated downtime back
to source to show it in query-migrate, it looks like a big trade off
    - UFFD_API have to be sent notwithstanding need or not to ask kernel
for a feature, due to kernel expects it in any case (see patch comment)
    - postcopy_downtime included into query-migrate output
    - also this patch set includes trivial fix
migration: fix hardcoded function name in error report
maybe that is a candidate for qemu-trivial mailing list, but I already
sent "migration: Fixed code style" and it was unclaimed.

(V2 -> V3)
    - Downtime calculation approach was changed, thanks to Peter Xu
    - Due to previous point no more need to keep GTree as well as bitmap of cpus.
So glib changes aren't included in this patch set, it could be resent in
another patch set, if it will be a good reason for it.
    - No procfs traces in this patchset, if somebody wants it, you could get it
from patchwork site to track down page fault initiators.
    - UFFD_FEATURE_THREAD_ID is requesting only when kernel supports it
    - It doesn't send back the downtime, just trace it

This patch set is based on commit
[PATCH v10 0/3] Add bitmap for received pages in postcopy migration

Both patch sets were rebased on 
commit d147f7e815f97cb477e223586bcb80c316ae10ea

Alexey Perevalov (6):
  migration: introduce postcopy-blocktime capability
  migration: add postcopy blocktime ctx into MigrationIncomingState
  migration: calculate vCPU blocktime on dst side
  migration: postcopy_blocktime documentation
  migration: add blocktime calculation into postcopy-test
  migration: add postcopy total blocktime into query-migrate

 docs/devel/migration.txt |  13 +++
 hmp.c                    |  15 +++
 migration/migration.c    |  51 +++++++++-
 migration/migration.h    |  13 +++
 migration/postcopy-ram.c | 257 ++++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events   |   6 +-
 qapi/migration.json      |  16 ++-
 tests/postcopy-test.c    |  63 +++++++++---
 8 files changed, 411 insertions(+), 23 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v11 1/6] migration: introduce postcopy-blocktime capability
       [not found]   ` <CGME20171005111622eucas1p1dd4545fb4b45add67b222d91355aa208@eucas1p1.samsung.com>
@ 2017-10-05 11:16     ` Alexey Perevalov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/migration.c | 9 +++++++++
 migration/migration.h | 1 +
 qapi/migration.json   | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 98429dc..713f070 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1467,6 +1467,15 @@ bool migrate_zero_blocks(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
 }
 
+bool migrate_postcopy_blocktime(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME];
+}
+
 bool migrate_use_compression(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index b83ccea..c12ceba 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -193,6 +193,7 @@ int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
+bool migrate_postcopy_blocktime(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index f8b365e..0f2af26 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -343,12 +343,15 @@
 #
 # @x-multifd: Use more than one fd for migration (since 2.11)
 #
+# @postcopy-blocktime: Calculate downtime for postcopy live migration
+#                     (since 2.11)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
            'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
-           'block', 'return-path', 'x-multifd' ] }
+           'block', 'return-path', 'x-multifd', 'postcopy-blocktime' ] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v11 2/6] migration: add postcopy blocktime ctx into MigrationIncomingState
       [not found]   ` <CGME20171005111623eucas1p272597c60842087cac3ade92b88212eff@eucas1p2.samsung.com>
@ 2017-10-05 11:16     ` Alexey Perevalov
  2017-10-18 11:21       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
in case when this feature is provided by kernel.

PostcopyBlocktimeContext is incapsulated inside postcopy-ram.c,
due to it's postcopy only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to request
proper capabiltiy (Patch for documentation will be at the tail of the patch
set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/migration.h    |  8 +++++++
 migration/postcopy-ram.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index c12ceba..2bae992 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -22,6 +22,8 @@
 #include "hw/qdev.h"
 #include "io/channel.h"
 
+struct PostcopyBlocktimeContext;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -59,6 +61,12 @@ struct MigrationIncomingState {
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
     QemuSemaphore colo_incoming_sem;
+
+    /*
+     * PostcopyBlocktimeContext to keep information for postcopy
+     * live migration, to calculate vCPU block time
+     * */
+    struct PostcopyBlocktimeContext *blocktime_ctx;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index bec6c2c..c18ec5a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -61,6 +61,52 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
+typedef struct PostcopyBlocktimeContext {
+    /* time when page fault initiated per vCPU */
+    int64_t *page_fault_vcpu_time;
+    /* page address per vCPU */
+    uint64_t *vcpu_addr;
+    int64_t total_blocktime;
+    /* blocktime per vCPU */
+    int64_t *vcpu_blocktime;
+    /* point in time when last page fault was initiated */
+    int64_t last_begin;
+    /* number of vCPU are suspended */
+    int smp_cpus_down;
+
+    /*
+     * Handler for exit event, necessary for
+     * releasing whole blocktime_ctx
+     */
+    Notifier exit_notifier;
+} PostcopyBlocktimeContext;
+
+static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
+{
+    g_free(ctx->page_fault_vcpu_time);
+    g_free(ctx->vcpu_addr);
+    g_free(ctx->vcpu_blocktime);
+    g_free(ctx);
+}
+
+static void migration_exit_cb(Notifier *n, void *data)
+{
+    PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
+                                                 exit_notifier);
+    destroy_blocktime_context(ctx);
+}
+
+static struct PostcopyBlocktimeContext *blocktime_context_new(void)
+{
+    PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
+    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
+    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
+    ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
+
+    ctx->exit_notifier.notify = migration_exit_cb;
+    qemu_add_exit_notifier(&ctx->exit_notifier);
+    return ctx;
+}
 
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
@@ -153,6 +199,19 @@ static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
         }
     }
 
+#ifdef UFFD_FEATURE_THREAD_ID
+    if (migrate_postcopy_blocktime() && mis &&
+        UFFD_FEATURE_THREAD_ID & supported_features) {
+        /* kernel supports that feature */
+        /* don't create blocktime_context if it exists */
+        if (!mis->blocktime_ctx) {
+            mis->blocktime_ctx = blocktime_context_new();
+        }
+
+        asked_features |= UFFD_FEATURE_THREAD_ID;
+    }
+#endif
+
     /*
      * request features, even if asked_features is 0, due to
      * kernel expects UFFD_API before UFFDIO_REGISTER, per
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v11 3/6] migration: calculate vCPU blocktime on dst side
       [not found]   ` <CGME20171005111624eucas1p294c2c03421f17915b82cbde4cf4b9fa3@eucas1p2.samsung.com>
@ 2017-10-05 11:16     ` Alexey Perevalov
  2017-10-18 18:59       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events   |   5 +-
 2 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c18ec5a..2e10870 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -553,6 +553,141 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid) {
+            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
+            return cpu_iter->cpu_index;
+        }
+    }
+    trace_get_mem_fault_cpu_index(-1, pid);
+    return -1;
+}
+
+/*
+ * This function is being called when pagefault occurs. It
+ * tracks down vCPU blocking time.
+ *
+ * @addr: faulted host virtual address
+ * @ptid: faulted process thread id
+ * @rb: ramblock appropriate to addr
+ */
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+                                          RAMBlock *rb)
+{
+    int cpu, already_received;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int64_t now_ms;
+
+    if (!dc || ptid == 0) {
+        return;
+    }
+    cpu = get_mem_fault_cpu_index(ptid);
+    if (cpu < 0) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    if (dc->vcpu_addr[cpu] == 0) {
+        atomic_inc(&dc->smp_cpus_down);
+    }
+
+    atomic_xchg__nocheck(&dc->last_begin, now_ms);
+    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+
+    /* check it here, not at the begining of the function,
+     * due to, check could accur early than bitmap_set in
+     * qemu_ufd_copy_ioctl */
+    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
+    if (already_received) {
+        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
+        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
+        atomic_sub(&dc->smp_cpus_down, 1);
+    }
+    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
+                                        cpu, already_received);
+}
+
+/*
+ *  This function just provide calculated blocktime per cpu and trace it.
+ *  Total blocktime is calculated in mark_postcopy_blocktime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *            it's a part of total blocktime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *              * - means blocktime per vCPU
+ *              x - means overlapped blocktime (total blocktime)
+ *
+ * @addr: host virtual address
+ */
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int i, affected_cpu = 0;
+    int64_t now_ms;
+    bool vcpu_total_blocktime = false;
+
+    if (!dc) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    /* lookup cpu, to clear it,
+     * that algorithm looks straighforward, but it's not
+     * optimal, more optimal algorithm is keeping tree or hash
+     * where key is address value is a list of  */
+    for (i = 0; i < smp_cpus; i++) {
+        uint64_t vcpu_blocktime = 0;
+
+        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr ||
+            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0) == 0) {
+            continue;
+        }
+        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+        vcpu_blocktime = now_ms -
+            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+        affected_cpu += 1;
+        /* we need to know is that mark_postcopy_end was due to
+         * faulted page, another possible case it's prefetched
+         * page and in that case we shouldn't be here */
+        if (!vcpu_total_blocktime &&
+            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
+            vcpu_total_blocktime = true;
+        }
+        /* continue cycle, due to one page could affect several vCPUs */
+        dc->vcpu_blocktime[i] += vcpu_blocktime;
+    }
+
+    atomic_sub(&dc->smp_cpus_down, affected_cpu);
+    if (vcpu_total_blocktime) {
+        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
+    }
+    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
+                                      affected_cpu);
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -630,8 +765,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset,
+                                                msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
+                                      msg.arg.pagefault.feat.ptid, rb);
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -721,6 +859,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
     if (!ret) {
         ramblock_recv_bitmap_set_range(rb, host_addr,
                                        pagesize / qemu_target_page_size());
+        mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr);
+
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 6f29fcc..b0c8708 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -115,6 +115,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d"
+mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d"
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -191,7 +193,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx pid=%u"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -200,6 +202,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v11 4/6] migration: postcopy_blocktime documentation
       [not found]   ` <CGME20171005111624eucas1p193bfeb0e428c8eee6180a1f7b96c0713@eucas1p1.samsung.com>
@ 2017-10-05 11:16     ` Alexey Perevalov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 docs/devel/migration.txt | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/docs/devel/migration.txt b/docs/devel/migration.txt
index 4030703..cebfe7a 100644
--- a/docs/devel/migration.txt
+++ b/docs/devel/migration.txt
@@ -402,6 +402,19 @@ will now cause the transition from precopy to postcopy.
 It can be issued immediately after migration is started or any
 time later on.  Issuing it after the end of a migration is harmless.
 
+Blocktime is a postcopy live migration metric, intended to show
+how long the vCPU was in state of interruptable sleep due to pagefault.
+That metric is calculated both for all vCPUs as overlapped value, and
+separately for each vCPU. These values are calculated on destination side.
+To enable postcopy blocktime calculation, enter following command on destination
+monitor:
+
+migrate_set_capability postcopy-blocktime on
+
+Postcopy blocktime can be retrieved by query-migrate qmp command.
+postcopy-blocktime value of qmp command will show overlapped blocking time for
+all vCPU, postcopy-vcpu-blocktime will show list of blocking time per vCPU.
+
 Note: During the postcopy phase, the bandwidth limits set using
 migrate_set_speed is ignored (to avoid delaying requested pages that
 the destination is waiting for).
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v11 5/6] migration: add blocktime calculation into postcopy-test
       [not found]   ` <CGME20171005111625eucas1p28ad35b246e6f964ca7d642cfa60df10d@eucas1p2.samsung.com>
@ 2017-10-05 11:16     ` Alexey Perevalov
  2017-10-18 19:09       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

This patch just requests blocktime calculation,
and check it in case when UFFD_FEATURE_THREAD_ID feature is set
on the host.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 tests/postcopy-test.c | 63 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 48 insertions(+), 15 deletions(-)

diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
index 8142f2a..4231cce 100644
--- a/tests/postcopy-test.c
+++ b/tests/postcopy-test.c
@@ -24,7 +24,8 @@
 
 const unsigned start_address = 1024 * 1024;
 const unsigned end_address = 100 * 1024 * 1024;
-bool got_stop;
+static bool got_stop;
+static bool uffd_feature_thread_id;
 
 #if defined(__linux__)
 #include <sys/syscall.h>
@@ -54,6 +55,7 @@ static bool ufd_version_check(void)
         g_test_message("Skipping test: UFFDIO_API failed");
         return false;
     }
+    uffd_feature_thread_id = api_struct.features & UFFD_FEATURE_THREAD_ID;
 
     ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
                  (__u64)1 << _UFFDIO_UNREGISTER;
@@ -265,22 +267,48 @@ static uint64_t get_migration_pass(void)
     return result;
 }
 
-static void wait_for_migration_complete(void)
+static bool get_src_status(void)
 {
     QDict *rsp, *rsp_return;
+    const char *status;
+    bool result;
+
+    rsp = return_or_event(qmp("{ 'execute': 'query-migrate' }"));
+    rsp_return = qdict_get_qdict(rsp, "return");
+    status = qdict_get_str(rsp_return, "status");
+    g_assert_cmpstr(status, !=,  "failed");
+    result = strcmp(status, "completed") == 0;
+    QDECREF(rsp);
+    return result;
+}
+
+static void read_blocktime(void)
+{
+    QDict *rsp, *rsp_return;
+
+    rsp = return_or_event(qmp("{ 'execute': 'query-migrate' }"));
+    rsp_return = qdict_get_qdict(rsp, "return");
+    g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
+    QDECREF(rsp);
+}
+
+static void wait_for_migration_complete(QTestState *from, QTestState *to)
+{
     bool completed;
 
     do {
-        const char *status;
-
-        rsp = return_or_event(qmp("{ 'execute': 'query-migrate' }"));
-        rsp_return = qdict_get_qdict(rsp, "return");
-        status = qdict_get_str(rsp_return, "status");
-        completed = strcmp(status, "completed") == 0;
-        g_assert_cmpstr(status, !=,  "failed");
-        QDECREF(rsp);
+
+        /* test src state */
+        global_qtest = from;
+        completed = get_src_status();
+
         usleep(1000 * 100);
     } while (!completed);
+
+    if (uffd_feature_thread_id) {
+        global_qtest = to;
+        read_blocktime();
+    }
 }
 
 static void wait_for_migration_pass(void)
@@ -364,8 +392,6 @@ static void test_migrate(void)
     char *bootpath = g_strdup_printf("%s/bootsect", tmpfs);
     const char *arch = qtest_get_arch();
 
-    got_stop = false;
-
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         init_bootfile_x86(bootpath);
         cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 150M"
@@ -425,6 +451,15 @@ static void test_migrate(void)
     g_assert(qdict_haskey(rsp, "return"));
     QDECREF(rsp);
 
+    global_qtest = to;
+    rsp = qmp("{ 'execute': 'migrate-set-capabilities',"
+                  "'arguments': { "
+                      "'capabilities': [ {"
+                          "'capability': 'postcopy-blocktime',"
+                          "'state': true } ] } }");
+    g_assert(qdict_haskey(rsp, "return"));
+    QDECREF(rsp);
+
     /* We want to pick a speed slow enough that the test completes
      * quickly, but that it doesn't complete precopy even on a slow
      * machine, so also set the downtime.
@@ -441,7 +476,6 @@ static void test_migrate(void)
     g_assert(qdict_haskey(rsp, "return"));
     QDECREF(rsp);
 
-
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
@@ -467,8 +501,7 @@ static void test_migrate(void)
     qmp_eventwait("RESUME");
 
     wait_for_serial("dest_serial");
-    global_qtest = from;
-    wait_for_migration_complete();
+    wait_for_migration_complete(from, to);
 
     qtest_quit(from);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v11 6/6] migration: add postcopy total blocktime into query-migrate
       [not found]   ` <CGME20171005111626eucas1p2ea023000ede617c0b8509f11c99fc10a@eucas1p2.samsung.com>
@ 2017-10-05 11:16     ` Alexey Perevalov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-05 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, quintela, dgilbert, peterx, i.maximets, heetae82.ahn

Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
    "postcopy-vcpu-blocktime": [115, 100],
    "status": "completed",
    "postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 hmp.c                    | 15 +++++++++++++
 migration/migration.c    | 42 ++++++++++++++++++++++++++++++++----
 migration/migration.h    |  4 ++++
 migration/postcopy-ram.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events   |  1 +
 qapi/migration.json      | 11 +++++++++-
 6 files changed, 124 insertions(+), 5 deletions(-)

diff --git a/hmp.c b/hmp.c
index ace729d..1939c02 100644
--- a/hmp.c
+++ b/hmp.c
@@ -264,6 +264,21 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->cpu_throttle_percentage);
     }
 
+    if (info->has_postcopy_blocktime) {
+        monitor_printf(mon, "postcopy blocktime: %" PRId64 "\n",
+                       info->postcopy_blocktime);
+    }
+
+    if (info->has_postcopy_vcpu_blocktime) {
+        Visitor *v;
+        char *str;
+        v = string_output_visitor_new(false, &str);
+        visit_type_int64List(v, NULL, &info->postcopy_vcpu_blocktime, NULL);
+        visit_complete(v, &str);
+        monitor_printf(mon, "postcopy vcpu blocktime: %s\n", str);
+        g_free(str);
+        visit_free(v);
+    }
     qapi_free_MigrationInfo(info);
     qapi_free_MigrationCapabilityStatusList(caps);
 }
diff --git a/migration/migration.c b/migration/migration.c
index 713f070..91fe885 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -584,14 +584,15 @@ static void populate_disk_info(MigrationInfo *info)
     }
 }
 
-MigrationInfo *qmp_query_migrate(Error **errp)
+static void fill_source_migration_info(MigrationInfo *info)
 {
-    MigrationInfo *info = g_malloc0(sizeof(*info));
     MigrationState *s = migrate_get_current();
 
     switch (s->state) {
     case MIGRATION_STATUS_NONE:
         /* no migration has happened ever */
+        /* do not overwrite destination migration status */
+        return;
         break;
     case MIGRATION_STATUS_SETUP:
         info->has_status = true;
@@ -640,8 +641,6 @@ MigrationInfo *qmp_query_migrate(Error **errp)
         break;
     }
     info->status = s->state;
-
-    return info;
 }
 
 /**
@@ -705,6 +704,41 @@ static bool migrate_caps_check(bool *cap_list,
     return true;
 }
 
+static void fill_destination_migration_info(MigrationInfo *info)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    switch (mis->state) {
+    case MIGRATION_STATUS_NONE:
+        return;
+        break;
+    case MIGRATION_STATUS_SETUP:
+    case MIGRATION_STATUS_CANCELLING:
+    case MIGRATION_STATUS_CANCELLED:
+    case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_FAILED:
+    case MIGRATION_STATUS_COLO:
+        info->has_status = true;
+        break;
+    case MIGRATION_STATUS_COMPLETED:
+        info->has_status = true;
+        fill_destination_postcopy_migration_info(info);
+        break;
+    }
+    info->status = mis->state;
+}
+
+MigrationInfo *qmp_query_migrate(Error **errp)
+{
+    MigrationInfo *info = g_malloc0(sizeof(*info));
+
+    fill_destination_migration_info(info);
+    fill_source_migration_info(info);
+
+    return info;
+}
+
 void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
                                   Error **errp)
 {
diff --git a/migration/migration.h b/migration/migration.h
index 2bae992..cb68768 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -71,6 +71,10 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+/*
+ * Functions to work with blocktime context
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info);
 
 #define TYPE_MIGRATION "migration"
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e10870..a203bae 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -108,6 +108,55 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     return ctx;
 }
 
+static int64List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
+{
+    int64List *list = NULL, *entry = NULL;
+    int i;
+
+    for (i = smp_cpus - 1; i >= 0; i--) {
+        entry = g_new0(int64List, 1);
+        entry->value = ctx->vcpu_blocktime[i];
+        entry->next = list;
+        list = entry;
+    }
+
+    return list;
+}
+
+/*
+ * This function just populates MigrationInfo from postcopy's
+ * blocktime context. It will not populate MigrationInfo,
+ * unless postcopy-blocktime capability was set.
+ *
+ * @info: pointer to MigrationInfo to populate
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+
+    if (!bc) {
+        return;
+    }
+
+    info->has_postcopy_blocktime = true;
+    info->postcopy_blocktime = bc->total_blocktime;
+    info->has_postcopy_vcpu_blocktime = true;
+    info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
+}
+
+static uint64_t get_postcopy_total_blocktime(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+
+    if (!bc) {
+        return 0;
+    }
+
+    return bc->total_blocktime;
+}
+
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
  * features in the future.
@@ -482,6 +531,9 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         munmap(mis->postcopy_tmp_zero_page, mis->largest_page_size);
         mis->postcopy_tmp_zero_page = NULL;
     }
+    trace_postcopy_ram_incoming_cleanup_blocktime(
+            get_postcopy_total_blocktime());
+
     trace_postcopy_ram_incoming_cleanup_exit();
     return 0;
 }
@@ -958,6 +1010,10 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
+void fill_destination_postcopy_migration_info(MigrationInfo *info)
+{
+}
+
 bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     error_report("%s: No OS support", __func__);
diff --git a/migration/trace-events b/migration/trace-events
index b0c8708..f667981 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -198,6 +198,7 @@ postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
+postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %" PRIu64
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
diff --git a/qapi/migration.json b/qapi/migration.json
index 0f2af26..bf19984 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -150,6 +150,13 @@
 #              @status is 'failed'. Clients should not attempt to parse the
 #              error strings. (Since 2.7)
 #
+# @postcopy-blocktime: total time when all vCPU were blocked during postcopy
+#           live migration (Since 2.11)
+#
+# @postcopy-vcpu-blocktime: list of the postcopy blocktime per vCPU (Since 2.11)
+#
+
+#
 # Since: 0.14.0
 ##
 { 'struct': 'MigrationInfo',
@@ -161,7 +168,9 @@
            '*downtime': 'int',
            '*setup-time': 'int',
            '*cpu-throttle-percentage': 'int',
-           '*error-desc': 'str'} }
+           '*error-desc': 'str',
+           '*postcopy-blocktime' : 'int64',
+           '*postcopy-vcpu-blocktime': ['int64']} }
 
 ##
 # @query-migrate:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v11 2/6] migration: add postcopy blocktime ctx into MigrationIncomingState
  2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 2/6] migration: add postcopy blocktime ctx into MigrationIncomingState Alexey Perevalov
@ 2017-10-18 11:21       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-18 11:21 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, quintela, peterx, i.maximets, heetae82.ahn

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
> in case when this feature is provided by kernel.
> 
> PostcopyBlocktimeContext is incapsulated inside postcopy-ram.c,
> due to it's postcopy only feature.
> Also it defines PostcopyBlocktimeContext's instance live time.
> Information from PostcopyBlocktimeContext instance will be provided
> much after postcopy migration end, instance of PostcopyBlocktimeContext
> will live till QEMU exit, but part of it (vcpu_addr,
> page_fault_vcpu_time) used only during calculation, will be released
> when postcopy ended or failed.
> 
> To enable postcopy blocktime calculation on destination, need to request
> proper capabiltiy (Patch for documentation will be at the tail of the patch
> set).
> 
> As an example following command enable that capability, assume QEMU was
> started with
> -chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
> option to control it
> 
> [root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
> {\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
> \"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
> true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock
> 
> Or just with HMP
> (qemu) migrate_set_capability postcopy-blocktime on
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.h    |  8 +++++++
>  migration/postcopy-ram.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index c12ceba..2bae992 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -22,6 +22,8 @@
>  #include "hw/qdev.h"
>  #include "io/channel.h"
>  
> +struct PostcopyBlocktimeContext;
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> @@ -59,6 +61,12 @@ struct MigrationIncomingState {
>      /* The coroutine we should enter (back) after failover */
>      Coroutine *migration_incoming_co;
>      QemuSemaphore colo_incoming_sem;
> +
> +    /*
> +     * PostcopyBlocktimeContext to keep information for postcopy
> +     * live migration, to calculate vCPU block time
> +     * */
> +    struct PostcopyBlocktimeContext *blocktime_ctx;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index bec6c2c..c18ec5a 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -61,6 +61,52 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> +typedef struct PostcopyBlocktimeContext {
> +    /* time when page fault initiated per vCPU */
> +    int64_t *page_fault_vcpu_time;
> +    /* page address per vCPU */
> +    uint64_t *vcpu_addr;
> +    int64_t total_blocktime;
> +    /* blocktime per vCPU */
> +    int64_t *vcpu_blocktime;
> +    /* point in time when last page fault was initiated */
> +    int64_t last_begin;
> +    /* number of vCPU are suspended */
> +    int smp_cpus_down;
> +
> +    /*
> +     * Handler for exit event, necessary for
> +     * releasing whole blocktime_ctx
> +     */
> +    Notifier exit_notifier;
> +} PostcopyBlocktimeContext;
> +
> +static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
> +{
> +    g_free(ctx->page_fault_vcpu_time);
> +    g_free(ctx->vcpu_addr);
> +    g_free(ctx->vcpu_blocktime);
> +    g_free(ctx);
> +}
> +
> +static void migration_exit_cb(Notifier *n, void *data)
> +{
> +    PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
> +                                                 exit_notifier);
> +    destroy_blocktime_context(ctx);
> +}
> +
> +static struct PostcopyBlocktimeContext *blocktime_context_new(void)
> +{
> +    PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
> +    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
> +    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
> +    ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
> +
> +    ctx->exit_notifier.notify = migration_exit_cb;
> +    qemu_add_exit_notifier(&ctx->exit_notifier);
> +    return ctx;
> +}
>  
>  /**
>   * receive_ufd_features: check userfault fd features, to request only supported
> @@ -153,6 +199,19 @@ static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
>          }
>      }
>  
> +#ifdef UFFD_FEATURE_THREAD_ID
> +    if (migrate_postcopy_blocktime() && mis &&
> +        UFFD_FEATURE_THREAD_ID & supported_features) {
> +        /* kernel supports that feature */
> +        /* don't create blocktime_context if it exists */
> +        if (!mis->blocktime_ctx) {
> +            mis->blocktime_ctx = blocktime_context_new();
> +        }
> +
> +        asked_features |= UFFD_FEATURE_THREAD_ID;
> +    }
> +#endif
> +
>      /*
>       * request features, even if asked_features is 0, due to
>       * kernel expects UFFD_API before UFFDIO_REGISTER, per
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v11 3/6] migration: calculate vCPU blocktime on dst side
  2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 3/6] migration: calculate vCPU blocktime on dst side Alexey Perevalov
@ 2017-10-18 18:59       ` Dr. David Alan Gilbert
  2017-10-19  8:48         ` Alexey Perevalov
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-18 18:59 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, quintela, peterx, i.maximets, heetae82.ahn

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch provides blocktime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for blocktime per vCPU (could be traced with page_fault_addr)
> 
> Blocktime will not calculated if postcopy_blocktime field of
> MigrationIncomingState wasn't initialized.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events   |   5 +-
>  2 files changed, 145 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index c18ec5a..2e10870 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -553,6 +553,141 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid) {
> +            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
> +            return cpu_iter->cpu_index;
> +        }
> +    }
> +    trace_get_mem_fault_cpu_index(-1, pid);
> +    return -1;
> +}
> +
> +/*
> + * This function is being called when pagefault occurs. It
> + * tracks down vCPU blocking time.
> + *
> + * @addr: faulted host virtual address
> + * @ptid: faulted process thread id
> + * @rb: ramblock appropriate to addr
> + */
> +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> +                                          RAMBlock *rb)
> +{
> +    int cpu, already_received;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int64_t now_ms;
> +
> +    if (!dc || ptid == 0) {
> +        return;
> +    }
> +    cpu = get_mem_fault_cpu_index(ptid);
> +    if (cpu < 0) {
> +        return;
> +    }
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    if (dc->vcpu_addr[cpu] == 0) {
> +        atomic_inc(&dc->smp_cpus_down);
> +    }
> +
> +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
> +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> +
> +    /* check it here, not at the begining of the function,
> +     * due to, check could accur early than bitmap_set in
> +     * qemu_ufd_copy_ioctl */
> +    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
> +    if (already_received) {
> +        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
> +        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
> +        atomic_sub(&dc->smp_cpus_down, 1);

Minor; but you could use atomic_dec to go with the atomic_inc

> +    }
> +    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> +                                        cpu, already_received);
> +}
> +
> +/*
> + *  This function just provide calculated blocktime per cpu and trace it.
> + *  Total blocktime is calculated in mark_postcopy_blocktime_end.
> + *
> + *
> + * Assume we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> + *            it's a part of total blocktime.
> + * S1 - here is last_begin
> + * Legend of the picture is following:
> + *              * - means blocktime per vCPU
> + *              x - means overlapped blocktime (total blocktime)
> + *
> + * @addr: host virtual address
> + */
> +static void mark_postcopy_blocktime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int i, affected_cpu = 0;
> +    int64_t now_ms;
> +    bool vcpu_total_blocktime = false;
> +
> +    if (!dc) {
> +        return;
> +    }
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    /* lookup cpu, to clear it,
> +     * that algorithm looks straighforward, but it's not
> +     * optimal, more optimal algorithm is keeping tree or hash
> +     * where key is address value is a list of  */
> +    for (i = 0; i < smp_cpus; i++) {
> +        uint64_t vcpu_blocktime = 0;
> +
> +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr ||
> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0) == 0) {
> +            continue;
> +        }
> +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
> +        vcpu_blocktime = now_ms -
> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);

This is almost there, but you mustn't read vcpu_times twice; the other
thread could have gone down the 'received' path during the time between
the reads (OK, unlikely but still); so I think you need:
    read_addr = atomic_fetch_add(&&dc->vcpu_addr[i], 0);
    read_vcpu_time = atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
    if (read_addr != addr || read_vcpu_time == 0) {
            continue;
    }
    atomic_ vcpu_addr = 0;
    vcpu_locktime = now_ms - read_vcpu_time;

Dave

> +        affected_cpu += 1;
> +        /* we need to know is that mark_postcopy_end was due to
> +         * faulted page, another possible case it's prefetched
> +         * page and in that case we shouldn't be here */
> +        if (!vcpu_total_blocktime &&
> +            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
> +            vcpu_total_blocktime = true;
> +        }
> +        /* continue cycle, due to one page could affect several vCPUs */
> +        dc->vcpu_blocktime[i] += vcpu_blocktime;
> +    }
> +
> +    atomic_sub(&dc->smp_cpus_down, affected_cpu);
> +    if (vcpu_total_blocktime) {
> +        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
> +    }
> +    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
> +                                      affected_cpu);
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -630,8 +765,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset,
> +                                                msg.arg.pagefault.feat.ptid);
>  
> +        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
> +                                      msg.arg.pagefault.feat.ptid, rb);
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -721,6 +859,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>      if (!ret) {
>          ramblock_recv_bitmap_set_range(rb, host_addr,
>                                         pagesize / qemu_target_page_size());
> +        mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr);
> +
>      }
>      return ret;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 6f29fcc..b0c8708 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -115,6 +115,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d"
> +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d"
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -191,7 +193,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx pid=%u"
>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -200,6 +202,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v11 5/6] migration: add blocktime calculation into postcopy-test
  2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 5/6] migration: add blocktime calculation into postcopy-test Alexey Perevalov
@ 2017-10-18 19:09       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-18 19:09 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, quintela, peterx, i.maximets, heetae82.ahn

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch just requests blocktime calculation,
> and check it in case when UFFD_FEATURE_THREAD_ID feature is set
> on the host.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(I preferred the bool got_stop and explicitly initialising it, because
if we add a 2nd test then it gets reset OK).

Dave
> ---
>  tests/postcopy-test.c | 63 +++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 48 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index 8142f2a..4231cce 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -24,7 +24,8 @@
>  
>  const unsigned start_address = 1024 * 1024;
>  const unsigned end_address = 100 * 1024 * 1024;
> -bool got_stop;
> +static bool got_stop;
> +static bool uffd_feature_thread_id;
>  
>  #if defined(__linux__)
>  #include <sys/syscall.h>
> @@ -54,6 +55,7 @@ static bool ufd_version_check(void)
>          g_test_message("Skipping test: UFFDIO_API failed");
>          return false;
>      }
> +    uffd_feature_thread_id = api_struct.features & UFFD_FEATURE_THREAD_ID;
>  
>      ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
>                   (__u64)1 << _UFFDIO_UNREGISTER;
> @@ -265,22 +267,48 @@ static uint64_t get_migration_pass(void)
>      return result;
>  }
>  
> -static void wait_for_migration_complete(void)
> +static bool get_src_status(void)
>  {
>      QDict *rsp, *rsp_return;
> +    const char *status;
> +    bool result;
> +
> +    rsp = return_or_event(qmp("{ 'execute': 'query-migrate' }"));
> +    rsp_return = qdict_get_qdict(rsp, "return");
> +    status = qdict_get_str(rsp_return, "status");
> +    g_assert_cmpstr(status, !=,  "failed");
> +    result = strcmp(status, "completed") == 0;
> +    QDECREF(rsp);
> +    return result;
> +}
> +
> +static void read_blocktime(void)
> +{
> +    QDict *rsp, *rsp_return;
> +
> +    rsp = return_or_event(qmp("{ 'execute': 'query-migrate' }"));
> +    rsp_return = qdict_get_qdict(rsp, "return");
> +    g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
> +    QDECREF(rsp);
> +}
> +
> +static void wait_for_migration_complete(QTestState *from, QTestState *to)
> +{
>      bool completed;
>  
>      do {
> -        const char *status;
> -
> -        rsp = return_or_event(qmp("{ 'execute': 'query-migrate' }"));
> -        rsp_return = qdict_get_qdict(rsp, "return");
> -        status = qdict_get_str(rsp_return, "status");
> -        completed = strcmp(status, "completed") == 0;
> -        g_assert_cmpstr(status, !=,  "failed");
> -        QDECREF(rsp);
> +
> +        /* test src state */
> +        global_qtest = from;
> +        completed = get_src_status();
> +
>          usleep(1000 * 100);
>      } while (!completed);
> +
> +    if (uffd_feature_thread_id) {
> +        global_qtest = to;
> +        read_blocktime();
> +    }
>  }
>  
>  static void wait_for_migration_pass(void)
> @@ -364,8 +392,6 @@ static void test_migrate(void)
>      char *bootpath = g_strdup_printf("%s/bootsect", tmpfs);
>      const char *arch = qtest_get_arch();
>  
> -    got_stop = false;
> -
>      if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>          init_bootfile_x86(bootpath);
>          cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 150M"
> @@ -425,6 +451,15 @@ static void test_migrate(void)
>      g_assert(qdict_haskey(rsp, "return"));
>      QDECREF(rsp);
>  
> +    global_qtest = to;
> +    rsp = qmp("{ 'execute': 'migrate-set-capabilities',"
> +                  "'arguments': { "
> +                      "'capabilities': [ {"
> +                          "'capability': 'postcopy-blocktime',"
> +                          "'state': true } ] } }");
> +    g_assert(qdict_haskey(rsp, "return"));
> +    QDECREF(rsp);
> +
>      /* We want to pick a speed slow enough that the test completes
>       * quickly, but that it doesn't complete precopy even on a slow
>       * machine, so also set the downtime.
> @@ -441,7 +476,6 @@ static void test_migrate(void)
>      g_assert(qdict_haskey(rsp, "return"));
>      QDECREF(rsp);
>  
> -
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
>  
> @@ -467,8 +501,7 @@ static void test_migrate(void)
>      qmp_eventwait("RESUME");
>  
>      wait_for_serial("dest_serial");
> -    global_qtest = from;
> -    wait_for_migration_complete();
> +    wait_for_migration_complete(from, to);
>  
>      qtest_quit(from);
>  
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v11 3/6] migration: calculate vCPU blocktime on dst side
  2017-10-18 18:59       ` Dr. David Alan Gilbert
@ 2017-10-19  8:48         ` Alexey Perevalov
  2017-10-19  8:58           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Perevalov @ 2017-10-19  8:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, quintela, peterx, i.maximets, heetae82.ahn

On 10/18/2017 09:59 PM, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
>> This patch provides blocktime calculation per vCPU,
>> as a summary and as a overlapped value for all vCPUs.
>>
>> This approach was suggested by Peter Xu, as an improvements of
>> previous approch where QEMU kept tree with faulted page address and cpus bitmask
>> in it. Now QEMU is keeping array with faulted page address as value and vCPU
>> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
>> list for blocktime per vCPU (could be traced with page_fault_addr)
>>
>> Blocktime will not calculated if postcopy_blocktime field of
>> MigrationIncomingState wasn't initialized.
>>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>>   migration/postcopy-ram.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++-
>>   migration/trace-events   |   5 +-
>>   2 files changed, 145 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index c18ec5a..2e10870 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -553,6 +553,141 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>>       return 0;
>>   }
>>   
>> +static int get_mem_fault_cpu_index(uint32_t pid)
>> +{
>> +    CPUState *cpu_iter;
>> +
>> +    CPU_FOREACH(cpu_iter) {
>> +        if (cpu_iter->thread_id == pid) {
>> +            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
>> +            return cpu_iter->cpu_index;
>> +        }
>> +    }
>> +    trace_get_mem_fault_cpu_index(-1, pid);
>> +    return -1;
>> +}
>> +
>> +/*
>> + * This function is being called when pagefault occurs. It
>> + * tracks down vCPU blocking time.
>> + *
>> + * @addr: faulted host virtual address
>> + * @ptid: faulted process thread id
>> + * @rb: ramblock appropriate to addr
>> + */
>> +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
>> +                                          RAMBlock *rb)
>> +{
>> +    int cpu, already_received;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
>> +    int64_t now_ms;
>> +
>> +    if (!dc || ptid == 0) {
>> +        return;
>> +    }
>> +    cpu = get_mem_fault_cpu_index(ptid);
>> +    if (cpu < 0) {
>> +        return;
>> +    }
>> +
>> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    if (dc->vcpu_addr[cpu] == 0) {
>> +        atomic_inc(&dc->smp_cpus_down);
>> +    }
>> +
>> +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
>> +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
>> +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
>> +
>> +    /* check it here, not at the begining of the function,
>> +     * due to, check could accur early than bitmap_set in
>> +     * qemu_ufd_copy_ioctl */
>> +    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
>> +    if (already_received) {
>> +        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
>> +        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
>> +        atomic_sub(&dc->smp_cpus_down, 1);
> Minor; but you could use atomic_dec to go with the atomic_inc
>
>> +    }
>> +    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
>> +                                        cpu, already_received);
>> +}
>> +
>> +/*
>> + *  This function just provide calculated blocktime per cpu and trace it.
>> + *  Total blocktime is calculated in mark_postcopy_blocktime_end.
>> + *
>> + *
>> + * Assume we have 3 CPU
>> + *
>> + *      S1        E1           S1               E1
>> + * -----***********------------xxx***************------------------------> CPU1
>> + *
>> + *             S2                E2
>> + * ------------****************xxx---------------------------------------> CPU2
>> + *
>> + *                         S3            E3
>> + * ------------------------****xxx********-------------------------------> CPU3
>> + *
>> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
>> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
>> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
>> + *            it's a part of total blocktime.
>> + * S1 - here is last_begin
>> + * Legend of the picture is following:
>> + *              * - means blocktime per vCPU
>> + *              x - means overlapped blocktime (total blocktime)
>> + *
>> + * @addr: host virtual address
>> + */
>> +static void mark_postcopy_blocktime_end(uint64_t addr)
>> +{
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
>> +    int i, affected_cpu = 0;
>> +    int64_t now_ms;
>> +    bool vcpu_total_blocktime = false;
>> +
>> +    if (!dc) {
>> +        return;
>> +    }
>> +
>> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +
>> +    /* lookup cpu, to clear it,
>> +     * that algorithm looks straighforward, but it's not
>> +     * optimal, more optimal algorithm is keeping tree or hash
>> +     * where key is address value is a list of  */
>> +    for (i = 0; i < smp_cpus; i++) {
>> +        uint64_t vcpu_blocktime = 0;
>> +
>> +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr ||
>> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0) == 0) {
>> +            continue;
>> +        }
>> +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
>> +        vcpu_blocktime = now_ms -
>> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
> This is almost there, but you mustn't read vcpu_times twice; the other
> thread could have gone down the 'received' path during the time between
> the reads (OK, unlikely but still); so I think you need:
>      read_addr = atomic_fetch_add(&&dc->vcpu_addr[i], 0);
>      read_vcpu_time = atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
I agree with read_vcpu_time, done.
but not clearing of &dc->vcpu_addr[i], we don't need to clear the copy 
of value, but the original value.
>      if (read_addr != addr || read_vcpu_time == 0) {
>              continue;
>      }
>      atomic_ vcpu_addr = 0;
>      vcpu_locktime = now_ms - read_vcpu_time;
>
> Dave
>
>> +        affected_cpu += 1;
>> +        /* we need to know is that mark_postcopy_end was due to
>> +         * faulted page, another possible case it's prefetched
>> +         * page and in that case we shouldn't be here */
>> +        if (!vcpu_total_blocktime &&
>> +            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
>> +            vcpu_total_blocktime = true;
>> +        }
>> +        /* continue cycle, due to one page could affect several vCPUs */
>> +        dc->vcpu_blocktime[i] += vcpu_blocktime;
>> +    }
>> +
>> +    atomic_sub(&dc->smp_cpus_down, affected_cpu);
>> +    if (vcpu_total_blocktime) {
>> +        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
>> +    }
>> +    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
>> +                                      affected_cpu);
>> +}
>> +
>>   /*
>>    * Handle faults detected by the USERFAULT markings
>>    */
>> @@ -630,8 +765,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>>           rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>>           trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>>                                                   qemu_ram_get_idstr(rb),
>> -                                                rb_offset);
>> +                                                rb_offset,
>> +                                                msg.arg.pagefault.feat.ptid);
>>   
>> +        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
>> +                                      msg.arg.pagefault.feat.ptid, rb);
>>           /*
>>            * Send the request to the source - we want to request one
>>            * of our host page sizes (which is >= TPS)
>> @@ -721,6 +859,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>       if (!ret) {
>>           ramblock_recv_bitmap_set_range(rb, host_addr,
>>                                          pagesize / qemu_target_page_size());
>> +        mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr);
>> +
>>       }
>>       return ret;
>>   }
>> diff --git a/migration/trace-events b/migration/trace-events
>> index 6f29fcc..b0c8708 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -115,6 +115,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>>   process_incoming_migration_co_postcopy_end_main(void) ""
>>   migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>>   migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
>> +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d"
>> +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d"
>>   
>>   # migration/rdma.c
>>   qemu_rdma_accept_incoming_migration(void) ""
>> @@ -191,7 +193,7 @@ postcopy_ram_enable_notify(void) ""
>>   postcopy_ram_fault_thread_entry(void) ""
>>   postcopy_ram_fault_thread_exit(void) ""
>>   postcopy_ram_fault_thread_quit(void) ""
>> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
>> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx pid=%u"
>>   postcopy_ram_incoming_cleanup_closeuf(void) ""
>>   postcopy_ram_incoming_cleanup_entry(void) ""
>>   postcopy_ram_incoming_cleanup_exit(void) ""
>> @@ -200,6 +202,7 @@ save_xbzrle_page_skipping(void) ""
>>   save_xbzrle_page_overflow(void) ""
>>   ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>>   ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
>> +get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
>>   
>>   # migration/exec.c
>>   migration_exec_outgoing(const char *cmd) "cmd=%s"
>> -- 
>> 2.7.4
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v11 3/6] migration: calculate vCPU blocktime on dst side
  2017-10-19  8:48         ` Alexey Perevalov
@ 2017-10-19  8:58           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-19  8:58 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, quintela, peterx, i.maximets, heetae82.ahn

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> On 10/18/2017 09:59 PM, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > This patch provides blocktime calculation per vCPU,
> > > as a summary and as a overlapped value for all vCPUs.
> > > 
> > > This approach was suggested by Peter Xu, as an improvements of
> > > previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > list for blocktime per vCPU (could be traced with page_fault_addr)
> > > 
> > > Blocktime will not calculated if postcopy_blocktime field of
> > > MigrationIncomingState wasn't initialized.
> > > 
> > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > ---
> > >   migration/postcopy-ram.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++-
> > >   migration/trace-events   |   5 +-
> > >   2 files changed, 145 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index c18ec5a..2e10870 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -553,6 +553,141 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> > >       return 0;
> > >   }
> > > +static int get_mem_fault_cpu_index(uint32_t pid)
> > > +{
> > > +    CPUState *cpu_iter;
> > > +
> > > +    CPU_FOREACH(cpu_iter) {
> > > +        if (cpu_iter->thread_id == pid) {
> > > +            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
> > > +            return cpu_iter->cpu_index;
> > > +        }
> > > +    }
> > > +    trace_get_mem_fault_cpu_index(-1, pid);
> > > +    return -1;
> > > +}
> > > +
> > > +/*
> > > + * This function is being called when pagefault occurs. It
> > > + * tracks down vCPU blocking time.
> > > + *
> > > + * @addr: faulted host virtual address
> > > + * @ptid: faulted process thread id
> > > + * @rb: ramblock appropriate to addr
> > > + */
> > > +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> > > +                                          RAMBlock *rb)
> > > +{
> > > +    int cpu, already_received;
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> > > +    int64_t now_ms;
> > > +
> > > +    if (!dc || ptid == 0) {
> > > +        return;
> > > +    }
> > > +    cpu = get_mem_fault_cpu_index(ptid);
> > > +    if (cpu < 0) {
> > > +        return;
> > > +    }
> > > +
> > > +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +    if (dc->vcpu_addr[cpu] == 0) {
> > > +        atomic_inc(&dc->smp_cpus_down);
> > > +    }
> > > +
> > > +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
> > > +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> > > +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> > > +
> > > +    /* check it here, not at the begining of the function,
> > > +     * due to, check could accur early than bitmap_set in
> > > +     * qemu_ufd_copy_ioctl */
> > > +    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
> > > +    if (already_received) {
> > > +        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
> > > +        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
> > > +        atomic_sub(&dc->smp_cpus_down, 1);
> > Minor; but you could use atomic_dec to go with the atomic_inc
> > 
> > > +    }
> > > +    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > +                                        cpu, already_received);
> > > +}
> > > +
> > > +/*
> > > + *  This function just provide calculated blocktime per cpu and trace it.
> > > + *  Total blocktime is calculated in mark_postcopy_blocktime_end.
> > > + *
> > > + *
> > > + * Assume we have 3 CPU
> > > + *
> > > + *      S1        E1           S1               E1
> > > + * -----***********------------xxx***************------------------------> CPU1
> > > + *
> > > + *             S2                E2
> > > + * ------------****************xxx---------------------------------------> CPU2
> > > + *
> > > + *                         S3            E3
> > > + * ------------------------****xxx********-------------------------------> CPU3
> > > + *
> > > + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> > > + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> > > + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> > > + *            it's a part of total blocktime.
> > > + * S1 - here is last_begin
> > > + * Legend of the picture is following:
> > > + *              * - means blocktime per vCPU
> > > + *              x - means overlapped blocktime (total blocktime)
> > > + *
> > > + * @addr: host virtual address
> > > + */
> > > +static void mark_postcopy_blocktime_end(uint64_t addr)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> > > +    int i, affected_cpu = 0;
> > > +    int64_t now_ms;
> > > +    bool vcpu_total_blocktime = false;
> > > +
> > > +    if (!dc) {
> > > +        return;
> > > +    }
> > > +
> > > +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +
> > > +    /* lookup cpu, to clear it,
> > > +     * that algorithm looks straighforward, but it's not
> > > +     * optimal, more optimal algorithm is keeping tree or hash
> > > +     * where key is address value is a list of  */
> > > +    for (i = 0; i < smp_cpus; i++) {
> > > +        uint64_t vcpu_blocktime = 0;
> > > +
> > > +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr ||
> > > +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0) == 0) {
> > > +            continue;
> > > +        }
> > > +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
> > > +        vcpu_blocktime = now_ms -
> > > +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
> > This is almost there, but you mustn't read vcpu_times twice; the other
> > thread could have gone down the 'received' path during the time between
> > the reads (OK, unlikely but still); so I think you need:
> >      read_addr = atomic_fetch_add(&&dc->vcpu_addr[i], 0);
> >      read_vcpu_time = atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
> I agree with read_vcpu_time, done.
> but not clearing of &dc->vcpu_addr[i], we don't need to clear the copy of
> value, but the original value.

Yes, agreed.

Dave

> >      if (read_addr != addr || read_vcpu_time == 0) {
> >              continue;
> >      }
> >      atomic_ vcpu_addr = 0;
> >      vcpu_locktime = now_ms - read_vcpu_time;
> > 
> > Dave
> > 
> > > +        affected_cpu += 1;
> > > +        /* we need to know is that mark_postcopy_end was due to
> > > +         * faulted page, another possible case it's prefetched
> > > +         * page and in that case we shouldn't be here */
> > > +        if (!vcpu_total_blocktime &&
> > > +            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
> > > +            vcpu_total_blocktime = true;
> > > +        }
> > > +        /* continue cycle, due to one page could affect several vCPUs */
> > > +        dc->vcpu_blocktime[i] += vcpu_blocktime;
> > > +    }
> > > +
> > > +    atomic_sub(&dc->smp_cpus_down, affected_cpu);
> > > +    if (vcpu_total_blocktime) {
> > > +        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
> > > +    }
> > > +    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
> > > +                                      affected_cpu);
> > > +}
> > > +
> > >   /*
> > >    * Handle faults detected by the USERFAULT markings
> > >    */
> > > @@ -630,8 +765,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
> > >           rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
> > >           trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> > >                                                   qemu_ram_get_idstr(rb),
> > > -                                                rb_offset);
> > > +                                                rb_offset,
> > > +                                                msg.arg.pagefault.feat.ptid);
> > > +        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
> > > +                                      msg.arg.pagefault.feat.ptid, rb);
> > >           /*
> > >            * Send the request to the source - we want to request one
> > >            * of our host page sizes (which is >= TPS)
> > > @@ -721,6 +859,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> > >       if (!ret) {
> > >           ramblock_recv_bitmap_set_range(rb, host_addr,
> > >                                          pagesize / qemu_target_page_size());
> > > +        mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr);
> > > +
> > >       }
> > >       return ret;
> > >   }
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index 6f29fcc..b0c8708 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -115,6 +115,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > >   process_incoming_migration_co_postcopy_end_main(void) ""
> > >   migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> > >   migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> > > +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d"
> > > +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d"
> > >   # migration/rdma.c
> > >   qemu_rdma_accept_incoming_migration(void) ""
> > > @@ -191,7 +193,7 @@ postcopy_ram_enable_notify(void) ""
> > >   postcopy_ram_fault_thread_entry(void) ""
> > >   postcopy_ram_fault_thread_exit(void) ""
> > >   postcopy_ram_fault_thread_quit(void) ""
> > > -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
> > > +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx pid=%u"
> > >   postcopy_ram_incoming_cleanup_closeuf(void) ""
> > >   postcopy_ram_incoming_cleanup_entry(void) ""
> > >   postcopy_ram_incoming_cleanup_exit(void) ""
> > > @@ -200,6 +202,7 @@ save_xbzrle_page_skipping(void) ""
> > >   save_xbzrle_page_overflow(void) ""
> > >   ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
> > >   ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> > > +get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
> > >   # migration/exec.c
> > >   migration_exec_outgoing(const char *cmd) "cmd=%s"
> > > -- 
> > > 2.7.4
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > 
> 
> -- 
> Best regards,
> Alexey Perevalov
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-19  8:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20171005111622eucas1p2093fedb3a69ca522d6b8260377b75419@eucas1p2.samsung.com>
2017-10-05 11:16 ` [Qemu-devel] [PATCH v11 0/6] calculate blocktime for postcopy live migration Alexey Perevalov
     [not found]   ` <CGME20171005111622eucas1p1dd4545fb4b45add67b222d91355aa208@eucas1p1.samsung.com>
2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 1/6] migration: introduce postcopy-blocktime capability Alexey Perevalov
     [not found]   ` <CGME20171005111623eucas1p272597c60842087cac3ade92b88212eff@eucas1p2.samsung.com>
2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 2/6] migration: add postcopy blocktime ctx into MigrationIncomingState Alexey Perevalov
2017-10-18 11:21       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20171005111624eucas1p294c2c03421f17915b82cbde4cf4b9fa3@eucas1p2.samsung.com>
2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 3/6] migration: calculate vCPU blocktime on dst side Alexey Perevalov
2017-10-18 18:59       ` Dr. David Alan Gilbert
2017-10-19  8:48         ` Alexey Perevalov
2017-10-19  8:58           ` Dr. David Alan Gilbert
     [not found]   ` <CGME20171005111624eucas1p193bfeb0e428c8eee6180a1f7b96c0713@eucas1p1.samsung.com>
2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 4/6] migration: postcopy_blocktime documentation Alexey Perevalov
     [not found]   ` <CGME20171005111625eucas1p28ad35b246e6f964ca7d642cfa60df10d@eucas1p2.samsung.com>
2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 5/6] migration: add blocktime calculation into postcopy-test Alexey Perevalov
2017-10-18 19:09       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20171005111626eucas1p2ea023000ede617c0b8509f11c99fc10a@eucas1p2.samsung.com>
2017-10-05 11:16     ` [Qemu-devel] [PATCH v11 6/6] migration: add postcopy total blocktime into query-migrate Alexey Perevalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.