All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v8 00/11] calculate blocktime for postcopy live migration
       [not found] <CGME20170607094720eucas1p24650bb7bb139ae209fc0ea8c5c57534b@eucas1p2.samsung.com>
@ 2017-06-07  9:46 ` Alexey Perevalov
       [not found]   ` <CGME20170607094726eucas1p146abfbdb92413f43fa395a5004d2541a@eucas1p1.samsung.com>
                     ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

This is 8th version.

The rationale for that idea is following:
vCPU could suspend during postcopy live migration until faulted
page is not copied into kernel. Downtime on source side it's a value -
time interval since source turn vCPU off, till destination start runnig
vCPU. But that value was proper value for precopy migration it really shows
amount of time when vCPU is down. But not for postcopy migration, because
several vCPU threads could susppend after vCPU was started. That is important
to estimate packet drop for SDN software.


(V7 -> V8)
    1. just one comma in
"migration: fix hardcoded function name in error report"
It was really missed, but fixed in futher patch.

(V6 -> V7)
    1. copied bitmap was placed into RAMBlock as another migration
related bitmaps.
    2. Ordering of mark_postcopy_blocktime_end call and ordering
of checking copied bitmap were changed.
    3. linewrap style defects
    4. new patch "postcopy_place_page factoring out"
    5. postcopy_ram_supported_by_host accepts
MigrationIncomingState in qmp_migrate_set_capabilities
    5. minor fixes of documentation. 
    and huge description of get_postcopy_total_blocktime was
moved. Davids comment.

(V5 -> V6)
    - blocktime was added into hmp command. Comment from David.
    - bitmap for copied pages was added as well as check in *_begin/_end
functions. Patch uses just introduced RAMBLOCK_FOREACH. Comment from David.
    - description of receive_ufd_features/request_ufd_features. Comment from David.
    - commit message headers/@since references were modified. Comment from Eric.
    - also typos in documentation. Comment from Eric.
    - style and description of field in MigrationInfo. Comment from Eric.
    - ufd_check_and_apply (former ufd_version_check) is calling twice,
so my previous patch contained double allocation of blocktime context and
as a result memory leak. In this patch series it was fixed.

(V4 -> V5)
    - fill_destination_postcopy_migration_info empty stub was missed for none linux
build

(V3 -> V4)
    - get rid of Downtime as a name for vCPU waiting time during postcopy migration
    - PostcopyBlocktimeContext renamed (it was just BlocktimeContext)
    - atomic operations are used for dealing with fields of PostcopyBlocktimeContext
affected in both threads.
    - hardcoded function names in error_report were replaced to %s and __line__
    - this patch set includes postcopy-downtime capability, but it used on
destination, coupled with not possibility to return calculated downtime back
to source to show it in query-migrate, it looks like a big trade off
    - UFFD_API have to be sent notwithstanding need or not to ask kernel
for a feature, due to kernel expects it in any case (see patch comment)
    - postcopy_downtime included into query-migrate output
    - also this patch set includes trivial fix
migration: fix hardcoded function name in error report
maybe that is a candidate for qemu-trivial mailing list, but I already
sent "migration: Fixed code style" and it was unclaimed.

(V2 -> V3)
    - Downtime calculation approach was changed, thanks to Peter Xu
    - Due to previous point no more need to keep GTree as well as bitmap of cpus.
So glib changes aren't included in this patch set, it could be resent in
another patch set, if it will be a good reason for it.
    - No procfs traces in this patchset, if somebody wants it, you could get it
from patchwork site to track down page fault initiators.
    - UFFD_FEATURE_THREAD_ID is requesting only when kernel supports it
    - It doesn't send back the downtime, just trace it

This patch set is based on commit
a0d4aac7467dd02e5657b79e867f067330266a24
of git://git.qemu-project.org/qemu.git

Alexey Perevalov (11):
  userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  migration: pass MigrationIncomingState* into migration check functions
  migration: fix hardcoded function name in error report
  migration: split ufd_version_check onto receive/request features part
  migration: introduce postcopy-blocktime capability
  migration: add postcopy blocktime ctx into MigrationIncomingState
  migration: add bitmap for copied page
  migration: postcopy_place_page factoring out
  migration: calculate vCPU blocktime on dst side
  migration: add postcopy total blocktime into query-migrate
  migration: postcopy_blocktime documentation

 docs/migration.txt                |  10 +
 hmp.c                             |  15 ++
 include/exec/ram_addr.h           |   2 +
 include/migration/migration.h     |  13 ++
 linux-headers/linux/userfaultfd.h |   4 +
 migration/migration.c             |  52 +++++-
 migration/postcopy-ram.c          | 374 ++++++++++++++++++++++++++++++++++++--
 migration/postcopy-ram.h          |   6 +-
 migration/ram.c                   |  40 +++-
 migration/ram.h                   |   4 +
 migration/savevm.c                |   2 +-
 migration/trace-events            |   6 +-
 qapi-schema.json                  |  14 +-
 13 files changed, 514 insertions(+), 28 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 01/11] userfault: add pid into uffd_msg & update UFFD_FEATURE_*
       [not found]   ` <CGME20170607094726eucas1p146abfbdb92413f43fa395a5004d2541a@eucas1p1.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-12 12:27       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

This commit duplicates header of "userfaultfd: provide pid in userfault msg"
into linux kernel.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 linux-headers/linux/userfaultfd.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 9701772..eda028c 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -78,6 +78,9 @@ struct uffd_msg {
 		struct {
 			__u64	flags;
 			__u64	address;
+			union {
+				__u32   ptid;
+			} feat;
 		} pagefault;
 
 		struct {
@@ -161,6 +164,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_MISSING_HUGETLBFS		(1<<4)
 #define UFFD_FEATURE_MISSING_SHMEM		(1<<5)
 #define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
+#define UFFD_FEATURE_THREAD_ID			(1<<7)
 	__u64 features;
 
 	__u64 ioctls;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions
       [not found]   ` <CGME20170607094727eucas1p13b2228fead9fc5a49d953985c777b719@eucas1p1.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-09  4:10       ` Peter Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/migration.c    |  3 ++-
 migration/postcopy-ram.c | 10 +++++-----
 migration/postcopy-ram.h |  2 +-
 migration/savevm.c       |  2 +-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 48c94c9..2a77636 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -726,6 +726,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
                                   Error **errp)
 {
     MigrationState *s = migrate_get_current();
+    MigrationIncomingState *mis = migration_incoming_get_current();
     MigrationCapabilityStatusList *cap;
     bool old_postcopy_cap = migrate_postcopy_ram();
 
@@ -772,7 +773,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
          * special support.
          */
         if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
-            !postcopy_ram_supported_by_host()) {
+            !postcopy_ram_supported_by_host(mis)) {
             /* postcopy_ram_supported_by_host will have emitted a more
              * detailed message
              */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9c41887..10d39a0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,7 +63,7 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
-static bool ufd_version_check(int ufd)
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
 {
     struct uffdio_api api_struct;
     uint64_t ioctl_mask;
@@ -126,7 +126,7 @@ static int test_ramblock_postcopiable(const char *block_name, void *host_addr,
  * normally fine since if the postcopy succeeds it gets turned back on at the
  * end.
  */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     long pagesize = getpagesize();
     int ufd = -1;
@@ -149,7 +149,7 @@ bool postcopy_ram_supported_by_host(void)
     }
 
     /* Version and features check */
-    if (!ufd_version_check(ufd)) {
+    if (!ufd_version_check(ufd, mis)) {
         goto out;
     }
 
@@ -525,7 +525,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
      * Although the host check already tested the API, we need to
      * do the check again as an ABI handshake on the new fd.
      */
-    if (!ufd_version_check(mis->userfault_fd)) {
+    if (!ufd_version_check(mis->userfault_fd, mis)) {
         return -1;
     }
 
@@ -663,7 +663,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     error_report("%s: No OS support", __func__);
     return false;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 52d51e8..587a8b8 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -14,7 +14,7 @@
 #define QEMU_POSTCOPY_RAM_H
 
 /* Return true if the host supports everything we need to do postcopy-ram */
-bool postcopy_ram_supported_by_host(void);
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
 
 /*
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
diff --git a/migration/savevm.c b/migration/savevm.c
index 9c320f5..8b7bab8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1380,7 +1380,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
         return -1;
     }
 
-    if (!postcopy_ram_supported_by_host()) {
+    if (!postcopy_ram_supported_by_host(mis)) {
         postcopy_state_set(POSTCOPY_INCOMING_NONE);
         return -1;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 03/11] migration: fix hardcoded function name in error report
       [not found]   ` <CGME20170607094727eucas1p2d1063171fa2850fc1d590b286cd5d880@eucas1p2.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 12:31       ` Juan Quintela
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 10d39a0..8838901 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -71,7 +71,7 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
     api_struct.api = UFFD_API;
     api_struct.features = 0;
     if (ioctl(ufd, UFFDIO_API, &api_struct)) {
-        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+        error_report("%s: UFFDIO_API failed: %s", __func__,
                      strerror(errno));
         return false;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 04/11] migration: split ufd_version_check onto receive/request features part
       [not found]   ` <CGME20170607094728eucas1p1984b365dd09f3222b758075e651a5b5d@eucas1p1.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-12  9:52       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.

QEMU have to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
So only one ioctl with UFFD_API is possible per ufd.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 94 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 6 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 8838901..cbe8f9f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,16 +63,67 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
-static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
+
+/**
+ * receive_ufd_features: check userfault fd features, to request only supported
+ * features in the future.
+ *
+ * Returns: true on success
+ *
+ * __NR_userfaultfd - should be checked before
+ *  @features: out parameter will contain uffdio_api.features provided by kernel
+ *              in case of success
+ */
+static bool receive_ufd_features(uint64_t *features)
 {
-    struct uffdio_api api_struct;
-    uint64_t ioctl_mask;
+    struct uffdio_api api_struct = {0};
+    int ufd;
+    bool ret = true;
+
+    /* if we are here __NR_userfaultfd should exists */
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
+                     strerror(errno));
+        return false;
+    }
 
+    /* ask features */
     api_struct.api = UFFD_API;
     api_struct.features = 0;
     if (ioctl(ufd, UFFDIO_API, &api_struct)) {
         error_report("%s: UFFDIO_API failed: %s", __func__,
                      strerror(errno));
+        ret = false;
+        goto release_ufd;
+    }
+
+    *features = api_struct.features;
+
+release_ufd:
+    close(ufd);
+    return ret;
+}
+
+/**
+ * request_ufd_features: this function should be called only once on a newly
+ * opened ufd, subsequent calls will lead to error.
+ *
+ * Returns: true on succes
+ *
+ * @ufd: fd obtained from userfaultfd syscall
+ * @features: bit mask see UFFD_API_FEATURES
+ */
+static bool request_ufd_features(int ufd, uint64_t features)
+{
+    struct uffdio_api api_struct = {0};
+    uint64_t ioctl_mask;
+
+    api_struct.api = UFFD_API;
+    api_struct.features = features;
+    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+        error_report("%s failed: UFFDIO_API failed: %s", __func__,
+                     strerror(errno));
         return false;
     }
 
@@ -84,11 +135,42 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
         return false;
     }
 
+    return true;
+}
+
+static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
+{
+    uint64_t asked_features = 0;
+    static uint64_t supported_features;
+
+    /*
+     * it's not possible to
+     * request UFFD_API twice per one fd
+     * userfault fd features is persistent
+     */
+    if (!supported_features) {
+        if (!receive_ufd_features(&supported_features)) {
+            error_report("%s failed", __func__);
+            return false;
+        }
+    }
+
+    /*
+     * request features, even if asked_features is 0, due to
+     * kernel expects UFFD_API before UFFDIO_REGISTER, per
+     * userfault file descriptor
+     */
+    if (!request_ufd_features(ufd, asked_features)) {
+        error_report("%s failed: features %" PRIu64, __func__,
+                     asked_features);
+        return false;
+    }
+
     if (getpagesize() != ram_pagesize_summary()) {
         bool have_hp = false;
         /* We've got a huge page */
 #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
-        have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
+        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
 #endif
         if (!have_hp) {
             error_report("Userfault on this host does not support huge pages");
@@ -149,7 +231,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
     }
 
     /* Version and features check */
-    if (!ufd_version_check(ufd, mis)) {
+    if (!ufd_check_and_apply(ufd, mis)) {
         goto out;
     }
 
@@ -525,7 +607,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
      * Although the host check already tested the API, we need to
      * do the check again as an ABI handshake on the new fd.
      */
-    if (!ufd_version_check(mis->userfault_fd, mis)) {
+    if (!ufd_check_and_apply(mis->userfault_fd, mis)) {
         return -1;
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 05/11] migration: introduce postcopy-blocktime capability
       [not found]   ` <CGME20170607094728eucas1p228f096ea7eebf7e791392c9193cefec0@eucas1p2.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 12:34       ` Juan Quintela
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h | 1 +
 migration/migration.c         | 9 +++++++++
 qapi-schema.json              | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 79b5484..2e61df5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -189,6 +189,7 @@ int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
+bool migrate_postcopy_blocktime(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_message(MigrationIncomingState *mis,
diff --git a/migration/migration.c b/migration/migration.c
index 2a77636..d1cc34f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1371,6 +1371,15 @@ bool migrate_zero_blocks(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
 }
 
+bool migrate_postcopy_blocktime(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME];
+}
+
 bool migrate_use_compression(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 4b50b65..e906953 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -900,12 +900,15 @@
 #          offers more flexibility.
 #          (Since 2.10)
 #
+# @postcopy-blocktime: Calculate downtime for postcopy live migration
+#                     (since 2.10)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
            'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
-           'block' ] }
+           'block', 'postcopy-blocktime'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState
       [not found]   ` <CGME20170607094729eucas1p119b3d77f7d869eb06c16c9e91215e8cd@eucas1p1.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 12:43       ` Juan Quintela
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
in case when this feature is provided by kernel.

PostcopyBlocktimeContext is incapsulated inside postcopy-ram.c,
due to it's postcopy only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to request
proper capabiltiy (Patch for documentation will be at the tail of the patch
set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h |  8 ++++++
 migration/postcopy-ram.c      | 65 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2e61df5..766e802 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -49,6 +49,8 @@ enum mig_rp_message_type {
     MIG_RP_MSG_MAX
 };
 
+struct PostcopyBlocktimeContext;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -86,6 +88,12 @@ struct MigrationIncomingState {
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
     QemuSemaphore colo_incoming_sem;
+
+    /*
+     * PostcopyBlocktimeContext to keep information for postcopy
+     * live migration, to calculate vCPU block time
+     * */
+    struct PostcopyBlocktimeContext *blocktime_ctx;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cbe8f9f..ade7f1c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,6 +63,58 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
+typedef struct PostcopyBlocktimeContext {
+    /* time when page fault initiated per vCPU */
+    int64_t *page_fault_vcpu_time;
+    /* page address per vCPU */
+    uint64_t *vcpu_addr;
+    int64_t total_blocktime;
+    /* blocktime per vCPU */
+    int64_t *vcpu_blocktime;
+    /* point in time when last page fault was initiated */
+    int64_t last_begin;
+    /* number of vCPU are suspended */
+    int smp_cpus_down;
+
+    /*
+     * Handler for exit event, necessary for
+     * releasing whole blocktime_ctx
+     */
+    Notifier exit_notifier;
+    /*
+     * Handler for postcopy event, necessary for
+     * releasing unnecessary part of blocktime_ctx
+     */
+    Notifier postcopy_notifier;
+} PostcopyBlocktimeContext;
+
+static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
+{
+    g_free(ctx->page_fault_vcpu_time);
+    g_free(ctx->vcpu_addr);
+    g_free(ctx->vcpu_blocktime);
+    g_free(ctx);
+}
+
+static void migration_exit_cb(Notifier *n, void *data)
+{
+    PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
+                                                 exit_notifier);
+    destroy_blocktime_context(ctx);
+}
+
+static struct PostcopyBlocktimeContext *blocktime_context_new(void)
+{
+    PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
+    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
+    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
+    ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
+
+    ctx->exit_notifier.notify = migration_exit_cb;
+    qemu_add_exit_notifier(&ctx->exit_notifier);
+    add_migration_state_change_notifier(&ctx->postcopy_notifier);
+    return ctx;
+}
 
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
@@ -155,6 +207,19 @@ static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
         }
     }
 
+#ifdef UFFD_FEATURE_THREAD_ID
+    if (migrate_postcopy_blocktime() && mis &&
+        UFFD_FEATURE_THREAD_ID & supported_features) {
+        /* kernel supports that feature */
+        /* don't create blocktime_context if it exists */
+        if (!mis->blocktime_ctx) {
+            mis->blocktime_ctx = blocktime_context_new();
+        }
+
+        asked_features |= UFFD_FEATURE_THREAD_ID;
+    }
+#endif
+
     /*
      * request features, even if asked_features is 0, due to
      * kernel expects UFFD_API before UFFDIO_REGISTER, per
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
       [not found]   ` <CGME20170607094729eucas1p15097f154039365d5e135f92b72aad1bf@eucas1p1.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 12:56       ` Juan Quintela
                         ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

This patch adds ability to track down already copied
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, maybe for restore after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about copied pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already copied pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).

Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps. Helpers are in migration/ram.c, due to
in this file is allowing to work with RAMBlock.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/exec/ram_addr.h |  2 ++
 migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
 migration/ram.h         |  4 ++++
 3 files changed, 42 insertions(+)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 140efa8..6a3780b 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -47,6 +47,8 @@ struct RAMBlock {
      * of the postcopy phase
      */
     unsigned long *unsentmap;
+    /* bitmap of already copied pages in postcopy */
+    unsigned long *copiedmap;
 };
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
diff --git a/migration/ram.c b/migration/ram.c
index f387e9c..a7c0db4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -149,6 +149,25 @@ out:
     return ret;
 }
 
+static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
+{
+    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
+    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
+                                    sizeof(rb->page_size));
+
+    return addr_offset >> page_shift;
+}
+
+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
+{
+    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
+}
+
+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
+{
+    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
+}
+
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
         block->bmap = NULL;
         g_free(block->unsentmap);
         block->unsentmap = NULL;
+        g_free(block->copiedmap);
+        block->copiedmap = NULL;
     }
 
     XBZRLE_cache_lock();
@@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
     return ret;
 }
 
+static unsigned long get_copiedmap_size(RAMBlock *rb)
+{
+    unsigned long pages;
+    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
+                                             sizeof(rb->page_size));
+    return pages;
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
@@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     rcu_read_lock();
 
     if (postcopy_running) {
+        RAMBlock *rb;
+        RAMBLOCK_FOREACH(rb) {
+            /* need for destination, bitmap_new calls
+             * g_try_malloc0 and this function
+             * Attempts to allocate @n_bytes, initialized to 0'sh */
+            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
+        }
         ret = ram_load_postcopy(f);
     }
 
diff --git a/migration/ram.h b/migration/ram.h
index c9563d1..1f32824 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+
+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 08/11] migration: postcopy_place_page factoring out
       [not found]   ` <CGME20170607094730eucas1p2126d9850427e7b4af92898b64b7b805a@eucas1p2.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 12:58       ` Juan Quintela
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

Need to mark paged copied as closer as possible place where it
tracks down. That will be necessary in futher patch.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 13 ++++++++-----
 migration/postcopy-ram.h |  4 ++--
 migration/ram.c          |  4 ++--
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ade7f1c..62a272a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -713,9 +713,10 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-                        size_t pagesize)
+                        RAMBlock *rb)
 {
     struct uffdio_copy copy_struct;
+    size_t pagesize = qemu_ram_pagesize(rb);
 
     copy_struct.dst = (uint64_t)(uintptr_t)host;
     copy_struct.src = (uint64_t)(uintptr_t)from;
@@ -744,10 +745,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
-                             size_t pagesize)
+                             RAMBlock *rb)
 {
+    size_t pagesize;
     trace_postcopy_place_page_zero(host);
 
+    pagesize = qemu_ram_pagesize(rb);
     if (pagesize == getpagesize()) {
         struct uffdio_zeropage zero_struct;
         zero_struct.range.start = (uint64_t)(uintptr_t)host;
@@ -778,7 +781,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
             memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
         }
         return postcopy_place_page(mis, host, mis->postcopy_tmp_zero_page,
-                                   pagesize);
+                                   rb);
     }
 
     return 0;
@@ -841,14 +844,14 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 }
 
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-                        size_t pagesize)
+                        RAMBlock *rb)
 {
     assert(0);
     return -1;
 }
 
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
-                        size_t pagesize)
+                        RAMBlock *rb)
 {
     assert(0);
     return -1;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 587a8b8..77ea0fd 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -72,14 +72,14 @@ void postcopy_discard_send_finish(MigrationState *ms,
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-                        size_t pagesize);
+                        RAMBlock *rb);
 
 /*
  * Place a zero page at (host) atomically
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
-                             size_t pagesize);
+                             RAMBlock *rb);
 
 /* The current postcopy state is read/set by postcopy_state_get/set
  * which update it atomically.
diff --git a/migration/ram.c b/migration/ram.c
index a7c0db4..a791d40 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2524,10 +2524,10 @@ static int ram_load_postcopy(QEMUFile *f)
 
             if (all_zero) {
                 ret = postcopy_place_page_zero(mis, place_dest,
-                                               block->page_size);
+                                               block);
             } else {
                 ret = postcopy_place_page(mis, place_dest,
-                                          place_source, block->page_size);
+                                          place_source, block);
             }
         }
         if (!ret) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side
       [not found]   ` <CGME20170607094730eucas1p29b692c0f813d5368d70d999ca8a1f186@eucas1p2.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 13:11       ` Juan Quintela
  2017-06-12 11:34       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events   |   5 +-
 2 files changed, 142 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 62a272a..0ad9f9f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,6 +27,7 @@
 #include "ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
+#include <sys/param.h>
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -561,6 +562,133 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid) {
+            return cpu_iter->cpu_index;
+        }
+    }
+    trace_get_mem_fault_cpu_index(pid);
+    return -1;
+}
+
+/*
+ * This function is being called when pagefault occurs. It
+ * tracks down vCPU blocking time.
+ *
+ * @addr: faulted host virtual address
+ * @ptid: faulted process thread id
+ * @rb: ramblock appropriate to addr
+ */
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+                                          RAMBlock *rb)
+{
+    int cpu;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int64_t now_ms;
+
+    if (!dc || ptid == 0) {
+        return;
+    }
+    cpu = get_mem_fault_cpu_index(ptid);
+    if (cpu < 0) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    if (dc->vcpu_addr[cpu] == 0) {
+        atomic_inc(&dc->smp_cpus_down);
+    }
+
+    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+    atomic_xchg__nocheck(&dc->last_begin, now_ms);
+    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+
+    if (test_copiedmap_by_addr(addr, rb)) {
+        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
+        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
+        atomic_sub(&dc->smp_cpus_down, 1);
+    }
+    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
+                                        cpu);
+}
+
+/*
+ *  This function just provide calculated blocktime per cpu and trace it.
+ *  Total blocktime is calculated in mark_postcopy_blocktime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *            it's a part of total blocktime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *              * - means blocktime per vCPU
+ *              x - means overlapped blocktime (total blocktime)
+ *
+ * @addr: host virtual address
+ */
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int i, affected_cpu = 0;
+    int64_t now_ms;
+    bool vcpu_total_blocktime = false;
+
+    if (!dc) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    /* lookup cpu, to clear it,
+     * that algorithm looks straighforward, but it's not
+     * optimal, more optimal algorithm is keeping tree or hash
+     * where key is address value is a list of  */
+    for (i = 0; i < smp_cpus; i++) {
+        uint64_t vcpu_blocktime = 0;
+        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
+            continue;
+        }
+        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+        vcpu_blocktime = now_ms -
+            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+        affected_cpu += 1;
+        /* we need to know is that mark_postcopy_end was due to
+         * faulted page, another possible case it's prefetched
+         * page and in that case we shouldn't be here */
+        if (!vcpu_total_blocktime &&
+            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
+            vcpu_total_blocktime = true;
+        }
+        /* continue cycle, due to one page could affect several vCPUs */
+        dc->vcpu_blocktime[i] += vcpu_blocktime;
+    }
+
+    atomic_sub(&dc->smp_cpus_down, affected_cpu);
+    if (vcpu_total_blocktime) {
+        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
+    }
+    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime);
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -638,8 +766,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset,
+                                                msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
+                                      msg.arg.pagefault.feat.ptid, rb);
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -723,6 +854,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
     copy_struct.len = pagesize;
     copy_struct.mode = 0;
 
+    /* copied page isn't feature of blocktime calculation,
+     * it's more general entity, so keep it here,
+     * but gup betwean two following operation could be high,
+     * and in this case blocktime for such small interval will be lost */
+    set_copiedmap_by_addr((uint64_t)(uintptr_t)host, rb);
+    mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host);
     /* copy also acks to the kernel waking the stalled thread up
      * TODO: We can inhibit that ack and only do it if it was requested
      * which would be slightly cheaper, but we'd have to be careful
diff --git a/migration/trace-events b/migration/trace-events
index 5b8ccf3..7bdadbb 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -112,6 +112,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
+mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -188,7 +190,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -197,6 +199,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 10/11] migration: add postcopy total blocktime into query-migrate
       [not found]   ` <CGME20170607094731eucas1p2cbbf439e841b1d72edb374d35d53bea3@eucas1p2.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
    "postcopy-vcpu-blocktime": [115, 100],
    "status": "completed",
    "postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 hmp.c                         | 15 ++++++++++++
 include/migration/migration.h |  4 +++
 migration/migration.c         | 40 +++++++++++++++++++++++++++---
 migration/postcopy-ram.c      | 57 +++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events        |  1 +
 qapi-schema.json              |  9 ++++++-
 6 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index 8c72c58..e0c4fdf 100644
--- a/hmp.c
+++ b/hmp.c
@@ -262,6 +262,21 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->cpu_throttle_percentage);
     }
 
+    if (info->has_postcopy_blocktime) {
+        monitor_printf(mon, "postcopy blocktime: %" PRId64 "\n",
+                       info->postcopy_blocktime);
+    }
+
+    if (info->has_postcopy_vcpu_blocktime) {
+        Visitor *v;
+        char *str;
+        v = string_output_visitor_new(false, &str);
+        visit_type_int64List(v, NULL, &info->postcopy_vcpu_blocktime, NULL);
+        visit_complete(v, &str);
+        monitor_printf(mon, "postcopy vcpu blocktime: %s\n", str);
+        g_free(str);
+        visit_free(v);
+    }
     qapi_free_MigrationInfo(info);
     qapi_free_MigrationCapabilityStatusList(caps);
 }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 766e802..7d20470 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -98,6 +98,10 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+/*
+ * Functions to work with blocktime context
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info);
 
 struct MigrationState
 {
diff --git a/migration/migration.c b/migration/migration.c
index d1cc34f..b80d5b5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,14 +625,15 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     }
 }
 
-MigrationInfo *qmp_query_migrate(Error **errp)
+static void fill_source_migration_info(MigrationInfo *info)
 {
-    MigrationInfo *info = g_malloc0(sizeof(*info));
     MigrationState *s = migrate_get_current();
 
     switch (s->state) {
     case MIGRATION_STATUS_NONE:
         /* no migration has happened ever */
+        /* do not overwrite destination migration status */
+        return;
         break;
     case MIGRATION_STATUS_SETUP:
         info->has_status = true;
@@ -718,10 +719,43 @@ MigrationInfo *qmp_query_migrate(Error **errp)
         break;
     }
     info->status = s->state;
+}
 
-    return info;
+static void fill_destination_migration_info(MigrationInfo *info)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    switch (mis->state) {
+    case MIGRATION_STATUS_NONE:
+        return;
+        break;
+    case MIGRATION_STATUS_SETUP:
+    case MIGRATION_STATUS_CANCELLING:
+    case MIGRATION_STATUS_CANCELLED:
+    case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_FAILED:
+    case MIGRATION_STATUS_COLO:
+        info->has_status = true;
+        break;
+    case MIGRATION_STATUS_COMPLETED:
+        info->has_status = true;
+        fill_destination_postcopy_migration_info(info);
+        break;
+    }
+    info->status = mis->state;
 }
 
+MigrationInfo *qmp_query_migrate(Error **errp)
+{
+    MigrationInfo *info = g_malloc0(sizeof(*info));
+
+    fill_destination_migration_info(info);
+    fill_source_migration_info(info);
+
+    return info;
+ }
+
 void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
                                   Error **errp)
 {
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0ad9f9f..7f5b402 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -117,6 +117,55 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     return ctx;
 }
 
+static int64List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
+{
+    int64List *list = NULL, *entry = NULL;
+    int i;
+
+    for (i = smp_cpus - 1; i >= 0; i--) {
+        entry = g_new0(int64List, 1);
+        entry->value = ctx->vcpu_blocktime[i];
+        entry->next = list;
+        list = entry;
+    }
+
+    return list;
+}
+
+/*
+ * This function just populates MigrationInfo from postcopy's
+ * blocktime context. It will not populate MigrationInfo,
+ * unless postcopy-blocktime capability was set.
+ *
+ * @info: pointer to MigrationInfo to populate
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+
+    if (!bc) {
+        return;
+    }
+
+    info->has_postcopy_blocktime = true;
+    info->postcopy_blocktime = bc->total_blocktime;
+    info->has_postcopy_vcpu_blocktime = true;
+    info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
+}
+
+static uint64_t get_postcopy_total_blocktime(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+
+    if (!bc) {
+        return 0;
+    }
+
+    return bc->total_blocktime;
+}
+
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
  * features in the future.
@@ -491,6 +540,9 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         munmap(mis->postcopy_tmp_zero_page, mis->largest_page_size);
         mis->postcopy_tmp_zero_page = NULL;
     }
+    trace_postcopy_ram_incoming_cleanup_blocktime(
+            get_postcopy_total_blocktime());
+
     trace_postcopy_ram_incoming_cleanup_exit();
     return 0;
 }
@@ -950,6 +1002,11 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
+void fill_destination_postcopy_migration_info(MigrationInfo *info)
+{
+    error_report("%s: No OS support", __func__);
+}
+
 bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     error_report("%s: No OS support", __func__);
diff --git a/migration/trace-events b/migration/trace-events
index 7bdadbb..55a3b6e 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -195,6 +195,7 @@ postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
+postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %" PRIu64
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
diff --git a/qapi-schema.json b/qapi-schema.json
index e906953..9229bbc 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -712,6 +712,11 @@
 #              @status is 'failed'. Clients should not attempt to parse the
 #              error strings. (Since 2.7)
 #
+# @postcopy-blocktime: total time when all vCPU were blocked during postcopy
+#           live migration (Since 2.10)
+#
+# @postcopy-vcpu-blocktime: list of the postcopy blocktime per vCPU (Since 2.10)
+#
 # Since: 0.14.0
 ##
 { 'struct': 'MigrationInfo',
@@ -723,7 +728,9 @@
            '*downtime': 'int',
            '*setup-time': 'int',
            '*cpu-throttle-percentage': 'int',
-           '*error-desc': 'str'} }
+           '*error-desc': 'str',
+           '*postcopy-blocktime' : 'int64',
+           '*postcopy-vcpu-blocktime': ['int64']} }
 
 ##
 # @query-migrate:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v8 11/11] migration: postcopy_blocktime documentation
       [not found]   ` <CGME20170607094732eucas1p199fb11b4189929a105515f6079415ebe@eucas1p1.samsung.com>
@ 2017-06-07  9:46     ` Alexey Perevalov
  2017-06-07 12:52       ` Juan Quintela
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07  9:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Perevalov, dgilbert, i.maximets, peterx

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 docs/migration.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 1b940a8..4b625ca 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -402,6 +402,16 @@ will now cause the transition from precopy to postcopy.
 It can be issued immediately after migration is started or any
 time later on.  Issuing it after the end of a migration is harmless.
 
+Blocktime is a postcopy live migration metric, intended to show
+how long the vCPU was in state of interruptable sleep due to pagefault.
+This value is calculated on destination side.
+To enable postcopy blocktime calculation, enter following command on destination
+monitor:
+
+migrate_set_capability postcopy-blocktime on
+
+Postcopy blocktime can be retrieved by query-migrate qmp command.
+
 Note: During the postcopy phase, the bandwidth limits set using
 migrate_set_speed is ignored (to avoid delaying requested pages that
 the destination is waiting for).
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 03/11] migration: fix hardcoded function name in error report
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 03/11] migration: fix hardcoded function name in error report Alexey Perevalov
@ 2017-06-07 12:31       ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 12:31 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 05/11] migration: introduce postcopy-blocktime capability
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 05/11] migration: introduce postcopy-blocktime capability Alexey Perevalov
@ 2017-06-07 12:34       ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 12:34 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:
> Right now it could be used on destination side to
> enable vCPU blocktime calculation for postcopy live migration.
> vCPU blocktime - it's time since vCPU thread was put into
> interruptible sleep, till memory page was copied and thread awake.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState Alexey Perevalov
@ 2017-06-07 12:43       ` Juan Quintela
  2017-06-07 12:53         ` Alexey Perevalov
  0 siblings, 1 reply; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 12:43 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:
> This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
> in case when this feature is provided by kernel.
>

I think this function is wrong

> +static void migration_exit_cb(Notifier *n, void *data)
> +{
> +    PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
> +                                                 exit_notifier);
> +    destroy_blocktime_context(ctx);
> +}
> +
> +static struct PostcopyBlocktimeContext *blocktime_context_new(void)
> +{
> +    PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
> +    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
> +    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
> +    ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
> +
> +    ctx->exit_notifier.notify = migration_exit_cb;
> +    qemu_add_exit_notifier(&ctx->exit_notifier);
> +    add_migration_state_change_notifier(&ctx->postcopy_notifier);

Or you don't want to call it this awy.

This will destroy the context at every migration state change.

Or I am missing something here?  Look at ui/spice-core.c to see how to
use it only for some states (I guess you will need to do it for
error/cleanup/completion changes only).

Later, Juan.

> +    return ctx;
> +}
>  
>  /**
>   * receive_ufd_features: check userfault fd features, to request only supported
> @@ -155,6 +207,19 @@ static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
>          }
>      }
>  
> +#ifdef UFFD_FEATURE_THREAD_ID
> +    if (migrate_postcopy_blocktime() && mis &&
> +        UFFD_FEATURE_THREAD_ID & supported_features) {
> +        /* kernel supports that feature */
> +        /* don't create blocktime_context if it exists */
> +        if (!mis->blocktime_ctx) {
> +            mis->blocktime_ctx = blocktime_context_new();
> +        }
> +
> +        asked_features |= UFFD_FEATURE_THREAD_ID;
> +    }
> +#endif
> +
>      /*
>       * request features, even if asked_features is 0, due to
>       * kernel expects UFFD_API before UFFDIO_REGISTER, per

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 11/11] migration: postcopy_blocktime documentation
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 11/11] migration: postcopy_blocktime documentation Alexey Perevalov
@ 2017-06-07 12:52       ` Juan Quintela
  2017-06-07 13:08         ` Alexey Perevalov
  0 siblings, 1 reply; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 12:52 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  docs/migration.txt | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/docs/migration.txt b/docs/migration.txt
> index 1b940a8..4b625ca 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -402,6 +402,16 @@ will now cause the transition from precopy to postcopy.
>  It can be issued immediately after migration is started or any
>  time later on.  Issuing it after the end of a migration is harmless.
>  
> +Blocktime is a postcopy live migration metric, intended to show
> +how long the vCPU was in state of interruptable sleep due to pagefault.
> +This value is calculated on destination side.
> +To enable postcopy blocktime calculation, enter following command on destination
> +monitor:
> +
> +migrate_set_capability postcopy-blocktime on
> +
> +Postcopy blocktime can be retrieved by query-migrate qmp command.
> +
>  Note: During the postcopy phase, the bandwidth limits set using
>  migrate_set_speed is ignored (to avoid delaying requested pages that
>  the destination is waiting for).

Reviewed-by: Juan Quintela <quintela@redhat.com>

If you have to respin, I think that put the units would be a good idea.
Even you can put the units in the patch where you define the value.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState
  2017-06-07 12:43       ` Juan Quintela
@ 2017-06-07 12:53         ` Alexey Perevalov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07 12:53 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, i.maximets, peterx, dgilbert

On 06/07/2017 03:43 PM, Juan Quintela wrote:
> Alexey Perevalov <a.perevalov@samsung.com> wrote:
>> This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
>> in case when this feature is provided by kernel.
>>
> I think this function is wrong

migration_exit_cb will be called at QEMU exit, but see later

>
>> +static void migration_exit_cb(Notifier *n, void *data)
>> +{
>> +    PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
>> +                                                 exit_notifier);
>> +    destroy_blocktime_context(ctx);
>> +}
>> +
>> +static struct PostcopyBlocktimeContext *blocktime_context_new(void)
>> +{
>> +    PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
>> +    ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
>> +    ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
>> +    ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
>> +
>> +    ctx->exit_notifier.notify = migration_exit_cb;
>> +    qemu_add_exit_notifier(&ctx->exit_notifier);
>> +    add_migration_state_change_notifier(&ctx->postcopy_notifier);
> Or you don't want to call it this awy.
>
> This will destroy the context at every migration state change.
>
> Or I am missing something here?  Look at ui/spice-core.c to see how to
> use it only for some states (I guess you will need to do it for
> error/cleanup/completion changes only).
I forgot, to remove

add_migration_state_change_notifier(&ctx->postcopy_notifier);

in previous version, here was callback with migration state check.

>
> Later, Juan.
>
>> +    return ctx;
>> +}
>>   
>>   /**
>>    * receive_ufd_features: check userfault fd features, to request only supported
>> @@ -155,6 +207,19 @@ static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
>>           }
>>       }
>>   
>> +#ifdef UFFD_FEATURE_THREAD_ID
>> +    if (migrate_postcopy_blocktime() && mis &&
>> +        UFFD_FEATURE_THREAD_ID & supported_features) {
>> +        /* kernel supports that feature */
>> +        /* don't create blocktime_context if it exists */
>> +        if (!mis->blocktime_ctx) {
>> +            mis->blocktime_ctx = blocktime_context_new();
>> +        }
>> +
>> +        asked_features |= UFFD_FEATURE_THREAD_ID;
>> +    }
>> +#endif
>> +
>>       /*
>>        * request features, even if asked_features is 0, due to
>>        * kernel expects UFFD_API before UFFDIO_REGISTER, per
>
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page Alexey Perevalov
@ 2017-06-07 12:56       ` Juan Quintela
  2017-06-07 14:46         ` Alexey Perevalov
  2017-06-07 14:13       ` Alexey Perevalov
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 12:56 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:

> +static unsigned long get_copiedmap_size(RAMBlock *rb)
> +{
> +    unsigned long pages;
> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
> +                                             sizeof(rb->page_size));
> +    return pages;

Are you sure that you want this and not:

pages = rb->max_length >> TARGET_PAGE_BITS?

Otherwise, in some architectures/configurations you can end with a
bitmap size that is different of the migration bitmap size.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 08/11] migration: postcopy_place_page factoring out
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 08/11] migration: postcopy_place_page factoring out Alexey Perevalov
@ 2017-06-07 12:58       ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 12:58 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:
> Need to mark paged copied as closer as possible place where it
> tracks down. That will be necessary in futher patch.
>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 11/11] migration: postcopy_blocktime documentation
  2017-06-07 12:52       ` Juan Quintela
@ 2017-06-07 13:08         ` Alexey Perevalov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07 13:08 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, i.maximets, peterx, dgilbert

On 06/07/2017 03:52 PM, Juan Quintela wrote:
> Alexey Perevalov <a.perevalov@samsung.com> wrote:
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>>   docs/migration.txt | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/docs/migration.txt b/docs/migration.txt
>> index 1b940a8..4b625ca 100644
>> --- a/docs/migration.txt
>> +++ b/docs/migration.txt
>> @@ -402,6 +402,16 @@ will now cause the transition from precopy to postcopy.
>>   It can be issued immediately after migration is started or any
>>   time later on.  Issuing it after the end of a migration is harmless.
>>   
>> +Blocktime is a postcopy live migration metric, intended to show
>> +how long the vCPU was in state of interruptable sleep due to pagefault.
>> +This value is calculated on destination side.
>> +To enable postcopy blocktime calculation, enter following command on destination
>> +monitor:
>> +
>> +migrate_set_capability postcopy-blocktime on
>> +
>> +Postcopy blocktime can be retrieved by query-migrate qmp command.
>> +
>>   Note: During the postcopy phase, the bandwidth limits set using
>>   migrate_set_speed is ignored (to avoid delaying requested pages that
>>   the destination is waiting for).
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> If you have to respin, I think that put the units would be a good idea.
> Even you can put the units in the patch where you define the value.
do you mean to extend tests/postcopy-test.c?
>
>
>
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side Alexey Perevalov
@ 2017-06-07 13:11       ` Juan Quintela
  2017-06-12 11:34       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2017-06-07 13:11 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx, dgilbert

Alexey Perevalov <a.perevalov@samsung.com> wrote:
> This patch provides blocktime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
>
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for blocktime per vCPU (could be traced with page_fault_addr)
>
> Blocktime will not calculated if postcopy_blocktime field of
> MigrationIncomingState wasn't initialized.
>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events   |   5 +-
>  2 files changed, 142 insertions(+), 2 deletions(-)
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 62a272a..0ad9f9f 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -27,6 +27,7 @@
>  #include "ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include <sys/param.h>
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -561,6 +562,133 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid) {

could we get a trace with the cpu for this pid, just for completeness?

> +            return cpu_iter->cpu_index;
> +        }
> +    }
> +    trace_get_mem_fault_cpu_index(pid);
> +    return -1;
> +}
> +
> +/*
> + * This function is being called when pagefault occurs. It
> + * tracks down vCPU blocking time.
> + *
> + * @addr: faulted host virtual address
> + * @ptid: faulted process thread id
> + * @rb: ramblock appropriate to addr
> + */
> +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> +                                          RAMBlock *rb)
> +{
> +    int cpu;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int64_t now_ms;
> +
> +    if (!dc || ptid == 0) {
> +        return;
> +    }
> +    cpu = get_mem_fault_cpu_index(ptid);
> +    if (cpu < 0) {

Add one error message?

> +        return;
> +    }
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    if (dc->vcpu_addr[cpu] == 0) {
> +        atomic_inc(&dc->smp_cpus_down);
> +    }
> +
> +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
> +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> +
> +    if (test_copiedmap_by_addr(addr, rb)) {
> +        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
> +        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
> +        atomic_sub(&dc->smp_cpus_down, 1);
> +    }
> +    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> +                                        cpu);
> +}
> +
> +/*
> + *  This function just provide calculated blocktime per cpu and trace it.
> + *  Total blocktime is calculated in mark_postcopy_blocktime_end.
> + *
> + *
> + * Assume we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> + *            it's a part of total blocktime.
> + * S1 - here is last_begin
> + * Legend of the picture is following:
> + *              * - means blocktime per vCPU
> + *              x - means overlapped blocktime (total blocktime)
> + *
> + * @addr: host virtual address
> + */
> +static void mark_postcopy_blocktime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int i, affected_cpu = 0;
> +    int64_t now_ms;
> +    bool vcpu_total_blocktime = false;
> +
> +    if (!dc) {
> +        return;
> +    }
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    /* lookup cpu, to clear it,
> +     * that algorithm looks straighforward, but it's not
> +     * optimal, more optimal algorithm is keeping tree or hash
> +     * where key is address value is a list of  */
> +    for (i = 0; i < smp_cpus; i++) {
> +        uint64_t vcpu_blocktime = 0;
> +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
> +            continue;
> +        }
> +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
> +        vcpu_blocktime = now_ms -
> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
> +        affected_cpu += 1;
> +        /* we need to know is that mark_postcopy_end was due to
> +         * faulted page, another possible case it's prefetched
> +         * page and in that case we shouldn't be here */
> +        if (!vcpu_total_blocktime &&
> +            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
> +            vcpu_total_blocktime = true;
> +        }
> +        /* continue cycle, due to one page could affect several vCPUs */
> +        dc->vcpu_blocktime[i] += vcpu_blocktime;
> +    }
> +
> +    atomic_sub(&dc->smp_cpus_down, affected_cpu);
> +    if (vcpu_total_blocktime) {
> +        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
> +    }
> +    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime);
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -638,8 +766,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset,
> +                                                msg.arg.pagefault.feat.ptid);
>  
> +        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
> +                                      msg.arg.pagefault.feat.ptid, rb);
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -723,6 +854,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>      copy_struct.len = pagesize;
>      copy_struct.mode = 0;
>  
> +    /* copied page isn't feature of blocktime calculation,
> +     * it's more general entity, so keep it here,
> +     * but gup betwean two following operation could be high,
> +     * and in this case blocktime for such small interval will be lost */
> +    set_copiedmap_by_addr((uint64_t)(uintptr_t)host, rb);
> +    mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host);
>      /* copy also acks to the kernel waking the stalled thread up
>       * TODO: We can inhibit that ack and only do it if it was requested
>       * which would be slightly cheaper, but we'd have to be careful
> diff --git a/migration/trace-events b/migration/trace-events
> index 5b8ccf3..7bdadbb 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -112,6 +112,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -188,7 +190,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"

Add "pid" in the string format?

>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -197,6 +199,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page Alexey Perevalov
  2017-06-07 12:56       ` Juan Quintela
@ 2017-06-07 14:13       ` Alexey Perevalov
  2017-06-09  6:06         ` Peter Xu
  2017-06-12 11:11       ` Dr. David Alan Gilbert
  2017-06-13  5:59       ` Peter Xu
  3 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, i.maximets, peterx

On 06/07/2017 12:46 PM, Alexey Perevalov wrote:
> This patch adds ability to track down already copied
> pages, it's necessary for calculation vCPU block time in
> postcopy migration feature, maybe for restore after
> postcopy migration failure.
> Also it's necessary to solve shared memory issue in
> postcopy livemigration. Information about copied pages
> will be transferred to the software virtual bridge
> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> already copied pages. fallocate syscall is required for
> remmaped shared memory, due to remmaping itself blocks
> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> error (struct page is exists after remmap).
>
> Bitmap is placed into RAMBlock as another postcopy/precopy
> related bitmaps. Helpers are in migration/ram.c, due to
> in this file is allowing to work with RAMBlock.
>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>   include/exec/ram_addr.h |  2 ++
>   migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
>   migration/ram.h         |  4 ++++
>   3 files changed, 42 insertions(+)
>
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 140efa8..6a3780b 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -47,6 +47,8 @@ struct RAMBlock {
>        * of the postcopy phase
>        */
>       unsigned long *unsentmap;
> +    /* bitmap of already copied pages in postcopy */
> +    unsigned long *copiedmap;
>   };
>   
>   static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> diff --git a/migration/ram.c b/migration/ram.c
> index f387e9c..a7c0db4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -149,6 +149,25 @@ out:
>       return ret;
>   }
>   
> +static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
> +{
> +    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
> +    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
> +                                    sizeof(rb->page_size));
> +
> +    return addr_offset >> page_shift;
> +}
> +
> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> +{
> +    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
> +}
> +
> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> +{
> +    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
> +}
> +
>   /*
>    * An outstanding page request, on the source, having been received
>    * and queued
> @@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
>           block->bmap = NULL;
>           g_free(block->unsentmap);
>           block->unsentmap = NULL;
looks like it's wrong place, because copiedmap is living
on destination side, so maybe in qemu_ram_free
> +        g_free(block->copiedmap);
> +        block->copiedmap = NULL;
>       }
>   
>       XBZRLE_cache_lock();
> @@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
>       return ret;
>   }
>   
> +static unsigned long get_copiedmap_size(RAMBlock *rb)
> +{
> +    unsigned long pages;
> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
> +                                             sizeof(rb->page_size));
> +    return pages;
> +}
> +
>   static int ram_load(QEMUFile *f, void *opaque, int version_id)
>   {
>       int flags = 0, ret = 0;
> @@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       rcu_read_lock();
>   
>       if (postcopy_running) {
> +        RAMBlock *rb;
> +        RAMBLOCK_FOREACH(rb) {
> +            /* need for destination, bitmap_new calls
> +             * g_try_malloc0 and this function
> +             * Attempts to allocate @n_bytes, initialized to 0'sh */
> +            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
> +        }
>           ret = ram_load_postcopy(f);
>       }
>   
> diff --git a/migration/ram.h b/migration/ram.h
> index c9563d1..1f32824 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>   int ram_postcopy_incoming_init(MigrationIncomingState *mis);
>   
>   void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> +
> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> +
>   #endif


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-07 12:56       ` Juan Quintela
@ 2017-06-07 14:46         ` Alexey Perevalov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-07 14:46 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, i.maximets, peterx, dgilbert

On 06/07/2017 03:56 PM, Juan Quintela wrote:
> Alexey Perevalov <a.perevalov@samsung.com> wrote:
>
>> +static unsigned long get_copiedmap_size(RAMBlock *rb)
>> +{
>> +    unsigned long pages;
>> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
>> +                                             sizeof(rb->page_size));
>> +    return pages;
> Are you sure that you want this and not:
>
> pages = rb->max_length >> TARGET_PAGE_BITS?
I just wish to optimize size of bitmap,
>
> Otherwise, in some architectures/configurations you can end with a
> bitmap size that is different of the migration bitmap size.
>
looks like, yes, that solution is for le only, so I feel luck
of converting to le, here.

>
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions Alexey Perevalov
@ 2017-06-09  4:10       ` Peter Xu
  2017-06-09  6:21         ` Alexey Perevalov
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Xu @ 2017-06-09  4:10 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets

On Wed, Jun 07, 2017 at 12:46:29PM +0300, Alexey Perevalov wrote:
> That tiny refactoring is necessary to be able to set
> UFFD_FEATURE_THREAD_ID while requesting features, and then
> to create downtime context in case when kernel supports it.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/migration.c    |  3 ++-
>  migration/postcopy-ram.c | 10 +++++-----
>  migration/postcopy-ram.h |  2 +-
>  migration/savevm.c       |  2 +-
>  4 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 48c94c9..2a77636 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -726,6 +726,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>                                    Error **errp)
>  {
>      MigrationState *s = migrate_get_current();
> +    MigrationIncomingState *mis = migration_incoming_get_current();

If this patch is only servicing patch 6, I'd prefer in patch 6 we call
migration_incoming_get_current() (rather than here), then this patch
may be dropped?...

Thanks,

>      MigrationCapabilityStatusList *cap;
>      bool old_postcopy_cap = migrate_postcopy_ram();
>  
> @@ -772,7 +773,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>           * special support.
>           */
>          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> -            !postcopy_ram_supported_by_host()) {
> +            !postcopy_ram_supported_by_host(mis)) {
>              /* postcopy_ram_supported_by_host will have emitted a more
>               * detailed message
>               */
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 9c41887..10d39a0 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -63,7 +63,7 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> -static bool ufd_version_check(int ufd)
> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>  {
>      struct uffdio_api api_struct;
>      uint64_t ioctl_mask;
> @@ -126,7 +126,7 @@ static int test_ramblock_postcopiable(const char *block_name, void *host_addr,
>   * normally fine since if the postcopy succeeds it gets turned back on at the
>   * end.
>   */
> -bool postcopy_ram_supported_by_host(void)
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>  {
>      long pagesize = getpagesize();
>      int ufd = -1;
> @@ -149,7 +149,7 @@ bool postcopy_ram_supported_by_host(void)
>      }
>  
>      /* Version and features check */
> -    if (!ufd_version_check(ufd)) {
> +    if (!ufd_version_check(ufd, mis)) {
>          goto out;
>      }
>  
> @@ -525,7 +525,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>       * Although the host check already tested the API, we need to
>       * do the check again as an ABI handshake on the new fd.
>       */
> -    if (!ufd_version_check(mis->userfault_fd)) {
> +    if (!ufd_version_check(mis->userfault_fd, mis)) {
>          return -1;
>      }
>  
> @@ -663,7 +663,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
>  
>  #else
>  /* No target OS support, stubs just fail */
> -bool postcopy_ram_supported_by_host(void)
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>  {
>      error_report("%s: No OS support", __func__);
>      return false;
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 52d51e8..587a8b8 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -14,7 +14,7 @@
>  #define QEMU_POSTCOPY_RAM_H
>  
>  /* Return true if the host supports everything we need to do postcopy-ram */
> -bool postcopy_ram_supported_by_host(void);
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
>  
>  /*
>   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 9c320f5..8b7bab8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1380,7 +1380,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> -    if (!postcopy_ram_supported_by_host()) {
> +    if (!postcopy_ram_supported_by_host(mis)) {
>          postcopy_state_set(POSTCOPY_INCOMING_NONE);
>          return -1;
>      }
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-07 14:13       ` Alexey Perevalov
@ 2017-06-09  6:06         ` Peter Xu
  2017-06-09  7:16           ` Alexey Perevalov
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Xu @ 2017-06-09  6:06 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets

On Wed, Jun 07, 2017 at 05:13:00PM +0300, Alexey Perevalov wrote:
> On 06/07/2017 12:46 PM, Alexey Perevalov wrote:
> >This patch adds ability to track down already copied
> >pages, it's necessary for calculation vCPU block time in
> >postcopy migration feature, maybe for restore after
> >postcopy migration failure.
> >Also it's necessary to solve shared memory issue in
> >postcopy livemigration. Information about copied pages
> >will be transferred to the software virtual bridge
> >(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >already copied pages. fallocate syscall is required for
> >remmaped shared memory, due to remmaping itself blocks
> >ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >error (struct page is exists after remmap).
> >
> >Bitmap is placed into RAMBlock as another postcopy/precopy
> >related bitmaps. Helpers are in migration/ram.c, due to
> >in this file is allowing to work with RAMBlock.
> >
> >Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >---
> >  include/exec/ram_addr.h |  2 ++
> >  migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
> >  migration/ram.h         |  4 ++++
> >  3 files changed, 42 insertions(+)
> >
> >diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> >index 140efa8..6a3780b 100644
> >--- a/include/exec/ram_addr.h
> >+++ b/include/exec/ram_addr.h
> >@@ -47,6 +47,8 @@ struct RAMBlock {
> >       * of the postcopy phase
> >       */
> >      unsigned long *unsentmap;
> >+    /* bitmap of already copied pages in postcopy */
> >+    unsigned long *copiedmap;
> >  };
> >  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> >diff --git a/migration/ram.c b/migration/ram.c
> >index f387e9c..a7c0db4 100644
> >--- a/migration/ram.c
> >+++ b/migration/ram.c
> >@@ -149,6 +149,25 @@ out:
> >      return ret;
> >  }
> >+static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
> >+{
> >+    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
> >+    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
> >+                                    sizeof(rb->page_size));
> >+
> >+    return addr_offset >> page_shift;
> >+}
> >+
> >+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> >+{
> >+    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
> >+}
> >+
> >+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> >+{
> >+    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
> >+}
> >+
> >  /*
> >   * An outstanding page request, on the source, having been received
> >   * and queued
> >@@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
> >          block->bmap = NULL;
> >          g_free(block->unsentmap);
> >          block->unsentmap = NULL;
> looks like it's wrong place, because copiedmap is living
> on destination side, so maybe in qemu_ram_free

Yes, and...

> >+        g_free(block->copiedmap);
> >+        block->copiedmap = NULL;
> >      }
> >      XBZRLE_cache_lock();
> >@@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
> >      return ret;
> >  }
> >+static unsigned long get_copiedmap_size(RAMBlock *rb)
> >+{
> >+    unsigned long pages;
> >+    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
> >+                                             sizeof(rb->page_size));
> >+    return pages;
> >+}
> >+
> >  static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >  {
> >      int flags = 0, ret = 0;
> >@@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      rcu_read_lock();
> >      if (postcopy_running) {
> >+        RAMBlock *rb;
> >+        RAMBLOCK_FOREACH(rb) {
> >+            /* need for destination, bitmap_new calls
> >+             * g_try_malloc0 and this function
> >+             * Attempts to allocate @n_bytes, initialized to 0'sh */
> >+            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));

... I'm not sure whether this is the right place to init the bitmap,
since iiuc ram_load() can be entered multiple times?

Also, I think we need the bitmap even before the first page we send
during precopy, right?

I would think loadvm_postcopy_handle_advise() somewhere proper: that
is before the first page is sent, and also when we are there it means
source wants to do postcopy finally.

Thanks,

> >+        }
> >          ret = ram_load_postcopy(f);
> >      }
> >diff --git a/migration/ram.h b/migration/ram.h
> >index c9563d1..1f32824 100644
> >--- a/migration/ram.h
> >+++ b/migration/ram.h
> >@@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
> >  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
> >  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> >+
> >+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> >+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> >+
> >  #endif
> 
> 
> -- 
> Best regards,
> Alexey Perevalov

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions
  2017-06-09  4:10       ` Peter Xu
@ 2017-06-09  6:21         ` Alexey Perevalov
  2017-06-09  7:14           ` Peter Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-09  6:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets

On 06/09/2017 07:10 AM, Peter Xu wrote:
> On Wed, Jun 07, 2017 at 12:46:29PM +0300, Alexey Perevalov wrote:
>> That tiny refactoring is necessary to be able to set
>> UFFD_FEATURE_THREAD_ID while requesting features, and then
>> to create downtime context in case when kernel supports it.
>>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>>   migration/migration.c    |  3 ++-
>>   migration/postcopy-ram.c | 10 +++++-----
>>   migration/postcopy-ram.h |  2 +-
>>   migration/savevm.c       |  2 +-
>>   4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 48c94c9..2a77636 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -726,6 +726,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>>                                     Error **errp)
>>   {
>>       MigrationState *s = migrate_get_current();
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
> If this patch is only servicing patch 6, I'd prefer in patch 6 we call
> migration_incoming_get_current() (rather than here), then this patch
> may be dropped?...
I planed this patch as preparation, I used to separate refactoring from 
main change, for
easy merging while rebasing.
mis - is necessary here to have the same behaviour as before.

>
> Thanks,
>
>>       MigrationCapabilityStatusList *cap;
>>       bool old_postcopy_cap = migrate_postcopy_ram();
>>   
>> @@ -772,7 +773,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>>            * special support.
>>            */
>>           if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
>> -            !postcopy_ram_supported_by_host()) {
>> +            !postcopy_ram_supported_by_host(mis)) {
>>               /* postcopy_ram_supported_by_host will have emitted a more
>>                * detailed message
>>                */
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index 9c41887..10d39a0 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -63,7 +63,7 @@ struct PostcopyDiscardState {
>>   #include <sys/eventfd.h>
>>   #include <linux/userfaultfd.h>
>>   
>> -static bool ufd_version_check(int ufd)
>> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>>   {
>>       struct uffdio_api api_struct;
>>       uint64_t ioctl_mask;
>> @@ -126,7 +126,7 @@ static int test_ramblock_postcopiable(const char *block_name, void *host_addr,
>>    * normally fine since if the postcopy succeeds it gets turned back on at the
>>    * end.
>>    */
>> -bool postcopy_ram_supported_by_host(void)
>> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>>   {
>>       long pagesize = getpagesize();
>>       int ufd = -1;
>> @@ -149,7 +149,7 @@ bool postcopy_ram_supported_by_host(void)
>>       }
>>   
>>       /* Version and features check */
>> -    if (!ufd_version_check(ufd)) {
>> +    if (!ufd_version_check(ufd, mis)) {
>>           goto out;
>>       }
>>   
>> @@ -525,7 +525,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>        * Although the host check already tested the API, we need to
>>        * do the check again as an ABI handshake on the new fd.
>>        */
>> -    if (!ufd_version_check(mis->userfault_fd)) {
>> +    if (!ufd_version_check(mis->userfault_fd, mis)) {
>>           return -1;
>>       }
>>   
>> @@ -663,7 +663,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
>>   
>>   #else
>>   /* No target OS support, stubs just fail */
>> -bool postcopy_ram_supported_by_host(void)
>> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>>   {
>>       error_report("%s: No OS support", __func__);
>>       return false;
>> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
>> index 52d51e8..587a8b8 100644
>> --- a/migration/postcopy-ram.h
>> +++ b/migration/postcopy-ram.h
>> @@ -14,7 +14,7 @@
>>   #define QEMU_POSTCOPY_RAM_H
>>   
>>   /* Return true if the host supports everything we need to do postcopy-ram */
>> -bool postcopy_ram_supported_by_host(void);
>> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
>>   
>>   /*
>>    * Make all of RAM sensitive to accesses to areas that haven't yet been written
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 9c320f5..8b7bab8 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1380,7 +1380,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
>>           return -1;
>>       }
>>   
>> -    if (!postcopy_ram_supported_by_host()) {
>> +    if (!postcopy_ram_supported_by_host(mis)) {
>>           postcopy_state_set(POSTCOPY_INCOMING_NONE);
>>           return -1;
>>       }
>> -- 
>> 1.9.1
>>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions
  2017-06-09  6:21         ` Alexey Perevalov
@ 2017-06-09  7:14           ` Peter Xu
  2017-06-09  7:25             ` Alexey Perevalov
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Xu @ 2017-06-09  7:14 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets

On Fri, Jun 09, 2017 at 09:21:38AM +0300, Alexey Perevalov wrote:
> On 06/09/2017 07:10 AM, Peter Xu wrote:
> >On Wed, Jun 07, 2017 at 12:46:29PM +0300, Alexey Perevalov wrote:
> >>That tiny refactoring is necessary to be able to set
> >>UFFD_FEATURE_THREAD_ID while requesting features, and then
> >>to create downtime context in case when kernel supports it.
> >>
> >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>---
> >>  migration/migration.c    |  3 ++-
> >>  migration/postcopy-ram.c | 10 +++++-----
> >>  migration/postcopy-ram.h |  2 +-
> >>  migration/savevm.c       |  2 +-
> >>  4 files changed, 9 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/migration/migration.c b/migration/migration.c
> >>index 48c94c9..2a77636 100644
> >>--- a/migration/migration.c
> >>+++ b/migration/migration.c
> >>@@ -726,6 +726,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> >>                                    Error **errp)
> >>  {
> >>      MigrationState *s = migrate_get_current();
> >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> >If this patch is only servicing patch 6, I'd prefer in patch 6 we call
> >migration_incoming_get_current() (rather than here), then this patch
> >may be dropped?...
> I planed this patch as preparation, I used to separate refactoring from main
> change, for
> easy merging while rebasing.
> mis - is necessary here to have the same behaviour as before.

Could I ask what's the "same behavior" you mentioned?

I thought this patch is only used by patch 6 when creating the
blocktime struct (but not really a clean-up), no?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-09  6:06         ` Peter Xu
@ 2017-06-09  7:16           ` Alexey Perevalov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-09  7:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets

On 06/09/2017 09:06 AM, Peter Xu wrote:
> On Wed, Jun 07, 2017 at 05:13:00PM +0300, Alexey Perevalov wrote:
>> On 06/07/2017 12:46 PM, Alexey Perevalov wrote:
>>> This patch adds ability to track down already copied
>>> pages, it's necessary for calculation vCPU block time in
>>> postcopy migration feature, maybe for restore after
>>> postcopy migration failure.
>>> Also it's necessary to solve shared memory issue in
>>> postcopy livemigration. Information about copied pages
>>> will be transferred to the software virtual bridge
>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>>> already copied pages. fallocate syscall is required for
>>> remmaped shared memory, due to remmaping itself blocks
>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>>> error (struct page is exists after remmap).
>>>
>>> Bitmap is placed into RAMBlock as another postcopy/precopy
>>> related bitmaps. Helpers are in migration/ram.c, due to
>>> in this file is allowing to work with RAMBlock.
>>>
>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>> ---
>>>   include/exec/ram_addr.h |  2 ++
>>>   migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
>>>   migration/ram.h         |  4 ++++
>>>   3 files changed, 42 insertions(+)
>>>
>>> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
>>> index 140efa8..6a3780b 100644
>>> --- a/include/exec/ram_addr.h
>>> +++ b/include/exec/ram_addr.h
>>> @@ -47,6 +47,8 @@ struct RAMBlock {
>>>        * of the postcopy phase
>>>        */
>>>       unsigned long *unsentmap;
>>> +    /* bitmap of already copied pages in postcopy */
>>> +    unsigned long *copiedmap;
>>>   };
>>>   static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index f387e9c..a7c0db4 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -149,6 +149,25 @@ out:
>>>       return ret;
>>>   }
>>> +static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
>>> +{
>>> +    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
>>> +    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
>>> +                                    sizeof(rb->page_size));
>>> +
>>> +    return addr_offset >> page_shift;
>>> +}
>>> +
>>> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
>>> +{
>>> +    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
>>> +}
>>> +
>>> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
>>> +{
>>> +    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
>>> +}
>>> +
>>>   /*
>>>    * An outstanding page request, on the source, having been received
>>>    * and queued
>>> @@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
>>>           block->bmap = NULL;
>>>           g_free(block->unsentmap);
>>>           block->unsentmap = NULL;
>> looks like it's wrong place, because copiedmap is living
>> on destination side, so maybe in qemu_ram_free
> Yes, and...
>
>>> +        g_free(block->copiedmap);
>>> +        block->copiedmap = NULL;
>>>       }
>>>       XBZRLE_cache_lock();
>>> @@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
>>>       return ret;
>>>   }
>>> +static unsigned long get_copiedmap_size(RAMBlock *rb)
>>> +{
>>> +    unsigned long pages;
size in bits, but I passed bytes, but as I remember  it was already 
mentioned.
>>> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
>>> +                                             sizeof(rb->page_size));

>>> +    return pages;
>>> +}
>>> +
>>>   static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>>   {
>>>       int flags = 0, ret = 0;
>>> @@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>>       rcu_read_lock();
>>>       if (postcopy_running) {
>>> +        RAMBlock *rb;
>>> +        RAMBLOCK_FOREACH(rb) {
>>> +            /* need for destination, bitmap_new calls
>>> +             * g_try_malloc0 and this function
>>> +             * Attempts to allocate @n_bytes, initialized to 0'sh */
>>> +            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
> ... I'm not sure whether this is the right place to init the bitmap,
> since iiuc ram_load() can be entered multiple times?
yes, you right, every time qemu_loadvm_section_part_end is called and it
qemu_loadvm_section_part_start too, so I didn't take it into account.
>
> Also, I think we need the bitmap even before the first page we send
> during precopy, right?
>
> I would think loadvm_postcopy_handle_advise() somewhere proper: that
> is before the first page is sent, and also when we are there it means
> source wants to do postcopy finally.
I think, you right again ),

loadvm_postcopy_handle_advise is calling before ram_discard_range (so page faults will be
after that) and before postcopy_place_page.


>
> Thanks,
>
>>> +        }
>>>           ret = ram_load_postcopy(f);
>>>       }
>>> diff --git a/migration/ram.h b/migration/ram.h
>>> index c9563d1..1f32824 100644
>>> --- a/migration/ram.h
>>> +++ b/migration/ram.h
>>> @@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>>>   int ram_postcopy_incoming_init(MigrationIncomingState *mis);
>>>   void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>>> +
>>> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
>>> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
>>> +
>>>   #endif
>>
>> -- 
>> Best regards,
>> Alexey Perevalov


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions
  2017-06-09  7:14           ` Peter Xu
@ 2017-06-09  7:25             ` Alexey Perevalov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-09  7:25 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets

On 06/09/2017 10:14 AM, Peter Xu wrote:
> On Fri, Jun 09, 2017 at 09:21:38AM +0300, Alexey Perevalov wrote:
>> On 06/09/2017 07:10 AM, Peter Xu wrote:
>>> On Wed, Jun 07, 2017 at 12:46:29PM +0300, Alexey Perevalov wrote:
>>>> That tiny refactoring is necessary to be able to set
>>>> UFFD_FEATURE_THREAD_ID while requesting features, and then
>>>> to create downtime context in case when kernel supports it.
>>>>
>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>> ---
>>>>   migration/migration.c    |  3 ++-
>>>>   migration/postcopy-ram.c | 10 +++++-----
>>>>   migration/postcopy-ram.h |  2 +-
>>>>   migration/savevm.c       |  2 +-
>>>>   4 files changed, 9 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 48c94c9..2a77636 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -726,6 +726,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>>>>                                     Error **errp)
>>>>   {
>>>>       MigrationState *s = migrate_get_current();
>>>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>>> If this patch is only servicing patch 6, I'd prefer in patch 6 we call
>>> migration_incoming_get_current() (rather than here), then this patch
>>> may be dropped?...
>> I planed this patch as preparation, I used to separate refactoring from main
>> change, for
>> easy merging while rebasing.
>> mis - is necessary here to have the same behaviour as before.
> Could I ask what's the "same behavior" you mentioned?
I meant no changes in functionality.
>
> I thought this patch is only used by patch 6 when creating the
> blocktime struct (but not really a clean-up), no?
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 04/11] migration: split ufd_version_check onto receive/request features part
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 04/11] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
@ 2017-06-12  9:52       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-12  9:52 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This modification is necessary for userfault fd features which are
> required to be requested from userspace.
> UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
> be introduced in the next patch.
> 
> QEMU have to use separate userfault file descriptor, due to
> userfault context has internal state, and after first call of
> ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
> success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
> So only one ioctl with UFFD_API is possible per ufd.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/postcopy-ram.c | 94 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 88 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 8838901..cbe8f9f 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -63,16 +63,67 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> +
> +/**
> + * receive_ufd_features: check userfault fd features, to request only supported
> + * features in the future.
> + *
> + * Returns: true on success
> + *
> + * __NR_userfaultfd - should be checked before
> + *  @features: out parameter will contain uffdio_api.features provided by kernel
> + *              in case of success
> + */
> +static bool receive_ufd_features(uint64_t *features)
>  {
> -    struct uffdio_api api_struct;
> -    uint64_t ioctl_mask;
> +    struct uffdio_api api_struct = {0};
> +    int ufd;
> +    bool ret = true;
> +
> +    /* if we are here __NR_userfaultfd should exists */
> +    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +    if (ufd == -1) {
> +        error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
> +                     strerror(errno));
> +        return false;
> +    }
>  
> +    /* ask features */
>      api_struct.api = UFFD_API;
>      api_struct.features = 0;
>      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>          error_report("%s: UFFDIO_API failed: %s", __func__,
>                       strerror(errno));
> +        ret = false;
> +        goto release_ufd;
> +    }
> +
> +    *features = api_struct.features;
> +
> +release_ufd:
> +    close(ufd);
> +    return ret;
> +}
> +
> +/**
> + * request_ufd_features: this function should be called only once on a newly
> + * opened ufd, subsequent calls will lead to error.
> + *
> + * Returns: true on succes
> + *
> + * @ufd: fd obtained from userfaultfd syscall
> + * @features: bit mask see UFFD_API_FEATURES
> + */
> +static bool request_ufd_features(int ufd, uint64_t features)
> +{
> +    struct uffdio_api api_struct = {0};
> +    uint64_t ioctl_mask;
> +
> +    api_struct.api = UFFD_API;
> +    api_struct.features = features;
> +    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> +        error_report("%s failed: UFFDIO_API failed: %s", __func__,
> +                     strerror(errno));
>          return false;
>      }
>  
> @@ -84,11 +135,42 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>          return false;
>      }
>  
> +    return true;
> +}
> +
> +static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
> +{
> +    uint64_t asked_features = 0;
> +    static uint64_t supported_features;
> +
> +    /*
> +     * it's not possible to
> +     * request UFFD_API twice per one fd
> +     * userfault fd features is persistent
> +     */
> +    if (!supported_features) {
> +        if (!receive_ufd_features(&supported_features)) {
> +            error_report("%s failed", __func__);
> +            return false;
> +        }
> +    }
> +
> +    /*
> +     * request features, even if asked_features is 0, due to
> +     * kernel expects UFFD_API before UFFDIO_REGISTER, per
> +     * userfault file descriptor
> +     */
> +    if (!request_ufd_features(ufd, asked_features)) {
> +        error_report("%s failed: features %" PRIu64, __func__,
> +                     asked_features);
> +        return false;
> +    }
> +
>      if (getpagesize() != ram_pagesize_summary()) {
>          bool have_hp = false;
>          /* We've got a huge page */
>  #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
> -        have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
> +        have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
>  #endif
>          if (!have_hp) {
>              error_report("Userfault on this host does not support huge pages");
> @@ -149,7 +231,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>      }
>  
>      /* Version and features check */
> -    if (!ufd_version_check(ufd, mis)) {
> +    if (!ufd_check_and_apply(ufd, mis)) {
>          goto out;
>      }
>  
> @@ -525,7 +607,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>       * Although the host check already tested the API, we need to
>       * do the check again as an ABI handshake on the new fd.
>       */
> -    if (!ufd_version_check(mis->userfault_fd, mis)) {
> +    if (!ufd_check_and_apply(mis->userfault_fd, mis)) {
>          return -1;
>      }
>  
> -- 
> 1.9.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page Alexey Perevalov
  2017-06-07 12:56       ` Juan Quintela
  2017-06-07 14:13       ` Alexey Perevalov
@ 2017-06-12 11:11       ` Dr. David Alan Gilbert
  2017-06-13  5:59       ` Peter Xu
  3 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-12 11:11 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch adds ability to track down already copied
> pages, it's necessary for calculation vCPU block time in
> postcopy migration feature, maybe for restore after
> postcopy migration failure.
> Also it's necessary to solve shared memory issue in
> postcopy livemigration. Information about copied pages
> will be transferred to the software virtual bridge
> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> already copied pages. fallocate syscall is required for
> remmaped shared memory, due to remmaping itself blocks
> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> error (struct page is exists after remmap).
> 
> Bitmap is placed into RAMBlock as another postcopy/precopy
> related bitmaps. Helpers are in migration/ram.c, due to
> in this file is allowing to work with RAMBlock.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/exec/ram_addr.h |  2 ++
>  migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
>  migration/ram.h         |  4 ++++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 140efa8..6a3780b 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -47,6 +47,8 @@ struct RAMBlock {
>       * of the postcopy phase
>       */
>      unsigned long *unsentmap;
> +    /* bitmap of already copied pages in postcopy */
> +    unsigned long *copiedmap;
>  };
>  
>  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> diff --git a/migration/ram.c b/migration/ram.c
> index f387e9c..a7c0db4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -149,6 +149,25 @@ out:
>      return ret;
>  }
>  
> +static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
> +{
> +    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
> +    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
> +                                    sizeof(rb->page_size));
> +
> +    return addr_offset >> page_shift;
> +}
> +
> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> +{
> +    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
> +}
> +
> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> +{
> +    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
> +}

Hi,
  Can you please make the 'uint64_t addr' you pass in here be
void *host_addr  ; it's just since we have so many types of addresses
it gets a bit confusing.

>  /*
>   * An outstanding page request, on the source, having been received
>   * and queued
> @@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
>          block->bmap = NULL;
>          g_free(block->unsentmap);
>          block->unsentmap = NULL;
> +        g_free(block->copiedmap);
> +        block->copiedmap = NULL;
>      }
>  
>      XBZRLE_cache_lock();
> @@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
>      return ret;
>  }
>  
> +static unsigned long get_copiedmap_size(RAMBlock *rb)
> +{
> +    unsigned long pages;
> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
> +                                             sizeof(rb->page_size));
> +    return pages;
> +}

I think the bitmap size should be the same size for all bitmaps; so you
shouldn't need a copiedmap specific function?

>  static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      int flags = 0, ret = 0;
> @@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      rcu_read_lock();
>  
>      if (postcopy_running) {
> +        RAMBlock *rb;
> +        RAMBLOCK_FOREACH(rb) {
> +            /* need for destination, bitmap_new calls
> +             * g_try_malloc0 and this function
> +             * Attempts to allocate @n_bytes, initialized to 0'sh */
> +            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
> +        }

Do you need to record the pages that have been received prior to
postcopy starting (and discard entries when 'discard' messages are
received?).

Dave

>          ret = ram_load_postcopy(f);
>      }
>  
> diff --git a/migration/ram.h b/migration/ram.h
> index c9563d1..1f32824 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
>  
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> +
> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> +
>  #endif
> -- 
> 1.9.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side Alexey Perevalov
  2017-06-07 13:11       ` Juan Quintela
@ 2017-06-12 11:34       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-12 11:34 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets, peterx

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch provides blocktime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for blocktime per vCPU (could be traced with page_fault_addr)
> 
> Blocktime will not calculated if postcopy_blocktime field of
> MigrationIncomingState wasn't initialized.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

I think this is mostly ok now, minor comments below;

> ---
>  migration/postcopy-ram.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events   |   5 +-
>  2 files changed, 142 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 62a272a..0ad9f9f 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -27,6 +27,7 @@
>  #include "ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include <sys/param.h>
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -561,6 +562,133 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid) {
> +            return cpu_iter->cpu_index;
> +        }
> +    }
> +    trace_get_mem_fault_cpu_index(pid);
> +    return -1;
> +}
> +
> +/*
> + * This function is being called when pagefault occurs. It
> + * tracks down vCPU blocking time.
> + *
> + * @addr: faulted host virtual address
> + * @ptid: faulted process thread id
> + * @rb: ramblock appropriate to addr
> + */
> +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> +                                          RAMBlock *rb)
> +{
> +    int cpu;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int64_t now_ms;
> +
> +    if (!dc || ptid == 0) {
> +        return;
> +    }
> +    cpu = get_mem_fault_cpu_index(ptid);
> +    if (cpu < 0) {
> +        return;
> +    }
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    if (dc->vcpu_addr[cpu] == 0) {
> +        atomic_inc(&dc->smp_cpus_down);
> +    }
> +
> +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
> +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> +
> +    if (test_copiedmap_by_addr(addr, rb)) {
> +        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
> +        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
> +        atomic_sub(&dc->smp_cpus_down, 1);
> +    }
> +    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> +                                        cpu);

You could add a flag to the trace to help let you know if you hit the
'copiedmap' case.

> +}
> +
> +/*
> + *  This function just provide calculated blocktime per cpu and trace it.
> + *  Total blocktime is calculated in mark_postcopy_blocktime_end.
> + *
> + *
> + * Assume we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
> + *            it's a part of total blocktime.
> + * S1 - here is last_begin
> + * Legend of the picture is following:
> + *              * - means blocktime per vCPU
> + *              x - means overlapped blocktime (total blocktime)
> + *
> + * @addr: host virtual address
> + */
> +static void mark_postcopy_blocktime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int i, affected_cpu = 0;
> +    int64_t now_ms;
> +    bool vcpu_total_blocktime = false;
> +
> +    if (!dc) {
> +        return;
> +    }
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    /* lookup cpu, to clear it,
> +     * that algorithm looks straighforward, but it's not
> +     * optimal, more optimal algorithm is keeping tree or hash
> +     * where key is address value is a list of  */
> +    for (i = 0; i < smp_cpus; i++) {
> +        uint64_t vcpu_blocktime = 0;
> +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
> +            continue;
> +        }
> +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
> +        vcpu_blocktime = now_ms -
> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
> +        affected_cpu += 1;
> +        /* we need to know is that mark_postcopy_end was due to
> +         * faulted page, another possible case it's prefetched
> +         * page and in that case we shouldn't be here */
> +        if (!vcpu_total_blocktime &&
> +            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
> +            vcpu_total_blocktime = true;
> +        }
> +        /* continue cycle, due to one page could affect several vCPUs */
> +        dc->vcpu_blocktime[i] += vcpu_blocktime;
> +    }
> +
> +    atomic_sub(&dc->smp_cpus_down, affected_cpu);
> +    if (vcpu_total_blocktime) {
> +        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
> +    }
> +    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime);

you could add affected_cpu to the trace.
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -638,8 +766,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset,
> +                                                msg.arg.pagefault.feat.ptid);
>  
> +        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
> +                                      msg.arg.pagefault.feat.ptid, rb);
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -723,6 +854,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>      copy_struct.len = pagesize;
>      copy_struct.mode = 0;
>  
> +    /* copied page isn't feature of blocktime calculation,
> +     * it's more general entity, so keep it here,
> +     * but gup betwean two following operation could be high,
              ^---gap ?

Dave

> +     * and in this case blocktime for such small interval will be lost */
> +    set_copiedmap_by_addr((uint64_t)(uintptr_t)host, rb);
> +    mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host);
>      /* copy also acks to the kernel waking the stalled thread up
>       * TODO: We can inhibit that ack and only do it if it was requested
>       * which would be slightly cheaper, but we'd have to be careful
> diff --git a/migration/trace-events b/migration/trace-events
> index 5b8ccf3..7bdadbb 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -112,6 +112,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -188,7 +190,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -197,6 +199,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 1.9.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 01/11] userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 01/11] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
@ 2017-06-12 12:27       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-12 12:27 UTC (permalink / raw)
  To: Alexey Perevalov, aarcange; +Cc: qemu-devel, i.maximets, peterx

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This commit duplicates header of "userfaultfd: provide pid in userfault msg"
> into linux kernel.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

OK, so this isn't yet merged into Linus' tree from what I can tell;
we need to wait until it gets merged, and then run the
scripts/update-linux-headers.sh

Dave

> ---
>  linux-headers/linux/userfaultfd.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
> index 9701772..eda028c 100644
> --- a/linux-headers/linux/userfaultfd.h
> +++ b/linux-headers/linux/userfaultfd.h
> @@ -78,6 +78,9 @@ struct uffd_msg {
>  		struct {
>  			__u64	flags;
>  			__u64	address;
> +			union {
> +				__u32   ptid;
> +			} feat;
>  		} pagefault;
>  
>  		struct {
> @@ -161,6 +164,7 @@ struct uffdio_api {
>  #define UFFD_FEATURE_MISSING_HUGETLBFS		(1<<4)
>  #define UFFD_FEATURE_MISSING_SHMEM		(1<<5)
>  #define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
> +#define UFFD_FEATURE_THREAD_ID			(1<<7)
>  	__u64 features;
>  
>  	__u64 ioctls;
> -- 
> 1.9.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page Alexey Perevalov
                         ` (2 preceding siblings ...)
  2017-06-12 11:11       ` Dr. David Alan Gilbert
@ 2017-06-13  5:59       ` Peter Xu
  2017-06-13  6:10         ` Alexey Perevalov
  3 siblings, 1 reply; 36+ messages in thread
From: Peter Xu @ 2017-06-13  5:59 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets

On Wed, Jun 07, 2017 at 12:46:34PM +0300, Alexey Perevalov wrote:
> This patch adds ability to track down already copied
> pages, it's necessary for calculation vCPU block time in
> postcopy migration feature, maybe for restore after
> postcopy migration failure.
> Also it's necessary to solve shared memory issue in
> postcopy livemigration. Information about copied pages
> will be transferred to the software virtual bridge
> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> already copied pages. fallocate syscall is required for
> remmaped shared memory, due to remmaping itself blocks
> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> error (struct page is exists after remmap).
> 
> Bitmap is placed into RAMBlock as another postcopy/precopy
> related bitmaps. Helpers are in migration/ram.c, due to
> in this file is allowing to work with RAMBlock.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Hi, Alexey,

Besides all the existing comments, I would suggest you do all the
copied_map things in this single patch, so that it'll be easier for
others to work upon your work. E.g., move the bit_set() operations
here as well (currently it was in followup patches, and looks like
that's not enough since we need to capture copied_map even for precopy
phase), then this single patch can ideally be separated from the whole
series (and then I can work upon it :-).

Or, please just let me know if you want me to do this for you. I can
post this as a standalone patch, with your s-o-b if you allow.

Thanks,

> ---
>  include/exec/ram_addr.h |  2 ++
>  migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
>  migration/ram.h         |  4 ++++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 140efa8..6a3780b 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -47,6 +47,8 @@ struct RAMBlock {
>       * of the postcopy phase
>       */
>      unsigned long *unsentmap;
> +    /* bitmap of already copied pages in postcopy */
> +    unsigned long *copiedmap;
>  };
>  
>  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> diff --git a/migration/ram.c b/migration/ram.c
> index f387e9c..a7c0db4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -149,6 +149,25 @@ out:
>      return ret;
>  }
>  
> +static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
> +{
> +    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
> +    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
> +                                    sizeof(rb->page_size));
> +
> +    return addr_offset >> page_shift;
> +}
> +
> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> +{
> +    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
> +}
> +
> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
> +{
> +    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
> +}
> +
>  /*
>   * An outstanding page request, on the source, having been received
>   * and queued
> @@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
>          block->bmap = NULL;
>          g_free(block->unsentmap);
>          block->unsentmap = NULL;
> +        g_free(block->copiedmap);
> +        block->copiedmap = NULL;
>      }
>  
>      XBZRLE_cache_lock();
> @@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
>      return ret;
>  }
>  
> +static unsigned long get_copiedmap_size(RAMBlock *rb)
> +{
> +    unsigned long pages;
> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
> +                                             sizeof(rb->page_size));
> +    return pages;
> +}
> +
>  static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      int flags = 0, ret = 0;
> @@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      rcu_read_lock();
>  
>      if (postcopy_running) {
> +        RAMBlock *rb;
> +        RAMBLOCK_FOREACH(rb) {
> +            /* need for destination, bitmap_new calls
> +             * g_try_malloc0 and this function
> +             * Attempts to allocate @n_bytes, initialized to 0'sh */
> +            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
> +        }
>          ret = ram_load_postcopy(f);
>      }
>  
> diff --git a/migration/ram.h b/migration/ram.h
> index c9563d1..1f32824 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
>  
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> +
> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
> +
>  #endif
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-13  5:59       ` Peter Xu
@ 2017-06-13  6:10         ` Alexey Perevalov
  2017-06-13  6:23           ` Peter Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Alexey Perevalov @ 2017-06-13  6:10 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, dgilbert, i.maximets

On 06/13/2017 08:59 AM, Peter Xu wrote:
> On Wed, Jun 07, 2017 at 12:46:34PM +0300, Alexey Perevalov wrote:
>> This patch adds ability to track down already copied
>> pages, it's necessary for calculation vCPU block time in
>> postcopy migration feature, maybe for restore after
>> postcopy migration failure.
>> Also it's necessary to solve shared memory issue in
>> postcopy livemigration. Information about copied pages
>> will be transferred to the software virtual bridge
>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>> already copied pages. fallocate syscall is required for
>> remmaped shared memory, due to remmaping itself blocks
>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>> error (struct page is exists after remmap).
>>
>> Bitmap is placed into RAMBlock as another postcopy/precopy
>> related bitmaps. Helpers are in migration/ram.c, due to
>> in this file is allowing to work with RAMBlock.
>>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> Hi, Alexey,
>
> Besides all the existing comments, I would suggest you do all the
> copied_map things in this single patch, so that it'll be easier for
> others to work upon your work. E.g., move the bit_set() operations
> here as well (currently it was in followup patches, and looks like
> that's not enough since we need to capture copied_map even for precopy
> phase), then this single patch can ideally be separated from the whole
> series (and then I can work upon it :-).
>
> Or, please just let me know if you want me to do this for you. I can
> post this as a standalone patch, with your s-o-b if you allow.

Hello Peter,
I'm working with this patch in another patch series too.
(it's about QEMU's shared memory and OVS-VSWITCHD,
vhost-user use case).
So if you need that I could resend this patch as separate patch.
And it will be convenient to base both my patch set and you patches
on top of it.
>
> Thanks,
>
>> ---
>>   include/exec/ram_addr.h |  2 ++
>>   migration/ram.c         | 36 ++++++++++++++++++++++++++++++++++++
>>   migration/ram.h         |  4 ++++
>>   3 files changed, 42 insertions(+)
>>
>> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
>> index 140efa8..6a3780b 100644
>> --- a/include/exec/ram_addr.h
>> +++ b/include/exec/ram_addr.h
>> @@ -47,6 +47,8 @@ struct RAMBlock {
>>        * of the postcopy phase
>>        */
>>       unsigned long *unsentmap;
>> +    /* bitmap of already copied pages in postcopy */
>> +    unsigned long *copiedmap;
>>   };
>>   
>>   static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
>> diff --git a/migration/ram.c b/migration/ram.c
>> index f387e9c..a7c0db4 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -149,6 +149,25 @@ out:
>>       return ret;
>>   }
>>   
>> +static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
>> +{
>> +    uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
>> +    int page_shift = find_first_bit((unsigned long *)&rb->page_size,
>> +                                    sizeof(rb->page_size));
>> +
>> +    return addr_offset >> page_shift;
>> +}
>> +
>> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
>> +{
>> +    return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
>> +}
>> +
>> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
>> +{
>> +    set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
>> +}
>> +
>>   /*
>>    * An outstanding page request, on the source, having been received
>>    * and queued
>> @@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
>>           block->bmap = NULL;
>>           g_free(block->unsentmap);
>>           block->unsentmap = NULL;
>> +        g_free(block->copiedmap);
>> +        block->copiedmap = NULL;
>>       }
>>   
>>       XBZRLE_cache_lock();
>> @@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
>>       return ret;
>>   }
>>   
>> +static unsigned long get_copiedmap_size(RAMBlock *rb)
>> +{
>> +    unsigned long pages;
>> +    pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
>> +                                             sizeof(rb->page_size));
>> +    return pages;
>> +}
>> +
>>   static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>   {
>>       int flags = 0, ret = 0;
>> @@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>       rcu_read_lock();
>>   
>>       if (postcopy_running) {
>> +        RAMBlock *rb;
>> +        RAMBLOCK_FOREACH(rb) {
>> +            /* need for destination, bitmap_new calls
>> +             * g_try_malloc0 and this function
>> +             * Attempts to allocate @n_bytes, initialized to 0'sh */
>> +            rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
>> +        }
>>           ret = ram_load_postcopy(f);
>>       }
>>   
>> diff --git a/migration/ram.h b/migration/ram.h
>> index c9563d1..1f32824 100644
>> --- a/migration/ram.h
>> +++ b/migration/ram.h
>> @@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>>   int ram_postcopy_incoming_init(MigrationIncomingState *mis);
>>   
>>   void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>> +
>> +int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
>> +void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
>> +
>>   #endif
>> -- 
>> 1.9.1
>>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page
  2017-06-13  6:10         ` Alexey Perevalov
@ 2017-06-13  6:23           ` Peter Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Xu @ 2017-06-13  6:23 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, dgilbert, i.maximets

On Tue, Jun 13, 2017 at 09:10:46AM +0300, Alexey Perevalov wrote:
> On 06/13/2017 08:59 AM, Peter Xu wrote:
> >On Wed, Jun 07, 2017 at 12:46:34PM +0300, Alexey Perevalov wrote:
> >>This patch adds ability to track down already copied
> >>pages, it's necessary for calculation vCPU block time in
> >>postcopy migration feature, maybe for restore after
> >>postcopy migration failure.
> >>Also it's necessary to solve shared memory issue in
> >>postcopy livemigration. Information about copied pages
> >>will be transferred to the software virtual bridge
> >>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >>already copied pages. fallocate syscall is required for
> >>remmaped shared memory, due to remmaping itself blocks
> >>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >>error (struct page is exists after remmap).
> >>
> >>Bitmap is placed into RAMBlock as another postcopy/precopy
> >>related bitmaps. Helpers are in migration/ram.c, due to
> >>in this file is allowing to work with RAMBlock.
> >>
> >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >Hi, Alexey,
> >
> >Besides all the existing comments, I would suggest you do all the
> >copied_map things in this single patch, so that it'll be easier for
> >others to work upon your work. E.g., move the bit_set() operations
> >here as well (currently it was in followup patches, and looks like
> >that's not enough since we need to capture copied_map even for precopy
> >phase), then this single patch can ideally be separated from the whole
> >series (and then I can work upon it :-).
> >
> >Or, please just let me know if you want me to do this for you. I can
> >post this as a standalone patch, with your s-o-b if you allow.
> 
> Hello Peter,
> I'm working with this patch in another patch series too.
> (it's about QEMU's shared memory and OVS-VSWITCHD,
> vhost-user use case).
> So if you need that I could resend this patch as separate patch.
> And it will be convenient to base both my patch set and you patches
> on top of it.

That'll be great!  Then please post this as standalone patch.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-06-13  6:23 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20170607094720eucas1p24650bb7bb139ae209fc0ea8c5c57534b@eucas1p2.samsung.com>
2017-06-07  9:46 ` [Qemu-devel] [PATCH v8 00/11] calculate blocktime for postcopy live migration Alexey Perevalov
     [not found]   ` <CGME20170607094726eucas1p146abfbdb92413f43fa395a5004d2541a@eucas1p1.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 01/11] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
2017-06-12 12:27       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170607094727eucas1p13b2228fead9fc5a49d953985c777b719@eucas1p1.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions Alexey Perevalov
2017-06-09  4:10       ` Peter Xu
2017-06-09  6:21         ` Alexey Perevalov
2017-06-09  7:14           ` Peter Xu
2017-06-09  7:25             ` Alexey Perevalov
     [not found]   ` <CGME20170607094727eucas1p2d1063171fa2850fc1d590b286cd5d880@eucas1p2.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 03/11] migration: fix hardcoded function name in error report Alexey Perevalov
2017-06-07 12:31       ` Juan Quintela
     [not found]   ` <CGME20170607094728eucas1p1984b365dd09f3222b758075e651a5b5d@eucas1p1.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 04/11] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
2017-06-12  9:52       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170607094728eucas1p228f096ea7eebf7e791392c9193cefec0@eucas1p2.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 05/11] migration: introduce postcopy-blocktime capability Alexey Perevalov
2017-06-07 12:34       ` Juan Quintela
     [not found]   ` <CGME20170607094729eucas1p119b3d77f7d869eb06c16c9e91215e8cd@eucas1p1.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState Alexey Perevalov
2017-06-07 12:43       ` Juan Quintela
2017-06-07 12:53         ` Alexey Perevalov
     [not found]   ` <CGME20170607094729eucas1p15097f154039365d5e135f92b72aad1bf@eucas1p1.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page Alexey Perevalov
2017-06-07 12:56       ` Juan Quintela
2017-06-07 14:46         ` Alexey Perevalov
2017-06-07 14:13       ` Alexey Perevalov
2017-06-09  6:06         ` Peter Xu
2017-06-09  7:16           ` Alexey Perevalov
2017-06-12 11:11       ` Dr. David Alan Gilbert
2017-06-13  5:59       ` Peter Xu
2017-06-13  6:10         ` Alexey Perevalov
2017-06-13  6:23           ` Peter Xu
     [not found]   ` <CGME20170607094730eucas1p2126d9850427e7b4af92898b64b7b805a@eucas1p2.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 08/11] migration: postcopy_place_page factoring out Alexey Perevalov
2017-06-07 12:58       ` Juan Quintela
     [not found]   ` <CGME20170607094730eucas1p29b692c0f813d5368d70d999ca8a1f186@eucas1p2.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side Alexey Perevalov
2017-06-07 13:11       ` Juan Quintela
2017-06-12 11:34       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170607094731eucas1p2cbbf439e841b1d72edb374d35d53bea3@eucas1p2.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 10/11] migration: add postcopy total blocktime into query-migrate Alexey Perevalov
     [not found]   ` <CGME20170607094732eucas1p199fb11b4189929a105515f6079415ebe@eucas1p1.samsung.com>
2017-06-07  9:46     ` [Qemu-devel] [PATCH v8 11/11] migration: postcopy_blocktime documentation Alexey Perevalov
2017-06-07 12:52       ` Juan Quintela
2017-06-07 13:08         ` Alexey Perevalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.