kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PULL 00/19] Migration patches
@ 2019-07-12 14:31 Juan Quintela
  2019-07-12 14:31 ` [PULL 01/19] migration: fix multifd_recv event typo Juan Quintela
                   ` (19 more replies)
  0 siblings, 20 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth

The following changes since commit a2a9d4adabe340617a24eb73a8b2a116d28a6b38:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190712' into staging (2019-07-12 11:06:48 +0100)

are available in the Git repository at:

  https://github.com/juanquintela/qemu.git tags/migration-pull-request

for you to fetch changes up to a48ad5602f496236b4e1955d9e2e8228a7d0ad56:

  migration: allow private destination ram with x-ignore-shared (2019-07-12 16:25:59 +0200)

----------------------------------------------------------------
Migration pull request

Fix the issues with the previous pull request and 32 bits.

Please apply.

----------------------------------------------------------------

Juan Quintela (3):
  migration: fix multifd_recv event typo
  migration-test: rename parameter to parameter_int
  migration-test: Add migration multifd test

Peng Tao (1):
  migration: allow private destination ram with x-ignore-shared

Peter Xu (10):
  migration: No need to take rcu during sync_dirty_bitmap
  memory: Don't set migration bitmap when without migration
  bitmap: Add bitmap_copy_with_{src|dst}_offset()
  memory: Pass mr into snapshot_and_clear_dirty
  memory: Introduce memory listener hook log_clear()
  kvm: Update comments for sync_dirty_bitmap
  kvm: Persistent per kvmslot dirty bitmap
  kvm: Introduce slots lock for memory listener
  kvm: Support KVM_CLEAR_DIRTY_LOG
  migration: Split log_clear() into smaller chunks

Wei Yang (5):
  migration/multifd: call multifd_send_sync_main when sending
    RAM_SAVE_FLAG_EOS
  migration/xbzrle: update cache and current_data in one place
  cutils: remove one unnecessary pointer operation
  migration/multifd: sync packet_num after all thread are done
  migration/ram.c: reset complete_round when we gets a queued page

 accel/kvm/kvm-all.c      | 260 ++++++++++++++++++++++++++++++++++++---
 accel/kvm/trace-events   |   1 +
 exec.c                   |  15 ++-
 include/exec/memory.h    |  19 +++
 include/exec/ram_addr.h  |  92 +++++++++++++-
 include/qemu/bitmap.h    |   9 ++
 include/sysemu/kvm_int.h |   4 +
 memory.c                 |  56 ++++++++-
 migration/migration.c    |   4 +
 migration/migration.h    |  27 ++++
 migration/ram.c          |  93 ++++++++++----
 migration/trace-events   |   3 +-
 tests/Makefile.include   |   2 +
 tests/migration-test.c   | 103 ++++++++++++----
 tests/test-bitmap.c      |  72 +++++++++++
 util/bitmap.c            |  85 +++++++++++++
 util/cutils.c            |   8 +-
 17 files changed, 770 insertions(+), 83 deletions(-)
 create mode 100644 tests/test-bitmap.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PULL 01/19] migration: fix multifd_recv event typo
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 02/19] migration-test: rename parameter to parameter_int Juan Quintela
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

It uses num in multifd_send().  Make it coherent.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/trace-events b/migration/trace-events
index de2e136e57..cd50a1e659 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -80,7 +80,7 @@ get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_throttle(void) ""
-multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %d packet number %" PRIu64 " pages %d flags 0x%x next packet size %d"
+multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags 0x%x next packet size %d"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %d"
 multifd_recv_sync_main_wait(uint8_t id) "channel %d"
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 02/19] migration-test: rename parameter to parameter_int
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
  2019-07-12 14:31 ` [PULL 01/19] migration: fix multifd_recv event typo Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 03/19] migration-test: Add migration multifd test Juan Quintela
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

We would need _str ones on the next patch.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 tests/migration-test.c | 55 +++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index b6434628e1..a4feb9545d 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -398,7 +398,8 @@ static char *migrate_get_socket_address(QTestState *who, const char *parameter)
     return result;
 }
 
-static long long migrate_get_parameter(QTestState *who, const char *parameter)
+static long long migrate_get_parameter_int(QTestState *who,
+                                           const char *parameter)
 {
     QDict *rsp;
     long long result;
@@ -409,17 +410,17 @@ static long long migrate_get_parameter(QTestState *who, const char *parameter)
     return result;
 }
 
-static void migrate_check_parameter(QTestState *who, const char *parameter,
-                                    long long value)
+static void migrate_check_parameter_int(QTestState *who, const char *parameter,
+                                        long long value)
 {
     long long result;
 
-    result = migrate_get_parameter(who, parameter);
+    result = migrate_get_parameter_int(who, parameter);
     g_assert_cmpint(result, ==, value);
 }
 
-static void migrate_set_parameter(QTestState *who, const char *parameter,
-                                  long long value)
+static void migrate_set_parameter_int(QTestState *who, const char *parameter,
+                                      long long value)
 {
     QDict *rsp;
 
@@ -429,7 +430,7 @@ static void migrate_set_parameter(QTestState *who, const char *parameter,
                     parameter, value);
     g_assert(qdict_haskey(rsp, "return"));
     qobject_unref(rsp);
-    migrate_check_parameter(who, parameter, value);
+    migrate_check_parameter_int(who, parameter, value);
 }
 
 static void migrate_pause(QTestState *who)
@@ -681,7 +682,7 @@ static void deprecated_set_downtime(QTestState *who, const double value)
                     " 'arguments': { 'value': %f } }", value);
     g_assert(qdict_haskey(rsp, "return"));
     qobject_unref(rsp);
-    migrate_check_parameter(who, "downtime-limit", value * 1000);
+    migrate_check_parameter_int(who, "downtime-limit", value * 1000);
 }
 
 static void deprecated_set_speed(QTestState *who, long long value)
@@ -692,7 +693,7 @@ static void deprecated_set_speed(QTestState *who, long long value)
                           "'arguments': { 'value': %lld } }", value);
     g_assert(qdict_haskey(rsp, "return"));
     qobject_unref(rsp);
-    migrate_check_parameter(who, "max-bandwidth", value);
+    migrate_check_parameter_int(who, "max-bandwidth", value);
 }
 
 static void deprecated_set_cache_size(QTestState *who, long long value)
@@ -703,7 +704,7 @@ static void deprecated_set_cache_size(QTestState *who, long long value)
                          "'arguments': { 'value': %lld } }", value);
     g_assert(qdict_haskey(rsp, "return"));
     qobject_unref(rsp);
-    migrate_check_parameter(who, "xbzrle-cache-size", value);
+    migrate_check_parameter_int(who, "xbzrle-cache-size", value);
 }
 
 static void test_deprecated(void)
@@ -738,8 +739,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
      * quickly, but that it doesn't complete precopy even on a slow
      * machine, so also set the downtime.
      */
-    migrate_set_parameter(from, "max-bandwidth", 100000000);
-    migrate_set_parameter(from, "downtime-limit", 1);
+    migrate_set_parameter_int(from, "max-bandwidth", 100000000);
+    migrate_set_parameter_int(from, "downtime-limit", 1);
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -790,7 +791,7 @@ static void test_postcopy_recovery(void)
     }
 
     /* Turn postcopy speed down, 4K/s is slow enough on any machines */
-    migrate_set_parameter(from, "max-postcopy-bandwidth", 4096);
+    migrate_set_parameter_int(from, "max-postcopy-bandwidth", 4096);
 
     /* Now we start the postcopy */
     migrate_postcopy_start(from, to);
@@ -831,7 +832,7 @@ static void test_postcopy_recovery(void)
     g_free(uri);
 
     /* Restore the postcopy bandwidth to unlimited */
-    migrate_set_parameter(from, "max-postcopy-bandwidth", 0);
+    migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
     migrate_postcopy_complete(from, to);
 }
@@ -877,9 +878,9 @@ static void test_precopy_unix(void)
      * machine, so also set the downtime.
      */
     /* 1 ms should make it not converge*/
-    migrate_set_parameter(from, "downtime-limit", 1);
+    migrate_set_parameter_int(from, "downtime-limit", 1);
     /* 1GB/s */
-    migrate_set_parameter(from, "max-bandwidth", 1000000000);
+    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -889,7 +890,7 @@ static void test_precopy_unix(void)
     wait_for_migration_pass(from);
 
     /* 300 ms should converge */
-    migrate_set_parameter(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", 300);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -956,11 +957,11 @@ static void test_xbzrle(const char *uri)
      * machine, so also set the downtime.
      */
     /* 1 ms should make it not converge*/
-    migrate_set_parameter(from, "downtime-limit", 1);
+    migrate_set_parameter_int(from, "downtime-limit", 1);
     /* 1GB/s */
-    migrate_set_parameter(from, "max-bandwidth", 1000000000);
+    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
 
-    migrate_set_parameter(from, "xbzrle-cache-size", 33554432);
+    migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432);
 
     migrate_set_capability(from, "xbzrle", "true");
     migrate_set_capability(to, "xbzrle", "true");
@@ -972,7 +973,7 @@ static void test_xbzrle(const char *uri)
     wait_for_migration_pass(from);
 
     /* 300ms should converge */
-    migrate_set_parameter(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", 300);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -1008,9 +1009,9 @@ static void test_precopy_tcp(void)
      * machine, so also set the downtime.
      */
     /* 1 ms should make it not converge*/
-    migrate_set_parameter(from, "downtime-limit", 1);
+    migrate_set_parameter_int(from, "downtime-limit", 1);
     /* 1GB/s */
-    migrate_set_parameter(from, "max-bandwidth", 1000000000);
+    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -1022,7 +1023,7 @@ static void test_precopy_tcp(void)
     wait_for_migration_pass(from);
 
     /* 300ms should converge */
-    migrate_set_parameter(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", 300);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -1054,9 +1055,9 @@ static void test_migrate_fd_proto(void)
      * machine, so also set the downtime.
      */
     /* 1 ms should make it not converge */
-    migrate_set_parameter(from, "downtime-limit", 1);
+    migrate_set_parameter_int(from, "downtime-limit", 1);
     /* 1GB/s */
-    migrate_set_parameter(from, "max-bandwidth", 1000000000);
+    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -1090,7 +1091,7 @@ static void test_migrate_fd_proto(void)
     wait_for_migration_pass(from);
 
     /* 300ms should converge */
-    migrate_set_parameter(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", 300);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 03/19] migration-test: Add migration multifd test
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
  2019-07-12 14:31 ` [PULL 01/19] migration: fix multifd_recv event typo Juan Quintela
  2019-07-12 14:31 ` [PULL 02/19] migration-test: rename parameter to parameter_int Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 04/19] migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS Juan Quintela
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

We set multifd-channels.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 tests/migration-test.c | 48 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index a4feb9545d..0dcaad86b5 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -1121,6 +1121,53 @@ static void test_migrate_fd_proto(void)
     test_migrate_end(from, to, true);
 }
 
+static void test_multifd_tcp(void)
+{
+    char *uri;
+    QTestState *from, *to;
+
+    if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", false, false)) {
+        return;
+    }
+
+    /*
+     * We want to pick a speed slow enough that the test completes
+     * quickly, but that it doesn't complete precopy even on a slow
+     * machine, so also set the downtime.
+     */
+    /* 1 ms should make it not converge*/
+    migrate_set_parameter_int(from, "downtime-limit", 1);
+    /* 1GB/s */
+    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
+
+    migrate_set_parameter_int(from, "multifd-channels", 2);
+    migrate_set_parameter_int(to, "multifd-channels", 2);
+
+    migrate_set_capability(from, "multifd", "true");
+    migrate_set_capability(to, "multifd", "true");
+    /* Wait for the first serial output from the source */
+    wait_for_serial("src_serial");
+
+    uri = migrate_get_socket_address(to, "socket-address");
+
+    migrate(from, uri, "{}");
+
+    wait_for_migration_pass(from);
+
+    /* 300ms it should converge */
+    migrate_set_parameter_int(from, "downtime-limit", 600);
+
+    if (!got_stop) {
+        qtest_qmp_eventwait(from, "STOP");
+    }
+    qtest_qmp_eventwait(to, "RESUME");
+
+    wait_for_serial("dest_serial");
+    wait_for_migration_complete(from);
+    test_migrate_end(from, to, true);
+    free(uri);
+}
+
 int main(int argc, char **argv)
 {
     char template[] = "/tmp/migration-test-XXXXXX";
@@ -1176,6 +1223,7 @@ int main(int argc, char **argv)
     /* qtest_add_func("/migration/ignore_shared", test_ignore_shared); */
     qtest_add_func("/migration/xbzrle/unix", test_xbzrle_unix);
     qtest_add_func("/migration/fd_proto", test_migrate_fd_proto);
+    qtest_add_func("/migration/multifd/tcp", test_multifd_tcp);
 
     ret = g_test_run();
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 04/19] migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (2 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 03/19] migration-test: Add migration multifd test Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 05/19] migration/xbzrle: update cache and current_data in one place Juan Quintela
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

From: Wei Yang <richardw.yang@linux.intel.com>

On receiving RAM_SAVE_FLAG_EOS, multifd_recv_sync_main() is called to
synchronize receive threads. Current synchronization mechanism is to wait
for each channel's sem_sync semaphore. This semaphore is triggered by a
packet with MULTIFD_FLAG_SYNC flag. While in current implementation, we
don't do multifd_send_sync_main() to send such packet when
blk_mig_bulk_active() is true.

This will leads to the receive threads won't notify
multifd_recv_sync_main() by sem_sync. And multifd_recv_sync_main() will
always wait there.

[Note]: normal migration test works, while didn't test the
blk_mig_bulk_active() case. Since not sure how to produce this
situation.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190612014337.11255-1-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 908517fc2b..74c9306c78 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3466,8 +3466,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
      */
     ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
-    multifd_send_sync_main();
 out:
+    multifd_send_sync_main();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
     ram_counters.transferred += 8;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 05/19] migration/xbzrle: update cache and current_data in one place
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (3 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 04/19] migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 06/19] cutils: remove one unnecessary pointer operation Juan Quintela
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

From: Wei Yang <richardw.yang@linux.intel.com>

When we are not in the last_stage, we need to update the cache if page
is not the same.

Currently this procedure is scattered in two places and mixed with
encoding status check.

This patch extract this general step out to make the code a little bit
easy to read.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190610004159.20966-1-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 74c9306c78..d3d72b6f4f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1585,25 +1585,30 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
     encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
                                        TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
                                        TARGET_PAGE_SIZE);
+
+    /*
+     * Update the cache contents, so that it corresponds to the data
+     * sent, in all cases except where we skip the page.
+     */
+    if (!last_stage && encoded_len != 0) {
+        memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
+        /*
+         * In the case where we couldn't compress, ensure that the caller
+         * sends the data from the cache, since the guest might have
+         * changed the RAM since we copied it.
+         */
+        *current_data = prev_cached_page;
+    }
+
     if (encoded_len == 0) {
         trace_save_xbzrle_page_skipping();
         return 0;
     } else if (encoded_len == -1) {
         trace_save_xbzrle_page_overflow();
         xbzrle_counters.overflow++;
-        /* update data in the cache */
-        if (!last_stage) {
-            memcpy(prev_cached_page, *current_data, TARGET_PAGE_SIZE);
-            *current_data = prev_cached_page;
-        }
         return -1;
     }
 
-    /* we need to update the data in the cache, in order to get the same data */
-    if (!last_stage) {
-        memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
-    }
-
     /* Send XBZRLE based compressed page */
     bytes_xbzrle = save_page_header(rs, rs->f, block,
                                     offset | RAM_SAVE_FLAG_XBZRLE);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 06/19] cutils: remove one unnecessary pointer operation
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (4 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 05/19] migration/xbzrle: update cache and current_data in one place Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 07/19] migration/multifd: sync packet_num after all thread are done Juan Quintela
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

From: Wei Yang <richardw.yang@linux.intel.com>

Since we will not operate on the next address pointed by out, it is not
necessary to do addition on it.

After removing the operation, the function size reduced 16/18 bytes.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190610030852.16039-2-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 util/cutils.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/cutils.c b/util/cutils.c
index dfc605f1ef..fd591cadf0 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -756,11 +756,11 @@ int uleb128_encode_small(uint8_t *out, uint32_t n)
 {
     g_assert(n <= 0x3fff);
     if (n < 0x80) {
-        *out++ = n;
+        *out = n;
         return 1;
     } else {
         *out++ = (n & 0x7f) | 0x80;
-        *out++ = n >> 7;
+        *out = n >> 7;
         return 2;
     }
 }
@@ -768,7 +768,7 @@ int uleb128_encode_small(uint8_t *out, uint32_t n)
 int uleb128_decode_small(const uint8_t *in, uint32_t *n)
 {
     if (!(*in & 0x80)) {
-        *n = *in++;
+        *n = *in;
         return 1;
     } else {
         *n = *in++ & 0x7f;
@@ -776,7 +776,7 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n)
         if (*in & 0x80) {
             return -1;
         }
-        *n |= *in++ << 7;
+        *n |= *in << 7;
         return 2;
     }
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 07/19] migration/multifd: sync packet_num after all thread are done
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (5 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 06/19] cutils: remove one unnecessary pointer operation Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 08/19] migration/ram.c: reset complete_round when we gets a queued page Juan Quintela
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang,
	Peter Xu

From: Wei Yang <richardw.yang@linux.intel.com>

Notification from recv thread is not ordered, which means we may be
notified by one MultiFDRecvParams but adjust packet_num for another.

Move the adjustment after we are sure each recv thread are sync-ed.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190604023540.26532-1-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d3d72b6f4f..96c84f770a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1291,15 +1291,15 @@ static void multifd_recv_sync_main(void)
 
         trace_multifd_recv_sync_main_wait(p->id);
         qemu_sem_wait(&multifd_recv_state->sem_sync);
+    }
+    for (i = 0; i < migrate_multifd_channels(); i++) {
+        MultiFDRecvParams *p = &multifd_recv_state->params[i];
+
         qemu_mutex_lock(&p->mutex);
         if (multifd_recv_state->packet_num < p->packet_num) {
             multifd_recv_state->packet_num = p->packet_num;
         }
         qemu_mutex_unlock(&p->mutex);
-    }
-    for (i = 0; i < migrate_multifd_channels(); i++) {
-        MultiFDRecvParams *p = &multifd_recv_state->params[i];
-
         trace_multifd_recv_sync_main_signal(p->id);
         qemu_sem_post(&p->sem_sync);
     }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 08/19] migration/ram.c: reset complete_round when we gets a queued page
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (6 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 07/19] migration/multifd: sync packet_num after all thread are done Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 09/19] migration: No need to take rcu during sync_dirty_bitmap Juan Quintela
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Wei Yang

From: Wei Yang <richardw.yang@linux.intel.com>

In case we gets a queued page, the order of block is interrupted. We may
not rely on the complete_round flag to say we have already searched the
whole blocks on the list.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190605010828.6969-1-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 96c84f770a..89eec7ee9d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2286,6 +2286,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
          */
         pss->block = block;
         pss->page = offset >> TARGET_PAGE_BITS;
+
+        /*
+         * This unqueued page would break the "one round" check, even is
+         * really rare.
+         */
+        pss->complete_round = false;
     }
 
     return !!block;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 09/19] migration: No need to take rcu during sync_dirty_bitmap
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (7 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 08/19] migration/ram.c: reset complete_round when we gets a queued page Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 10/19] memory: Don't set migration bitmap when without migration Juan Quintela
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

cpu_physical_memory_sync_dirty_bitmap() has one RAMBlock* as
parameter, which means that it must be with RCU read lock held
already.  Taking it again inside seems redundant.  Removing it.
Instead comment on the functions about the RCU read lock.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-2-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/ram_addr.h | 5 +----
 migration/ram.c         | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f96777bb99..44dcc98de6 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -409,6 +409,7 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
 }
 
 
+/* Called with RCU critical section */
 static inline
 uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                                                ram_addr_t start,
@@ -432,8 +433,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                                         DIRTY_MEMORY_BLOCK_SIZE);
         unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
 
-        rcu_read_lock();
-
         src = atomic_rcu_read(
                 &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION])->blocks;
 
@@ -453,8 +452,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                 idx++;
             }
         }
-
-        rcu_read_unlock();
     } else {
         ram_addr_t offset = rb->offset;
 
diff --git a/migration/ram.c b/migration/ram.c
index 89eec7ee9d..48969db84b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1674,6 +1674,7 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
     return ret;
 }
 
+/* Called with RCU critical section */
 static void migration_bitmap_sync_range(RAMState *rs, RAMBlock *rb,
                                         ram_addr_t length)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 10/19] memory: Don't set migration bitmap when without migration
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (8 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 09/19] migration: No need to take rcu during sync_dirty_bitmap Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:31 ` [PULL 11/19] bitmap: Add bitmap_copy_with_{src|dst}_offset() Juan Quintela
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

Similar to 9460dee4b2 ("memory: do not touch code dirty bitmap unless
TCG is enabled", 2015-06-05) but for the migration bitmap - we can
skip the MIGRATION bitmap update if migration not enabled.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-4-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h | 12 +++++++++++-
 memory.c                |  2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2c5cdffa31..70d6f7e451 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -46,6 +46,8 @@
         OBJECT_GET_CLASS(IOMMUMemoryRegionClass, (obj), \
                          TYPE_IOMMU_MEMORY_REGION)
 
+extern bool global_dirty_log;
+
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 44dcc98de6..0a532c3963 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -349,8 +349,13 @@ static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
             if (bitmap[k]) {
                 unsigned long temp = leul_to_cpu(bitmap[k]);
 
-                atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset], temp);
                 atomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
+
+                if (global_dirty_log) {
+                    atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset],
+                              temp);
+                }
+
                 if (tcg_enabled()) {
                     atomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp);
                 }
@@ -367,6 +372,11 @@ static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
         xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS);
     } else {
         uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE;
+
+        if (!global_dirty_log) {
+            clients &= ~(1 << DIRTY_MEMORY_MIGRATION);
+        }
+
         /*
          * bitmap-traveling is faster than memory-traveling (for addr...)
          * especially when most of the memory is not dirty.
diff --git a/memory.c b/memory.c
index 480f3d989b..93486a71d7 100644
--- a/memory.c
+++ b/memory.c
@@ -38,7 +38,7 @@
 static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
-static bool global_dirty_log = false;
+bool global_dirty_log;
 
 static QTAILQ_HEAD(, MemoryListener) memory_listeners
     = QTAILQ_HEAD_INITIALIZER(memory_listeners);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 11/19] bitmap: Add bitmap_copy_with_{src|dst}_offset()
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (9 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 10/19] memory: Don't set migration bitmap when without migration Juan Quintela
@ 2019-07-12 14:31 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 12/19] memory: Pass mr into snapshot_and_clear_dirty Juan Quintela
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

These helpers copy the source bitmap to destination bitmap with a
shift either on the src or dst bitmap.

Meanwhile, we never have bitmap tests but we should.

This patch also introduces the initial test cases for utils/bitmap.c
but it only tests the newly introduced functions.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-5-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

---

Bitmap test used sizeof(unsigned long) instead of BITS_PER_LONG.
---
 include/qemu/bitmap.h  |  9 +++++
 tests/Makefile.include |  2 +
 tests/test-bitmap.c    | 72 +++++++++++++++++++++++++++++++++++
 util/bitmap.c          | 85 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 168 insertions(+)
 create mode 100644 tests/test-bitmap.c

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 5c313346b9..82a1d2f41f 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -41,6 +41,10 @@
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)	Find bit free area
  * bitmap_to_le(dst, src, nbits)      Convert bitmap to little endian
  * bitmap_from_le(dst, src, nbits)    Convert bitmap from little endian
+ * bitmap_copy_with_src_offset(dst, src, offset, nbits)
+ *                                    *dst = *src (with an offset into src)
+ * bitmap_copy_with_dst_offset(dst, src, offset, nbits)
+ *                                    *dst = *src (with an offset into dst)
  */
 
 /*
@@ -271,4 +275,9 @@ void bitmap_to_le(unsigned long *dst, const unsigned long *src,
 void bitmap_from_le(unsigned long *dst, const unsigned long *src,
                     long nbits);
 
+void bitmap_copy_with_src_offset(unsigned long *dst, const unsigned long *src,
+                                 unsigned long offset, unsigned long nbits);
+void bitmap_copy_with_dst_offset(unsigned long *dst, const unsigned long *src,
+                                 unsigned long shift, unsigned long nbits);
+
 #endif /* BITMAP_H */
diff --git a/tests/Makefile.include b/tests/Makefile.include
index a983dd32da..fd7fdb8658 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -65,6 +65,7 @@ check-unit-y += tests/test-opts-visitor$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-coroutine$(EXESUF)
 check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
+check-unit-y += tests/test-bitmap$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-aio$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-aio-multithread$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-throttle$(EXESUF)
@@ -538,6 +539,7 @@ tests/test-image-locking$(EXESUF): tests/test-image-locking.o $(test-block-obj-y
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)
+tests/test-bitmap$(EXESUF): tests/test-bitmap.o $(test-util-obj-y)
 tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
 tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o migration/page_cache.o $(test-util-obj-y)
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o $(test-util-obj-y)
diff --git a/tests/test-bitmap.c b/tests/test-bitmap.c
new file mode 100644
index 0000000000..cb7c5e462d
--- /dev/null
+++ b/tests/test-bitmap.c
@@ -0,0 +1,72 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Bitmap.c unit-tests.
+ *
+ * Copyright (C) 2019, Red Hat, Inc.
+ *
+ * Author: Peter Xu <peterx@redhat.com>
+ */
+
+#include <stdlib.h>
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+
+#define BMAP_SIZE  1024
+
+static void check_bitmap_copy_with_offset(void)
+{
+    unsigned long *bmap1, *bmap2, *bmap3, total;
+
+    bmap1 = bitmap_new(BMAP_SIZE);
+    bmap2 = bitmap_new(BMAP_SIZE);
+    bmap3 = bitmap_new(BMAP_SIZE);
+
+    bmap1[0] = random();
+    bmap1[1] = random();
+    bmap1[2] = random();
+    bmap1[3] = random();
+    total = BITS_PER_LONG * 4;
+
+    /* Shift 115 bits into bmap2 */
+    bitmap_copy_with_dst_offset(bmap2, bmap1, 115, total);
+    /* Shift another 85 bits into bmap3 */
+    bitmap_copy_with_dst_offset(bmap3, bmap2, 85, total + 115);
+    /* Shift back 200 bits back */
+    bitmap_copy_with_src_offset(bmap2, bmap3, 200, total);
+
+    g_assert_cmpmem(bmap1, total / BITS_PER_LONG,
+                    bmap2, total / BITS_PER_LONG);
+
+    bitmap_clear(bmap1, 0, BMAP_SIZE);
+    /* Set bits in bmap1 are 100-245 */
+    bitmap_set(bmap1, 100, 145);
+
+    /* Set bits in bmap2 are 60-205 */
+    bitmap_copy_with_src_offset(bmap2, bmap1, 40, 250);
+    g_assert_cmpint(find_first_bit(bmap2, 60), ==, 60);
+    g_assert_cmpint(find_next_zero_bit(bmap2, 205, 60), ==, 205);
+    g_assert(test_bit(205, bmap2) == 0);
+
+    /* Set bits in bmap3 are 135-280 */
+    bitmap_copy_with_dst_offset(bmap3, bmap1, 35, 250);
+    g_assert_cmpint(find_first_bit(bmap3, 135), ==, 135);
+    g_assert_cmpint(find_next_zero_bit(bmap3, 280, 135), ==, 280);
+    g_assert(test_bit(280, bmap3) == 0);
+
+    g_free(bmap1);
+    g_free(bmap2);
+    g_free(bmap3);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/bitmap/bitmap_copy_with_offset",
+                    check_bitmap_copy_with_offset);
+
+    g_test_run();
+
+    return 0;
+}
diff --git a/util/bitmap.c b/util/bitmap.c
index cb618c65a5..1753ff7f5b 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -402,3 +402,88 @@ void bitmap_to_le(unsigned long *dst, const unsigned long *src,
 {
     bitmap_to_from_le(dst, src, nbits);
 }
+
+/*
+ * Copy "src" bitmap with a positive offset and put it into the "dst"
+ * bitmap.  The caller needs to make sure the bitmap size of "src"
+ * is bigger than (shift + nbits).
+ */
+void bitmap_copy_with_src_offset(unsigned long *dst, const unsigned long *src,
+                                 unsigned long shift, unsigned long nbits)
+{
+    unsigned long left_mask, right_mask, last_mask;
+
+    /* Proper shift src pointer to the first word to copy from */
+    src += BIT_WORD(shift);
+    shift %= BITS_PER_LONG;
+
+    if (!shift) {
+        /* Fast path */
+        bitmap_copy(dst, src, nbits);
+        return;
+    }
+
+    right_mask = (1ul << shift) - 1;
+    left_mask = ~right_mask;
+
+    while (nbits >= BITS_PER_LONG) {
+        *dst = (*src & left_mask) >> shift;
+        *dst |= (src[1] & right_mask) << (BITS_PER_LONG - shift);
+        dst++;
+        src++;
+        nbits -= BITS_PER_LONG;
+    }
+
+    if (nbits > BITS_PER_LONG - shift) {
+        *dst = (*src & left_mask) >> shift;
+        nbits -= BITS_PER_LONG - shift;
+        last_mask = (1ul << nbits) - 1;
+        *dst |= (src[1] & last_mask) << (BITS_PER_LONG - shift);
+    } else if (nbits) {
+        last_mask = (1ul << nbits) - 1;
+        *dst = (*src >> shift) & last_mask;
+    }
+}
+
+/*
+ * Copy "src" bitmap into the "dst" bitmap with an offset in the
+ * "dst".  The caller needs to make sure the bitmap size of "dst" is
+ * bigger than (shift + nbits).
+ */
+void bitmap_copy_with_dst_offset(unsigned long *dst, const unsigned long *src,
+                                 unsigned long shift, unsigned long nbits)
+{
+    unsigned long left_mask, right_mask, last_mask;
+
+    /* Proper shift dst pointer to the first word to copy from */
+    dst += BIT_WORD(shift);
+    shift %= BITS_PER_LONG;
+
+    if (!shift) {
+        /* Fast path */
+        bitmap_copy(dst, src, nbits);
+        return;
+    }
+
+    right_mask = (1ul << (BITS_PER_LONG - shift)) - 1;
+    left_mask = ~right_mask;
+
+    *dst &= (1ul << shift) - 1;
+    while (nbits >= BITS_PER_LONG) {
+        *dst |= (*src & right_mask) << shift;
+        dst[1] = (*src & left_mask) >> (BITS_PER_LONG - shift);
+        dst++;
+        src++;
+        nbits -= BITS_PER_LONG;
+    }
+
+    if (nbits > BITS_PER_LONG - shift) {
+        *dst |= (*src & right_mask) << shift;
+        nbits -= BITS_PER_LONG - shift;
+        last_mask = ((1ul << nbits) - 1) << (BITS_PER_LONG - shift);
+        dst[1] = (*src & last_mask) >> (BITS_PER_LONG - shift);
+    } else if (nbits) {
+        last_mask = (1ul << nbits) - 1;
+        *dst |= (*src & last_mask) << shift;
+    }
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 12/19] memory: Pass mr into snapshot_and_clear_dirty
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (10 preceding siblings ...)
  2019-07-12 14:31 ` [PULL 11/19] bitmap: Add bitmap_copy_with_{src|dst}_offset() Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 13/19] memory: Introduce memory listener hook log_clear() Juan Quintela
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

Also we change the 2nd parameter of it to be the relative offset
within the memory region. This is to be used in follow up patches.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-6-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 exec.c                  | 3 ++-
 include/exec/ram_addr.h | 2 +-
 memory.c                | 3 +--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 50ea9c5aaa..3a00698cc0 100644
--- a/exec.c
+++ b/exec.c
@@ -1390,9 +1390,10 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
 }
 
 DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
-     (ram_addr_t start, ram_addr_t length, unsigned client)
+    (MemoryRegion *mr, hwaddr offset, hwaddr length, unsigned client)
 {
     DirtyMemoryBlocks *blocks;
+    ram_addr_t start = memory_region_get_ram_addr(mr) + offset;
     unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
     ram_addr_t first = QEMU_ALIGN_DOWN(start, align);
     ram_addr_t last  = QEMU_ALIGN_UP(start + length, align);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 0a532c3963..1843b6f2d3 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -404,7 +404,7 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
                                               unsigned client);
 
 DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
-    (ram_addr_t start, ram_addr_t length, unsigned client);
+    (MemoryRegion *mr, hwaddr offset, hwaddr length, unsigned client);
 
 bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
                                             ram_addr_t start,
diff --git a/memory.c b/memory.c
index 93486a71d7..71fcaf2d00 100644
--- a/memory.c
+++ b/memory.c
@@ -2071,8 +2071,7 @@ DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
 {
     assert(mr->ram_block);
     memory_region_sync_dirty_bitmap(mr);
-    return cpu_physical_memory_snapshot_and_clear_dirty(
-                memory_region_get_ram_addr(mr) + addr, size, client);
+    return cpu_physical_memory_snapshot_and_clear_dirty(mr, addr, size, client);
 }
 
 bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot *snap,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 13/19] memory: Introduce memory listener hook log_clear()
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (11 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 12/19] memory: Pass mr into snapshot_and_clear_dirty Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 14/19] kvm: Update comments for sync_dirty_bitmap Juan Quintela
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

Introduce a new memory region listener hook log_clear() to allow the
listeners to hook onto the points where the dirty bitmap is cleared by
the bitmap users.

Previously log_sync() contains two operations:

  - dirty bitmap collection, and,
  - dirty bitmap clear on remote site.

Let's take KVM as example - log_sync() for KVM will first copy the
kernel dirty bitmap to userspace, and at the same time we'll clear the
dirty bitmap there along with re-protecting all the guest pages again.

We add this new log_clear() interface only to split the old log_sync()
into two separated procedures:

  - use log_sync() to collect the collection only, and,
  - use log_clear() to clear the remote dirty bitmap.

With the new interface, the memory listener users will still be able
to decide how to implement the log synchronization procedure, e.g.,
they can still only provide log_sync() method only and put all the two
procedures within log_sync() (that's how the old KVM works before
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced).  However with this
new interface the memory listener users will start to have a chance to
postpone the log clear operation explicitly if the module supports.
That can really benefit users like KVM at least for host kernels that
support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2.

There are three places that can clear dirty bits in any one of the
dirty bitmap in the ram_list.dirty_memory[3] array:

        cpu_physical_memory_snapshot_and_clear_dirty
        cpu_physical_memory_test_and_clear_dirty
        cpu_physical_memory_sync_dirty_bitmap

Currently we hook directly into each of the functions to notify about
the log_clear().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-7-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 exec.c                  | 12 ++++++++++
 include/exec/memory.h   | 17 ++++++++++++++
 include/exec/ram_addr.h |  3 +++
 memory.c                | 51 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+)

diff --git a/exec.c b/exec.c
index 3a00698cc0..3e78de3b8f 100644
--- a/exec.c
+++ b/exec.c
@@ -1358,6 +1358,8 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
     DirtyMemoryBlocks *blocks;
     unsigned long end, page;
     bool dirty = false;
+    RAMBlock *ramblock;
+    uint64_t mr_offset, mr_size;
 
     if (length == 0) {
         return false;
@@ -1369,6 +1371,10 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
     rcu_read_lock();
 
     blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+    ramblock = qemu_get_ram_block(start);
+    /* Range sanity check on the ramblock */
+    assert(start >= ramblock->offset &&
+           start + length <= ramblock->offset + ramblock->used_length);
 
     while (page < end) {
         unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
@@ -1380,6 +1386,10 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
         page += num;
     }
 
+    mr_offset = (ram_addr_t)(page << TARGET_PAGE_BITS) - ramblock->offset;
+    mr_size = (end - page) << TARGET_PAGE_BITS;
+    memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
+
     rcu_read_unlock();
 
     if (dirty && tcg_enabled()) {
@@ -1435,6 +1445,8 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
         tlb_reset_dirty_range_all(start, length);
     }
 
+    memory_region_clear_dirty_bitmap(mr, offset, length);
+
     return snap;
 }
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 70d6f7e451..bb0961ddb9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -416,6 +416,7 @@ struct MemoryListener {
     void (*log_stop)(MemoryListener *listener, MemoryRegionSection *section,
                      int old, int new);
     void (*log_sync)(MemoryListener *listener, MemoryRegionSection *section);
+    void (*log_clear)(MemoryListener *listener, MemoryRegionSection *section);
     void (*log_global_start)(MemoryListener *listener);
     void (*log_global_stop)(MemoryListener *listener);
     void (*eventfd_add)(MemoryListener *listener, MemoryRegionSection *section,
@@ -1269,6 +1270,22 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
 void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
                              hwaddr size);
 
+/**
+ * memory_region_clear_dirty_bitmap - clear dirty bitmap for memory range
+ *
+ * This function is called when the caller wants to clear the remote
+ * dirty bitmap of a memory range within the memory region.  This can
+ * be used by e.g. KVM to manually clear dirty log when
+ * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is declared support by the host
+ * kernel.
+ *
+ * @mr:     the memory region to clear the dirty log upon
+ * @start:  start address offset within the memory region
+ * @len:    length of the memory region to clear dirty bitmap
+ */
+void memory_region_clear_dirty_bitmap(MemoryRegion *mr, hwaddr start,
+                                      hwaddr len);
+
 /**
  * memory_region_snapshot_and_clear_dirty: Get a snapshot of the dirty
  *                                         bitmap and clear it.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 1843b6f2d3..222b4338fb 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -462,6 +462,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                 idx++;
             }
         }
+
+        /* TODO: split the huge bitmap into smaller chunks */
+        memory_region_clear_dirty_bitmap(rb->mr, start, length);
     } else {
         ram_addr_t offset = rb->offset;
 
diff --git a/memory.c b/memory.c
index 71fcaf2d00..beac26e173 100644
--- a/memory.c
+++ b/memory.c
@@ -2064,6 +2064,57 @@ static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
     }
 }
 
+void memory_region_clear_dirty_bitmap(MemoryRegion *mr, hwaddr start,
+                                      hwaddr len)
+{
+    MemoryRegionSection mrs;
+    MemoryListener *listener;
+    AddressSpace *as;
+    FlatView *view;
+    FlatRange *fr;
+    hwaddr sec_start, sec_end, sec_size;
+
+    QTAILQ_FOREACH(listener, &memory_listeners, link) {
+        if (!listener->log_clear) {
+            continue;
+        }
+        as = listener->address_space;
+        view = address_space_get_flatview(as);
+        FOR_EACH_FLAT_RANGE(fr, view) {
+            if (!fr->dirty_log_mask || fr->mr != mr) {
+                /*
+                 * Clear dirty bitmap operation only applies to those
+                 * regions whose dirty logging is at least enabled
+                 */
+                continue;
+            }
+
+            mrs = section_from_flat_range(fr, view);
+
+            sec_start = MAX(mrs.offset_within_region, start);
+            sec_end = mrs.offset_within_region + int128_get64(mrs.size);
+            sec_end = MIN(sec_end, start + len);
+
+            if (sec_start >= sec_end) {
+                /*
+                 * If this memory region section has no intersection
+                 * with the requested range, skip.
+                 */
+                continue;
+            }
+
+            /* Valid case; shrink the section if needed */
+            mrs.offset_within_address_space +=
+                sec_start - mrs.offset_within_region;
+            mrs.offset_within_region = sec_start;
+            sec_size = sec_end - sec_start;
+            mrs.size = int128_make64(sec_size);
+            listener->log_clear(listener, &mrs);
+        }
+        flatview_unref(view);
+    }
+}
+
 DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
                                                             hwaddr addr,
                                                             hwaddr size,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 14/19] kvm: Update comments for sync_dirty_bitmap
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (12 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 13/19] memory: Introduce memory listener hook log_clear() Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 15/19] kvm: Persistent per kvmslot dirty bitmap Juan Quintela
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

It's obviously obsolete.  Do some update.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-8-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 accel/kvm/kvm-all.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 3d86ae5052..a3df19da56 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -478,13 +478,13 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
 /**
- * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space
- * This function updates qemu's dirty bitmap using
- * memory_region_set_dirty().  This means all bits are set
- * to dirty.
+ * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space
  *
- * @start_add: start of logged region.
- * @end_addr: end of logged region.
+ * This function will first try to fetch dirty bitmap from the kernel,
+ * and then updates qemu's dirty bitmap.
+ *
+ * @kml: the KVM memory listener object
+ * @section: the memory section to sync the dirty bitmap with
  */
 static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
                                           MemoryRegionSection *section)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 15/19] kvm: Persistent per kvmslot dirty bitmap
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (13 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 14/19] kvm: Update comments for sync_dirty_bitmap Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 16/19] kvm: Introduce slots lock for memory listener Juan Quintela
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

When synchronizing dirty bitmap from kernel KVM we do it in a
per-kvmslot fashion and we allocate the userspace bitmap for each of
the ioctl.  This patch instead make the bitmap cache be persistent
then we don't need to g_malloc0() every time.

More importantly, the cached per-kvmslot dirty bitmap will be further
used when we want to add support for the KVM_CLEAR_DIRTY_LOG and this
cached bitmap will be used to guarantee we won't clear any unknown
dirty bits otherwise that can be a severe data loss issue for
migration code.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-9-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 accel/kvm/kvm-all.c      | 10 +++++++---
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a3df19da56..23ace52b9e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -516,17 +516,19 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
          */
         size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
                      /*HOST_LONG_BITS*/ 64) / 8;
-        d.dirty_bitmap = g_malloc0(size);
+        if (!mem->dirty_bmap) {
+            /* Allocate on the first log_sync, once and for all */
+            mem->dirty_bmap = g_malloc0(size);
+        }
 
+        d.dirty_bitmap = mem->dirty_bmap;
         d.slot = mem->slot | (kml->as_id << 16);
         if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
             DPRINTF("ioctl failed %d\n", errno);
-            g_free(d.dirty_bitmap);
             return -1;
         }
 
         kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
-        g_free(d.dirty_bitmap);
     }
 
     return 0;
@@ -801,6 +803,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         }
 
         /* unregister the slot */
+        g_free(mem->dirty_bmap);
+        mem->dirty_bmap = NULL;
         mem->memory_size = 0;
         mem->flags = 0;
         err = kvm_set_user_memory_region(kml, mem, false);
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index f838412491..687a2ee423 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -21,6 +21,8 @@ typedef struct KVMSlot
     int slot;
     int flags;
     int old_flags;
+    /* Dirty bitmap cache for the slot */
+    unsigned long *dirty_bmap;
 } KVMSlot;
 
 typedef struct KVMMemoryListener {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 16/19] kvm: Introduce slots lock for memory listener
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (14 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 15/19] kvm: Persistent per kvmslot dirty bitmap Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 17/19] kvm: Support KVM_CLEAR_DIRTY_LOG Juan Quintela
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

Introduce KVMMemoryListener.slots_lock to protect the slots inside the
kvm memory listener.  Currently it is close to useless because all the
KVM code path now is always protected by the BQL.  But it'll start to
make sense in follow up patches where we might do remote dirty bitmap
clear and also we'll update the per-slot cached dirty bitmap even
without the BQL.  So let's prepare for it.

We can also use per-slot lock for above reason but it seems to be an
overkill.  Let's just use this bigger one (which covers all the slots
of a single address space) but anyway this lock is still much smaller
than the BQL.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-10-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 accel/kvm/kvm-all.c      | 58 +++++++++++++++++++++++++++++++---------
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 23ace52b9e..621c9a43ab 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -138,6 +138,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
     KVM_CAP_LAST_INFO
 };
 
+#define kvm_slots_lock(kml)      qemu_mutex_lock(&(kml)->slots_lock)
+#define kvm_slots_unlock(kml)    qemu_mutex_unlock(&(kml)->slots_lock)
+
 int kvm_get_max_memslots(void)
 {
     KVMState *s = KVM_STATE(current_machine->accelerator);
@@ -165,6 +168,7 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
     return 1;
 }
 
+/* Called with KVMMemoryListener.slots_lock held */
 static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 {
     KVMState *s = kvm_state;
@@ -182,10 +186,17 @@ static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 bool kvm_has_free_slot(MachineState *ms)
 {
     KVMState *s = KVM_STATE(ms->accelerator);
+    bool result;
+    KVMMemoryListener *kml = &s->memory_listener;
 
-    return kvm_get_free_slot(&s->memory_listener);
+    kvm_slots_lock(kml);
+    result = !!kvm_get_free_slot(kml);
+    kvm_slots_unlock(kml);
+
+    return result;
 }
 
+/* Called with KVMMemoryListener.slots_lock held */
 static KVMSlot *kvm_alloc_slot(KVMMemoryListener *kml)
 {
     KVMSlot *slot = kvm_get_free_slot(kml);
@@ -244,18 +255,21 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
                                        hwaddr *phys_addr)
 {
     KVMMemoryListener *kml = &s->memory_listener;
-    int i;
+    int i, ret = 0;
 
+    kvm_slots_lock(kml);
     for (i = 0; i < s->nr_slots; i++) {
         KVMSlot *mem = &kml->slots[i];
 
         if (ram >= mem->ram && ram < mem->ram + mem->memory_size) {
             *phys_addr = mem->start_addr + (ram - mem->ram);
-            return 1;
+            ret = 1;
+            break;
         }
     }
+    kvm_slots_unlock(kml);
 
-    return 0;
+    return ret;
 }
 
 static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
@@ -396,6 +410,7 @@ static int kvm_mem_flags(MemoryRegion *mr)
     return flags;
 }
 
+/* Called with KVMMemoryListener.slots_lock held */
 static int kvm_slot_update_flags(KVMMemoryListener *kml, KVMSlot *mem,
                                  MemoryRegion *mr)
 {
@@ -414,19 +429,26 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
 {
     hwaddr start_addr, size;
     KVMSlot *mem;
+    int ret = 0;
 
     size = kvm_align_section(section, &start_addr);
     if (!size) {
         return 0;
     }
 
+    kvm_slots_lock(kml);
+
     mem = kvm_lookup_matching_slot(kml, start_addr, size);
     if (!mem) {
         /* We don't have a slot if we want to trap every access. */
-        return 0;
+        goto out;
     }
 
-    return kvm_slot_update_flags(kml, mem, section->mr);
+    ret = kvm_slot_update_flags(kml, mem, section->mr);
+
+out:
+    kvm_slots_unlock(kml);
+    return ret;
 }
 
 static void kvm_log_start(MemoryListener *listener,
@@ -483,6 +505,8 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
  * This function will first try to fetch dirty bitmap from the kernel,
  * and then updates qemu's dirty bitmap.
  *
+ * NOTE: caller must be with kml->slots_lock held.
+ *
  * @kml: the KVM memory listener object
  * @section: the memory section to sync the dirty bitmap with
  */
@@ -493,13 +517,14 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
     struct kvm_dirty_log d = {};
     KVMSlot *mem;
     hwaddr start_addr, size;
+    int ret = 0;
 
     size = kvm_align_section(section, &start_addr);
     if (size) {
         mem = kvm_lookup_matching_slot(kml, start_addr, size);
         if (!mem) {
             /* We don't have a slot if we want to trap every access. */
-            return 0;
+            goto out;
         }
 
         /* XXX bad kernel interface alert
@@ -525,13 +550,14 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
         d.slot = mem->slot | (kml->as_id << 16);
         if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
             DPRINTF("ioctl failed %d\n", errno);
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
     }
-
-    return 0;
+out:
+    return ret;
 }
 
 static void kvm_coalesce_mmio_region(MemoryListener *listener,
@@ -793,10 +819,12 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     ram = memory_region_get_ram_ptr(mr) + section->offset_within_region +
           (start_addr - section->offset_within_address_space);
 
+    kvm_slots_lock(kml);
+
     if (!add) {
         mem = kvm_lookup_matching_slot(kml, start_addr, size);
         if (!mem) {
-            return;
+            goto out;
         }
         if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
             kvm_physical_sync_dirty_bitmap(kml, section);
@@ -813,7 +841,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                     __func__, strerror(-err));
             abort();
         }
-        return;
+        goto out;
     }
 
     /* register the new slot */
@@ -829,6 +857,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                 strerror(-err));
         abort();
     }
+
+out:
+    kvm_slots_unlock(kml);
 }
 
 static void kvm_region_add(MemoryListener *listener,
@@ -855,7 +886,9 @@ static void kvm_log_sync(MemoryListener *listener,
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
     int r;
 
+    kvm_slots_lock(kml);
     r = kvm_physical_sync_dirty_bitmap(kml, section);
+    kvm_slots_unlock(kml);
     if (r < 0) {
         abort();
     }
@@ -939,6 +972,7 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
 {
     int i;
 
+    qemu_mutex_init(&kml->slots_lock);
     kml->slots = g_malloc0(s->nr_slots * sizeof(KVMSlot));
     kml->as_id = as_id;
 
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 687a2ee423..31df465fdc 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -27,6 +27,8 @@ typedef struct KVMSlot
 
 typedef struct KVMMemoryListener {
     MemoryListener listener;
+    /* Protects the slots and all inside them */
+    QemuMutex slots_lock;
     KVMSlot *slots;
     int as_id;
 } KVMMemoryListener;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 17/19] kvm: Support KVM_CLEAR_DIRTY_LOG
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (15 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 16/19] kvm: Introduce slots lock for memory listener Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 18/19] migration: Split log_clear() into smaller chunks Juan Quintela
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

Firstly detect the interface using KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
and mark it.  When failed to enable the new feature we'll fall back to
the old sync.

Provide the log_clear() hook for the memory listeners for both address
spaces of KVM (normal system memory, and SMM) and deliever the clear
message to kernel.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-11-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 accel/kvm/kvm-all.c    | 182 +++++++++++++++++++++++++++++++++++++++++
 accel/kvm/trace-events |   1 +
 2 files changed, 183 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 621c9a43ab..35ea3cb624 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -91,6 +91,7 @@ struct KVMState
     int many_ioeventfds;
     int intx_set_mask;
     bool sync_mmu;
+    bool manual_dirty_log_protect;
     /* The man page (and posix) say ioctl numbers are signed int, but
      * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
      * unsigned, and treating them as signed here can break things */
@@ -560,6 +561,159 @@ out:
     return ret;
 }
 
+/* Alignment requirement for KVM_CLEAR_DIRTY_LOG - 64 pages */
+#define KVM_CLEAR_LOG_SHIFT  6
+#define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
+#define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
+
+/**
+ * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
+ *
+ * NOTE: this will be a no-op if we haven't enabled manual dirty log
+ * protection in the host kernel because in that case this operation
+ * will be done within log_sync().
+ *
+ * @kml:     the kvm memory listener
+ * @section: the memory range to clear dirty bitmap
+ */
+static int kvm_physical_log_clear(KVMMemoryListener *kml,
+                                  MemoryRegionSection *section)
+{
+    KVMState *s = kvm_state;
+    struct kvm_clear_dirty_log d;
+    uint64_t start, end, bmap_start, start_delta, bmap_npages, size;
+    unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
+    KVMSlot *mem = NULL;
+    int ret, i;
+
+    if (!s->manual_dirty_log_protect) {
+        /* No need to do explicit clear */
+        return 0;
+    }
+
+    start = section->offset_within_address_space;
+    size = int128_get64(section->size);
+
+    if (!size) {
+        /* Nothing more we can do... */
+        return 0;
+    }
+
+    kvm_slots_lock(kml);
+
+    /* Find any possible slot that covers the section */
+    for (i = 0; i < s->nr_slots; i++) {
+        mem = &kml->slots[i];
+        if (mem->start_addr <= start &&
+            start + size <= mem->start_addr + mem->memory_size) {
+            break;
+        }
+    }
+
+    /*
+     * We should always find one memslot until this point, otherwise
+     * there could be something wrong from the upper layer
+     */
+    assert(mem && i != s->nr_slots);
+
+    /*
+     * We need to extend either the start or the size or both to
+     * satisfy the KVM interface requirement.  Firstly, do the start
+     * page alignment on 64 host pages
+     */
+    bmap_start = (start - mem->start_addr) & KVM_CLEAR_LOG_MASK;
+    start_delta = start - mem->start_addr - bmap_start;
+    bmap_start /= psize;
+
+    /*
+     * The kernel interface has restriction on the size too, that either:
+     *
+     * (1) the size is 64 host pages aligned (just like the start), or
+     * (2) the size fills up until the end of the KVM memslot.
+     */
+    bmap_npages = DIV_ROUND_UP(size + start_delta, KVM_CLEAR_LOG_ALIGN)
+        << KVM_CLEAR_LOG_SHIFT;
+    end = mem->memory_size / psize;
+    if (bmap_npages > end - bmap_start) {
+        bmap_npages = end - bmap_start;
+    }
+    start_delta /= psize;
+
+    /*
+     * Prepare the bitmap to clear dirty bits.  Here we must guarantee
+     * that we won't clear any unknown dirty bits otherwise we might
+     * accidentally clear some set bits which are not yet synced from
+     * the kernel into QEMU's bitmap, then we'll lose track of the
+     * guest modifications upon those pages (which can directly lead
+     * to guest data loss or panic after migration).
+     *
+     * Layout of the KVMSlot.dirty_bmap:
+     *
+     *                   |<-------- bmap_npages -----------..>|
+     *                                                     [1]
+     *                     start_delta         size
+     *  |----------------|-------------|------------------|------------|
+     *  ^                ^             ^                               ^
+     *  |                |             |                               |
+     * start          bmap_start     (start)                         end
+     * of memslot                                             of memslot
+     *
+     * [1] bmap_npages can be aligned to either 64 pages or the end of slot
+     */
+
+    assert(bmap_start % BITS_PER_LONG == 0);
+    /* We should never do log_clear before log_sync */
+    assert(mem->dirty_bmap);
+    if (start_delta) {
+        /* Slow path - we need to manipulate a temp bitmap */
+        bmap_clear = bitmap_new(bmap_npages);
+        bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap,
+                                    bmap_start, start_delta + size / psize);
+        /*
+         * We need to fill the holes at start because that was not
+         * specified by the caller and we extended the bitmap only for
+         * 64 pages alignment
+         */
+        bitmap_clear(bmap_clear, 0, start_delta);
+        d.dirty_bitmap = bmap_clear;
+    } else {
+        /* Fast path - start address aligns well with BITS_PER_LONG */
+        d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start);
+    }
+
+    d.first_page = bmap_start;
+    /* It should never overflow.  If it happens, say something */
+    assert(bmap_npages <= UINT32_MAX);
+    d.num_pages = bmap_npages;
+    d.slot = mem->slot | (kml->as_id << 16);
+
+    if (kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, &d) == -1) {
+        ret = -errno;
+        error_report("%s: KVM_CLEAR_DIRTY_LOG failed, slot=%d, "
+                     "start=0x%"PRIx64", size=0x%"PRIx32", errno=%d",
+                     __func__, d.slot, (uint64_t)d.first_page,
+                     (uint32_t)d.num_pages, ret);
+    } else {
+        ret = 0;
+        trace_kvm_clear_dirty_log(d.slot, d.first_page, d.num_pages);
+    }
+
+    /*
+     * After we have updated the remote dirty bitmap, we update the
+     * cached bitmap as well for the memslot, then if another user
+     * clears the same region we know we shouldn't clear it again on
+     * the remote otherwise it's data loss as well.
+     */
+    bitmap_clear(mem->dirty_bmap, bmap_start + start_delta,
+                 size / psize);
+    /* This handles the NULL case well */
+    g_free(bmap_clear);
+
+    kvm_slots_unlock(kml);
+
+    return ret;
+}
+
 static void kvm_coalesce_mmio_region(MemoryListener *listener,
                                      MemoryRegionSection *secion,
                                      hwaddr start, hwaddr size)
@@ -894,6 +1048,22 @@ static void kvm_log_sync(MemoryListener *listener,
     }
 }
 
+static void kvm_log_clear(MemoryListener *listener,
+                          MemoryRegionSection *section)
+{
+    KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    int r;
+
+    r = kvm_physical_log_clear(kml, section);
+    if (r < 0) {
+        error_report_once("%s: kvm log clear failed: mr=%s "
+                          "offset=%"HWADDR_PRIx" size=%"PRIx64, __func__,
+                          section->mr->name, section->offset_within_region,
+                          int128_get64(section->size));
+        abort();
+    }
+}
+
 static void kvm_mem_ioeventfd_add(MemoryListener *listener,
                                   MemoryRegionSection *section,
                                   bool match_data, uint64_t data,
@@ -985,6 +1155,7 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
     kml->listener.log_start = kvm_log_start;
     kml->listener.log_stop = kvm_log_stop;
     kml->listener.log_sync = kvm_log_sync;
+    kml->listener.log_clear = kvm_log_clear;
     kml->listener.priority = 10;
 
     memory_listener_register(&kml->listener, as);
@@ -1709,6 +1880,17 @@ static int kvm_init(MachineState *ms)
     s->coalesced_pio = s->coalesced_mmio &&
                        kvm_check_extension(s, KVM_CAP_COALESCED_PIO);
 
+    s->manual_dirty_log_protect =
+        kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
+    if (s->manual_dirty_log_protect) {
+        ret = kvm_vm_enable_cap(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0, 1);
+        if (ret) {
+            warn_report("Trying to enable KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 "
+                        "but failed.  Falling back to the legacy mode. ");
+            s->manual_dirty_log_protect = false;
+        }
+    }
+
 #ifdef KVM_CAP_VCPU_EVENTS
     s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
 #endif
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 33c5b1b3af..4fb6e59d19 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,4 +15,5 @@ kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
 kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
 kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
+kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 18/19] migration: Split log_clear() into smaller chunks
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (16 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 17/19] kvm: Support KVM_CLEAR_DIRTY_LOG Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 14:32 ` [PULL 19/19] migration: allow private destination ram with x-ignore-shared Juan Quintela
  2019-07-12 16:33 ` [Qemu-devel] [PULL 00/19] Migration patches Peter Maydell
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peter Xu

From: Peter Xu <peterx@redhat.com>

Currently we are doing log_clear() right after log_sync() which mostly
keeps the old behavior when log_clear() was still part of log_sync().

This patch tries to further optimize the migration log_clear() code
path to split huge log_clear()s into smaller chunks.

We do this by spliting the whole guest memory region into memory
chunks, whose size is decided by MigrationState.clear_bitmap_shift (an
example will be given below).  With that, we don't do the dirty bitmap
clear operation on the remote node (e.g., KVM) when we fetch the dirty
bitmap, instead we explicitly clear the dirty bitmap for the memory
chunk for each of the first time we send a page in that chunk.

Here comes an example.

Assuming the guest has 64G memory, then before this patch the KVM
ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory.
If after the patch, let's assume when the clear bitmap shift is 18,
then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB.  Then
instead of sending a big 64G ioctl, we'll send 64 small ioctls, each
of the ioctl will cover 1G of the guest memory.  For each of the 64
small ioctls, we'll only send if any of the page in that small chunk
was going to be sent right away.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-12-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/ram_addr.h | 76 +++++++++++++++++++++++++++++++++++++++--
 migration/migration.c   |  4 +++
 migration/migration.h   | 27 +++++++++++++++
 migration/ram.c         | 44 ++++++++++++++++++++++++
 migration/trace-events  |  1 +
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 222b4338fb..b7b2e60ff6 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -51,8 +51,70 @@ struct RAMBlock {
     unsigned long *unsentmap;
     /* bitmap of already received pages in postcopy */
     unsigned long *receivedmap;
+
+    /*
+     * bitmap to track already cleared dirty bitmap.  When the bit is
+     * set, it means the corresponding memory chunk needs a log-clear.
+     * Set this up to non-NULL to enable the capability to postpone
+     * and split clearing of dirty bitmap on the remote node (e.g.,
+     * KVM).  The bitmap will be set only when doing global sync.
+     *
+     * NOTE: this bitmap is different comparing to the other bitmaps
+     * in that one bit can represent multiple guest pages (which is
+     * decided by the `clear_bmap_shift' variable below).  On
+     * destination side, this should always be NULL, and the variable
+     * `clear_bmap_shift' is meaningless.
+     */
+    unsigned long *clear_bmap;
+    uint8_t clear_bmap_shift;
 };
 
+/**
+ * clear_bmap_size: calculate clear bitmap size
+ *
+ * @pages: number of guest pages
+ * @shift: guest page number shift
+ *
+ * Returns: number of bits for the clear bitmap
+ */
+static inline long clear_bmap_size(uint64_t pages, uint8_t shift)
+{
+    return DIV_ROUND_UP(pages, 1UL << shift);
+}
+
+/**
+ * clear_bmap_set: set clear bitmap for the page range
+ *
+ * @rb: the ramblock to operate on
+ * @start: the start page number
+ * @size: number of pages to set in the bitmap
+ *
+ * Returns: None
+ */
+static inline void clear_bmap_set(RAMBlock *rb, uint64_t start,
+                                  uint64_t npages)
+{
+    uint8_t shift = rb->clear_bmap_shift;
+
+    bitmap_set_atomic(rb->clear_bmap, start >> shift,
+                      clear_bmap_size(npages, shift));
+}
+
+/**
+ * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set
+ *
+ * @rb: the ramblock to operate on
+ * @page: the page number to check
+ *
+ * Returns: true if the bit was set, false otherwise
+ */
+static inline bool clear_bmap_test_and_clear(RAMBlock *rb, uint64_t page)
+{
+    uint8_t shift = rb->clear_bmap_shift;
+
+    return bitmap_test_and_clear_atomic(rb->clear_bmap, page >> shift, 1);
+}
+
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
 {
     return (b && b->host && offset < b->used_length) ? true : false;
@@ -463,8 +525,18 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
             }
         }
 
-        /* TODO: split the huge bitmap into smaller chunks */
-        memory_region_clear_dirty_bitmap(rb->mr, start, length);
+        if (rb->clear_bmap) {
+            /*
+             * Postpone the dirty bitmap clear to the point before we
+             * really send the pages, also we will split the clear
+             * dirty procedure into smaller chunks.
+             */
+            clear_bmap_set(rb, start >> TARGET_PAGE_BITS,
+                           length >> TARGET_PAGE_BITS);
+        } else {
+            /* Slow path - still do that in a huge chunk */
+            memory_region_clear_dirty_bitmap(rb->mr, start, length);
+        }
     } else {
         ram_addr_t offset = rb->offset;
 
diff --git a/migration/migration.c b/migration/migration.c
index 2865ae3fa9..8a607fe1e2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3362,6 +3362,8 @@ void migration_global_dump(Monitor *mon)
                    ms->send_section_footer ? "on" : "off");
     monitor_printf(mon, "decompress-error-check: %s\n",
                    ms->decompress_error_check ? "on" : "off");
+    monitor_printf(mon, "clear-bitmap-shift: %u\n",
+                   ms->clear_bitmap_shift);
 }
 
 #define DEFINE_PROP_MIG_CAP(name, x)             \
@@ -3376,6 +3378,8 @@ static Property migration_properties[] = {
                      send_section_footer, true),
     DEFINE_PROP_BOOL("decompress-error-check", MigrationState,
                       decompress_error_check, true),
+    DEFINE_PROP_UINT8("x-clear-bitmap-shift", MigrationState,
+                      clear_bitmap_shift, CLEAR_BITMAP_SHIFT_DEFAULT),
 
     /* Migration parameters */
     DEFINE_PROP_UINT8("x-compress-level", MigrationState,
diff --git a/migration/migration.h b/migration/migration.h
index 5e8f09c6db..1fdd7b21fd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -26,6 +26,23 @@ struct PostcopyBlocktimeContext;
 
 #define  MIGRATION_RESUME_ACK_VALUE  (1)
 
+/*
+ * 1<<6=64 pages -> 256K chunk when page size is 4K.  This gives us
+ * the benefit that all the chunks are 64 pages aligned then the
+ * bitmaps are always aligned to LONG.
+ */
+#define CLEAR_BITMAP_SHIFT_MIN             6
+/*
+ * 1<<18=256K pages -> 1G chunk when page size is 4K.  This is the
+ * default value to use if no one specified.
+ */
+#define CLEAR_BITMAP_SHIFT_DEFAULT        18
+/*
+ * 1<<31=2G pages -> 8T chunk when page size is 4K.  This should be
+ * big enough and make sure we won't overflow easily.
+ */
+#define CLEAR_BITMAP_SHIFT_MAX            31
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -232,6 +249,16 @@ struct MigrationState
      * do not trigger spurious decompression errors.
      */
     bool decompress_error_check;
+
+    /*
+     * This decides the size of guest memory chunk that will be used
+     * to track dirty bitmap clearing.  The size of memory chunk will
+     * be GUEST_PAGE_SIZE << N.  Say, N=0 means we will clear dirty
+     * bitmap for each page to send (1<<0=1); N=10 means we will clear
+     * dirty bitmap only once for 1<<10=1K continuous guest pages
+     * (which is in 4M chunk).
+     */
+    uint8_t clear_bitmap_shift;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/ram.c b/migration/ram.c
index 48969db84b..8a6ad61d3d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1664,6 +1664,33 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
     bool ret;
 
     qemu_mutex_lock(&rs->bitmap_mutex);
+
+    /*
+     * Clear dirty bitmap if needed.  This _must_ be called before we
+     * send any of the page in the chunk because we need to make sure
+     * we can capture further page content changes when we sync dirty
+     * log the next time.  So as long as we are going to send any of
+     * the page in the chunk we clear the remote dirty bitmap for all.
+     * Clearing it earlier won't be a problem, but too late will.
+     */
+    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
+        uint8_t shift = rb->clear_bmap_shift;
+        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
+        hwaddr start = (page << TARGET_PAGE_BITS) & (-size);
+
+        /*
+         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
+         * can make things easier sometimes since then start address
+         * of the small chunk will always be 64 pages aligned so the
+         * bitmap will always be aligned to unsigned long.  We should
+         * even be able to remove this restriction but I'm simply
+         * keeping it.
+         */
+        assert(shift >= 6);
+        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+        memory_region_clear_dirty_bitmap(rb->mr, start, size);
+    }
+
     ret = test_and_clear_bit(page, rb->bmap);
 
     if (ret) {
@@ -2687,6 +2714,8 @@ static void ram_save_cleanup(void *opaque)
     memory_global_dirty_log_stop();
 
     RAMBLOCK_FOREACH_NOT_IGNORED(block) {
+        g_free(block->clear_bmap);
+        block->clear_bmap = NULL;
         g_free(block->bmap);
         block->bmap = NULL;
         g_free(block->unsentmap);
@@ -3197,11 +3226,24 @@ static int ram_state_init(RAMState **rsp)
 
 static void ram_list_init_bitmaps(void)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block;
     unsigned long pages;
+    uint8_t shift;
 
     /* Skip setting bitmap if there is no RAM */
     if (ram_bytes_total()) {
+        shift = ms->clear_bitmap_shift;
+        if (shift > CLEAR_BITMAP_SHIFT_MAX) {
+            error_report("clear_bitmap_shift (%u) too big, using "
+                         "max value (%u)", shift, CLEAR_BITMAP_SHIFT_MAX);
+            shift = CLEAR_BITMAP_SHIFT_MAX;
+        } else if (shift < CLEAR_BITMAP_SHIFT_MIN) {
+            error_report("clear_bitmap_shift (%u) too small, using "
+                         "min value (%u)", shift, CLEAR_BITMAP_SHIFT_MIN);
+            shift = CLEAR_BITMAP_SHIFT_MIN;
+        }
+
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             pages = block->max_length >> TARGET_PAGE_BITS;
             /*
@@ -3214,6 +3256,8 @@ static void ram_list_init_bitmaps(void)
              * Here setting RAMBlock.bmap would be fine too but not necessary.
              */
             block->bmap = bitmap_new(pages);
+            block->clear_bmap_shift = shift;
+            block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
             if (migrate_postcopy_ram()) {
                 block->unsentmap = bitmap_new(pages);
                 bitmap_set(block->unsentmap, 0, pages);
diff --git a/migration/trace-events b/migration/trace-events
index cd50a1e659..d8e54c367a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -79,6 +79,7 @@ get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_
 get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned long page_abs, int sent) "%s/0x%" PRIx64 " page_abs=0x%lx (sent=%d)"
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
+migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
 migration_throttle(void) ""
 multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags 0x%x next packet size %d"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PULL 19/19] migration: allow private destination ram with x-ignore-shared
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (17 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 18/19] migration: Split log_clear() into smaller chunks Juan Quintela
@ 2019-07-12 14:32 ` Juan Quintela
  2019-07-12 16:33 ` [Qemu-devel] [PULL 00/19] Migration patches Peter Maydell
  19 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-12 14:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Richard Henderson, Paolo Bonzini, kvm,
	Dr. David Alan Gilbert, Juan Quintela, Thomas Huth, Peng Tao,
	Yury Kotov, Jiangshan Lai, Xu Wang

From: Peng Tao <tao.peng@linux.alibaba.com>

By removing the share ram check, qemu is able to migrate
to private destination ram when x-ignore-shared capability
is on. Then we can create multiple destination VMs based
on the same source VM.

This changes the x-ignore-shared migration capability to
work similar to Lai's original bypass-shared-memory
work(https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00003.html)
which enables kata containers (https://katacontainers.io)
to implement the VM templating feature.

An example usage in kata containers(https://katacontainers.io):
1. Start the source VM:
   qemu-system-x86 -m 2G \
     -object memory-backend-file,id=mem0,size=2G,share=on,mem-path=/tmpfs/template-memory \
     -numa node,memdev=mem0
2. Stop the template VM, set migration x-ignore-shared capability,
   migrate "exec:cat>/tmpfs/state", quit it
3. Start target VM:
   qemu-system-x86 -m 2G \
     -object memory-backend-file,id=mem0,size=2G,share=off,mem-path=/tmpfs/template-memory \
     -numa node,memdev=mem0 \
     -incoming defer
4. connect to target VM qmp, set migration x-ignore-shared capability,
migrate_incoming "exec:cat /tmpfs/state"
5. create more target VMs repeating 3 and 4

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Yury Kotov <yury-kotov@yandex-team.ru>
Cc: Jiangshan Lai <laijs@hyper.sh>
Cc: Xu Wang <xu@hyper.sh>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1560494113-1141-1-git-send-email-tao.peng@linux.alibaba.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 8a6ad61d3d..8622b4dc49 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3426,7 +3426,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         }
         if (migrate_ignore_shared()) {
             qemu_put_be64(f, block->mr->addr);
-            qemu_put_byte(f, ramblock_is_ignored(block) ? 1 : 0);
         }
     }
 
@@ -4393,12 +4392,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                     }
                     if (migrate_ignore_shared()) {
                         hwaddr addr = qemu_get_be64(f);
-                        bool ignored = qemu_get_byte(f);
-                        if (ignored != ramblock_is_ignored(block)) {
-                            error_report("RAM block %s should %s be migrated",
-                                         id, ignored ? "" : "not");
-                            ret = -EINVAL;
-                        }
                         if (ramblock_is_ignored(block) &&
                             block->mr->addr != addr) {
                             error_report("Mismatched GPAs for block %s "
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/19] Migration patches
  2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
                   ` (18 preceding siblings ...)
  2019-07-12 14:32 ` [PULL 19/19] migration: allow private destination ram with x-ignore-shared Juan Quintela
@ 2019-07-12 16:33 ` Peter Maydell
  2019-07-12 17:54   ` Dr. David Alan Gilbert
  2019-07-15 11:16   ` Peter Maydell
  19 siblings, 2 replies; 30+ messages in thread
From: Peter Maydell @ 2019-07-12 16:33 UTC (permalink / raw)
  To: Juan Quintela
  Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Fri, 12 Jul 2019 at 15:32, Juan Quintela <quintela@redhat.com> wrote:
>
> The following changes since commit a2a9d4adabe340617a24eb73a8b2a116d28a6b38:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190712' into staging (2019-07-12 11:06:48 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/juanquintela/qemu.git tags/migration-pull-request
>
> for you to fetch changes up to a48ad5602f496236b4e1955d9e2e8228a7d0ad56:
>
>   migration: allow private destination ram with x-ignore-shared (2019-07-12 16:25:59 +0200)
>
> ----------------------------------------------------------------
> Migration pull request
>
> Fix the issues with the previous pull request and 32 bits.
>
> Please apply.
>

Still fails on aarch32 host, I'm afraid:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64
QTEST_QEMU_IMG=qemu-img tests/migration-test -m=quick -k --tap <
/dev/null | ./scripts/tap-driver.pl --test-name="migration-test"
PASS 1 migration-test /aarch64/migration/deprecated
PASS 2 migration-test /aarch64/migration/bad_dest
PASS 3 migration-test /aarch64/migration/fd_proto
PASS 4 migration-test /aarch64/migration/postcopy/unix
PASS 5 migration-test /aarch64/migration/postcopy/recovery
PASS 6 migration-test /aarch64/migration/precopy/unix
PASS 7 migration-test /aarch64/migration/precopy/tcp
PASS 8 migration-test /aarch64/migration/xbzrle/unix
malloc(): memory corruption
Broken pipe
qemu-system-aarch64: load of migration failed: Invalid argument
/home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
terminate QEMU process but encountered exit status 1
Aborted
ERROR - too few tests run (expected 9, got 8)
/home/peter.maydell/qemu/tests/Makefile.include:899: recipe for target
'check-qtest-aarch64' failed


thanks
-- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/19] Migration patches
  2019-07-12 16:33 ` [Qemu-devel] [PULL 00/19] Migration patches Peter Maydell
@ 2019-07-12 17:54   ` Dr. David Alan Gilbert
  2019-07-15 11:16   ` Peter Maydell
  1 sibling, 0 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-12 17:54 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Juan Quintela, QEMU Developers, Laurent Vivier, Thomas Huth,
	kvm-devel, Paolo Bonzini, Richard Henderson

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Fri, 12 Jul 2019 at 15:32, Juan Quintela <quintela@redhat.com> wrote:
> >
> > The following changes since commit a2a9d4adabe340617a24eb73a8b2a116d28a6b38:
> >
> >   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190712' into staging (2019-07-12 11:06:48 +0100)
> >
> > are available in the Git repository at:
> >
> >   https://github.com/juanquintela/qemu.git tags/migration-pull-request
> >
> > for you to fetch changes up to a48ad5602f496236b4e1955d9e2e8228a7d0ad56:
> >
> >   migration: allow private destination ram with x-ignore-shared (2019-07-12 16:25:59 +0200)
> >
> > ----------------------------------------------------------------
> > Migration pull request
> >
> > Fix the issues with the previous pull request and 32 bits.
> >
> > Please apply.
> >
> 
> Still fails on aarch32 host, I'm afraid:
> 
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64
> QTEST_QEMU_IMG=qemu-img tests/migration-test -m=quick -k --tap <
> /dev/null | ./scripts/tap-driver.pl --test-name="migration-test"
> PASS 1 migration-test /aarch64/migration/deprecated
> PASS 2 migration-test /aarch64/migration/bad_dest
> PASS 3 migration-test /aarch64/migration/fd_proto
> PASS 4 migration-test /aarch64/migration/postcopy/unix
> PASS 5 migration-test /aarch64/migration/postcopy/recovery
> PASS 6 migration-test /aarch64/migration/precopy/unix
> PASS 7 migration-test /aarch64/migration/precopy/tcp
> PASS 8 migration-test /aarch64/migration/xbzrle/unix
> malloc(): memory corruption
> Broken pipe
> qemu-system-aarch64: load of migration failed: Invalid argument
> /home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
> terminate QEMU process but encountered exit status 1
> Aborted
> ERROR - too few tests run (expected 9, got 8)
> /home/peter.maydell/qemu/tests/Makefile.include:899: recipe for target
> 'check-qtest-aarch64' failed

Hmm, I think the only one to go near xbzrle specifically is
'migration/xbzrle: update cache and current_data in one place'

it did look right to me; but that is a subtle function so hmmmm.

Dave

> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/19] Migration patches
  2019-07-12 16:33 ` [Qemu-devel] [PULL 00/19] Migration patches Peter Maydell
  2019-07-12 17:54   ` Dr. David Alan Gilbert
@ 2019-07-15 11:16   ` Peter Maydell
  2019-07-15 13:44     ` Juan Quintela
  2019-07-15 14:04     ` [Qemu-devel] " Daniel P. Berrangé
  1 sibling, 2 replies; 30+ messages in thread
From: Peter Maydell @ 2019-07-15 11:16 UTC (permalink / raw)
  To: Juan Quintela
  Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Fri, 12 Jul 2019 at 17:33, Peter Maydell <peter.maydell@linaro.org> wrote:
> Still fails on aarch32 host, I'm afraid:
>
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64
> QTEST_QEMU_IMG=qemu-img tests/migration-test -m=quick -k --tap <
> /dev/null | ./scripts/tap-driver.pl --test-name="migration-test"
> PASS 1 migration-test /aarch64/migration/deprecated
> PASS 2 migration-test /aarch64/migration/bad_dest
> PASS 3 migration-test /aarch64/migration/fd_proto
> PASS 4 migration-test /aarch64/migration/postcopy/unix
> PASS 5 migration-test /aarch64/migration/postcopy/recovery
> PASS 6 migration-test /aarch64/migration/precopy/unix
> PASS 7 migration-test /aarch64/migration/precopy/tcp
> PASS 8 migration-test /aarch64/migration/xbzrle/unix
> malloc(): memory corruption
> Broken pipe
> qemu-system-aarch64: load of migration failed: Invalid argument
> /home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
> terminate QEMU process but encountered exit status 1
> Aborted
> ERROR - too few tests run (expected 9, got 8)
> /home/peter.maydell/qemu/tests/Makefile.include:899: recipe for target
> 'check-qtest-aarch64' failed

A run with valgrind:

(armhf)pmaydell@mustang-maydell:~/qemu/build/all-a32$
QTEST_QEMU_BINARY='valgrind aarch64-softmmu/qemu-system-aarch64'
tests/migration-test -v -p '/aarch64/migration/multifd/tcp'
/aarch64/migration/multifd/tcp: ==4034== Memcheck, a memory error detector
==4034== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4034== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4034== Command: aarch64-softmmu/qemu-system-aarch64 -qtest
unix:/tmp/qtest-4033.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-4033.qmp,id=char0 -mon
chardev=char0,mode=control -machine accel=qtest -display none -machine
virt,accel=kvm:tcg,gic-version=max -name vmsource,debug-threads=on
-cpu max -m 150M -serial file:/tmp/migration-test-mSLr4A/src_serial
-kernel /tmp/migration-test-mSLr4A/bootsect
==4034==
==4040== Memcheck, a memory error detector
==4040== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4040== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4040== Command: aarch64-softmmu/qemu-system-aarch64 -qtest
unix:/tmp/qtest-4033.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-4033.qmp,id=char0 -mon
chardev=char0,mode=control -machine accel=qtest -display none -machine
virt,accel=kvm:tcg,gic-version=max -name vmdest,debug-threads=on -cpu
max -m 150M -serial file:/tmp/migration-test-mSLr4A/dest_serial
-kernel /tmp/migration-test-mSLr4A/bootsect -incoming tcp:127.0.0.1:0
==4040==
==4034== Thread 5 multifdsend_0:
==4034== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==4034==    at 0x5299F06: __libc_do_syscall (libc-do-syscall.S:47)
==4034==    by 0x5298FCB: sendmsg (sendmsg.c:28)
==4034==    by 0x60135D: qio_channel_socket_writev (channel-socket.c:544)
==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
==4034==    by 0x26BA73: multifd_send_initial_packet (ram.c:711)
==4034==    by 0x26BA73: multifd_send_thread (ram.c:1085)
==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
==4034==    by 0x5290613: start_thread (pthread_create.c:463)
==4034==    by 0x53487FB: ??? (clone.S:73)
==4034==  Address 0x2320048d is on thread 5's stack
==4034==  in frame #5, created by multifd_send_thread (ram.c:1077)
==4034==
==4034== Thread 6 multifdsend_1:
==4034== Invalid write of size 4
==4034==    at 0x26BB7C: multifd_send_fill_packet (ram.c:806)
==4034==    by 0x26BB7C: multifd_send_thread (ram.c:1101)
==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
==4034==    by 0x5290613: start_thread (pthread_create.c:463)
==4034==    by 0x53487FB: ??? (clone.S:73)
==4034==  Address 0x224ed668 is 0 bytes after a block of size 832 alloc'd
==4034==    at 0x4841BC4: calloc (vg_replace_malloc.c:711)
==4034==    by 0x5018269: g_malloc0 (in
/usr/lib/arm-linux-gnueabihf/libglib-2.0.so.0.5600.4)
==4034==
==4034== Invalid write of size 4
==4034==    at 0x26BB82: multifd_send_fill_packet (ram.c:806)
==4034==    by 0x26BB82: multifd_send_thread (ram.c:1101)
==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
==4034==    by 0x5290613: start_thread (pthread_create.c:463)
==4034==    by 0x53487FB: ??? (clone.S:73)
==4034==  Address 0x224ed66c is 4 bytes after a block of size 832 alloc'd
==4034==    at 0x4841BC4: calloc (vg_replace_malloc.c:711)
==4034==    by 0x5018269: g_malloc0 (in
/usr/lib/arm-linux-gnueabihf/libglib-2.0.so.0.5600.4)
==4034==
==4034== Invalid read of size 4
==4034==    at 0x5FF1DA: qio_channel_writev_full (channel.c:86)
==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
==4034==    by 0x26BBD9: multifd_send_thread (ram.c:1111)
==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
==4034==    by 0x5290613: start_thread (pthread_create.c:463)
==4034==    by 0x53487FB: ??? (clone.S:73)
==4034==  Address 0x30 is not stack'd, malloc'd or (recently) free'd
==4034==
==4034==
==4034== Process terminating with default action of signal 11 (SIGSEGV)
==4034==  Access not within mapped region at address 0x30
==4034==    at 0x5FF1DA: qio_channel_writev_full (channel.c:86)
==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
==4034==    by 0x26BBD9: multifd_send_thread (ram.c:1111)
==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
==4034==    by 0x5290613: start_thread (pthread_create.c:463)
==4034==    by 0x53487FB: ??? (clone.S:73)
==4034==  If you believe this happened as a result of a stack
==4034==  overflow in your program's main thread (unlikely but
==4034==  possible), you can try to increase the size of the
==4034==  main thread stack using the --main-stacksize= flag.
==4034==  The main thread stack size used in this run was 8388608.
==4034==
==4034== HEAP SUMMARY:
==4034==     in use at exit: 5,994,911 bytes in 23,588 blocks
==4034==   total heap usage: 87,487 allocs, 63,899 frees, 17,732,188
bytes allocated
==4034==
==4034== LEAK SUMMARY:
==4034==    definitely lost: 56 bytes in 1 blocks
==4034==    indirectly lost: 64 bytes in 2 blocks
==4034==      possibly lost: 1,620 bytes in 26 blocks
==4034==    still reachable: 5,993,171 bytes in 23,559 blocks
==4034==         suppressed: 0 bytes in 0 blocks
==4034== Rerun with --leak-check=full to see details of leaked memory
==4034==
==4034== For counts of detected and suppressed errors, rerun with: -v
==4034== Use --track-origins=yes to see where uninitialised values come from
==4034== ERROR SUMMARY: 66 errors from 4 contexts (suppressed: 6 from 3)
Broken pipe
qemu-system-aarch64: load of migration failed: Input/output error
==4040==
==4040== HEAP SUMMARY:
==4040==     in use at exit: 4,893,269 bytes in 19,702 blocks
==4040==   total heap usage: 86,196 allocs, 66,494 frees, 17,438,183
bytes allocated
==4040==
==4040== LEAK SUMMARY:
==4040==    definitely lost: 0 bytes in 0 blocks
==4040==    indirectly lost: 0 bytes in 0 blocks
==4040==      possibly lost: 1,160 bytes in 5 blocks
==4040==    still reachable: 4,892,109 bytes in 19,697 blocks
==4040==         suppressed: 0 bytes in 0 blocks
==4040== Rerun with --leak-check=full to see details of leaked memory
==4040==
==4040== For counts of detected and suppressed errors, rerun with: -v
==4040== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 3)
/home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
terminate QEMU process but encountered exit status 1
Aborted

thanks
-- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PULL 00/19] Migration patches
  2019-07-15 11:16   ` Peter Maydell
@ 2019-07-15 13:44     ` Juan Quintela
  2019-07-15 13:48       ` Peter Maydell
  2019-07-15 14:04     ` [Qemu-devel] " Daniel P. Berrangé
  1 sibling, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2019-07-15 13:44 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

Peter Maydell <peter.maydell@linaro.org> wrote:
> On Fri, 12 Jul 2019 at 17:33, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Still fails on aarch32 host, I'm afraid:

Hi

dropping the multifd test patch from now.  For "some" reason, having a
packed struct and 32bits is getting ugly, not sure yet _why_.

Resending the pull request.


>>
>> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
>> QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64
>> QTEST_QEMU_IMG=qemu-img tests/migration-test -m=quick -k --tap <
>> /dev/null | ./scripts/tap-driver.pl --test-name="migration-test"
>> PASS 1 migration-test /aarch64/migration/deprecated
>> PASS 2 migration-test /aarch64/migration/bad_dest
>> PASS 3 migration-test /aarch64/migration/fd_proto
>> PASS 4 migration-test /aarch64/migration/postcopy/unix
>> PASS 5 migration-test /aarch64/migration/postcopy/recovery
>> PASS 6 migration-test /aarch64/migration/precopy/unix
>> PASS 7 migration-test /aarch64/migration/precopy/tcp
>> PASS 8 migration-test /aarch64/migration/xbzrle/unix
>> malloc(): memory corruption
>> Broken pipe
>> qemu-system-aarch64: load of migration failed: Invalid argument
>> /home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
>> terminate QEMU process but encountered exit status 1
>> Aborted
>> ERROR - too few tests run (expected 9, got 8)
>> /home/peter.maydell/qemu/tests/Makefile.include:899: recipe for target
>> 'check-qtest-aarch64' failed
>
> A run with valgrind:
>
> (armhf)pmaydell@mustang-maydell:~/qemu/build/all-a32$
> QTEST_QEMU_BINARY='valgrind aarch64-softmmu/qemu-system-aarch64'
> tests/migration-test -v -p '/aarch64/migration/multifd/tcp'
> /aarch64/migration/multifd/tcp: ==4034== Memcheck, a memory error detector
> ==4034== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==4034== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==4034== Command: aarch64-softmmu/qemu-system-aarch64 -qtest
> unix:/tmp/qtest-4033.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-4033.qmp,id=char0 -mon
> chardev=char0,mode=control -machine accel=qtest -display none -machine
> virt,accel=kvm:tcg,gic-version=max -name vmsource,debug-threads=on
> -cpu max -m 150M -serial file:/tmp/migration-test-mSLr4A/src_serial
> -kernel /tmp/migration-test-mSLr4A/bootsect
> ==4034==
> ==4040== Memcheck, a memory error detector
> ==4040== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==4040== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==4040== Command: aarch64-softmmu/qemu-system-aarch64 -qtest
> unix:/tmp/qtest-4033.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-4033.qmp,id=char0 -mon
> chardev=char0,mode=control -machine accel=qtest -display none -machine
> virt,accel=kvm:tcg,gic-version=max -name vmdest,debug-threads=on -cpu
> max -m 150M -serial file:/tmp/migration-test-mSLr4A/dest_serial
> -kernel /tmp/migration-test-mSLr4A/bootsect -incoming tcp:127.0.0.1:0
> ==4040==
> ==4034== Thread 5 multifdsend_0:
> ==4034== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
> ==4034==    at 0x5299F06: __libc_do_syscall (libc-do-syscall.S:47)
> ==4034==    by 0x5298FCB: sendmsg (sendmsg.c:28)
> ==4034==    by 0x60135D: qio_channel_socket_writev (channel-socket.c:544)
> ==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
> ==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
> ==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
> ==4034==    by 0x26BA73: multifd_send_initial_packet (ram.c:711)
> ==4034==    by 0x26BA73: multifd_send_thread (ram.c:1085)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  Address 0x2320048d is on thread 5's stack
> ==4034==  in frame #5, created by multifd_send_thread (ram.c:1077)
> ==4034==
> ==4034== Thread 6 multifdsend_1:
> ==4034== Invalid write of size 4
> ==4034==    at 0x26BB7C: multifd_send_fill_packet (ram.c:806)
> ==4034==    by 0x26BB7C: multifd_send_thread (ram.c:1101)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  Address 0x224ed668 is 0 bytes after a block of size 832 alloc'd
> ==4034==    at 0x4841BC4: calloc (vg_replace_malloc.c:711)
> ==4034==    by 0x5018269: g_malloc0 (in
> /usr/lib/arm-linux-gnueabihf/libglib-2.0.so.0.5600.4)
> ==4034==
> ==4034== Invalid write of size 4
> ==4034==    at 0x26BB82: multifd_send_fill_packet (ram.c:806)
> ==4034==    by 0x26BB82: multifd_send_thread (ram.c:1101)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  Address 0x224ed66c is 4 bytes after a block of size 832 alloc'd
> ==4034==    at 0x4841BC4: calloc (vg_replace_malloc.c:711)
> ==4034==    by 0x5018269: g_malloc0 (in
> /usr/lib/arm-linux-gnueabihf/libglib-2.0.so.0.5600.4)
> ==4034==
> ==4034== Invalid read of size 4
> ==4034==    at 0x5FF1DA: qio_channel_writev_full (channel.c:86)
> ==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
> ==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
> ==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
> ==4034==    by 0x26BBD9: multifd_send_thread (ram.c:1111)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  Address 0x30 is not stack'd, malloc'd or (recently) free'd
> ==4034==
> ==4034==
> ==4034== Process terminating with default action of signal 11 (SIGSEGV)
> ==4034==  Access not within mapped region at address 0x30
> ==4034==    at 0x5FF1DA: qio_channel_writev_full (channel.c:86)
> ==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
> ==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
> ==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
> ==4034==    by 0x26BBD9: multifd_send_thread (ram.c:1111)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  If you believe this happened as a result of a stack
> ==4034==  overflow in your program's main thread (unlikely but
> ==4034==  possible), you can try to increase the size of the
> ==4034==  main thread stack using the --main-stacksize= flag.
> ==4034==  The main thread stack size used in this run was 8388608.
> ==4034==
> ==4034== HEAP SUMMARY:
> ==4034==     in use at exit: 5,994,911 bytes in 23,588 blocks
> ==4034==   total heap usage: 87,487 allocs, 63,899 frees, 17,732,188
> bytes allocated
> ==4034==
> ==4034== LEAK SUMMARY:
> ==4034==    definitely lost: 56 bytes in 1 blocks
> ==4034==    indirectly lost: 64 bytes in 2 blocks
> ==4034==      possibly lost: 1,620 bytes in 26 blocks
> ==4034==    still reachable: 5,993,171 bytes in 23,559 blocks
> ==4034==         suppressed: 0 bytes in 0 blocks
> ==4034== Rerun with --leak-check=full to see details of leaked memory
> ==4034==
> ==4034== For counts of detected and suppressed errors, rerun with: -v
> ==4034== Use --track-origins=yes to see where uninitialised values come from
> ==4034== ERROR SUMMARY: 66 errors from 4 contexts (suppressed: 6 from 3)
> Broken pipe
> qemu-system-aarch64: load of migration failed: Input/output error
> ==4040==
> ==4040== HEAP SUMMARY:
> ==4040==     in use at exit: 4,893,269 bytes in 19,702 blocks
> ==4040==   total heap usage: 86,196 allocs, 66,494 frees, 17,438,183
> bytes allocated
> ==4040==
> ==4040== LEAK SUMMARY:
> ==4040==    definitely lost: 0 bytes in 0 blocks
> ==4040==    indirectly lost: 0 bytes in 0 blocks
> ==4040==      possibly lost: 1,160 bytes in 5 blocks
> ==4040==    still reachable: 4,892,109 bytes in 19,697 blocks
> ==4040==         suppressed: 0 bytes in 0 blocks
> ==4040== Rerun with --leak-check=full to see details of leaked memory
> ==4040==
> ==4040== For counts of detected and suppressed errors, rerun with: -v
> ==4040== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 3)
> /home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
> terminate QEMU process but encountered exit status 1
> Aborted
>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PULL 00/19] Migration patches
  2019-07-15 13:44     ` Juan Quintela
@ 2019-07-15 13:48       ` Peter Maydell
  2019-07-15 14:10         ` Juan Quintela
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2019-07-15 13:48 UTC (permalink / raw)
  To: Juan Quintela
  Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Mon, 15 Jul 2019 at 14:44, Juan Quintela <quintela@redhat.com> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> wrote:
> > On Fri, 12 Jul 2019 at 17:33, Peter Maydell <peter.maydell@linaro.org> wrote:
> >> Still fails on aarch32 host, I'm afraid:
>
> Hi
>
> dropping the multifd test patch from now.  For "some" reason, having a
> packed struct and 32bits is getting ugly, not sure yet _why_.

IMHO 'packed' structs are usually a bad idea. They have a bunch
of behaviours you may not be expecting (for instance they're
also not naturally aligned, and arrays of them won't be the
size you expect).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/19] Migration patches
  2019-07-15 11:16   ` Peter Maydell
  2019-07-15 13:44     ` Juan Quintela
@ 2019-07-15 14:04     ` Daniel P. Berrangé
  2019-07-15 14:17       ` Peter Maydell
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2019-07-15 14:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Juan Quintela, Laurent Vivier, Thomas Huth, kvm-devel,
	QEMU Developers, Dr. David Alan Gilbert, Paolo Bonzini,
	Richard Henderson

On Mon, Jul 15, 2019 at 12:16:57PM +0100, Peter Maydell wrote:
> On Fri, 12 Jul 2019 at 17:33, Peter Maydell <peter.maydell@linaro.org> wrote:
> > Still fails on aarch32 host, I'm afraid:
> >
> > MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64
> > QTEST_QEMU_IMG=qemu-img tests/migration-test -m=quick -k --tap <
> > /dev/null | ./scripts/tap-driver.pl --test-name="migration-test"
> > PASS 1 migration-test /aarch64/migration/deprecated
> > PASS 2 migration-test /aarch64/migration/bad_dest
> > PASS 3 migration-test /aarch64/migration/fd_proto
> > PASS 4 migration-test /aarch64/migration/postcopy/unix
> > PASS 5 migration-test /aarch64/migration/postcopy/recovery
> > PASS 6 migration-test /aarch64/migration/precopy/unix
> > PASS 7 migration-test /aarch64/migration/precopy/tcp
> > PASS 8 migration-test /aarch64/migration/xbzrle/unix
> > malloc(): memory corruption
> > Broken pipe
> > qemu-system-aarch64: load of migration failed: Invalid argument
> > /home/peter.maydell/qemu/tests/libqtest.c:137: kill_qemu() tried to
> > terminate QEMU process but encountered exit status 1
> > Aborted
> > ERROR - too few tests run (expected 9, got 8)
> > /home/peter.maydell/qemu/tests/Makefile.include:899: recipe for target
> > 'check-qtest-aarch64' failed
> 
> A run with valgrind:
> 
> (armhf)pmaydell@mustang-maydell:~/qemu/build/all-a32$
> QTEST_QEMU_BINARY='valgrind aarch64-softmmu/qemu-system-aarch64'
> tests/migration-test -v -p '/aarch64/migration/multifd/tcp'
> /aarch64/migration/multifd/tcp: ==4034== Memcheck, a memory error detector
> ==4034== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==4034== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==4034== Command: aarch64-softmmu/qemu-system-aarch64 -qtest
> unix:/tmp/qtest-4033.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-4033.qmp,id=char0 -mon
> chardev=char0,mode=control -machine accel=qtest -display none -machine
> virt,accel=kvm:tcg,gic-version=max -name vmsource,debug-threads=on
> -cpu max -m 150M -serial file:/tmp/migration-test-mSLr4A/src_serial
> -kernel /tmp/migration-test-mSLr4A/bootsect
> ==4034==
> ==4040== Memcheck, a memory error detector
> ==4040== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==4040== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==4040== Command: aarch64-softmmu/qemu-system-aarch64 -qtest
> unix:/tmp/qtest-4033.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-4033.qmp,id=char0 -mon
> chardev=char0,mode=control -machine accel=qtest -display none -machine
> virt,accel=kvm:tcg,gic-version=max -name vmdest,debug-threads=on -cpu
> max -m 150M -serial file:/tmp/migration-test-mSLr4A/dest_serial
> -kernel /tmp/migration-test-mSLr4A/bootsect -incoming tcp:127.0.0.1:0
> ==4040==
> ==4034== Thread 5 multifdsend_0:
> ==4034== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
> ==4034==    at 0x5299F06: __libc_do_syscall (libc-do-syscall.S:47)
> ==4034==    by 0x5298FCB: sendmsg (sendmsg.c:28)
> ==4034==    by 0x60135D: qio_channel_socket_writev (channel-socket.c:544)
> ==4034==    by 0x5FF995: qio_channel_writev (channel.c:207)
> ==4034==    by 0x5FF995: qio_channel_writev_all (channel.c:171)
> ==4034==    by 0x5FFA0F: qio_channel_write_all (channel.c:257)
> ==4034==    by 0x26BA73: multifd_send_initial_packet (ram.c:711)
> ==4034==    by 0x26BA73: multifd_send_thread (ram.c:1085)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  Address 0x2320048d is on thread 5's stack
> ==4034==  in frame #5, created by multifd_send_thread (ram.c:1077)

This is a simple missing initialization

multifd_send_initial_packet has a local variable:

    MultiFDInit_t msg;

the code initializes 4 fields, but does *not* initialize the 2
padding fields, so we're writing random data. Harmless as the
receiving end will ignore padding too, but we should fill with
zeros really. so

    MultiFDInit_t msg = {0};

should fix it.

> ==4034==
> ==4034== Thread 6 multifdsend_1:
> ==4034== Invalid write of size 4
> ==4034==    at 0x26BB7C: multifd_send_fill_packet (ram.c:806)
> ==4034==    by 0x26BB7C: multifd_send_thread (ram.c:1101)
> ==4034==    by 0x63C0B1: qemu_thread_start (qemu-thread-posix.c:502)
> ==4034==    by 0x5290613: start_thread (pthread_create.c:463)
> ==4034==    by 0x53487FB: ??? (clone.S:73)
> ==4034==  Address 0x224ed668 is 0 bytes after a block of size 832 alloc'd
> ==4034==    at 0x4841BC4: calloc (vg_replace_malloc.c:711)
> ==4034==    by 0x5018269: g_malloc0 (in
> /usr/lib/arm-linux-gnueabihf/libglib-2.0.so.0.5600.4)

multifd_send_fill_packet is getting the oob write in:

    for (i = 0; i < p->pages->used; i++) {
        packet->offset[i] = cpu_to_be64(p->pages->offset[i]);
    }

offset is a variable length struct field at the end of MultiFDPacket_t:

  typedef struct {
     ...snip...
    char ramblock[256];
    uint64_t offset[];
  } __attribute__((packed)) MultiFDPacket_t;

but the packet data is allocated back in multifd_save_setup using:

        p->packet_len = sizeof(MultiFDPacket_t)
                      + sizeof(ram_addr_t) * page_count;
        p->packet = g_malloc0(p->packet_len);


Notice the field in the struct is "uint64_t" but the length we're
allocating is "ram_addr_t".

Since this is a 32-bit build, I'm guessing ram_addr_t is a 32-bit
integer and thus we're under-allocating the variable length offset
field by half

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PULL 00/19] Migration patches
  2019-07-15 13:48       ` Peter Maydell
@ 2019-07-15 14:10         ` Juan Quintela
  2019-07-15 14:15           ` Peter Maydell
  0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2019-07-15 14:10 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

Peter Maydell <peter.maydell@linaro.org> wrote:
> On Mon, 15 Jul 2019 at 14:44, Juan Quintela <quintela@redhat.com> wrote:
>>
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>> > On Fri, 12 Jul 2019 at 17:33, Peter Maydell <peter.maydell@linaro.org> wrote:
>> >> Still fails on aarch32 host, I'm afraid:
>>
>> Hi
>>
>> dropping the multifd test patch from now.  For "some" reason, having a
>> packed struct and 32bits is getting ugly, not sure yet _why_.
>
> IMHO 'packed' structs are usually a bad idea. They have a bunch
> of behaviours you may not be expecting (for instance they're
> also not naturally aligned, and arrays of them won't be the
> size you expect).

I can't get everything happy O:-)
For the multifd initial packet, I used to have that I wrote the fields
by hand.  Then danp asked that I used a packed struct, and converted the
values inside it.  So ..... Imposible to have everybody happy.

Anyways, the struct is packed, both sides are i386 32bits, and it should
be exactly the same, but it appears that there is where your valgrind
problems appear.  Still investigating _where_ the problem is.  What is
even weirder is that there is no error at all on 64bits.

Thanks, Juan.

PS.  BTW, did you launched by hand the guests with valgrind, or there is
     a trick that I am missing for launching a qtest with valgrind?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PULL 00/19] Migration patches
  2019-07-15 14:10         ` Juan Quintela
@ 2019-07-15 14:15           ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2019-07-15 14:15 UTC (permalink / raw)
  To: Juan Quintela
  Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

On Mon, 15 Jul 2019 at 15:10, Juan Quintela <quintela@redhat.com> wrote:
> PS.  BTW, did you launched by hand the guests with valgrind, or there is
>      a trick that I am missing for launching a qtest with valgrind?

I quoted the command line I used:

QTEST_QEMU_BINARY='valgrind aarch64-softmmu/qemu-system-aarch64'
tests/migration-test -v -p '/aarch64/migration/multifd/tcp'

(https://wiki.qemu.org/Features/QTest lists this and some other
things you might want to do with qtest tests.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/19] Migration patches
  2019-07-15 14:04     ` [Qemu-devel] " Daniel P. Berrangé
@ 2019-07-15 14:17       ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2019-07-15 14:17 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, Laurent Vivier, Thomas Huth, kvm-devel,
	QEMU Developers, Dr. David Alan Gilbert, Paolo Bonzini,
	Richard Henderson

On Mon, 15 Jul 2019 at 15:04, Daniel P. Berrangé <berrange@redhat.com> wrote:
>     MultiFDInit_t msg = {0};
>
> should fix it.

A minor nit, but "= {}" is our standard struct-zero-initializer
so we should prefer that, I think. (I know it is not the C-spec
recommended version but some C compilers incorrectly warn about
"= {0}" whereas no compiler we care about warns about the
gnu-extension "= {}".)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PULL 06/19] cutils: remove one unnecessary pointer operation
  2019-07-11 10:43 Juan Quintela
@ 2019-07-11 10:43 ` Juan Quintela
  0 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2019-07-11 10:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr. David Alan Gilbert, Juan Quintela, Laurent Vivier, kvm,
	Thomas Huth, Richard Henderson, Paolo Bonzini, Wei Yang

From: Wei Yang <richardw.yang@linux.intel.com>

Since we will not operate on the next address pointed by out, it is not
necessary to do addition on it.

After removing the operation, the function size reduced 16/18 bytes.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Message-Id: <20190610030852.16039-2-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 util/cutils.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/cutils.c b/util/cutils.c
index dfc605f1ef..fd591cadf0 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -756,11 +756,11 @@ int uleb128_encode_small(uint8_t *out, uint32_t n)
 {
     g_assert(n <= 0x3fff);
     if (n < 0x80) {
-        *out++ = n;
+        *out = n;
         return 1;
     } else {
         *out++ = (n & 0x7f) | 0x80;
-        *out++ = n >> 7;
+        *out = n >> 7;
         return 2;
     }
 }
@@ -768,7 +768,7 @@ int uleb128_encode_small(uint8_t *out, uint32_t n)
 int uleb128_decode_small(const uint8_t *in, uint32_t *n)
 {
     if (!(*in & 0x80)) {
-        *n = *in++;
+        *n = *in;
         return 1;
     } else {
         *n = *in++ & 0x7f;
@@ -776,7 +776,7 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n)
         if (*in & 0x80) {
             return -1;
         }
-        *n |= *in++ << 7;
+        *n |= *in << 7;
         return 2;
     }
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-07-15 15:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-12 14:31 [PULL 00/19] Migration patches Juan Quintela
2019-07-12 14:31 ` [PULL 01/19] migration: fix multifd_recv event typo Juan Quintela
2019-07-12 14:31 ` [PULL 02/19] migration-test: rename parameter to parameter_int Juan Quintela
2019-07-12 14:31 ` [PULL 03/19] migration-test: Add migration multifd test Juan Quintela
2019-07-12 14:31 ` [PULL 04/19] migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS Juan Quintela
2019-07-12 14:31 ` [PULL 05/19] migration/xbzrle: update cache and current_data in one place Juan Quintela
2019-07-12 14:31 ` [PULL 06/19] cutils: remove one unnecessary pointer operation Juan Quintela
2019-07-12 14:31 ` [PULL 07/19] migration/multifd: sync packet_num after all thread are done Juan Quintela
2019-07-12 14:31 ` [PULL 08/19] migration/ram.c: reset complete_round when we gets a queued page Juan Quintela
2019-07-12 14:31 ` [PULL 09/19] migration: No need to take rcu during sync_dirty_bitmap Juan Quintela
2019-07-12 14:31 ` [PULL 10/19] memory: Don't set migration bitmap when without migration Juan Quintela
2019-07-12 14:31 ` [PULL 11/19] bitmap: Add bitmap_copy_with_{src|dst}_offset() Juan Quintela
2019-07-12 14:32 ` [PULL 12/19] memory: Pass mr into snapshot_and_clear_dirty Juan Quintela
2019-07-12 14:32 ` [PULL 13/19] memory: Introduce memory listener hook log_clear() Juan Quintela
2019-07-12 14:32 ` [PULL 14/19] kvm: Update comments for sync_dirty_bitmap Juan Quintela
2019-07-12 14:32 ` [PULL 15/19] kvm: Persistent per kvmslot dirty bitmap Juan Quintela
2019-07-12 14:32 ` [PULL 16/19] kvm: Introduce slots lock for memory listener Juan Quintela
2019-07-12 14:32 ` [PULL 17/19] kvm: Support KVM_CLEAR_DIRTY_LOG Juan Quintela
2019-07-12 14:32 ` [PULL 18/19] migration: Split log_clear() into smaller chunks Juan Quintela
2019-07-12 14:32 ` [PULL 19/19] migration: allow private destination ram with x-ignore-shared Juan Quintela
2019-07-12 16:33 ` [Qemu-devel] [PULL 00/19] Migration patches Peter Maydell
2019-07-12 17:54   ` Dr. David Alan Gilbert
2019-07-15 11:16   ` Peter Maydell
2019-07-15 13:44     ` Juan Quintela
2019-07-15 13:48       ` Peter Maydell
2019-07-15 14:10         ` Juan Quintela
2019-07-15 14:15           ` Peter Maydell
2019-07-15 14:04     ` [Qemu-devel] " Daniel P. Berrangé
2019-07-15 14:17       ` Peter Maydell
  -- strict thread matches above, loose matches on Subject: below --
2019-07-11 10:43 Juan Quintela
2019-07-11 10:43 ` [PULL 06/19] cutils: remove one unnecessary pointer operation Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).