All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] migration: improve and cleanup compression
@ 2018-03-13  7:57 ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

This is the first part of our work to improve compression to make it
be more useful in the production.

The first patch resolves the problem that the migration thread spends
too much CPU resource to compression memory if it jumps to a new block
that causes the network is used very deficient.

The second patch fixes the performance issue that too many VM-exits
happen during live migration if compression is being used, it is caused
by huge memory returned to kernel frequently as the memory is allocated
and freed for every signal call to compress2()

The remaining patches clean the code up dramatically  


Xiao Guangrong (8):
  migration: stop compressing page in migration thread
  migration: stop allocating and freeing memory frequently
  migration: support to detect compression and decompression errors
  migration: introduce control_save_page()
  migration: move calling control_save_page to the common place
  migration: move calling save_zero_page to the common place
  migration: introduce save_normal_page()
  migration: remove ram_save_compressed_page()

 migration/qemu-file.c |  38 ++++-
 migration/qemu-file.h |   6 +-
 migration/ram.c       | 430 +++++++++++++++++++++++++++++---------------------
 3 files changed, 290 insertions(+), 184 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 0/8] migration: improve and cleanup compression
@ 2018-03-13  7:57 ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

This is the first part of our work to improve compression to make it
be more useful in the production.

The first patch resolves the problem that the migration thread spends
too much CPU resource to compression memory if it jumps to a new block
that causes the network is used very deficient.

The second patch fixes the performance issue that too many VM-exits
happen during live migration if compression is being used, it is caused
by huge memory returned to kernel frequently as the memory is allocated
and freed for every signal call to compress2()

The remaining patches clean the code up dramatically  


Xiao Guangrong (8):
  migration: stop compressing page in migration thread
  migration: stop allocating and freeing memory frequently
  migration: support to detect compression and decompression errors
  migration: introduce control_save_page()
  migration: move calling control_save_page to the common place
  migration: move calling save_zero_page to the common place
  migration: introduce save_normal_page()
  migration: remove ram_save_compressed_page()

 migration/qemu-file.c |  38 ++++-
 migration/qemu-file.h |   6 +-
 migration/ram.c       | 430 +++++++++++++++++++++++++++++---------------------
 3 files changed, 290 insertions(+), 184 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

As compression is a heavy work, do not do it in migration thread,
instead, we post it out as a normal page

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7266351fd0..615693f180 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
     int pages = -1;
     uint64_t bytes_xmit = 0;
     uint8_t *p;
-    int ret, blen;
+    int ret;
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
 
@@ -1162,23 +1162,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
         if (block != rs->last_sent_block) {
             flush_compressed_data(rs);
             pages = save_zero_page(rs, block, offset);
-            if (pages == -1) {
-                /* Make sure the first page is sent out before other pages */
-                bytes_xmit = save_page_header(rs, rs->f, block, offset |
-                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
-                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
-                                                 migrate_compress_level());
-                if (blen > 0) {
-                    ram_counters.transferred += bytes_xmit + blen;
-                    ram_counters.normal++;
-                    pages = 1;
-                } else {
-                    qemu_file_set_error(rs->f, blen);
-                    error_report("compressed data failed!");
-                }
-            }
             if (pages > 0) {
                 ram_release_pages(block->idstr, offset, pages);
+            } else {
+                /*
+                 * Make sure the first page is sent out before other pages.
+                 *
+                 * we post it as normal page as compression will take much
+                 * CPU resource.
+                 */
+                ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                                offset | RAM_SAVE_FLAG_PAGE);
+                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
+                                      migrate_release_ram() &
+                                      migration_in_postcopy());
+                ram_counters.transferred += TARGET_PAGE_SIZE;
+                ram_counters.normal++;
+                pages = 1;
             }
         } else {
             pages = save_zero_page(rs, block, offset);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

As compression is a heavy work, do not do it in migration thread,
instead, we post it out as a normal page

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7266351fd0..615693f180 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
     int pages = -1;
     uint64_t bytes_xmit = 0;
     uint8_t *p;
-    int ret, blen;
+    int ret;
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
 
@@ -1162,23 +1162,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
         if (block != rs->last_sent_block) {
             flush_compressed_data(rs);
             pages = save_zero_page(rs, block, offset);
-            if (pages == -1) {
-                /* Make sure the first page is sent out before other pages */
-                bytes_xmit = save_page_header(rs, rs->f, block, offset |
-                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
-                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
-                                                 migrate_compress_level());
-                if (blen > 0) {
-                    ram_counters.transferred += bytes_xmit + blen;
-                    ram_counters.normal++;
-                    pages = 1;
-                } else {
-                    qemu_file_set_error(rs->f, blen);
-                    error_report("compressed data failed!");
-                }
-            }
             if (pages > 0) {
                 ram_release_pages(block->idstr, offset, pages);
+            } else {
+                /*
+                 * Make sure the first page is sent out before other pages.
+                 *
+                 * we post it as normal page as compression will take much
+                 * CPU resource.
+                 */
+                ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                                offset | RAM_SAVE_FLAG_PAGE);
+                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
+                                      migrate_release_ram() &
+                                      migration_in_postcopy());
+                ram_counters.transferred += TARGET_PAGE_SIZE;
+                ram_counters.normal++;
+                pages = 1;
             }
         } else {
             pages = save_zero_page(rs, block, offset);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Current code uses compress2()/uncompress() to compress/decompress
memory, these two function manager memory allocation and release
internally, that causes huge memory is allocated and freed very
frequently

More worse, frequently returning memory to kernel will flush TLBs
and trigger invalidation callbacks on mmu-notification which
interacts with KVM MMU, that dramatically reduce the performance
of VM

So, we maintain the memory by ourselves and reuse it for each
compression and decompression

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/qemu-file.c |  34 ++++++++++--
 migration/qemu-file.h |   6 ++-
 migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 140 insertions(+), 42 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2ab2bf362d..1ff33a1ffb 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
+/* return the size after compression, or negative value on error */
+static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
+                              const uint8_t *source, size_t source_len)
+{
+    int err;
+
+    err = deflateReset(stream);
+    if (err != Z_OK) {
+        return -1;
+    }
+
+    stream->avail_in = source_len;
+    stream->next_in = (uint8_t *)source;
+    stream->avail_out = dest_len;
+    stream->next_out = dest;
+
+    err = deflate(stream, Z_FINISH);
+    if (err != Z_STREAM_END) {
+        return -1;
+    }
+
+    return stream->next_out - dest;
+}
+
 /* Compress size bytes of data start at p with specific compression
  * level and store the compressed data to the buffer of f.
  *
@@ -668,8 +692,8 @@ uint64_t qemu_get_be64(QEMUFile *f)
  * data, return -1.
  */
 
-ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
-                                  int level)
+ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
+                                  const uint8_t *p, size_t size)
 {
     ssize_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t);
 
@@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
             return -1;
         }
     }
-    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
-                  (Bytef *)p, size, level) != Z_OK) {
+
+    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
+                              blen, p, size);
+    if (blen < 0) {
         error_report("Compress Failed!");
         return 0;
     }
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index aae4e5ed36..d123b21ca8 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -25,6 +25,8 @@
 #ifndef MIGRATION_QEMU_FILE_H
 #define MIGRATION_QEMU_FILE_H
 
+#include <zlib.h>
+
 /* Read a chunk of data from a file at the given position.  The pos argument
  * can be ignored if the file is only be used for streaming.  The number of
  * bytes actually read should be returned.
@@ -132,8 +134,8 @@ bool qemu_file_is_writable(QEMUFile *f);
 
 size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset);
 size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size);
-ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
-                                  int level);
+ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
+                                  const uint8_t *p, size_t size);
 int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
 
 /*
diff --git a/migration/ram.c b/migration/ram.c
index 615693f180..fff3f31e90 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -264,6 +264,7 @@ struct CompressParam {
     QemuCond cond;
     RAMBlock *block;
     ram_addr_t offset;
+    z_stream stream;
 };
 typedef struct CompressParam CompressParam;
 
@@ -275,6 +276,7 @@ struct DecompressParam {
     void *des;
     uint8_t *compbuf;
     int len;
+    z_stream stream;
 };
 typedef struct DecompressParam DecompressParam;
 
@@ -294,7 +296,7 @@ static QemuThread *decompress_threads;
 static QemuMutex decomp_done_lock;
 static QemuCond decomp_done_cond;
 
-static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
+static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                 ram_addr_t offset);
 
 static void *do_data_compress(void *opaque)
@@ -311,7 +313,7 @@ static void *do_data_compress(void *opaque)
             param->block = NULL;
             qemu_mutex_unlock(&param->mutex);
 
-            do_compress_ram_page(param->file, block, offset);
+            do_compress_ram_page(param->file, &param->stream, block, offset);
 
             qemu_mutex_lock(&comp_done_lock);
             param->done = true;
@@ -352,10 +354,17 @@ static void compress_threads_save_cleanup(void)
     terminate_compression_threads();
     thread_count = migrate_compress_threads();
     for (i = 0; i < thread_count; i++) {
+        /* something in compress_threads_save_setup() is wrong. */
+        if (!comp_param[i].stream.opaque) {
+            break;
+        }
+
         qemu_thread_join(compress_threads + i);
         qemu_fclose(comp_param[i].file);
         qemu_mutex_destroy(&comp_param[i].mutex);
         qemu_cond_destroy(&comp_param[i].cond);
+        deflateEnd(&comp_param[i].stream);
+        comp_param[i].stream.opaque = NULL;
     }
     qemu_mutex_destroy(&comp_done_lock);
     qemu_cond_destroy(&comp_done_cond);
@@ -365,12 +374,12 @@ static void compress_threads_save_cleanup(void)
     comp_param = NULL;
 }
 
-static void compress_threads_save_setup(void)
+static int compress_threads_save_setup(void)
 {
     int i, thread_count;
 
     if (!migrate_use_compression()) {
-        return;
+        return 0;
     }
     thread_count = migrate_compress_threads();
     compress_threads = g_new0(QemuThread, thread_count);
@@ -378,6 +387,12 @@ static void compress_threads_save_setup(void)
     qemu_cond_init(&comp_done_cond);
     qemu_mutex_init(&comp_done_lock);
     for (i = 0; i < thread_count; i++) {
+        if (deflateInit(&comp_param[i].stream,
+                           migrate_compress_level()) != Z_OK) {
+            goto exit;
+        }
+        comp_param[i].stream.opaque = &comp_param[i];
+
         /* comp_param[i].file is just used as a dummy buffer to save data,
          * set its ops to empty.
          */
@@ -390,6 +405,11 @@ static void compress_threads_save_setup(void)
                            do_data_compress, comp_param + i,
                            QEMU_THREAD_JOINABLE);
     }
+    return 0;
+
+exit:
+    compress_threads_save_cleanup();
+    return -1;
 }
 
 /* Multiple fd's */
@@ -1026,7 +1046,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     return pages;
 }
 
-static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
+static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                 ram_addr_t offset)
 {
     RAMState *rs = ram_state;
@@ -1035,8 +1055,7 @@ static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
 
     bytes_sent = save_page_header(rs, f, block, offset |
                                   RAM_SAVE_FLAG_COMPRESS_PAGE);
-    blen = qemu_put_compression_data(f, p, TARGET_PAGE_SIZE,
-                                     migrate_compress_level());
+    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
     if (blen < 0) {
         bytes_sent = 0;
         qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
@@ -2209,9 +2228,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMState **rsp = opaque;
     RAMBlock *block;
 
+    if (compress_threads_save_setup()) {
+        return -1;
+    }
+
     /* migration has already setup the bitmap, reuse it. */
     if (!migration_in_colo_state()) {
         if (ram_init_all(rsp) != 0) {
+            compress_threads_save_cleanup();
             return -1;
         }
     }
@@ -2231,7 +2255,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     }
 
     rcu_read_unlock();
-    compress_threads_save_setup();
 
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -2495,6 +2518,30 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/* return the size after decompression, or negative value on error */
+static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
+                           uint8_t *source, size_t source_len)
+{
+    int err;
+
+    err = inflateReset(stream);
+    if (err != Z_OK) {
+        return -1;
+    }
+
+    stream->avail_in = source_len;
+    stream->next_in = source;
+    stream->avail_out = dest_len;
+    stream->next_out = dest;
+
+    err = inflate(stream, Z_NO_FLUSH);
+    if (err != Z_STREAM_END) {
+        return -1;
+    }
+
+    return stream->total_out;
+}
+
 static void *do_data_decompress(void *opaque)
 {
     DecompressParam *param = opaque;
@@ -2511,13 +2558,13 @@ static void *do_data_decompress(void *opaque)
             qemu_mutex_unlock(&param->mutex);
 
             pagesize = TARGET_PAGE_SIZE;
-            /* uncompress() will return failed in some case, especially
+            /* qemu_uncompress() will return failed in some case, especially
              * when the page is dirted when doing the compression, it's
              * not a problem because the dirty page will be retransferred
              * and uncompress() won't break the data in other pages.
              */
-            uncompress((Bytef *)des, &pagesize,
-                       (const Bytef *)param->compbuf, len);
+            qemu_uncompress(&param->stream, des, pagesize,
+                            param->compbuf, len);
 
             qemu_mutex_lock(&decomp_done_lock);
             param->done = true;
@@ -2552,30 +2599,6 @@ static void wait_for_decompress_done(void)
     qemu_mutex_unlock(&decomp_done_lock);
 }
 
-static void compress_threads_load_setup(void)
-{
-    int i, thread_count;
-
-    if (!migrate_use_compression()) {
-        return;
-    }
-    thread_count = migrate_decompress_threads();
-    decompress_threads = g_new0(QemuThread, thread_count);
-    decomp_param = g_new0(DecompressParam, thread_count);
-    qemu_mutex_init(&decomp_done_lock);
-    qemu_cond_init(&decomp_done_cond);
-    for (i = 0; i < thread_count; i++) {
-        qemu_mutex_init(&decomp_param[i].mutex);
-        qemu_cond_init(&decomp_param[i].cond);
-        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
-        decomp_param[i].done = true;
-        decomp_param[i].quit = false;
-        qemu_thread_create(decompress_threads + i, "decompress",
-                           do_data_decompress, decomp_param + i,
-                           QEMU_THREAD_JOINABLE);
-    }
-}
-
 static void compress_threads_load_cleanup(void)
 {
     int i, thread_count;
@@ -2585,16 +2608,26 @@ static void compress_threads_load_cleanup(void)
     }
     thread_count = migrate_decompress_threads();
     for (i = 0; i < thread_count; i++) {
+        if (!decomp_param[i].stream.opaque) {
+            break;
+        }
+
         qemu_mutex_lock(&decomp_param[i].mutex);
         decomp_param[i].quit = true;
         qemu_cond_signal(&decomp_param[i].cond);
         qemu_mutex_unlock(&decomp_param[i].mutex);
     }
     for (i = 0; i < thread_count; i++) {
+        if (!decomp_param[i].stream.opaque) {
+            break;
+        }
+
         qemu_thread_join(decompress_threads + i);
         qemu_mutex_destroy(&decomp_param[i].mutex);
         qemu_cond_destroy(&decomp_param[i].cond);
         g_free(decomp_param[i].compbuf);
+        inflateEnd(&decomp_param[i].stream);
+        decomp_param[i].stream.opaque = NULL;
     }
     g_free(decompress_threads);
     g_free(decomp_param);
@@ -2602,6 +2635,40 @@ static void compress_threads_load_cleanup(void)
     decomp_param = NULL;
 }
 
+static int compress_threads_load_setup(void)
+{
+    int i, thread_count;
+
+    if (!migrate_use_compression()) {
+        return 0;
+    }
+
+    thread_count = migrate_decompress_threads();
+    decompress_threads = g_new0(QemuThread, thread_count);
+    decomp_param = g_new0(DecompressParam, thread_count);
+    qemu_mutex_init(&decomp_done_lock);
+    qemu_cond_init(&decomp_done_cond);
+    for (i = 0; i < thread_count; i++) {
+        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
+            goto exit;
+        }
+        decomp_param[i].stream.opaque = &decomp_param[i];
+
+        qemu_mutex_init(&decomp_param[i].mutex);
+        qemu_cond_init(&decomp_param[i].cond);
+        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
+        decomp_param[i].done = true;
+        decomp_param[i].quit = false;
+        qemu_thread_create(decompress_threads + i, "decompress",
+                           do_data_decompress, decomp_param + i,
+                           QEMU_THREAD_JOINABLE);
+    }
+    return 0;
+exit:
+    compress_threads_load_cleanup();
+    return -1;
+}
+
 static void decompress_data_with_multi_threads(QEMUFile *f,
                                                void *host, int len)
 {
@@ -2641,8 +2708,11 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
  */
 static int ram_load_setup(QEMUFile *f, void *opaque)
 {
+    if (compress_threads_load_setup()) {
+        return -1;
+    }
+
     xbzrle_load_setup();
-    compress_threads_load_setup();
     ramblock_recv_map_init();
     return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Current code uses compress2()/uncompress() to compress/decompress
memory, these two function manager memory allocation and release
internally, that causes huge memory is allocated and freed very
frequently

More worse, frequently returning memory to kernel will flush TLBs
and trigger invalidation callbacks on mmu-notification which
interacts with KVM MMU, that dramatically reduce the performance
of VM

So, we maintain the memory by ourselves and reuse it for each
compression and decompression

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/qemu-file.c |  34 ++++++++++--
 migration/qemu-file.h |   6 ++-
 migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 140 insertions(+), 42 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2ab2bf362d..1ff33a1ffb 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
+/* return the size after compression, or negative value on error */
+static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
+                              const uint8_t *source, size_t source_len)
+{
+    int err;
+
+    err = deflateReset(stream);
+    if (err != Z_OK) {
+        return -1;
+    }
+
+    stream->avail_in = source_len;
+    stream->next_in = (uint8_t *)source;
+    stream->avail_out = dest_len;
+    stream->next_out = dest;
+
+    err = deflate(stream, Z_FINISH);
+    if (err != Z_STREAM_END) {
+        return -1;
+    }
+
+    return stream->next_out - dest;
+}
+
 /* Compress size bytes of data start at p with specific compression
  * level and store the compressed data to the buffer of f.
  *
@@ -668,8 +692,8 @@ uint64_t qemu_get_be64(QEMUFile *f)
  * data, return -1.
  */
 
-ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
-                                  int level)
+ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
+                                  const uint8_t *p, size_t size)
 {
     ssize_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t);
 
@@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
             return -1;
         }
     }
-    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
-                  (Bytef *)p, size, level) != Z_OK) {
+
+    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
+                              blen, p, size);
+    if (blen < 0) {
         error_report("Compress Failed!");
         return 0;
     }
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index aae4e5ed36..d123b21ca8 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -25,6 +25,8 @@
 #ifndef MIGRATION_QEMU_FILE_H
 #define MIGRATION_QEMU_FILE_H
 
+#include <zlib.h>
+
 /* Read a chunk of data from a file at the given position.  The pos argument
  * can be ignored if the file is only be used for streaming.  The number of
  * bytes actually read should be returned.
@@ -132,8 +134,8 @@ bool qemu_file_is_writable(QEMUFile *f);
 
 size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset);
 size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size);
-ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
-                                  int level);
+ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
+                                  const uint8_t *p, size_t size);
 int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
 
 /*
diff --git a/migration/ram.c b/migration/ram.c
index 615693f180..fff3f31e90 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -264,6 +264,7 @@ struct CompressParam {
     QemuCond cond;
     RAMBlock *block;
     ram_addr_t offset;
+    z_stream stream;
 };
 typedef struct CompressParam CompressParam;
 
@@ -275,6 +276,7 @@ struct DecompressParam {
     void *des;
     uint8_t *compbuf;
     int len;
+    z_stream stream;
 };
 typedef struct DecompressParam DecompressParam;
 
@@ -294,7 +296,7 @@ static QemuThread *decompress_threads;
 static QemuMutex decomp_done_lock;
 static QemuCond decomp_done_cond;
 
-static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
+static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                 ram_addr_t offset);
 
 static void *do_data_compress(void *opaque)
@@ -311,7 +313,7 @@ static void *do_data_compress(void *opaque)
             param->block = NULL;
             qemu_mutex_unlock(&param->mutex);
 
-            do_compress_ram_page(param->file, block, offset);
+            do_compress_ram_page(param->file, &param->stream, block, offset);
 
             qemu_mutex_lock(&comp_done_lock);
             param->done = true;
@@ -352,10 +354,17 @@ static void compress_threads_save_cleanup(void)
     terminate_compression_threads();
     thread_count = migrate_compress_threads();
     for (i = 0; i < thread_count; i++) {
+        /* something in compress_threads_save_setup() is wrong. */
+        if (!comp_param[i].stream.opaque) {
+            break;
+        }
+
         qemu_thread_join(compress_threads + i);
         qemu_fclose(comp_param[i].file);
         qemu_mutex_destroy(&comp_param[i].mutex);
         qemu_cond_destroy(&comp_param[i].cond);
+        deflateEnd(&comp_param[i].stream);
+        comp_param[i].stream.opaque = NULL;
     }
     qemu_mutex_destroy(&comp_done_lock);
     qemu_cond_destroy(&comp_done_cond);
@@ -365,12 +374,12 @@ static void compress_threads_save_cleanup(void)
     comp_param = NULL;
 }
 
-static void compress_threads_save_setup(void)
+static int compress_threads_save_setup(void)
 {
     int i, thread_count;
 
     if (!migrate_use_compression()) {
-        return;
+        return 0;
     }
     thread_count = migrate_compress_threads();
     compress_threads = g_new0(QemuThread, thread_count);
@@ -378,6 +387,12 @@ static void compress_threads_save_setup(void)
     qemu_cond_init(&comp_done_cond);
     qemu_mutex_init(&comp_done_lock);
     for (i = 0; i < thread_count; i++) {
+        if (deflateInit(&comp_param[i].stream,
+                           migrate_compress_level()) != Z_OK) {
+            goto exit;
+        }
+        comp_param[i].stream.opaque = &comp_param[i];
+
         /* comp_param[i].file is just used as a dummy buffer to save data,
          * set its ops to empty.
          */
@@ -390,6 +405,11 @@ static void compress_threads_save_setup(void)
                            do_data_compress, comp_param + i,
                            QEMU_THREAD_JOINABLE);
     }
+    return 0;
+
+exit:
+    compress_threads_save_cleanup();
+    return -1;
 }
 
 /* Multiple fd's */
@@ -1026,7 +1046,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     return pages;
 }
 
-static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
+static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                 ram_addr_t offset)
 {
     RAMState *rs = ram_state;
@@ -1035,8 +1055,7 @@ static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
 
     bytes_sent = save_page_header(rs, f, block, offset |
                                   RAM_SAVE_FLAG_COMPRESS_PAGE);
-    blen = qemu_put_compression_data(f, p, TARGET_PAGE_SIZE,
-                                     migrate_compress_level());
+    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
     if (blen < 0) {
         bytes_sent = 0;
         qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
@@ -2209,9 +2228,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMState **rsp = opaque;
     RAMBlock *block;
 
+    if (compress_threads_save_setup()) {
+        return -1;
+    }
+
     /* migration has already setup the bitmap, reuse it. */
     if (!migration_in_colo_state()) {
         if (ram_init_all(rsp) != 0) {
+            compress_threads_save_cleanup();
             return -1;
         }
     }
@@ -2231,7 +2255,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     }
 
     rcu_read_unlock();
-    compress_threads_save_setup();
 
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -2495,6 +2518,30 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/* return the size after decompression, or negative value on error */
+static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
+                           uint8_t *source, size_t source_len)
+{
+    int err;
+
+    err = inflateReset(stream);
+    if (err != Z_OK) {
+        return -1;
+    }
+
+    stream->avail_in = source_len;
+    stream->next_in = source;
+    stream->avail_out = dest_len;
+    stream->next_out = dest;
+
+    err = inflate(stream, Z_NO_FLUSH);
+    if (err != Z_STREAM_END) {
+        return -1;
+    }
+
+    return stream->total_out;
+}
+
 static void *do_data_decompress(void *opaque)
 {
     DecompressParam *param = opaque;
@@ -2511,13 +2558,13 @@ static void *do_data_decompress(void *opaque)
             qemu_mutex_unlock(&param->mutex);
 
             pagesize = TARGET_PAGE_SIZE;
-            /* uncompress() will return failed in some case, especially
+            /* qemu_uncompress() will return failed in some case, especially
              * when the page is dirted when doing the compression, it's
              * not a problem because the dirty page will be retransferred
              * and uncompress() won't break the data in other pages.
              */
-            uncompress((Bytef *)des, &pagesize,
-                       (const Bytef *)param->compbuf, len);
+            qemu_uncompress(&param->stream, des, pagesize,
+                            param->compbuf, len);
 
             qemu_mutex_lock(&decomp_done_lock);
             param->done = true;
@@ -2552,30 +2599,6 @@ static void wait_for_decompress_done(void)
     qemu_mutex_unlock(&decomp_done_lock);
 }
 
-static void compress_threads_load_setup(void)
-{
-    int i, thread_count;
-
-    if (!migrate_use_compression()) {
-        return;
-    }
-    thread_count = migrate_decompress_threads();
-    decompress_threads = g_new0(QemuThread, thread_count);
-    decomp_param = g_new0(DecompressParam, thread_count);
-    qemu_mutex_init(&decomp_done_lock);
-    qemu_cond_init(&decomp_done_cond);
-    for (i = 0; i < thread_count; i++) {
-        qemu_mutex_init(&decomp_param[i].mutex);
-        qemu_cond_init(&decomp_param[i].cond);
-        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
-        decomp_param[i].done = true;
-        decomp_param[i].quit = false;
-        qemu_thread_create(decompress_threads + i, "decompress",
-                           do_data_decompress, decomp_param + i,
-                           QEMU_THREAD_JOINABLE);
-    }
-}
-
 static void compress_threads_load_cleanup(void)
 {
     int i, thread_count;
@@ -2585,16 +2608,26 @@ static void compress_threads_load_cleanup(void)
     }
     thread_count = migrate_decompress_threads();
     for (i = 0; i < thread_count; i++) {
+        if (!decomp_param[i].stream.opaque) {
+            break;
+        }
+
         qemu_mutex_lock(&decomp_param[i].mutex);
         decomp_param[i].quit = true;
         qemu_cond_signal(&decomp_param[i].cond);
         qemu_mutex_unlock(&decomp_param[i].mutex);
     }
     for (i = 0; i < thread_count; i++) {
+        if (!decomp_param[i].stream.opaque) {
+            break;
+        }
+
         qemu_thread_join(decompress_threads + i);
         qemu_mutex_destroy(&decomp_param[i].mutex);
         qemu_cond_destroy(&decomp_param[i].cond);
         g_free(decomp_param[i].compbuf);
+        inflateEnd(&decomp_param[i].stream);
+        decomp_param[i].stream.opaque = NULL;
     }
     g_free(decompress_threads);
     g_free(decomp_param);
@@ -2602,6 +2635,40 @@ static void compress_threads_load_cleanup(void)
     decomp_param = NULL;
 }
 
+static int compress_threads_load_setup(void)
+{
+    int i, thread_count;
+
+    if (!migrate_use_compression()) {
+        return 0;
+    }
+
+    thread_count = migrate_decompress_threads();
+    decompress_threads = g_new0(QemuThread, thread_count);
+    decomp_param = g_new0(DecompressParam, thread_count);
+    qemu_mutex_init(&decomp_done_lock);
+    qemu_cond_init(&decomp_done_cond);
+    for (i = 0; i < thread_count; i++) {
+        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
+            goto exit;
+        }
+        decomp_param[i].stream.opaque = &decomp_param[i];
+
+        qemu_mutex_init(&decomp_param[i].mutex);
+        qemu_cond_init(&decomp_param[i].cond);
+        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
+        decomp_param[i].done = true;
+        decomp_param[i].quit = false;
+        qemu_thread_create(decompress_threads + i, "decompress",
+                           do_data_decompress, decomp_param + i,
+                           QEMU_THREAD_JOINABLE);
+    }
+    return 0;
+exit:
+    compress_threads_load_cleanup();
+    return -1;
+}
+
 static void decompress_data_with_multi_threads(QEMUFile *f,
                                                void *host, int len)
 {
@@ -2641,8 +2708,11 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
  */
 static int ram_load_setup(QEMUFile *f, void *opaque)
 {
+    if (compress_threads_load_setup()) {
+        return -1;
+    }
+
     xbzrle_load_setup();
-    compress_threads_load_setup();
     ramblock_recv_map_init();
     return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Currently the page being compressed is allowed to be updated by
the VM on the source QEMU, correspondingly the destination QEMU
just ignores the decompression error. However, we completely miss
the chance to catch real errors, then the VM is corrupted silently

To make the migration more robuster, we copy the page to a buffer
first to avoid it being written by VM, then detect and handle the
errors of both compression and decompression errors properly

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/qemu-file.c |  4 ++--
 migration/ram.c       | 29 +++++++++++++++++++----------
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1ff33a1ffb..137bcc8bdc 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
     blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
                               blen, p, size);
     if (blen < 0) {
-        error_report("Compress Failed!");
-        return 0;
+        return -1;
     }
+
     qemu_put_be32(f, blen);
     if (f->ops->writev_buffer) {
         add_to_iovec(f, f->buf + f->buf_index, blen, false);
diff --git a/migration/ram.c b/migration/ram.c
index fff3f31e90..c47185d38c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -273,6 +273,7 @@ struct DecompressParam {
     bool quit;
     QemuMutex mutex;
     QemuCond cond;
+    QEMUFile *file;
     void *des;
     uint8_t *compbuf;
     int len;
@@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
 {
     RAMState *rs = ram_state;
     int bytes_sent, blen;
-    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
+    uint8_t buf[TARGET_PAGE_SIZE], *p;
 
+    p = block->host + (offset & TARGET_PAGE_MASK);
     bytes_sent = save_page_header(rs, f, block, offset |
                                   RAM_SAVE_FLAG_COMPRESS_PAGE);
-    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
+    memcpy(buf, p, TARGET_PAGE_SIZE);
+    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
     if (blen < 0) {
         bytes_sent = 0;
         qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
@@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque)
     DecompressParam *param = opaque;
     unsigned long pagesize;
     uint8_t *des;
-    int len;
+    int len, ret;
 
     qemu_mutex_lock(&param->mutex);
     while (!param->quit) {
@@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque)
              * not a problem because the dirty page will be retransferred
              * and uncompress() won't break the data in other pages.
              */
-            qemu_uncompress(&param->stream, des, pagesize,
-                            param->compbuf, len);
+            ret = qemu_uncompress(&param->stream, des, pagesize,
+                                  param->compbuf, len);
+            if (ret < 0) {
+                error_report("decompress data failed");
+                qemu_file_set_error(param->file, ret);
+            }
 
             qemu_mutex_lock(&decomp_done_lock);
             param->done = true;
@@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque)
     return NULL;
 }
 
-static void wait_for_decompress_done(void)
+static int wait_for_decompress_done(QEMUFile *f)
 {
     int idx, thread_count;
 
     if (!migrate_use_compression()) {
-        return;
+        return 0;
     }
 
     thread_count = migrate_decompress_threads();
@@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void)
         }
     }
     qemu_mutex_unlock(&decomp_done_lock);
+    return qemu_file_get_error(f);
 }
 
 static void compress_threads_load_cleanup(void)
@@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void)
     decomp_param = NULL;
 }
 
-static int compress_threads_load_setup(void)
+static int compress_threads_load_setup(QEMUFile *f)
 {
     int i, thread_count;
 
@@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void)
         }
         decomp_param[i].stream.opaque = &decomp_param[i];
 
+        decomp_param[i].file = f;
         qemu_mutex_init(&decomp_param[i].mutex);
         qemu_cond_init(&decomp_param[i].cond);
         decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
@@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
  */
 static int ram_load_setup(QEMUFile *f, void *opaque)
 {
-    if (compress_threads_load_setup()) {
+    if (compress_threads_load_setup(f)) {
         return -1;
     }
 
@@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
         }
     }
 
-    wait_for_decompress_done();
+    ret |= wait_for_decompress_done(f);
     rcu_read_unlock();
     trace_ram_load_complete(ret, seq_iter);
     return ret;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Currently the page being compressed is allowed to be updated by
the VM on the source QEMU, correspondingly the destination QEMU
just ignores the decompression error. However, we completely miss
the chance to catch real errors, then the VM is corrupted silently

To make the migration more robuster, we copy the page to a buffer
first to avoid it being written by VM, then detect and handle the
errors of both compression and decompression errors properly

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/qemu-file.c |  4 ++--
 migration/ram.c       | 29 +++++++++++++++++++----------
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1ff33a1ffb..137bcc8bdc 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
     blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
                               blen, p, size);
     if (blen < 0) {
-        error_report("Compress Failed!");
-        return 0;
+        return -1;
     }
+
     qemu_put_be32(f, blen);
     if (f->ops->writev_buffer) {
         add_to_iovec(f, f->buf + f->buf_index, blen, false);
diff --git a/migration/ram.c b/migration/ram.c
index fff3f31e90..c47185d38c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -273,6 +273,7 @@ struct DecompressParam {
     bool quit;
     QemuMutex mutex;
     QemuCond cond;
+    QEMUFile *file;
     void *des;
     uint8_t *compbuf;
     int len;
@@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
 {
     RAMState *rs = ram_state;
     int bytes_sent, blen;
-    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
+    uint8_t buf[TARGET_PAGE_SIZE], *p;
 
+    p = block->host + (offset & TARGET_PAGE_MASK);
     bytes_sent = save_page_header(rs, f, block, offset |
                                   RAM_SAVE_FLAG_COMPRESS_PAGE);
-    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
+    memcpy(buf, p, TARGET_PAGE_SIZE);
+    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
     if (blen < 0) {
         bytes_sent = 0;
         qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
@@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque)
     DecompressParam *param = opaque;
     unsigned long pagesize;
     uint8_t *des;
-    int len;
+    int len, ret;
 
     qemu_mutex_lock(&param->mutex);
     while (!param->quit) {
@@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque)
              * not a problem because the dirty page will be retransferred
              * and uncompress() won't break the data in other pages.
              */
-            qemu_uncompress(&param->stream, des, pagesize,
-                            param->compbuf, len);
+            ret = qemu_uncompress(&param->stream, des, pagesize,
+                                  param->compbuf, len);
+            if (ret < 0) {
+                error_report("decompress data failed");
+                qemu_file_set_error(param->file, ret);
+            }
 
             qemu_mutex_lock(&decomp_done_lock);
             param->done = true;
@@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque)
     return NULL;
 }
 
-static void wait_for_decompress_done(void)
+static int wait_for_decompress_done(QEMUFile *f)
 {
     int idx, thread_count;
 
     if (!migrate_use_compression()) {
-        return;
+        return 0;
     }
 
     thread_count = migrate_decompress_threads();
@@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void)
         }
     }
     qemu_mutex_unlock(&decomp_done_lock);
+    return qemu_file_get_error(f);
 }
 
 static void compress_threads_load_cleanup(void)
@@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void)
     decomp_param = NULL;
 }
 
-static int compress_threads_load_setup(void)
+static int compress_threads_load_setup(QEMUFile *f)
 {
     int i, thread_count;
 
@@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void)
         }
         decomp_param[i].stream.opaque = &decomp_param[i];
 
+        decomp_param[i].file = f;
         qemu_mutex_init(&decomp_param[i].mutex);
         qemu_cond_init(&decomp_param[i].cond);
         decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
@@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
  */
 static int ram_load_setup(QEMUFile *f, void *opaque)
 {
-    if (compress_threads_load_setup()) {
+    if (compress_threads_load_setup(f)) {
         return -1;
     }
 
@@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
         }
     }
 
-    wait_for_decompress_done();
+    ret |= wait_for_decompress_done(f);
     rcu_read_unlock();
     trace_ram_load_complete(ret, seq_iter);
     return ret;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 4/8] migration: introduce control_save_page()
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Abstract the common function control_save_page() to cleanup the code,
no logic is changed

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 174 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 89 insertions(+), 85 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c47185d38c..e7b8b14c3c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -957,6 +957,44 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
     ram_discard_range(rbname, offset, pages << TARGET_PAGE_BITS);
 }
 
+/*
+ * @pages: the number of pages written by the control path,
+ *        < 0 - error
+ *        > 0 - number of pages written
+ *
+ * Return true if the pages has been saved, otherwise false is returned.
+ */
+static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
+                              int *pages)
+{
+    uint64_t bytes_xmit = 0;
+    int ret;
+
+    *pages = -1;
+    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
+                                &bytes_xmit);
+    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
+        return false;
+    }
+
+    if (bytes_xmit) {
+        ram_counters.transferred += bytes_xmit;
+        *pages = 1;
+    }
+
+    if (ret == RAM_SAVE_CONTROL_DELAYED) {
+        return true;
+    }
+
+    if (bytes_xmit > 0) {
+        ram_counters.normal++;
+    } else if (bytes_xmit == 0) {
+        ram_counters.duplicate++;
+    }
+
+    return true;
+}
+
 /**
  * ram_save_page: send the given page to the stream
  *
@@ -973,56 +1011,36 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
 static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
 {
     int pages = -1;
-    uint64_t bytes_xmit;
-    ram_addr_t current_addr;
     uint8_t *p;
-    int ret;
     bool send_async = true;
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
+    ram_addr_t current_addr = block->offset + offset;
 
     p = block->host + offset;
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
-    /* In doubt sent page as normal */
-    bytes_xmit = 0;
-    ret = ram_control_save_page(rs->f, block->offset,
-                           offset, TARGET_PAGE_SIZE, &bytes_xmit);
-    if (bytes_xmit) {
-        ram_counters.transferred += bytes_xmit;
-        pages = 1;
+    if (control_save_page(rs, block, offset, &pages)) {
+        return pages;
     }
 
     XBZRLE_cache_lock();
-
-    current_addr = block->offset + offset;
-
-    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
-            if (bytes_xmit > 0) {
-                ram_counters.normal++;
-            } else if (bytes_xmit == 0) {
-                ram_counters.duplicate++;
-            }
-        }
-    } else {
-        pages = save_zero_page(rs, block, offset);
-        if (pages > 0) {
-            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
-             * page would be stale
+    pages = save_zero_page(rs, block, offset);
+    if (pages > 0) {
+        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
+         * page would be stale
+         */
+        xbzrle_cache_zero_page(rs, current_addr);
+        ram_release_pages(block->idstr, offset, pages);
+    } else if (!rs->ram_bulk_stage &&
+               !migration_in_postcopy() && migrate_use_xbzrle()) {
+        pages = save_xbzrle_page(rs, &p, current_addr, block,
+                                 offset, last_stage);
+        if (!last_stage) {
+            /* Can't send this cached data async, since the cache page
+             * might get updated before it gets to the wire
              */
-            xbzrle_cache_zero_page(rs, current_addr);
-            ram_release_pages(block->idstr, offset, pages);
-        } else if (!rs->ram_bulk_stage &&
-                   !migration_in_postcopy() && migrate_use_xbzrle()) {
-            pages = save_xbzrle_page(rs, &p, current_addr, block,
-                                     offset, last_stage);
-            if (!last_stage) {
-                /* Can't send this cached data async, since the cache page
-                 * might get updated before it gets to the wire
-                 */
-                send_async = false;
-            }
+            send_async = false;
         }
     }
 
@@ -1152,63 +1170,49 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
                                     bool last_stage)
 {
     int pages = -1;
-    uint64_t bytes_xmit = 0;
     uint8_t *p;
-    int ret;
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
 
     p = block->host + offset;
 
-    ret = ram_control_save_page(rs->f, block->offset,
-                                offset, TARGET_PAGE_SIZE, &bytes_xmit);
-    if (bytes_xmit) {
-        ram_counters.transferred += bytes_xmit;
-        pages = 1;
+    if (control_save_page(rs, block, offset, &pages)) {
+        return pages;
     }
-    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
-            if (bytes_xmit > 0) {
-                ram_counters.normal++;
-            } else if (bytes_xmit == 0) {
-                ram_counters.duplicate++;
-            }
+
+    /* When starting the process of a new block, the first page of
+     * the block should be sent out before other pages in the same
+     * block, and all the pages in last block should have been sent
+     * out, keeping this order is important, because the 'cont' flag
+     * is used to avoid resending the block name.
+     */
+    if (block != rs->last_sent_block) {
+        flush_compressed_data(rs);
+        pages = save_zero_page(rs, block, offset);
+        if (pages > 0) {
+            ram_release_pages(block->idstr, offset, pages);
+        } else {
+            /*
+             * Make sure the first page is sent out before other pages.
+             *
+             * we post it as normal page as compression will take much
+             * CPU resource.
+             */
+            ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                            offset | RAM_SAVE_FLAG_PAGE);
+            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
+                                  migrate_release_ram() &
+                                  migration_in_postcopy());
+            ram_counters.transferred += TARGET_PAGE_SIZE;
+            ram_counters.normal++;
+            pages = 1;
         }
     } else {
-        /* When starting the process of a new block, the first page of
-         * the block should be sent out before other pages in the same
-         * block, and all the pages in last block should have been sent
-         * out, keeping this order is important, because the 'cont' flag
-         * is used to avoid resending the block name.
-         */
-        if (block != rs->last_sent_block) {
-            flush_compressed_data(rs);
-            pages = save_zero_page(rs, block, offset);
-            if (pages > 0) {
-                ram_release_pages(block->idstr, offset, pages);
-            } else {
-                /*
-                 * Make sure the first page is sent out before other pages.
-                 *
-                 * we post it as normal page as compression will take much
-                 * CPU resource.
-                 */
-                ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                                offset | RAM_SAVE_FLAG_PAGE);
-                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                      migrate_release_ram() &
-                                      migration_in_postcopy());
-                ram_counters.transferred += TARGET_PAGE_SIZE;
-                ram_counters.normal++;
-                pages = 1;
-            }
+        pages = save_zero_page(rs, block, offset);
+        if (pages == -1) {
+            pages = compress_page_with_multi_thread(rs, block, offset);
         } else {
-            pages = save_zero_page(rs, block, offset);
-            if (pages == -1) {
-                pages = compress_page_with_multi_thread(rs, block, offset);
-            } else {
-                ram_release_pages(block->idstr, offset, pages);
-            }
+            ram_release_pages(block->idstr, offset, pages);
         }
     }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 4/8] migration: introduce control_save_page()
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Abstract the common function control_save_page() to cleanup the code,
no logic is changed

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 174 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 89 insertions(+), 85 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c47185d38c..e7b8b14c3c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -957,6 +957,44 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
     ram_discard_range(rbname, offset, pages << TARGET_PAGE_BITS);
 }
 
+/*
+ * @pages: the number of pages written by the control path,
+ *        < 0 - error
+ *        > 0 - number of pages written
+ *
+ * Return true if the pages has been saved, otherwise false is returned.
+ */
+static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
+                              int *pages)
+{
+    uint64_t bytes_xmit = 0;
+    int ret;
+
+    *pages = -1;
+    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
+                                &bytes_xmit);
+    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
+        return false;
+    }
+
+    if (bytes_xmit) {
+        ram_counters.transferred += bytes_xmit;
+        *pages = 1;
+    }
+
+    if (ret == RAM_SAVE_CONTROL_DELAYED) {
+        return true;
+    }
+
+    if (bytes_xmit > 0) {
+        ram_counters.normal++;
+    } else if (bytes_xmit == 0) {
+        ram_counters.duplicate++;
+    }
+
+    return true;
+}
+
 /**
  * ram_save_page: send the given page to the stream
  *
@@ -973,56 +1011,36 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
 static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
 {
     int pages = -1;
-    uint64_t bytes_xmit;
-    ram_addr_t current_addr;
     uint8_t *p;
-    int ret;
     bool send_async = true;
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
+    ram_addr_t current_addr = block->offset + offset;
 
     p = block->host + offset;
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
-    /* In doubt sent page as normal */
-    bytes_xmit = 0;
-    ret = ram_control_save_page(rs->f, block->offset,
-                           offset, TARGET_PAGE_SIZE, &bytes_xmit);
-    if (bytes_xmit) {
-        ram_counters.transferred += bytes_xmit;
-        pages = 1;
+    if (control_save_page(rs, block, offset, &pages)) {
+        return pages;
     }
 
     XBZRLE_cache_lock();
-
-    current_addr = block->offset + offset;
-
-    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
-            if (bytes_xmit > 0) {
-                ram_counters.normal++;
-            } else if (bytes_xmit == 0) {
-                ram_counters.duplicate++;
-            }
-        }
-    } else {
-        pages = save_zero_page(rs, block, offset);
-        if (pages > 0) {
-            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
-             * page would be stale
+    pages = save_zero_page(rs, block, offset);
+    if (pages > 0) {
+        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
+         * page would be stale
+         */
+        xbzrle_cache_zero_page(rs, current_addr);
+        ram_release_pages(block->idstr, offset, pages);
+    } else if (!rs->ram_bulk_stage &&
+               !migration_in_postcopy() && migrate_use_xbzrle()) {
+        pages = save_xbzrle_page(rs, &p, current_addr, block,
+                                 offset, last_stage);
+        if (!last_stage) {
+            /* Can't send this cached data async, since the cache page
+             * might get updated before it gets to the wire
              */
-            xbzrle_cache_zero_page(rs, current_addr);
-            ram_release_pages(block->idstr, offset, pages);
-        } else if (!rs->ram_bulk_stage &&
-                   !migration_in_postcopy() && migrate_use_xbzrle()) {
-            pages = save_xbzrle_page(rs, &p, current_addr, block,
-                                     offset, last_stage);
-            if (!last_stage) {
-                /* Can't send this cached data async, since the cache page
-                 * might get updated before it gets to the wire
-                 */
-                send_async = false;
-            }
+            send_async = false;
         }
     }
 
@@ -1152,63 +1170,49 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
                                     bool last_stage)
 {
     int pages = -1;
-    uint64_t bytes_xmit = 0;
     uint8_t *p;
-    int ret;
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
 
     p = block->host + offset;
 
-    ret = ram_control_save_page(rs->f, block->offset,
-                                offset, TARGET_PAGE_SIZE, &bytes_xmit);
-    if (bytes_xmit) {
-        ram_counters.transferred += bytes_xmit;
-        pages = 1;
+    if (control_save_page(rs, block, offset, &pages)) {
+        return pages;
     }
-    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
-            if (bytes_xmit > 0) {
-                ram_counters.normal++;
-            } else if (bytes_xmit == 0) {
-                ram_counters.duplicate++;
-            }
+
+    /* When starting the process of a new block, the first page of
+     * the block should be sent out before other pages in the same
+     * block, and all the pages in last block should have been sent
+     * out, keeping this order is important, because the 'cont' flag
+     * is used to avoid resending the block name.
+     */
+    if (block != rs->last_sent_block) {
+        flush_compressed_data(rs);
+        pages = save_zero_page(rs, block, offset);
+        if (pages > 0) {
+            ram_release_pages(block->idstr, offset, pages);
+        } else {
+            /*
+             * Make sure the first page is sent out before other pages.
+             *
+             * we post it as normal page as compression will take much
+             * CPU resource.
+             */
+            ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                            offset | RAM_SAVE_FLAG_PAGE);
+            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
+                                  migrate_release_ram() &
+                                  migration_in_postcopy());
+            ram_counters.transferred += TARGET_PAGE_SIZE;
+            ram_counters.normal++;
+            pages = 1;
         }
     } else {
-        /* When starting the process of a new block, the first page of
-         * the block should be sent out before other pages in the same
-         * block, and all the pages in last block should have been sent
-         * out, keeping this order is important, because the 'cont' flag
-         * is used to avoid resending the block name.
-         */
-        if (block != rs->last_sent_block) {
-            flush_compressed_data(rs);
-            pages = save_zero_page(rs, block, offset);
-            if (pages > 0) {
-                ram_release_pages(block->idstr, offset, pages);
-            } else {
-                /*
-                 * Make sure the first page is sent out before other pages.
-                 *
-                 * we post it as normal page as compression will take much
-                 * CPU resource.
-                 */
-                ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                                offset | RAM_SAVE_FLAG_PAGE);
-                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                      migrate_release_ram() &
-                                      migration_in_postcopy());
-                ram_counters.transferred += TARGET_PAGE_SIZE;
-                ram_counters.normal++;
-                pages = 1;
-            }
+        pages = save_zero_page(rs, block, offset);
+        if (pages == -1) {
+            pages = compress_page_with_multi_thread(rs, block, offset);
         } else {
-            pages = save_zero_page(rs, block, offset);
-            if (pages == -1) {
-                pages = compress_page_with_multi_thread(rs, block, offset);
-            } else {
-                ram_release_pages(block->idstr, offset, pages);
-            }
+            ram_release_pages(block->idstr, offset, pages);
         }
     }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 5/8] migration: move calling control_save_page to the common place
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

The function is called by both ram_save_page and ram_save_target_page,
so move it to the common caller to cleanup the code

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e7b8b14c3c..839665d866 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1020,10 +1020,6 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     p = block->host + offset;
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
-    if (control_save_page(rs, block, offset, &pages)) {
-        return pages;
-    }
-
     XBZRLE_cache_lock();
     pages = save_zero_page(rs, block, offset);
     if (pages > 0) {
@@ -1176,10 +1172,6 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
 
     p = block->host + offset;
 
-    if (control_save_page(rs, block, offset, &pages)) {
-        return pages;
-    }
-
     /* When starting the process of a new block, the first page of
      * the block should be sent out before other pages in the same
      * block, and all the pages in last block should have been sent
@@ -1472,6 +1464,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
 
     /* Check the pages is dirty and if it is send it */
     if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+        RAMBlock *block = pss->block;
+        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
+
+        if (control_save_page(rs, block, offset, &res)) {
+            goto page_saved;
+        }
+
         /*
          * If xbzrle is on, stop using the data compression after first
          * round of migration even if compression is enabled. In theory,
@@ -1484,6 +1483,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
             res = ram_save_page(rs, pss, last_stage);
         }
 
+page_saved:
         if (res < 0) {
             return res;
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 5/8] migration: move calling control_save_page to the common place
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

The function is called by both ram_save_page and ram_save_target_page,
so move it to the common caller to cleanup the code

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e7b8b14c3c..839665d866 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1020,10 +1020,6 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     p = block->host + offset;
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
-    if (control_save_page(rs, block, offset, &pages)) {
-        return pages;
-    }
-
     XBZRLE_cache_lock();
     pages = save_zero_page(rs, block, offset);
     if (pages > 0) {
@@ -1176,10 +1172,6 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
 
     p = block->host + offset;
 
-    if (control_save_page(rs, block, offset, &pages)) {
-        return pages;
-    }
-
     /* When starting the process of a new block, the first page of
      * the block should be sent out before other pages in the same
      * block, and all the pages in last block should have been sent
@@ -1472,6 +1464,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
 
     /* Check the pages is dirty and if it is send it */
     if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+        RAMBlock *block = pss->block;
+        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
+
+        if (control_save_page(rs, block, offset, &res)) {
+            goto page_saved;
+        }
+
         /*
          * If xbzrle is on, stop using the data compression after first
          * round of migration even if compression is enabled. In theory,
@@ -1484,6 +1483,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
             res = ram_save_page(rs, pss, last_stage);
         }
 
+page_saved:
         if (res < 0) {
             return res;
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 6/8] migration: move calling save_zero_page to the common place
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

save_zero_page() is always our first approach to try, move it to
the common place before calling ram_save_compressed_page
and ram_save_page

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 106 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 60 insertions(+), 46 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 839665d866..9627ce18e9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1021,15 +1021,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
     XBZRLE_cache_lock();
-    pages = save_zero_page(rs, block, offset);
-    if (pages > 0) {
-        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
-         * page would be stale
-         */
-        xbzrle_cache_zero_page(rs, current_addr);
-        ram_release_pages(block->idstr, offset, pages);
-    } else if (!rs->ram_bulk_stage &&
-               !migration_in_postcopy() && migrate_use_xbzrle()) {
+    if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
+           migrate_use_xbzrle()) {
         pages = save_xbzrle_page(rs, &p, current_addr, block,
                                  offset, last_stage);
         if (!last_stage) {
@@ -1172,40 +1165,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
 
     p = block->host + offset;
 
-    /* When starting the process of a new block, the first page of
-     * the block should be sent out before other pages in the same
-     * block, and all the pages in last block should have been sent
-     * out, keeping this order is important, because the 'cont' flag
-     * is used to avoid resending the block name.
-     */
     if (block != rs->last_sent_block) {
-        flush_compressed_data(rs);
-        pages = save_zero_page(rs, block, offset);
-        if (pages > 0) {
-            ram_release_pages(block->idstr, offset, pages);
-        } else {
-            /*
-             * Make sure the first page is sent out before other pages.
-             *
-             * we post it as normal page as compression will take much
-             * CPU resource.
-             */
-            ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                            offset | RAM_SAVE_FLAG_PAGE);
-            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                  migrate_release_ram() &
-                                  migration_in_postcopy());
-            ram_counters.transferred += TARGET_PAGE_SIZE;
-            ram_counters.normal++;
-            pages = 1;
-        }
+        /*
+         * Make sure the first page is sent out before other pages.
+         *
+         * we post it as normal page as compression will take much
+         * CPU resource.
+         */
+        ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                        offset | RAM_SAVE_FLAG_PAGE);
+        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
+                              migrate_release_ram() &
+                              migration_in_postcopy());
+        ram_counters.transferred += TARGET_PAGE_SIZE;
+        ram_counters.normal++;
+        pages = 1;
     } else {
-        pages = save_zero_page(rs, block, offset);
-        if (pages == -1) {
-            pages = compress_page_with_multi_thread(rs, block, offset);
-        } else {
-            ram_release_pages(block->idstr, offset, pages);
-        }
+        pages = compress_page_with_multi_thread(rs, block, offset);
     }
 
     return pages;
@@ -1447,6 +1423,25 @@ err:
     return -1;
 }
 
+static bool save_page_use_compression(RAMState *rs)
+{
+    if (!migrate_use_compression()) {
+        return false;
+    }
+
+    /*
+     * If xbzrle is on, stop using the data compression after first
+     * round of migration even if compression is enabled. In theory,
+     * xbzrle can do better than compression.
+     */
+    if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
+        return true;
+    }
+
+    return false;
+
+}
+
 /**
  * ram_save_target_page: save one target page
  *
@@ -1472,12 +1467,31 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
         }
 
         /*
-         * If xbzrle is on, stop using the data compression after first
-         * round of migration even if compression is enabled. In theory,
-         * xbzrle can do better than compression.
+         * When starting the process of a new block, the first page of
+         * the block should be sent out before other pages in the same
+         * block, and all the pages in last block should have been sent
+         * out, keeping this order is important, because the 'cont' flag
+         * is used to avoid resending the block name.
          */
-        if (migrate_use_compression() &&
-            (rs->ram_bulk_stage || !migrate_use_xbzrle())) {
+        if (block != rs->last_sent_block && save_page_use_compression(rs)) {
+            flush_compressed_data(rs);
+        }
+
+        res = save_zero_page(rs, block, offset);
+        if (res > 0) {
+            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
+             * page would be stale
+             */
+            if (!save_page_use_compression(rs)) {
+                XBZRLE_cache_lock();
+                xbzrle_cache_zero_page(rs, block->offset + offset);
+                XBZRLE_cache_unlock();
+            }
+            ram_release_pages(block->idstr, offset, res);
+            goto page_saved;
+        }
+
+        if (save_page_use_compression(rs)) {
             res = ram_save_compressed_page(rs, pss, last_stage);
         } else {
             res = ram_save_page(rs, pss, last_stage);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 6/8] migration: move calling save_zero_page to the common place
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

save_zero_page() is always our first approach to try, move it to
the common place before calling ram_save_compressed_page
and ram_save_page

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 106 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 60 insertions(+), 46 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 839665d866..9627ce18e9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1021,15 +1021,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
     XBZRLE_cache_lock();
-    pages = save_zero_page(rs, block, offset);
-    if (pages > 0) {
-        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
-         * page would be stale
-         */
-        xbzrle_cache_zero_page(rs, current_addr);
-        ram_release_pages(block->idstr, offset, pages);
-    } else if (!rs->ram_bulk_stage &&
-               !migration_in_postcopy() && migrate_use_xbzrle()) {
+    if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
+           migrate_use_xbzrle()) {
         pages = save_xbzrle_page(rs, &p, current_addr, block,
                                  offset, last_stage);
         if (!last_stage) {
@@ -1172,40 +1165,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
 
     p = block->host + offset;
 
-    /* When starting the process of a new block, the first page of
-     * the block should be sent out before other pages in the same
-     * block, and all the pages in last block should have been sent
-     * out, keeping this order is important, because the 'cont' flag
-     * is used to avoid resending the block name.
-     */
     if (block != rs->last_sent_block) {
-        flush_compressed_data(rs);
-        pages = save_zero_page(rs, block, offset);
-        if (pages > 0) {
-            ram_release_pages(block->idstr, offset, pages);
-        } else {
-            /*
-             * Make sure the first page is sent out before other pages.
-             *
-             * we post it as normal page as compression will take much
-             * CPU resource.
-             */
-            ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                            offset | RAM_SAVE_FLAG_PAGE);
-            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                  migrate_release_ram() &
-                                  migration_in_postcopy());
-            ram_counters.transferred += TARGET_PAGE_SIZE;
-            ram_counters.normal++;
-            pages = 1;
-        }
+        /*
+         * Make sure the first page is sent out before other pages.
+         *
+         * we post it as normal page as compression will take much
+         * CPU resource.
+         */
+        ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                        offset | RAM_SAVE_FLAG_PAGE);
+        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
+                              migrate_release_ram() &
+                              migration_in_postcopy());
+        ram_counters.transferred += TARGET_PAGE_SIZE;
+        ram_counters.normal++;
+        pages = 1;
     } else {
-        pages = save_zero_page(rs, block, offset);
-        if (pages == -1) {
-            pages = compress_page_with_multi_thread(rs, block, offset);
-        } else {
-            ram_release_pages(block->idstr, offset, pages);
-        }
+        pages = compress_page_with_multi_thread(rs, block, offset);
     }
 
     return pages;
@@ -1447,6 +1423,25 @@ err:
     return -1;
 }
 
+static bool save_page_use_compression(RAMState *rs)
+{
+    if (!migrate_use_compression()) {
+        return false;
+    }
+
+    /*
+     * If xbzrle is on, stop using the data compression after first
+     * round of migration even if compression is enabled. In theory,
+     * xbzrle can do better than compression.
+     */
+    if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
+        return true;
+    }
+
+    return false;
+
+}
+
 /**
  * ram_save_target_page: save one target page
  *
@@ -1472,12 +1467,31 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
         }
 
         /*
-         * If xbzrle is on, stop using the data compression after first
-         * round of migration even if compression is enabled. In theory,
-         * xbzrle can do better than compression.
+         * When starting the process of a new block, the first page of
+         * the block should be sent out before other pages in the same
+         * block, and all the pages in last block should have been sent
+         * out, keeping this order is important, because the 'cont' flag
+         * is used to avoid resending the block name.
          */
-        if (migrate_use_compression() &&
-            (rs->ram_bulk_stage || !migrate_use_xbzrle())) {
+        if (block != rs->last_sent_block && save_page_use_compression(rs)) {
+            flush_compressed_data(rs);
+        }
+
+        res = save_zero_page(rs, block, offset);
+        if (res > 0) {
+            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
+             * page would be stale
+             */
+            if (!save_page_use_compression(rs)) {
+                XBZRLE_cache_lock();
+                xbzrle_cache_zero_page(rs, block->offset + offset);
+                XBZRLE_cache_unlock();
+            }
+            ram_release_pages(block->idstr, offset, res);
+            goto page_saved;
+        }
+
+        if (save_page_use_compression(rs)) {
             res = ram_save_compressed_page(rs, pss, last_stage);
         } else {
             res = ram_save_page(rs, pss, last_stage);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 7/8] migration: introduce save_normal_page()
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

It directly sends the page to the stream neither checking zero nor
using xbzrle or compression

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 50 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 9627ce18e9..f778627992 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -995,6 +995,34 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     return true;
 }
 
+/*
+ * directly send the page to the stream
+ *
+ * Returns the number of pages written.
+ *
+ * @rs: current RAM state
+ * @block: block that contains the page we want to send
+ * @offset: offset inside the block for the page
+ * @buf: the page to be sent
+ * @async: send to page asyncly
+ */
+static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
+                            uint8_t *buf, bool async)
+{
+    ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                                 offset | RAM_SAVE_FLAG_PAGE);
+    if (async) {
+        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
+                              migrate_release_ram() &
+                              migration_in_postcopy());
+    } else {
+        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
+    }
+    ram_counters.transferred += TARGET_PAGE_SIZE;
+    ram_counters.normal++;
+    return 1;
+}
+
 /**
  * ram_save_page: send the given page to the stream
  *
@@ -1035,18 +1063,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
 
     /* XBZRLE overflow or normal page */
     if (pages == -1) {
-        ram_counters.transferred +=
-            save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_PAGE);
-        if (send_async) {
-            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                  migrate_release_ram() &
-                                  migration_in_postcopy());
-        } else {
-            qemu_put_buffer(rs->f, p, TARGET_PAGE_SIZE);
-        }
-        ram_counters.transferred += TARGET_PAGE_SIZE;
-        pages = 1;
-        ram_counters.normal++;
+        pages = save_normal_page(rs, block, offset, p, send_async);
     }
 
     XBZRLE_cache_unlock();
@@ -1172,14 +1189,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
          * we post it as normal page as compression will take much
          * CPU resource.
          */
-        ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                        offset | RAM_SAVE_FLAG_PAGE);
-        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                              migrate_release_ram() &
-                              migration_in_postcopy());
-        ram_counters.transferred += TARGET_PAGE_SIZE;
-        ram_counters.normal++;
-        pages = 1;
+        pages = save_normal_page(rs, block, offset, p, true);
     } else {
         pages = compress_page_with_multi_thread(rs, block, offset);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 7/8] migration: introduce save_normal_page()
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

It directly sends the page to the stream neither checking zero nor
using xbzrle or compression

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 50 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 9627ce18e9..f778627992 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -995,6 +995,34 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     return true;
 }
 
+/*
+ * directly send the page to the stream
+ *
+ * Returns the number of pages written.
+ *
+ * @rs: current RAM state
+ * @block: block that contains the page we want to send
+ * @offset: offset inside the block for the page
+ * @buf: the page to be sent
+ * @async: send to page asyncly
+ */
+static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
+                            uint8_t *buf, bool async)
+{
+    ram_counters.transferred += save_page_header(rs, rs->f, block,
+                                                 offset | RAM_SAVE_FLAG_PAGE);
+    if (async) {
+        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
+                              migrate_release_ram() &
+                              migration_in_postcopy());
+    } else {
+        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
+    }
+    ram_counters.transferred += TARGET_PAGE_SIZE;
+    ram_counters.normal++;
+    return 1;
+}
+
 /**
  * ram_save_page: send the given page to the stream
  *
@@ -1035,18 +1063,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
 
     /* XBZRLE overflow or normal page */
     if (pages == -1) {
-        ram_counters.transferred +=
-            save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_PAGE);
-        if (send_async) {
-            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                  migrate_release_ram() &
-                                  migration_in_postcopy());
-        } else {
-            qemu_put_buffer(rs->f, p, TARGET_PAGE_SIZE);
-        }
-        ram_counters.transferred += TARGET_PAGE_SIZE;
-        pages = 1;
-        ram_counters.normal++;
+        pages = save_normal_page(rs, block, offset, p, send_async);
     }
 
     XBZRLE_cache_unlock();
@@ -1172,14 +1189,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
          * we post it as normal page as compression will take much
          * CPU resource.
          */
-        ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                        offset | RAM_SAVE_FLAG_PAGE);
-        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                              migrate_release_ram() &
-                              migration_in_postcopy());
-        ram_counters.transferred += TARGET_PAGE_SIZE;
-        ram_counters.normal++;
-        pages = 1;
+        pages = save_normal_page(rs, block, offset, p, true);
     } else {
         pages = compress_page_with_multi_thread(rs, block, offset);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 8/8] migration: remove ram_save_compressed_page()
  2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
@ 2018-03-13  7:57   ` guangrong.xiao
  -1 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: Xiao Guangrong, qemu-devel, kvm

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Now, we can reuse the path in ram_save_page() to post the page out
as normal, then the only thing remained in ram_save_compressed_page()
is compression that we can move it out to the caller

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 45 ++++++++-------------------------------------
 1 file changed, 8 insertions(+), 37 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f778627992..8f4f8aca86 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1162,41 +1162,6 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
     return pages;
 }
 
-/**
- * ram_save_compressed_page: compress the given page and send it to the stream
- *
- * Returns the number of pages written.
- *
- * @rs: current RAM state
- * @block: block that contains the page we want to send
- * @offset: offset inside the block for the page
- * @last_stage: if we are at the completion stage
- */
-static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
-                                    bool last_stage)
-{
-    int pages = -1;
-    uint8_t *p;
-    RAMBlock *block = pss->block;
-    ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
-
-    p = block->host + offset;
-
-    if (block != rs->last_sent_block) {
-        /*
-         * Make sure the first page is sent out before other pages.
-         *
-         * we post it as normal page as compression will take much
-         * CPU resource.
-         */
-        pages = save_normal_page(rs, block, offset, p, true);
-    } else {
-        pages = compress_page_with_multi_thread(rs, block, offset);
-    }
-
-    return pages;
-}
-
 /**
  * find_dirty_block: find the next dirty page and update any state
  * associated with the search process.
@@ -1501,8 +1466,14 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
             goto page_saved;
         }
 
-        if (save_page_use_compression(rs)) {
-            res = ram_save_compressed_page(rs, pss, last_stage);
+        /*
+         * Make sure the first page is sent out before other pages.
+         *
+         * we post it as normal page as compression will take much
+         * CPU resource.
+         */
+        if (block == rs->last_sent_block && save_page_use_compression(rs)) {
+            res = compress_page_with_multi_thread(rs, block, offset);
         } else {
             res = ram_save_page(rs, pss, last_stage);
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [Qemu-devel] [PATCH 8/8] migration: remove ram_save_compressed_page()
@ 2018-03-13  7:57   ` guangrong.xiao
  0 siblings, 0 replies; 126+ messages in thread
From: guangrong.xiao @ 2018-03-13  7:57 UTC (permalink / raw)
  To: pbonzini, mst, mtosatti; +Cc: qemu-devel, kvm, Xiao Guangrong

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Now, we can reuse the path in ram_save_page() to post the page out
as normal, then the only thing remained in ram_save_compressed_page()
is compression that we can move it out to the caller

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 45 ++++++++-------------------------------------
 1 file changed, 8 insertions(+), 37 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f778627992..8f4f8aca86 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1162,41 +1162,6 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
     return pages;
 }
 
-/**
- * ram_save_compressed_page: compress the given page and send it to the stream
- *
- * Returns the number of pages written.
- *
- * @rs: current RAM state
- * @block: block that contains the page we want to send
- * @offset: offset inside the block for the page
- * @last_stage: if we are at the completion stage
- */
-static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
-                                    bool last_stage)
-{
-    int pages = -1;
-    uint8_t *p;
-    RAMBlock *block = pss->block;
-    ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
-
-    p = block->host + offset;
-
-    if (block != rs->last_sent_block) {
-        /*
-         * Make sure the first page is sent out before other pages.
-         *
-         * we post it as normal page as compression will take much
-         * CPU resource.
-         */
-        pages = save_normal_page(rs, block, offset, p, true);
-    } else {
-        pages = compress_page_with_multi_thread(rs, block, offset);
-    }
-
-    return pages;
-}
-
 /**
  * find_dirty_block: find the next dirty page and update any state
  * associated with the search process.
@@ -1501,8 +1466,14 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
             goto page_saved;
         }
 
-        if (save_page_use_compression(rs)) {
-            res = ram_save_compressed_page(rs, pss, last_stage);
+        /*
+         * Make sure the first page is sent out before other pages.
+         *
+         * we post it as normal page as compression will take much
+         * CPU resource.
+         */
+        if (block == rs->last_sent_block && save_page_use_compression(rs)) {
+            res = compress_page_with_multi_thread(rs, block, offset);
         } else {
             res = ram_save_page(rs, pss, last_stage);
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 10:25     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 10:25 UTC (permalink / raw)
  To: guangrong.xiao
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong, qemu-devel,
	mst, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> As compression is a heavy work, do not do it in migration thread,
> instead, we post it out as a normal page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 32 ++++++++++++++++----------------

Hi,
  Do you have some performance numbers to show this helps?  Were those
taken on a normal system or were they taken with one of the compression
accelerators (which I think the compression migration was designed for)?

>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 7266351fd0..615693f180 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>      int pages = -1;
>      uint64_t bytes_xmit = 0;
>      uint8_t *p;
> -    int ret, blen;
> +    int ret;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>  
> @@ -1162,23 +1162,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>          if (block != rs->last_sent_block) {
>              flush_compressed_data(rs);
>              pages = save_zero_page(rs, block, offset);
> -            if (pages == -1) {
> -                /* Make sure the first page is sent out before other pages */
> -                bytes_xmit = save_page_header(rs, rs->f, block, offset |
> -                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
> -                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
> -                                                 migrate_compress_level());
> -                if (blen > 0) {
> -                    ram_counters.transferred += bytes_xmit + blen;
> -                    ram_counters.normal++;
> -                    pages = 1;
> -                } else {
> -                    qemu_file_set_error(rs->f, blen);
> -                    error_report("compressed data failed!");
> -                }
> -            }
>              if (pages > 0) {
>                  ram_release_pages(block->idstr, offset, pages);
> +            } else {
> +                /*
> +                 * Make sure the first page is sent out before other pages.
> +                 *
> +                 * we post it as normal page as compression will take much
> +                 * CPU resource.
> +                 */
> +                ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                                offset | RAM_SAVE_FLAG_PAGE);
> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                                      migrate_release_ram() &
> +                                      migration_in_postcopy());
> +                ram_counters.transferred += TARGET_PAGE_SIZE;
> +                ram_counters.normal++;
> +                pages = 1;


However, the code and idea look OK, so

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>              }
>          } else {
>              pages = save_zero_page(rs, block, offset);
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-15 10:25     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 10:25 UTC (permalink / raw)
  To: guangrong.xiao
  Cc: pbonzini, mst, mtosatti, quintela, liang.z.li, Xiao Guangrong,
	qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> As compression is a heavy work, do not do it in migration thread,
> instead, we post it out as a normal page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 32 ++++++++++++++++----------------

Hi,
  Do you have some performance numbers to show this helps?  Were those
taken on a normal system or were they taken with one of the compression
accelerators (which I think the compression migration was designed for)?

>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 7266351fd0..615693f180 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>      int pages = -1;
>      uint64_t bytes_xmit = 0;
>      uint8_t *p;
> -    int ret, blen;
> +    int ret;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>  
> @@ -1162,23 +1162,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>          if (block != rs->last_sent_block) {
>              flush_compressed_data(rs);
>              pages = save_zero_page(rs, block, offset);
> -            if (pages == -1) {
> -                /* Make sure the first page is sent out before other pages */
> -                bytes_xmit = save_page_header(rs, rs->f, block, offset |
> -                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
> -                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
> -                                                 migrate_compress_level());
> -                if (blen > 0) {
> -                    ram_counters.transferred += bytes_xmit + blen;
> -                    ram_counters.normal++;
> -                    pages = 1;
> -                } else {
> -                    qemu_file_set_error(rs->f, blen);
> -                    error_report("compressed data failed!");
> -                }
> -            }
>              if (pages > 0) {
>                  ram_release_pages(block->idstr, offset, pages);
> +            } else {
> +                /*
> +                 * Make sure the first page is sent out before other pages.
> +                 *
> +                 * we post it as normal page as compression will take much
> +                 * CPU resource.
> +                 */
> +                ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                                offset | RAM_SAVE_FLAG_PAGE);
> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                                      migrate_release_ram() &
> +                                      migration_in_postcopy());
> +                ram_counters.transferred += TARGET_PAGE_SIZE;
> +                ram_counters.normal++;
> +                pages = 1;


However, the code and idea look OK, so

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>              }
>          } else {
>              pages = save_zero_page(rs, block, offset);
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 11:03     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:03 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Current code uses compress2()/uncompress() to compress/decompress
> memory, these two function manager memory allocation and release
> internally, that causes huge memory is allocated and freed very
> frequently
> 
> More worse, frequently returning memory to kernel will flush TLBs
> and trigger invalidation callbacks on mmu-notification which
> interacts with KVM MMU, that dramatically reduce the performance
> of VM
> 
> So, we maintain the memory by ourselves and reuse it for each
> compression and decompression

I think
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  34 ++++++++++--
>  migration/qemu-file.h |   6 ++-
>  migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 42 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2ab2bf362d..1ff33a1ffb 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
>      return v;
>  }
>  
> +/* return the size after compression, or negative value on error */
> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                              const uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = deflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = (uint8_t *)source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
> +
> +    err = deflate(stream, Z_FINISH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->next_out - dest;
> +}
> +
>  /* Compress size bytes of data start at p with specific compression
>   * level and store the compressed data to the buffer of f.
>   *
> @@ -668,8 +692,8 @@ uint64_t qemu_get_be64(QEMUFile *f)
>   * data, return -1.
>   */
>  
> -ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
> -                                  int level)
> +ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
> +                                  const uint8_t *p, size_t size)
>  {
>      ssize_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t);
>  
> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
>              return -1;
>          }
>      }
> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
> -                  (Bytef *)p, size, level) != Z_OK) {
> +
> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
> +                              blen, p, size);
> +    if (blen < 0) {
>          error_report("Compress Failed!");
>          return 0;
>      }
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index aae4e5ed36..d123b21ca8 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -25,6 +25,8 @@
>  #ifndef MIGRATION_QEMU_FILE_H
>  #define MIGRATION_QEMU_FILE_H
>  
> +#include <zlib.h>
> +
>  /* Read a chunk of data from a file at the given position.  The pos argument
>   * can be ignored if the file is only be used for streaming.  The number of
>   * bytes actually read should be returned.
> @@ -132,8 +134,8 @@ bool qemu_file_is_writable(QEMUFile *f);
>  
>  size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset);
>  size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size);
> -ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
> -                                  int level);
> +ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
> +                                  const uint8_t *p, size_t size);
>  int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
>  
>  /*
> diff --git a/migration/ram.c b/migration/ram.c
> index 615693f180..fff3f31e90 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -264,6 +264,7 @@ struct CompressParam {
>      QemuCond cond;
>      RAMBlock *block;
>      ram_addr_t offset;
> +    z_stream stream;
>  };
>  typedef struct CompressParam CompressParam;
>  
> @@ -275,6 +276,7 @@ struct DecompressParam {
>      void *des;
>      uint8_t *compbuf;
>      int len;
> +    z_stream stream;
>  };
>  typedef struct DecompressParam DecompressParam;
>  
> @@ -294,7 +296,7 @@ static QemuThread *decompress_threads;
>  static QemuMutex decomp_done_lock;
>  static QemuCond decomp_done_cond;
>  
> -static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
> +static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                  ram_addr_t offset);
>  
>  static void *do_data_compress(void *opaque)
> @@ -311,7 +313,7 @@ static void *do_data_compress(void *opaque)
>              param->block = NULL;
>              qemu_mutex_unlock(&param->mutex);
>  
> -            do_compress_ram_page(param->file, block, offset);
> +            do_compress_ram_page(param->file, &param->stream, block, offset);
>  
>              qemu_mutex_lock(&comp_done_lock);
>              param->done = true;
> @@ -352,10 +354,17 @@ static void compress_threads_save_cleanup(void)
>      terminate_compression_threads();
>      thread_count = migrate_compress_threads();
>      for (i = 0; i < thread_count; i++) {
> +        /* something in compress_threads_save_setup() is wrong. */
> +        if (!comp_param[i].stream.opaque) {
> +            break;
> +        }
> +
>          qemu_thread_join(compress_threads + i);
>          qemu_fclose(comp_param[i].file);
>          qemu_mutex_destroy(&comp_param[i].mutex);
>          qemu_cond_destroy(&comp_param[i].cond);
> +        deflateEnd(&comp_param[i].stream);
> +        comp_param[i].stream.opaque = NULL;
>      }
>      qemu_mutex_destroy(&comp_done_lock);
>      qemu_cond_destroy(&comp_done_cond);
> @@ -365,12 +374,12 @@ static void compress_threads_save_cleanup(void)
>      comp_param = NULL;
>  }
>  
> -static void compress_threads_save_setup(void)
> +static int compress_threads_save_setup(void)
>  {
>      int i, thread_count;
>  
>      if (!migrate_use_compression()) {
> -        return;
> +        return 0;
>      }
>      thread_count = migrate_compress_threads();
>      compress_threads = g_new0(QemuThread, thread_count);
> @@ -378,6 +387,12 @@ static void compress_threads_save_setup(void)
>      qemu_cond_init(&comp_done_cond);
>      qemu_mutex_init(&comp_done_lock);
>      for (i = 0; i < thread_count; i++) {
> +        if (deflateInit(&comp_param[i].stream,
> +                           migrate_compress_level()) != Z_OK) {
> +            goto exit;
> +        }
> +        comp_param[i].stream.opaque = &comp_param[i];
> +
>          /* comp_param[i].file is just used as a dummy buffer to save data,
>           * set its ops to empty.
>           */
> @@ -390,6 +405,11 @@ static void compress_threads_save_setup(void)
>                             do_data_compress, comp_param + i,
>                             QEMU_THREAD_JOINABLE);
>      }
> +    return 0;
> +
> +exit:
> +    compress_threads_save_cleanup();
> +    return -1;
>  }
>  
>  /* Multiple fd's */
> @@ -1026,7 +1046,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      return pages;
>  }
>  
> -static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
> +static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                  ram_addr_t offset)
>  {
>      RAMState *rs = ram_state;
> @@ -1035,8 +1055,7 @@ static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
>  
>      bytes_sent = save_page_header(rs, f, block, offset |
>                                    RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, p, TARGET_PAGE_SIZE,
> -                                     migrate_compress_level());
> +    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
>      if (blen < 0) {
>          bytes_sent = 0;
>          qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
> @@ -2209,9 +2228,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMState **rsp = opaque;
>      RAMBlock *block;
>  
> +    if (compress_threads_save_setup()) {
> +        return -1;
> +    }
> +
>      /* migration has already setup the bitmap, reuse it. */
>      if (!migration_in_colo_state()) {
>          if (ram_init_all(rsp) != 0) {
> +            compress_threads_save_cleanup();
>              return -1;
>          }
>      }
> @@ -2231,7 +2255,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      }
>  
>      rcu_read_unlock();
> -    compress_threads_save_setup();
>  
>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
> @@ -2495,6 +2518,30 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>      }
>  }
>  
> +/* return the size after decompression, or negative value on error */
> +static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                           uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = inflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
> +
> +    err = inflate(stream, Z_NO_FLUSH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->total_out;
> +}
> +
>  static void *do_data_decompress(void *opaque)
>  {
>      DecompressParam *param = opaque;
> @@ -2511,13 +2558,13 @@ static void *do_data_decompress(void *opaque)
>              qemu_mutex_unlock(&param->mutex);
>  
>              pagesize = TARGET_PAGE_SIZE;
> -            /* uncompress() will return failed in some case, especially
> +            /* qemu_uncompress() will return failed in some case, especially
>               * when the page is dirted when doing the compression, it's
>               * not a problem because the dirty page will be retransferred
>               * and uncompress() won't break the data in other pages.
>               */
> -            uncompress((Bytef *)des, &pagesize,
> -                       (const Bytef *)param->compbuf, len);
> +            qemu_uncompress(&param->stream, des, pagesize,
> +                            param->compbuf, len);
>  
>              qemu_mutex_lock(&decomp_done_lock);
>              param->done = true;
> @@ -2552,30 +2599,6 @@ static void wait_for_decompress_done(void)
>      qemu_mutex_unlock(&decomp_done_lock);
>  }
>  
> -static void compress_threads_load_setup(void)
> -{
> -    int i, thread_count;
> -
> -    if (!migrate_use_compression()) {
> -        return;
> -    }
> -    thread_count = migrate_decompress_threads();
> -    decompress_threads = g_new0(QemuThread, thread_count);
> -    decomp_param = g_new0(DecompressParam, thread_count);
> -    qemu_mutex_init(&decomp_done_lock);
> -    qemu_cond_init(&decomp_done_cond);
> -    for (i = 0; i < thread_count; i++) {
> -        qemu_mutex_init(&decomp_param[i].mutex);
> -        qemu_cond_init(&decomp_param[i].cond);
> -        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> -        decomp_param[i].done = true;
> -        decomp_param[i].quit = false;
> -        qemu_thread_create(decompress_threads + i, "decompress",
> -                           do_data_decompress, decomp_param + i,
> -                           QEMU_THREAD_JOINABLE);
> -    }
> -}
> -
>  static void compress_threads_load_cleanup(void)
>  {
>      int i, thread_count;
> @@ -2585,16 +2608,26 @@ static void compress_threads_load_cleanup(void)
>      }
>      thread_count = migrate_decompress_threads();
>      for (i = 0; i < thread_count; i++) {
> +        if (!decomp_param[i].stream.opaque) {
> +            break;
> +        }
> +
>          qemu_mutex_lock(&decomp_param[i].mutex);
>          decomp_param[i].quit = true;
>          qemu_cond_signal(&decomp_param[i].cond);
>          qemu_mutex_unlock(&decomp_param[i].mutex);
>      }
>      for (i = 0; i < thread_count; i++) {
> +        if (!decomp_param[i].stream.opaque) {
> +            break;
> +        }
> +
>          qemu_thread_join(decompress_threads + i);
>          qemu_mutex_destroy(&decomp_param[i].mutex);
>          qemu_cond_destroy(&decomp_param[i].cond);
>          g_free(decomp_param[i].compbuf);
> +        inflateEnd(&decomp_param[i].stream);
> +        decomp_param[i].stream.opaque = NULL;
>      }
>      g_free(decompress_threads);
>      g_free(decomp_param);
> @@ -2602,6 +2635,40 @@ static void compress_threads_load_cleanup(void)
>      decomp_param = NULL;
>  }
>  
> +static int compress_threads_load_setup(void)
> +{
> +    int i, thread_count;
> +
> +    if (!migrate_use_compression()) {
> +        return 0;
> +    }
> +
> +    thread_count = migrate_decompress_threads();
> +    decompress_threads = g_new0(QemuThread, thread_count);
> +    decomp_param = g_new0(DecompressParam, thread_count);
> +    qemu_mutex_init(&decomp_done_lock);
> +    qemu_cond_init(&decomp_done_cond);
> +    for (i = 0; i < thread_count; i++) {
> +        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
> +            goto exit;
> +        }
> +        decomp_param[i].stream.opaque = &decomp_param[i];
> +
> +        qemu_mutex_init(&decomp_param[i].mutex);
> +        qemu_cond_init(&decomp_param[i].cond);
> +        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> +        decomp_param[i].done = true;
> +        decomp_param[i].quit = false;
> +        qemu_thread_create(decompress_threads + i, "decompress",
> +                           do_data_decompress, decomp_param + i,
> +                           QEMU_THREAD_JOINABLE);
> +    }
> +    return 0;
> +exit:
> +    compress_threads_load_cleanup();

I don't think this is safe; if inflateInit(..) fails in not-the-last
thread, compress_threads_load_cleanup() will try and destroy all the
mutex's and condition variables, even though they've not yet all been
_init'd.

However, other than that I think the patch is OK; a chat with Dan
Berrange has convinced me this probably doesn't affect the stream
format, so that's OK.

One thing I would like is a comment as to how the 'opaque' field is
being used; I don't think I quite understand what you're doing there.

Dave

> +    return -1;
> +}
> +
>  static void decompress_data_with_multi_threads(QEMUFile *f,
>                                                 void *host, int len)
>  {
> @@ -2641,8 +2708,11 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>   */
>  static int ram_load_setup(QEMUFile *f, void *opaque)
>  {
> +    if (compress_threads_load_setup()) {
> +        return -1;
> +    }
> +
>      xbzrle_load_setup();
> -    compress_threads_load_setup();
>      ramblock_recv_map_init();
>      return 0;
>  }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-15 11:03     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:03 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Current code uses compress2()/uncompress() to compress/decompress
> memory, these two function manager memory allocation and release
> internally, that causes huge memory is allocated and freed very
> frequently
> 
> More worse, frequently returning memory to kernel will flush TLBs
> and trigger invalidation callbacks on mmu-notification which
> interacts with KVM MMU, that dramatically reduce the performance
> of VM
> 
> So, we maintain the memory by ourselves and reuse it for each
> compression and decompression

I think
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  34 ++++++++++--
>  migration/qemu-file.h |   6 ++-
>  migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 42 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2ab2bf362d..1ff33a1ffb 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
>      return v;
>  }
>  
> +/* return the size after compression, or negative value on error */
> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                              const uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = deflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = (uint8_t *)source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
> +
> +    err = deflate(stream, Z_FINISH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->next_out - dest;
> +}
> +
>  /* Compress size bytes of data start at p with specific compression
>   * level and store the compressed data to the buffer of f.
>   *
> @@ -668,8 +692,8 @@ uint64_t qemu_get_be64(QEMUFile *f)
>   * data, return -1.
>   */
>  
> -ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
> -                                  int level)
> +ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
> +                                  const uint8_t *p, size_t size)
>  {
>      ssize_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t);
>  
> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
>              return -1;
>          }
>      }
> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
> -                  (Bytef *)p, size, level) != Z_OK) {
> +
> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
> +                              blen, p, size);
> +    if (blen < 0) {
>          error_report("Compress Failed!");
>          return 0;
>      }
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index aae4e5ed36..d123b21ca8 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -25,6 +25,8 @@
>  #ifndef MIGRATION_QEMU_FILE_H
>  #define MIGRATION_QEMU_FILE_H
>  
> +#include <zlib.h>
> +
>  /* Read a chunk of data from a file at the given position.  The pos argument
>   * can be ignored if the file is only be used for streaming.  The number of
>   * bytes actually read should be returned.
> @@ -132,8 +134,8 @@ bool qemu_file_is_writable(QEMUFile *f);
>  
>  size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset);
>  size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size);
> -ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
> -                                  int level);
> +ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
> +                                  const uint8_t *p, size_t size);
>  int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
>  
>  /*
> diff --git a/migration/ram.c b/migration/ram.c
> index 615693f180..fff3f31e90 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -264,6 +264,7 @@ struct CompressParam {
>      QemuCond cond;
>      RAMBlock *block;
>      ram_addr_t offset;
> +    z_stream stream;
>  };
>  typedef struct CompressParam CompressParam;
>  
> @@ -275,6 +276,7 @@ struct DecompressParam {
>      void *des;
>      uint8_t *compbuf;
>      int len;
> +    z_stream stream;
>  };
>  typedef struct DecompressParam DecompressParam;
>  
> @@ -294,7 +296,7 @@ static QemuThread *decompress_threads;
>  static QemuMutex decomp_done_lock;
>  static QemuCond decomp_done_cond;
>  
> -static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
> +static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                  ram_addr_t offset);
>  
>  static void *do_data_compress(void *opaque)
> @@ -311,7 +313,7 @@ static void *do_data_compress(void *opaque)
>              param->block = NULL;
>              qemu_mutex_unlock(&param->mutex);
>  
> -            do_compress_ram_page(param->file, block, offset);
> +            do_compress_ram_page(param->file, &param->stream, block, offset);
>  
>              qemu_mutex_lock(&comp_done_lock);
>              param->done = true;
> @@ -352,10 +354,17 @@ static void compress_threads_save_cleanup(void)
>      terminate_compression_threads();
>      thread_count = migrate_compress_threads();
>      for (i = 0; i < thread_count; i++) {
> +        /* something in compress_threads_save_setup() is wrong. */
> +        if (!comp_param[i].stream.opaque) {
> +            break;
> +        }
> +
>          qemu_thread_join(compress_threads + i);
>          qemu_fclose(comp_param[i].file);
>          qemu_mutex_destroy(&comp_param[i].mutex);
>          qemu_cond_destroy(&comp_param[i].cond);
> +        deflateEnd(&comp_param[i].stream);
> +        comp_param[i].stream.opaque = NULL;
>      }
>      qemu_mutex_destroy(&comp_done_lock);
>      qemu_cond_destroy(&comp_done_cond);
> @@ -365,12 +374,12 @@ static void compress_threads_save_cleanup(void)
>      comp_param = NULL;
>  }
>  
> -static void compress_threads_save_setup(void)
> +static int compress_threads_save_setup(void)
>  {
>      int i, thread_count;
>  
>      if (!migrate_use_compression()) {
> -        return;
> +        return 0;
>      }
>      thread_count = migrate_compress_threads();
>      compress_threads = g_new0(QemuThread, thread_count);
> @@ -378,6 +387,12 @@ static void compress_threads_save_setup(void)
>      qemu_cond_init(&comp_done_cond);
>      qemu_mutex_init(&comp_done_lock);
>      for (i = 0; i < thread_count; i++) {
> +        if (deflateInit(&comp_param[i].stream,
> +                           migrate_compress_level()) != Z_OK) {
> +            goto exit;
> +        }
> +        comp_param[i].stream.opaque = &comp_param[i];
> +
>          /* comp_param[i].file is just used as a dummy buffer to save data,
>           * set its ops to empty.
>           */
> @@ -390,6 +405,11 @@ static void compress_threads_save_setup(void)
>                             do_data_compress, comp_param + i,
>                             QEMU_THREAD_JOINABLE);
>      }
> +    return 0;
> +
> +exit:
> +    compress_threads_save_cleanup();
> +    return -1;
>  }
>  
>  /* Multiple fd's */
> @@ -1026,7 +1046,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      return pages;
>  }
>  
> -static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
> +static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                  ram_addr_t offset)
>  {
>      RAMState *rs = ram_state;
> @@ -1035,8 +1055,7 @@ static int do_compress_ram_page(QEMUFile *f, RAMBlock *block,
>  
>      bytes_sent = save_page_header(rs, f, block, offset |
>                                    RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, p, TARGET_PAGE_SIZE,
> -                                     migrate_compress_level());
> +    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
>      if (blen < 0) {
>          bytes_sent = 0;
>          qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
> @@ -2209,9 +2228,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMState **rsp = opaque;
>      RAMBlock *block;
>  
> +    if (compress_threads_save_setup()) {
> +        return -1;
> +    }
> +
>      /* migration has already setup the bitmap, reuse it. */
>      if (!migration_in_colo_state()) {
>          if (ram_init_all(rsp) != 0) {
> +            compress_threads_save_cleanup();
>              return -1;
>          }
>      }
> @@ -2231,7 +2255,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      }
>  
>      rcu_read_unlock();
> -    compress_threads_save_setup();
>  
>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
> @@ -2495,6 +2518,30 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>      }
>  }
>  
> +/* return the size after decompression, or negative value on error */
> +static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                           uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = inflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
> +
> +    err = inflate(stream, Z_NO_FLUSH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->total_out;
> +}
> +
>  static void *do_data_decompress(void *opaque)
>  {
>      DecompressParam *param = opaque;
> @@ -2511,13 +2558,13 @@ static void *do_data_decompress(void *opaque)
>              qemu_mutex_unlock(&param->mutex);
>  
>              pagesize = TARGET_PAGE_SIZE;
> -            /* uncompress() will return failed in some case, especially
> +            /* qemu_uncompress() will return failed in some case, especially
>               * when the page is dirted when doing the compression, it's
>               * not a problem because the dirty page will be retransferred
>               * and uncompress() won't break the data in other pages.
>               */
> -            uncompress((Bytef *)des, &pagesize,
> -                       (const Bytef *)param->compbuf, len);
> +            qemu_uncompress(&param->stream, des, pagesize,
> +                            param->compbuf, len);
>  
>              qemu_mutex_lock(&decomp_done_lock);
>              param->done = true;
> @@ -2552,30 +2599,6 @@ static void wait_for_decompress_done(void)
>      qemu_mutex_unlock(&decomp_done_lock);
>  }
>  
> -static void compress_threads_load_setup(void)
> -{
> -    int i, thread_count;
> -
> -    if (!migrate_use_compression()) {
> -        return;
> -    }
> -    thread_count = migrate_decompress_threads();
> -    decompress_threads = g_new0(QemuThread, thread_count);
> -    decomp_param = g_new0(DecompressParam, thread_count);
> -    qemu_mutex_init(&decomp_done_lock);
> -    qemu_cond_init(&decomp_done_cond);
> -    for (i = 0; i < thread_count; i++) {
> -        qemu_mutex_init(&decomp_param[i].mutex);
> -        qemu_cond_init(&decomp_param[i].cond);
> -        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> -        decomp_param[i].done = true;
> -        decomp_param[i].quit = false;
> -        qemu_thread_create(decompress_threads + i, "decompress",
> -                           do_data_decompress, decomp_param + i,
> -                           QEMU_THREAD_JOINABLE);
> -    }
> -}
> -
>  static void compress_threads_load_cleanup(void)
>  {
>      int i, thread_count;
> @@ -2585,16 +2608,26 @@ static void compress_threads_load_cleanup(void)
>      }
>      thread_count = migrate_decompress_threads();
>      for (i = 0; i < thread_count; i++) {
> +        if (!decomp_param[i].stream.opaque) {
> +            break;
> +        }
> +
>          qemu_mutex_lock(&decomp_param[i].mutex);
>          decomp_param[i].quit = true;
>          qemu_cond_signal(&decomp_param[i].cond);
>          qemu_mutex_unlock(&decomp_param[i].mutex);
>      }
>      for (i = 0; i < thread_count; i++) {
> +        if (!decomp_param[i].stream.opaque) {
> +            break;
> +        }
> +
>          qemu_thread_join(decompress_threads + i);
>          qemu_mutex_destroy(&decomp_param[i].mutex);
>          qemu_cond_destroy(&decomp_param[i].cond);
>          g_free(decomp_param[i].compbuf);
> +        inflateEnd(&decomp_param[i].stream);
> +        decomp_param[i].stream.opaque = NULL;
>      }
>      g_free(decompress_threads);
>      g_free(decomp_param);
> @@ -2602,6 +2635,40 @@ static void compress_threads_load_cleanup(void)
>      decomp_param = NULL;
>  }
>  
> +static int compress_threads_load_setup(void)
> +{
> +    int i, thread_count;
> +
> +    if (!migrate_use_compression()) {
> +        return 0;
> +    }
> +
> +    thread_count = migrate_decompress_threads();
> +    decompress_threads = g_new0(QemuThread, thread_count);
> +    decomp_param = g_new0(DecompressParam, thread_count);
> +    qemu_mutex_init(&decomp_done_lock);
> +    qemu_cond_init(&decomp_done_cond);
> +    for (i = 0; i < thread_count; i++) {
> +        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
> +            goto exit;
> +        }
> +        decomp_param[i].stream.opaque = &decomp_param[i];
> +
> +        qemu_mutex_init(&decomp_param[i].mutex);
> +        qemu_cond_init(&decomp_param[i].cond);
> +        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> +        decomp_param[i].done = true;
> +        decomp_param[i].quit = false;
> +        qemu_thread_create(decompress_threads + i, "decompress",
> +                           do_data_decompress, decomp_param + i,
> +                           QEMU_THREAD_JOINABLE);
> +    }
> +    return 0;
> +exit:
> +    compress_threads_load_cleanup();

I don't think this is safe; if inflateInit(..) fails in not-the-last
thread, compress_threads_load_cleanup() will try and destroy all the
mutex's and condition variables, even though they've not yet all been
_init'd.

However, other than that I think the patch is OK; a chat with Dan
Berrange has convinced me this probably doesn't affect the stream
format, so that's OK.

One thing I would like is a comment as to how the 'opaque' field is
being used; I don't think I quite understand what you're doing there.

Dave

> +    return -1;
> +}
> +
>  static void decompress_data_with_multi_threads(QEMUFile *f,
>                                                 void *host, int len)
>  {
> @@ -2641,8 +2708,11 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>   */
>  static int ram_load_setup(QEMUFile *f, void *opaque)
>  {
> +    if (compress_threads_load_setup()) {
> +        return -1;
> +    }
> +
>      xbzrle_load_setup();
> -    compress_threads_load_setup();
>      ramblock_recv_map_init();
>      return 0;
>  }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 11:29     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:29 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Currently the page being compressed is allowed to be updated by
> the VM on the source QEMU, correspondingly the destination QEMU
> just ignores the decompression error. However, we completely miss
> the chance to catch real errors, then the VM is corrupted silently
> 
> To make the migration more robuster, we copy the page to a buffer
> first to avoid it being written by VM, then detect and handle the
> errors of both compression and decompression errors properly
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  4 ++--
>  migration/ram.c       | 29 +++++++++++++++++++----------
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1ff33a1ffb..137bcc8bdc 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
>      blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>                                blen, p, size);
>      if (blen < 0) {
> -        error_report("Compress Failed!");
> -        return 0;
> +        return -1;
>      }
> +
>      qemu_put_be32(f, blen);
>      if (f->ops->writev_buffer) {
>          add_to_iovec(f, f->buf + f->buf_index, blen, false);
> diff --git a/migration/ram.c b/migration/ram.c
> index fff3f31e90..c47185d38c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -273,6 +273,7 @@ struct DecompressParam {
>      bool quit;
>      QemuMutex mutex;
>      QemuCond cond;
> +    QEMUFile *file;
>      void *des;
>      uint8_t *compbuf;
>      int len;
> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>  {
>      RAMState *rs = ram_state;
>      int bytes_sent, blen;
> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> +    uint8_t buf[TARGET_PAGE_SIZE], *p;

That should be malloc'd somewhere rather than be on the stack; it's a
bit big and also there are architectures where TARGET_PAGE_SIZE isn't
compile time constant.

(Also, please use g_try_malloc rather than g_malloc on larger chunks,
since g_try_malloc will return NULL so you can fail nicely;  g_malloc is
OK for small things that are very unlikely to fail).

Other than that, I think the patch is fine.

Dave

> +    p = block->host + (offset & TARGET_PAGE_MASK);
>      bytes_sent = save_page_header(rs, f, block, offset |
>                                    RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
> +    memcpy(buf, p, TARGET_PAGE_SIZE);
> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
>      if (blen < 0) {
>          bytes_sent = 0;
>          qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
> @@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque)
>      DecompressParam *param = opaque;
>      unsigned long pagesize;
>      uint8_t *des;
> -    int len;
> +    int len, ret;
>  
>      qemu_mutex_lock(&param->mutex);
>      while (!param->quit) {
> @@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque)
>               * not a problem because the dirty page will be retransferred
>               * and uncompress() won't break the data in other pages.
>               */
> -            qemu_uncompress(&param->stream, des, pagesize,
> -                            param->compbuf, len);
> +            ret = qemu_uncompress(&param->stream, des, pagesize,
> +                                  param->compbuf, len);
> +            if (ret < 0) {
> +                error_report("decompress data failed");
> +                qemu_file_set_error(param->file, ret);
> +            }
>  
>              qemu_mutex_lock(&decomp_done_lock);
>              param->done = true;
> @@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque)
>      return NULL;
>  }
>  
> -static void wait_for_decompress_done(void)
> +static int wait_for_decompress_done(QEMUFile *f)
>  {
>      int idx, thread_count;
>  
>      if (!migrate_use_compression()) {
> -        return;
> +        return 0;
>      }
>  
>      thread_count = migrate_decompress_threads();
> @@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void)
>          }
>      }
>      qemu_mutex_unlock(&decomp_done_lock);
> +    return qemu_file_get_error(f);
>  }
>  
>  static void compress_threads_load_cleanup(void)
> @@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void)
>      decomp_param = NULL;
>  }
>  
> -static int compress_threads_load_setup(void)
> +static int compress_threads_load_setup(QEMUFile *f)
>  {
>      int i, thread_count;
>  
> @@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void)
>          }
>          decomp_param[i].stream.opaque = &decomp_param[i];
>  
> +        decomp_param[i].file = f;
>          qemu_mutex_init(&decomp_param[i].mutex);
>          qemu_cond_init(&decomp_param[i].cond);
>          decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> @@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>   */
>  static int ram_load_setup(QEMUFile *f, void *opaque)
>  {
> -    if (compress_threads_load_setup()) {
> +    if (compress_threads_load_setup(f)) {
>          return -1;
>      }
>  
> @@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>          }
>      }
>  
> -    wait_for_decompress_done();
> +    ret |= wait_for_decompress_done(f);
>      rcu_read_unlock();
>      trace_ram_load_complete(ret, seq_iter);
>      return ret;
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-15 11:29     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:29 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Currently the page being compressed is allowed to be updated by
> the VM on the source QEMU, correspondingly the destination QEMU
> just ignores the decompression error. However, we completely miss
> the chance to catch real errors, then the VM is corrupted silently
> 
> To make the migration more robuster, we copy the page to a buffer
> first to avoid it being written by VM, then detect and handle the
> errors of both compression and decompression errors properly
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  4 ++--
>  migration/ram.c       | 29 +++++++++++++++++++----------
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1ff33a1ffb..137bcc8bdc 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
>      blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>                                blen, p, size);
>      if (blen < 0) {
> -        error_report("Compress Failed!");
> -        return 0;
> +        return -1;
>      }
> +
>      qemu_put_be32(f, blen);
>      if (f->ops->writev_buffer) {
>          add_to_iovec(f, f->buf + f->buf_index, blen, false);
> diff --git a/migration/ram.c b/migration/ram.c
> index fff3f31e90..c47185d38c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -273,6 +273,7 @@ struct DecompressParam {
>      bool quit;
>      QemuMutex mutex;
>      QemuCond cond;
> +    QEMUFile *file;
>      void *des;
>      uint8_t *compbuf;
>      int len;
> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>  {
>      RAMState *rs = ram_state;
>      int bytes_sent, blen;
> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> +    uint8_t buf[TARGET_PAGE_SIZE], *p;

That should be malloc'd somewhere rather than be on the stack; it's a
bit big and also there are architectures where TARGET_PAGE_SIZE isn't
compile time constant.

(Also, please use g_try_malloc rather than g_malloc on larger chunks,
since g_try_malloc will return NULL so you can fail nicely;  g_malloc is
OK for small things that are very unlikely to fail).

Other than that, I think the patch is fine.

Dave

> +    p = block->host + (offset & TARGET_PAGE_MASK);
>      bytes_sent = save_page_header(rs, f, block, offset |
>                                    RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
> +    memcpy(buf, p, TARGET_PAGE_SIZE);
> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
>      if (blen < 0) {
>          bytes_sent = 0;
>          qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
> @@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque)
>      DecompressParam *param = opaque;
>      unsigned long pagesize;
>      uint8_t *des;
> -    int len;
> +    int len, ret;
>  
>      qemu_mutex_lock(&param->mutex);
>      while (!param->quit) {
> @@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque)
>               * not a problem because the dirty page will be retransferred
>               * and uncompress() won't break the data in other pages.
>               */
> -            qemu_uncompress(&param->stream, des, pagesize,
> -                            param->compbuf, len);
> +            ret = qemu_uncompress(&param->stream, des, pagesize,
> +                                  param->compbuf, len);
> +            if (ret < 0) {
> +                error_report("decompress data failed");
> +                qemu_file_set_error(param->file, ret);
> +            }
>  
>              qemu_mutex_lock(&decomp_done_lock);
>              param->done = true;
> @@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque)
>      return NULL;
>  }
>  
> -static void wait_for_decompress_done(void)
> +static int wait_for_decompress_done(QEMUFile *f)
>  {
>      int idx, thread_count;
>  
>      if (!migrate_use_compression()) {
> -        return;
> +        return 0;
>      }
>  
>      thread_count = migrate_decompress_threads();
> @@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void)
>          }
>      }
>      qemu_mutex_unlock(&decomp_done_lock);
> +    return qemu_file_get_error(f);
>  }
>  
>  static void compress_threads_load_cleanup(void)
> @@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void)
>      decomp_param = NULL;
>  }
>  
> -static int compress_threads_load_setup(void)
> +static int compress_threads_load_setup(QEMUFile *f)
>  {
>      int i, thread_count;
>  
> @@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void)
>          }
>          decomp_param[i].stream.opaque = &decomp_param[i];
>  
> +        decomp_param[i].file = f;
>          qemu_mutex_init(&decomp_param[i].mutex);
>          qemu_cond_init(&decomp_param[i].cond);
>          decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> @@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>   */
>  static int ram_load_setup(QEMUFile *f, void *opaque)
>  {
> -    if (compress_threads_load_setup()) {
> +    if (compress_threads_load_setup(f)) {
>          return -1;
>      }
>  
> @@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>          }
>      }
>  
> -    wait_for_decompress_done();
> +    ret |= wait_for_decompress_done(f);
>      rcu_read_unlock();
>      trace_ram_load_complete(ret, seq_iter);
>      return ret;
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/8] migration: introduce control_save_page()
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 11:37     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:37 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Abstract the common function control_save_page() to cleanup the code,
> no logic is changed
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

It would be good to find a better name for control_save_page, but I
can't think of one!).
> ---
>  migration/ram.c | 174 +++++++++++++++++++++++++++++---------------------------
>  1 file changed, 89 insertions(+), 85 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c47185d38c..e7b8b14c3c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -957,6 +957,44 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
>      ram_discard_range(rbname, offset, pages << TARGET_PAGE_BITS);
>  }
>  
> +/*
> + * @pages: the number of pages written by the control path,
> + *        < 0 - error
> + *        > 0 - number of pages written

What about 0 !

> + * Return true if the pages has been saved, otherwise false is returned.
> + */
> +static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> +                              int *pages)
> +{
> +    uint64_t bytes_xmit = 0;
> +    int ret;
> +
> +    *pages = -1;
> +    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
> +                                &bytes_xmit);
> +    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
> +        return false;
> +    }
> +
> +    if (bytes_xmit) {
> +        ram_counters.transferred += bytes_xmit;
> +        *pages = 1;
> +    }
> +
> +    if (ret == RAM_SAVE_CONTROL_DELAYED) {
> +        return true;
> +    }
> +
> +    if (bytes_xmit > 0) {
> +        ram_counters.normal++;
> +    } else if (bytes_xmit == 0) {
> +        ram_counters.duplicate++;
> +    }
> +
> +    return true;
> +}
> +
>  /**
>   * ram_save_page: send the given page to the stream
>   *
> @@ -973,56 +1011,36 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
>  static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>  {
>      int pages = -1;
> -    uint64_t bytes_xmit;
> -    ram_addr_t current_addr;
>      uint8_t *p;
> -    int ret;
>      bool send_async = true;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> +    ram_addr_t current_addr = block->offset + offset;
>  
>      p = block->host + offset;
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
> -    /* In doubt sent page as normal */
> -    bytes_xmit = 0;
> -    ret = ram_control_save_page(rs->f, block->offset,
> -                           offset, TARGET_PAGE_SIZE, &bytes_xmit);
> -    if (bytes_xmit) {
> -        ram_counters.transferred += bytes_xmit;
> -        pages = 1;
> +    if (control_save_page(rs, block, offset, &pages)) {
> +        return pages;
>      }
>  
>      XBZRLE_cache_lock();
> -
> -    current_addr = block->offset + offset;
> -
> -    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> -        if (ret != RAM_SAVE_CONTROL_DELAYED) {
> -            if (bytes_xmit > 0) {
> -                ram_counters.normal++;
> -            } else if (bytes_xmit == 0) {
> -                ram_counters.duplicate++;
> -            }
> -        }
> -    } else {
> -        pages = save_zero_page(rs, block, offset);
> -        if (pages > 0) {
> -            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -             * page would be stale
> +    pages = save_zero_page(rs, block, offset);
> +    if (pages > 0) {
> +        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> +         * page would be stale
> +         */
> +        xbzrle_cache_zero_page(rs, current_addr);
> +        ram_release_pages(block->idstr, offset, pages);
> +    } else if (!rs->ram_bulk_stage &&
> +               !migration_in_postcopy() && migrate_use_xbzrle()) {
> +        pages = save_xbzrle_page(rs, &p, current_addr, block,
> +                                 offset, last_stage);
> +        if (!last_stage) {
> +            /* Can't send this cached data async, since the cache page
> +             * might get updated before it gets to the wire
>               */
> -            xbzrle_cache_zero_page(rs, current_addr);
> -            ram_release_pages(block->idstr, offset, pages);
> -        } else if (!rs->ram_bulk_stage &&
> -                   !migration_in_postcopy() && migrate_use_xbzrle()) {
> -            pages = save_xbzrle_page(rs, &p, current_addr, block,
> -                                     offset, last_stage);
> -            if (!last_stage) {
> -                /* Can't send this cached data async, since the cache page
> -                 * might get updated before it gets to the wire
> -                 */
> -                send_async = false;
> -            }
> +            send_async = false;
>          }
>      }
>  
> @@ -1152,63 +1170,49 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>                                      bool last_stage)
>  {
>      int pages = -1;
> -    uint64_t bytes_xmit = 0;
>      uint8_t *p;
> -    int ret;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>  
>      p = block->host + offset;
>  
> -    ret = ram_control_save_page(rs->f, block->offset,
> -                                offset, TARGET_PAGE_SIZE, &bytes_xmit);
> -    if (bytes_xmit) {
> -        ram_counters.transferred += bytes_xmit;
> -        pages = 1;
> +    if (control_save_page(rs, block, offset, &pages)) {
> +        return pages;
>      }
> -    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> -        if (ret != RAM_SAVE_CONTROL_DELAYED) {
> -            if (bytes_xmit > 0) {
> -                ram_counters.normal++;
> -            } else if (bytes_xmit == 0) {
> -                ram_counters.duplicate++;
> -            }
> +
> +    /* When starting the process of a new block, the first page of
> +     * the block should be sent out before other pages in the same
> +     * block, and all the pages in last block should have been sent
> +     * out, keeping this order is important, because the 'cont' flag
> +     * is used to avoid resending the block name.
> +     */
> +    if (block != rs->last_sent_block) {
> +        flush_compressed_data(rs);
> +        pages = save_zero_page(rs, block, offset);
> +        if (pages > 0) {
> +            ram_release_pages(block->idstr, offset, pages);
> +        } else {
> +            /*
> +             * Make sure the first page is sent out before other pages.
> +             *
> +             * we post it as normal page as compression will take much
> +             * CPU resource.
> +             */
> +            ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                            offset | RAM_SAVE_FLAG_PAGE);
> +            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                                  migrate_release_ram() &
> +                                  migration_in_postcopy());
> +            ram_counters.transferred += TARGET_PAGE_SIZE;
> +            ram_counters.normal++;
> +            pages = 1;
>          }
>      } else {
> -        /* When starting the process of a new block, the first page of
> -         * the block should be sent out before other pages in the same
> -         * block, and all the pages in last block should have been sent
> -         * out, keeping this order is important, because the 'cont' flag
> -         * is used to avoid resending the block name.
> -         */
> -        if (block != rs->last_sent_block) {
> -            flush_compressed_data(rs);
> -            pages = save_zero_page(rs, block, offset);
> -            if (pages > 0) {
> -                ram_release_pages(block->idstr, offset, pages);
> -            } else {
> -                /*
> -                 * Make sure the first page is sent out before other pages.
> -                 *
> -                 * we post it as normal page as compression will take much
> -                 * CPU resource.
> -                 */
> -                ram_counters.transferred += save_page_header(rs, rs->f, block,
> -                                                offset | RAM_SAVE_FLAG_PAGE);
> -                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                      migrate_release_ram() &
> -                                      migration_in_postcopy());
> -                ram_counters.transferred += TARGET_PAGE_SIZE;
> -                ram_counters.normal++;
> -                pages = 1;
> -            }
> +        pages = save_zero_page(rs, block, offset);
> +        if (pages == -1) {
> +            pages = compress_page_with_multi_thread(rs, block, offset);
>          } else {
> -            pages = save_zero_page(rs, block, offset);
> -            if (pages == -1) {
> -                pages = compress_page_with_multi_thread(rs, block, offset);
> -            } else {
> -                ram_release_pages(block->idstr, offset, pages);
> -            }
> +            ram_release_pages(block->idstr, offset, pages);
>          }
>      }
>  
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] migration: introduce control_save_page()
@ 2018-03-15 11:37     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:37 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Abstract the common function control_save_page() to cleanup the code,
> no logic is changed
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

It would be good to find a better name for control_save_page, but I
can't think of one!).
> ---
>  migration/ram.c | 174 +++++++++++++++++++++++++++++---------------------------
>  1 file changed, 89 insertions(+), 85 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c47185d38c..e7b8b14c3c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -957,6 +957,44 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
>      ram_discard_range(rbname, offset, pages << TARGET_PAGE_BITS);
>  }
>  
> +/*
> + * @pages: the number of pages written by the control path,
> + *        < 0 - error
> + *        > 0 - number of pages written

What about 0 !

> + * Return true if the pages has been saved, otherwise false is returned.
> + */
> +static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> +                              int *pages)
> +{
> +    uint64_t bytes_xmit = 0;
> +    int ret;
> +
> +    *pages = -1;
> +    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
> +                                &bytes_xmit);
> +    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
> +        return false;
> +    }
> +
> +    if (bytes_xmit) {
> +        ram_counters.transferred += bytes_xmit;
> +        *pages = 1;
> +    }
> +
> +    if (ret == RAM_SAVE_CONTROL_DELAYED) {
> +        return true;
> +    }
> +
> +    if (bytes_xmit > 0) {
> +        ram_counters.normal++;
> +    } else if (bytes_xmit == 0) {
> +        ram_counters.duplicate++;
> +    }
> +
> +    return true;
> +}
> +
>  /**
>   * ram_save_page: send the given page to the stream
>   *
> @@ -973,56 +1011,36 @@ static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
>  static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>  {
>      int pages = -1;
> -    uint64_t bytes_xmit;
> -    ram_addr_t current_addr;
>      uint8_t *p;
> -    int ret;
>      bool send_async = true;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> +    ram_addr_t current_addr = block->offset + offset;
>  
>      p = block->host + offset;
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
> -    /* In doubt sent page as normal */
> -    bytes_xmit = 0;
> -    ret = ram_control_save_page(rs->f, block->offset,
> -                           offset, TARGET_PAGE_SIZE, &bytes_xmit);
> -    if (bytes_xmit) {
> -        ram_counters.transferred += bytes_xmit;
> -        pages = 1;
> +    if (control_save_page(rs, block, offset, &pages)) {
> +        return pages;
>      }
>  
>      XBZRLE_cache_lock();
> -
> -    current_addr = block->offset + offset;
> -
> -    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> -        if (ret != RAM_SAVE_CONTROL_DELAYED) {
> -            if (bytes_xmit > 0) {
> -                ram_counters.normal++;
> -            } else if (bytes_xmit == 0) {
> -                ram_counters.duplicate++;
> -            }
> -        }
> -    } else {
> -        pages = save_zero_page(rs, block, offset);
> -        if (pages > 0) {
> -            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -             * page would be stale
> +    pages = save_zero_page(rs, block, offset);
> +    if (pages > 0) {
> +        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> +         * page would be stale
> +         */
> +        xbzrle_cache_zero_page(rs, current_addr);
> +        ram_release_pages(block->idstr, offset, pages);
> +    } else if (!rs->ram_bulk_stage &&
> +               !migration_in_postcopy() && migrate_use_xbzrle()) {
> +        pages = save_xbzrle_page(rs, &p, current_addr, block,
> +                                 offset, last_stage);
> +        if (!last_stage) {
> +            /* Can't send this cached data async, since the cache page
> +             * might get updated before it gets to the wire
>               */
> -            xbzrle_cache_zero_page(rs, current_addr);
> -            ram_release_pages(block->idstr, offset, pages);
> -        } else if (!rs->ram_bulk_stage &&
> -                   !migration_in_postcopy() && migrate_use_xbzrle()) {
> -            pages = save_xbzrle_page(rs, &p, current_addr, block,
> -                                     offset, last_stage);
> -            if (!last_stage) {
> -                /* Can't send this cached data async, since the cache page
> -                 * might get updated before it gets to the wire
> -                 */
> -                send_async = false;
> -            }
> +            send_async = false;
>          }
>      }
>  
> @@ -1152,63 +1170,49 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>                                      bool last_stage)
>  {
>      int pages = -1;
> -    uint64_t bytes_xmit = 0;
>      uint8_t *p;
> -    int ret;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>  
>      p = block->host + offset;
>  
> -    ret = ram_control_save_page(rs->f, block->offset,
> -                                offset, TARGET_PAGE_SIZE, &bytes_xmit);
> -    if (bytes_xmit) {
> -        ram_counters.transferred += bytes_xmit;
> -        pages = 1;
> +    if (control_save_page(rs, block, offset, &pages)) {
> +        return pages;
>      }
> -    if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> -        if (ret != RAM_SAVE_CONTROL_DELAYED) {
> -            if (bytes_xmit > 0) {
> -                ram_counters.normal++;
> -            } else if (bytes_xmit == 0) {
> -                ram_counters.duplicate++;
> -            }
> +
> +    /* When starting the process of a new block, the first page of
> +     * the block should be sent out before other pages in the same
> +     * block, and all the pages in last block should have been sent
> +     * out, keeping this order is important, because the 'cont' flag
> +     * is used to avoid resending the block name.
> +     */
> +    if (block != rs->last_sent_block) {
> +        flush_compressed_data(rs);
> +        pages = save_zero_page(rs, block, offset);
> +        if (pages > 0) {
> +            ram_release_pages(block->idstr, offset, pages);
> +        } else {
> +            /*
> +             * Make sure the first page is sent out before other pages.
> +             *
> +             * we post it as normal page as compression will take much
> +             * CPU resource.
> +             */
> +            ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                            offset | RAM_SAVE_FLAG_PAGE);
> +            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                                  migrate_release_ram() &
> +                                  migration_in_postcopy());
> +            ram_counters.transferred += TARGET_PAGE_SIZE;
> +            ram_counters.normal++;
> +            pages = 1;
>          }
>      } else {
> -        /* When starting the process of a new block, the first page of
> -         * the block should be sent out before other pages in the same
> -         * block, and all the pages in last block should have been sent
> -         * out, keeping this order is important, because the 'cont' flag
> -         * is used to avoid resending the block name.
> -         */
> -        if (block != rs->last_sent_block) {
> -            flush_compressed_data(rs);
> -            pages = save_zero_page(rs, block, offset);
> -            if (pages > 0) {
> -                ram_release_pages(block->idstr, offset, pages);
> -            } else {
> -                /*
> -                 * Make sure the first page is sent out before other pages.
> -                 *
> -                 * we post it as normal page as compression will take much
> -                 * CPU resource.
> -                 */
> -                ram_counters.transferred += save_page_header(rs, rs->f, block,
> -                                                offset | RAM_SAVE_FLAG_PAGE);
> -                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                      migrate_release_ram() &
> -                                      migration_in_postcopy());
> -                ram_counters.transferred += TARGET_PAGE_SIZE;
> -                ram_counters.normal++;
> -                pages = 1;
> -            }
> +        pages = save_zero_page(rs, block, offset);
> +        if (pages == -1) {
> +            pages = compress_page_with_multi_thread(rs, block, offset);
>          } else {
> -            pages = save_zero_page(rs, block, offset);
> -            if (pages == -1) {
> -                pages = compress_page_with_multi_thread(rs, block, offset);
> -            } else {
> -                ram_release_pages(block->idstr, offset, pages);
> -            }
> +            ram_release_pages(block->idstr, offset, pages);
>          }
>      }
>  
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 5/8] migration: move calling control_save_page to the common place
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 11:47     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:47 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> The function is called by both ram_save_page and ram_save_target_page,
> so move it to the common caller to cleanup the code
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e7b8b14c3c..839665d866 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1020,10 +1020,6 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      p = block->host + offset;
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
> -    if (control_save_page(rs, block, offset, &pages)) {
> -        return pages;
> -    }
> -
>      XBZRLE_cache_lock();
>      pages = save_zero_page(rs, block, offset);
>      if (pages > 0) {
> @@ -1176,10 +1172,6 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>  
>      p = block->host + offset;
>  
> -    if (control_save_page(rs, block, offset, &pages)) {
> -        return pages;
> -    }
> -
>      /* When starting the process of a new block, the first page of
>       * the block should be sent out before other pages in the same
>       * block, and all the pages in last block should have been sent
> @@ -1472,6 +1464,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>  
>      /* Check the pages is dirty and if it is send it */
>      if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> +        RAMBlock *block = pss->block;
> +        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> +
> +        if (control_save_page(rs, block, offset, &res)) {
> +            goto page_saved;

OK, but I'd prefer if you avoided this forward goto;  we do use goto but
we tend to keep it just for error cases.

Dave

> +        }
> +
>          /*
>           * If xbzrle is on, stop using the data compression after first
>           * round of migration even if compression is enabled. In theory,
> @@ -1484,6 +1483,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>              res = ram_save_page(rs, pss, last_stage);
>          }
>  
> +page_saved:
>          if (res < 0) {
>              return res;
>          }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] migration: move calling control_save_page to the common place
@ 2018-03-15 11:47     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 11:47 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> The function is called by both ram_save_page and ram_save_target_page,
> so move it to the common caller to cleanup the code
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e7b8b14c3c..839665d866 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1020,10 +1020,6 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      p = block->host + offset;
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
> -    if (control_save_page(rs, block, offset, &pages)) {
> -        return pages;
> -    }
> -
>      XBZRLE_cache_lock();
>      pages = save_zero_page(rs, block, offset);
>      if (pages > 0) {
> @@ -1176,10 +1172,6 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>  
>      p = block->host + offset;
>  
> -    if (control_save_page(rs, block, offset, &pages)) {
> -        return pages;
> -    }
> -
>      /* When starting the process of a new block, the first page of
>       * the block should be sent out before other pages in the same
>       * block, and all the pages in last block should have been sent
> @@ -1472,6 +1464,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>  
>      /* Check the pages is dirty and if it is send it */
>      if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> +        RAMBlock *block = pss->block;
> +        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> +
> +        if (control_save_page(rs, block, offset, &res)) {
> +            goto page_saved;

OK, but I'd prefer if you avoided this forward goto;  we do use goto but
we tend to keep it just for error cases.

Dave

> +        }
> +
>          /*
>           * If xbzrle is on, stop using the data compression after first
>           * round of migration even if compression is enabled. In theory,
> @@ -1484,6 +1483,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>              res = ram_save_page(rs, pss, last_stage);
>          }
>  
> +page_saved:
>          if (res < 0) {
>              return res;
>          }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 6/8] migration: move calling save_zero_page to the common place
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 12:27     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 12:27 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> save_zero_page() is always our first approach to try, move it to
> the common place before calling ram_save_compressed_page
> and ram_save_page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 106 ++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 60 insertions(+), 46 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 839665d866..9627ce18e9 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1021,15 +1021,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
>      XBZRLE_cache_lock();
> -    pages = save_zero_page(rs, block, offset);
> -    if (pages > 0) {
> -        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -         * page would be stale
> -         */
> -        xbzrle_cache_zero_page(rs, current_addr);
> -        ram_release_pages(block->idstr, offset, pages);
> -    } else if (!rs->ram_bulk_stage &&
> -               !migration_in_postcopy() && migrate_use_xbzrle()) {
> +    if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
> +           migrate_use_xbzrle()) {
>          pages = save_xbzrle_page(rs, &p, current_addr, block,
>                                   offset, last_stage);
>          if (!last_stage) {
> @@ -1172,40 +1165,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>  
>      p = block->host + offset;
>  
> -    /* When starting the process of a new block, the first page of
> -     * the block should be sent out before other pages in the same
> -     * block, and all the pages in last block should have been sent
> -     * out, keeping this order is important, because the 'cont' flag
> -     * is used to avoid resending the block name.
> -     */
>      if (block != rs->last_sent_block) {
> -        flush_compressed_data(rs);
> -        pages = save_zero_page(rs, block, offset);
> -        if (pages > 0) {
> -            ram_release_pages(block->idstr, offset, pages);
> -        } else {
> -            /*
> -             * Make sure the first page is sent out before other pages.
> -             *
> -             * we post it as normal page as compression will take much
> -             * CPU resource.
> -             */
> -            ram_counters.transferred += save_page_header(rs, rs->f, block,
> -                                            offset | RAM_SAVE_FLAG_PAGE);
> -            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                  migrate_release_ram() &
> -                                  migration_in_postcopy());
> -            ram_counters.transferred += TARGET_PAGE_SIZE;
> -            ram_counters.normal++;
> -            pages = 1;
> -        }
> +        /*
> +         * Make sure the first page is sent out before other pages.
> +         *
> +         * we post it as normal page as compression will take much
> +         * CPU resource.
> +         */
> +        ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                        offset | RAM_SAVE_FLAG_PAGE);
> +        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                              migrate_release_ram() &
> +                              migration_in_postcopy());
> +        ram_counters.transferred += TARGET_PAGE_SIZE;
> +        ram_counters.normal++;
> +        pages = 1;
>      } else {
> -        pages = save_zero_page(rs, block, offset);
> -        if (pages == -1) {
> -            pages = compress_page_with_multi_thread(rs, block, offset);
> -        } else {
> -            ram_release_pages(block->idstr, offset, pages);
> -        }
> +        pages = compress_page_with_multi_thread(rs, block, offset);
>      }
>  
>      return pages;
> @@ -1447,6 +1423,25 @@ err:
>      return -1;
>  }
>  
> +static bool save_page_use_compression(RAMState *rs)
> +{
> +    if (!migrate_use_compression()) {
> +        return false;
> +    }
> +
> +    /*
> +     * If xbzrle is on, stop using the data compression after first
> +     * round of migration even if compression is enabled. In theory,
> +     * xbzrle can do better than compression.
> +     */
> +    if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
> +        return true;
> +    }
> +
> +    return false;
> +
> +}
> +
>  /**
>   * ram_save_target_page: save one target page
>   *
> @@ -1472,12 +1467,31 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>          }
>  
>          /*
> -         * If xbzrle is on, stop using the data compression after first
> -         * round of migration even if compression is enabled. In theory,
> -         * xbzrle can do better than compression.
> +         * When starting the process of a new block, the first page of
> +         * the block should be sent out before other pages in the same
> +         * block, and all the pages in last block should have been sent
> +         * out, keeping this order is important, because the 'cont' flag
> +         * is used to avoid resending the block name.
>           */
> -        if (migrate_use_compression() &&
> -            (rs->ram_bulk_stage || !migrate_use_xbzrle())) {
> +        if (block != rs->last_sent_block && save_page_use_compression(rs)) {
> +            flush_compressed_data(rs);
> +        }
> +
> +        res = save_zero_page(rs, block, offset);
> +        if (res > 0) {
> +            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> +             * page would be stale
> +             */
> +            if (!save_page_use_compression(rs)) {

This test is quite interesting; I think the
reason it's different in the compression case is that since we don't put
any none-0 data in the xbzrle cache, we don't need to knock any old
none-0 pages out of the cache.

> +                XBZRLE_cache_lock();
> +                xbzrle_cache_zero_page(rs, block->offset + offset);
> +                XBZRLE_cache_unlock();
> +            }
> +            ram_release_pages(block->idstr, offset, res);
> +            goto page_saved;
> +        }
> +
> +        if (save_page_use_compression(rs)) {
>              res = ram_save_compressed_page(rs, pss, last_stage);
>          } else {
>              res = ram_save_page(rs, pss, last_stage);


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] migration: move calling save_zero_page to the common place
@ 2018-03-15 12:27     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 12:27 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> save_zero_page() is always our first approach to try, move it to
> the common place before calling ram_save_compressed_page
> and ram_save_page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 106 ++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 60 insertions(+), 46 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 839665d866..9627ce18e9 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1021,15 +1021,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
>      XBZRLE_cache_lock();
> -    pages = save_zero_page(rs, block, offset);
> -    if (pages > 0) {
> -        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -         * page would be stale
> -         */
> -        xbzrle_cache_zero_page(rs, current_addr);
> -        ram_release_pages(block->idstr, offset, pages);
> -    } else if (!rs->ram_bulk_stage &&
> -               !migration_in_postcopy() && migrate_use_xbzrle()) {
> +    if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
> +           migrate_use_xbzrle()) {
>          pages = save_xbzrle_page(rs, &p, current_addr, block,
>                                   offset, last_stage);
>          if (!last_stage) {
> @@ -1172,40 +1165,23 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>  
>      p = block->host + offset;
>  
> -    /* When starting the process of a new block, the first page of
> -     * the block should be sent out before other pages in the same
> -     * block, and all the pages in last block should have been sent
> -     * out, keeping this order is important, because the 'cont' flag
> -     * is used to avoid resending the block name.
> -     */
>      if (block != rs->last_sent_block) {
> -        flush_compressed_data(rs);
> -        pages = save_zero_page(rs, block, offset);
> -        if (pages > 0) {
> -            ram_release_pages(block->idstr, offset, pages);
> -        } else {
> -            /*
> -             * Make sure the first page is sent out before other pages.
> -             *
> -             * we post it as normal page as compression will take much
> -             * CPU resource.
> -             */
> -            ram_counters.transferred += save_page_header(rs, rs->f, block,
> -                                            offset | RAM_SAVE_FLAG_PAGE);
> -            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                  migrate_release_ram() &
> -                                  migration_in_postcopy());
> -            ram_counters.transferred += TARGET_PAGE_SIZE;
> -            ram_counters.normal++;
> -            pages = 1;
> -        }
> +        /*
> +         * Make sure the first page is sent out before other pages.
> +         *
> +         * we post it as normal page as compression will take much
> +         * CPU resource.
> +         */
> +        ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                        offset | RAM_SAVE_FLAG_PAGE);
> +        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                              migrate_release_ram() &
> +                              migration_in_postcopy());
> +        ram_counters.transferred += TARGET_PAGE_SIZE;
> +        ram_counters.normal++;
> +        pages = 1;
>      } else {
> -        pages = save_zero_page(rs, block, offset);
> -        if (pages == -1) {
> -            pages = compress_page_with_multi_thread(rs, block, offset);
> -        } else {
> -            ram_release_pages(block->idstr, offset, pages);
> -        }
> +        pages = compress_page_with_multi_thread(rs, block, offset);
>      }
>  
>      return pages;
> @@ -1447,6 +1423,25 @@ err:
>      return -1;
>  }
>  
> +static bool save_page_use_compression(RAMState *rs)
> +{
> +    if (!migrate_use_compression()) {
> +        return false;
> +    }
> +
> +    /*
> +     * If xbzrle is on, stop using the data compression after first
> +     * round of migration even if compression is enabled. In theory,
> +     * xbzrle can do better than compression.
> +     */
> +    if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
> +        return true;
> +    }
> +
> +    return false;
> +
> +}
> +
>  /**
>   * ram_save_target_page: save one target page
>   *
> @@ -1472,12 +1467,31 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>          }
>  
>          /*
> -         * If xbzrle is on, stop using the data compression after first
> -         * round of migration even if compression is enabled. In theory,
> -         * xbzrle can do better than compression.
> +         * When starting the process of a new block, the first page of
> +         * the block should be sent out before other pages in the same
> +         * block, and all the pages in last block should have been sent
> +         * out, keeping this order is important, because the 'cont' flag
> +         * is used to avoid resending the block name.
>           */
> -        if (migrate_use_compression() &&
> -            (rs->ram_bulk_stage || !migrate_use_xbzrle())) {
> +        if (block != rs->last_sent_block && save_page_use_compression(rs)) {
> +            flush_compressed_data(rs);
> +        }
> +
> +        res = save_zero_page(rs, block, offset);
> +        if (res > 0) {
> +            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> +             * page would be stale
> +             */
> +            if (!save_page_use_compression(rs)) {

This test is quite interesting; I think the
reason it's different in the compression case is that since we don't put
any none-0 data in the xbzrle cache, we don't need to knock any old
none-0 pages out of the cache.

> +                XBZRLE_cache_lock();
> +                xbzrle_cache_zero_page(rs, block->offset + offset);
> +                XBZRLE_cache_unlock();
> +            }
> +            ram_release_pages(block->idstr, offset, res);
> +            goto page_saved;
> +        }
> +
> +        if (save_page_use_compression(rs)) {
>              res = ram_save_compressed_page(rs, pss, last_stage);
>          } else {
>              res = ram_save_page(rs, pss, last_stage);


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 7/8] migration: introduce save_normal_page()
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 12:30     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 12:30 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> It directly sends the page to the stream neither checking zero nor
> using xbzrle or compression
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 50 ++++++++++++++++++++++++++++++--------------------
>  1 file changed, 30 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9627ce18e9..f778627992 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -995,6 +995,34 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>      return true;
>  }
>  
> +/*
> + * directly send the page to the stream
> + *
> + * Returns the number of pages written.
> + *
> + * @rs: current RAM state
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + * @buf: the page to be sent
> + * @async: send to page asyncly
> + */
> +static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> +                            uint8_t *buf, bool async)
> +{
> +    ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                                 offset | RAM_SAVE_FLAG_PAGE);
> +    if (async) {
> +        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
> +                              migrate_release_ram() &
> +                              migration_in_postcopy());
> +    } else {
> +        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
> +    }
> +    ram_counters.transferred += TARGET_PAGE_SIZE;
> +    ram_counters.normal++;
> +    return 1;
> +}
> +
>  /**
>   * ram_save_page: send the given page to the stream
>   *
> @@ -1035,18 +1063,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>  
>      /* XBZRLE overflow or normal page */
>      if (pages == -1) {
> -        ram_counters.transferred +=
> -            save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_PAGE);
> -        if (send_async) {
> -            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                  migrate_release_ram() &
> -                                  migration_in_postcopy());
> -        } else {
> -            qemu_put_buffer(rs->f, p, TARGET_PAGE_SIZE);
> -        }
> -        ram_counters.transferred += TARGET_PAGE_SIZE;
> -        pages = 1;
> -        ram_counters.normal++;
> +        pages = save_normal_page(rs, block, offset, p, send_async);
>      }
>  
>      XBZRLE_cache_unlock();
> @@ -1172,14 +1189,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>           * we post it as normal page as compression will take much
>           * CPU resource.
>           */
> -        ram_counters.transferred += save_page_header(rs, rs->f, block,
> -                                        offset | RAM_SAVE_FLAG_PAGE);
> -        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                              migrate_release_ram() &
> -                              migration_in_postcopy());
> -        ram_counters.transferred += TARGET_PAGE_SIZE;
> -        ram_counters.normal++;
> -        pages = 1;
> +        pages = save_normal_page(rs, block, offset, p, true);
>      } else {
>          pages = compress_page_with_multi_thread(rs, block, offset);
>      }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] migration: introduce save_normal_page()
@ 2018-03-15 12:30     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 12:30 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> It directly sends the page to the stream neither checking zero nor
> using xbzrle or compression
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 50 ++++++++++++++++++++++++++++++--------------------
>  1 file changed, 30 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9627ce18e9..f778627992 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -995,6 +995,34 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>      return true;
>  }
>  
> +/*
> + * directly send the page to the stream
> + *
> + * Returns the number of pages written.
> + *
> + * @rs: current RAM state
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + * @buf: the page to be sent
> + * @async: send to page asyncly
> + */
> +static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> +                            uint8_t *buf, bool async)
> +{
> +    ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                                 offset | RAM_SAVE_FLAG_PAGE);
> +    if (async) {
> +        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
> +                              migrate_release_ram() &
> +                              migration_in_postcopy());
> +    } else {
> +        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
> +    }
> +    ram_counters.transferred += TARGET_PAGE_SIZE;
> +    ram_counters.normal++;
> +    return 1;
> +}
> +
>  /**
>   * ram_save_page: send the given page to the stream
>   *
> @@ -1035,18 +1063,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>  
>      /* XBZRLE overflow or normal page */
>      if (pages == -1) {
> -        ram_counters.transferred +=
> -            save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_PAGE);
> -        if (send_async) {
> -            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                  migrate_release_ram() &
> -                                  migration_in_postcopy());
> -        } else {
> -            qemu_put_buffer(rs->f, p, TARGET_PAGE_SIZE);
> -        }
> -        ram_counters.transferred += TARGET_PAGE_SIZE;
> -        pages = 1;
> -        ram_counters.normal++;
> +        pages = save_normal_page(rs, block, offset, p, send_async);
>      }
>  
>      XBZRLE_cache_unlock();
> @@ -1172,14 +1189,7 @@ static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>           * we post it as normal page as compression will take much
>           * CPU resource.
>           */
> -        ram_counters.transferred += save_page_header(rs, rs->f, block,
> -                                        offset | RAM_SAVE_FLAG_PAGE);
> -        qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                              migrate_release_ram() &
> -                              migration_in_postcopy());
> -        ram_counters.transferred += TARGET_PAGE_SIZE;
> -        ram_counters.normal++;
> -        pages = 1;
> +        pages = save_normal_page(rs, block, offset, p, true);
>      } else {
>          pages = compress_page_with_multi_thread(rs, block, offset);
>      }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 8/8] migration: remove ram_save_compressed_page()
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-15 12:32     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 12:32 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Now, we can reuse the path in ram_save_page() to post the page out
> as normal, then the only thing remained in ram_save_compressed_page()
> is compression that we can move it out to the caller
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Thanks, that does simplify stuff a lot in the end!

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 45 ++++++++-------------------------------------
>  1 file changed, 8 insertions(+), 37 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index f778627992..8f4f8aca86 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1162,41 +1162,6 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
>      return pages;
>  }
>  
> -/**
> - * ram_save_compressed_page: compress the given page and send it to the stream
> - *
> - * Returns the number of pages written.
> - *
> - * @rs: current RAM state
> - * @block: block that contains the page we want to send
> - * @offset: offset inside the block for the page
> - * @last_stage: if we are at the completion stage
> - */
> -static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
> -                                    bool last_stage)
> -{
> -    int pages = -1;
> -    uint8_t *p;
> -    RAMBlock *block = pss->block;
> -    ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> -
> -    p = block->host + offset;
> -
> -    if (block != rs->last_sent_block) {
> -        /*
> -         * Make sure the first page is sent out before other pages.
> -         *
> -         * we post it as normal page as compression will take much
> -         * CPU resource.
> -         */
> -        pages = save_normal_page(rs, block, offset, p, true);
> -    } else {
> -        pages = compress_page_with_multi_thread(rs, block, offset);
> -    }
> -
> -    return pages;
> -}
> -
>  /**
>   * find_dirty_block: find the next dirty page and update any state
>   * associated with the search process.
> @@ -1501,8 +1466,14 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>              goto page_saved;
>          }
>  
> -        if (save_page_use_compression(rs)) {
> -            res = ram_save_compressed_page(rs, pss, last_stage);
> +        /*
> +         * Make sure the first page is sent out before other pages.
> +         *
> +         * we post it as normal page as compression will take much
> +         * CPU resource.
> +         */
> +        if (block == rs->last_sent_block && save_page_use_compression(rs)) {
> +            res = compress_page_with_multi_thread(rs, block, offset);
>          } else {
>              res = ram_save_page(rs, pss, last_stage);
>          }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] migration: remove ram_save_compressed_page()
@ 2018-03-15 12:32     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-15 12:32 UTC (permalink / raw)
  To: guangrong.xiao, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Now, we can reuse the path in ram_save_page() to post the page out
> as normal, then the only thing remained in ram_save_compressed_page()
> is compression that we can move it out to the caller
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Thanks, that does simplify stuff a lot in the end!

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 45 ++++++++-------------------------------------
>  1 file changed, 8 insertions(+), 37 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index f778627992..8f4f8aca86 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1162,41 +1162,6 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
>      return pages;
>  }
>  
> -/**
> - * ram_save_compressed_page: compress the given page and send it to the stream
> - *
> - * Returns the number of pages written.
> - *
> - * @rs: current RAM state
> - * @block: block that contains the page we want to send
> - * @offset: offset inside the block for the page
> - * @last_stage: if we are at the completion stage
> - */
> -static int ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
> -                                    bool last_stage)
> -{
> -    int pages = -1;
> -    uint8_t *p;
> -    RAMBlock *block = pss->block;
> -    ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> -
> -    p = block->host + offset;
> -
> -    if (block != rs->last_sent_block) {
> -        /*
> -         * Make sure the first page is sent out before other pages.
> -         *
> -         * we post it as normal page as compression will take much
> -         * CPU resource.
> -         */
> -        pages = save_normal_page(rs, block, offset, p, true);
> -    } else {
> -        pages = compress_page_with_multi_thread(rs, block, offset);
> -    }
> -
> -    return pages;
> -}
> -
>  /**
>   * find_dirty_block: find the next dirty page and update any state
>   * associated with the search process.
> @@ -1501,8 +1466,14 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>              goto page_saved;
>          }
>  
> -        if (save_page_use_compression(rs)) {
> -            res = ram_save_compressed_page(rs, pss, last_stage);
> +        /*
> +         * Make sure the first page is sent out before other pages.
> +         *
> +         * we post it as normal page as compression will take much
> +         * CPU resource.
> +         */
> +        if (block == rs->last_sent_block && save_page_use_compression(rs)) {
> +            res = compress_page_with_multi_thread(rs, block, offset);
>          } else {
>              res = ram_save_page(rs, pss, last_stage);
>          }
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-15 10:25     ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-16  8:05       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong, qemu-devel,
	mst, pbonzini


Hi David,

Thanks for your review.

On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:

>>   migration/ram.c | 32 ++++++++++++++++----------------
> 
> Hi,
>    Do you have some performance numbers to show this helps?  Were those
> taken on a normal system or were they taken with one of the compression
> accelerators (which I think the compression migration was designed for)?

Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.

During the migration, a workload which has 8 threads repeatedly written total
6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
applying, the bandwidth is ~50 mbps.

BTW, Compression will use almost all valid bandwidth after all of our work
which i will post it out part by part.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-16  8:05       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pbonzini, mst, mtosatti, quintela, liang.z.li, Xiao Guangrong,
	qemu-devel, kvm


Hi David,

Thanks for your review.

On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:

>>   migration/ram.c | 32 ++++++++++++++++----------------
> 
> Hi,
>    Do you have some performance numbers to show this helps?  Were those
> taken on a normal system or were they taken with one of the compression
> accelerators (which I think the compression migration was designed for)?

Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.

During the migration, a workload which has 8 threads repeatedly written total
6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
applying, the bandwidth is ~50 mbps.

BTW, Compression will use almost all valid bandwidth after all of our work
which i will post it out part by part.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-15 11:03     ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-16  8:19       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/15/2018 07:03 PM, Dr. David Alan Gilbert wrote:

>> +static int compress_threads_load_setup(void)
>> +{
>> +    int i, thread_count;
>> +
>> +    if (!migrate_use_compression()) {
>> +        return 0;
>> +    }
>> +
>> +    thread_count = migrate_decompress_threads();
>> +    decompress_threads = g_new0(QemuThread, thread_count);
>> +    decomp_param = g_new0(DecompressParam, thread_count);
>> +    qemu_mutex_init(&decomp_done_lock);
>> +    qemu_cond_init(&decomp_done_cond);
>> +    for (i = 0; i < thread_count; i++) {
>> +        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
>> +            goto exit;
>> +        }
>> +        decomp_param[i].stream.opaque = &decomp_param[i];
>> +
>> +        qemu_mutex_init(&decomp_param[i].mutex);
>> +        qemu_cond_init(&decomp_param[i].cond);
>> +        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
>> +        decomp_param[i].done = true;
>> +        decomp_param[i].quit = false;
>> +        qemu_thread_create(decompress_threads + i, "decompress",
>> +                           do_data_decompress, decomp_param + i,
>> +                           QEMU_THREAD_JOINABLE);
>> +    }
>> +    return 0;
>> +exit:
>> +    compress_threads_load_cleanup();
> 
> I don't think this is safe; if inflateInit(..) fails in not-the-last
> thread, compress_threads_load_cleanup() will try and destroy all the
> mutex's and condition variables, even though they've not yet all been
> _init'd.
> 

That is exactly why we used 'opaque', please see more below...

> However, other than that I think the patch is OK; a chat with Dan
> Berrange has convinced me this probably doesn't affect the stream
> format, so that's OK.
> 
> One thing I would like is a comment as to how the 'opaque' field is
> being used; I don't think I quite understand what you're doing there.

The zlib.h file says that:
"     The opaque value provided by the application will be passed as the first
    parameter for calls of zalloc and zfree.  This can be useful for custom
    memory management.  The compression library attaches no meaning to the
    opaque value."
So we can use it to store our private data.

Here, we use it as a indicator which shows if the thread is properly init'd
or not. If inflateInit() is successful we will set it to non-NULL, otherwise
it is NULL, so that the cleanup path can figure out the first thread failed
to do inflateInit().

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-16  8:19       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/15/2018 07:03 PM, Dr. David Alan Gilbert wrote:

>> +static int compress_threads_load_setup(void)
>> +{
>> +    int i, thread_count;
>> +
>> +    if (!migrate_use_compression()) {
>> +        return 0;
>> +    }
>> +
>> +    thread_count = migrate_decompress_threads();
>> +    decompress_threads = g_new0(QemuThread, thread_count);
>> +    decomp_param = g_new0(DecompressParam, thread_count);
>> +    qemu_mutex_init(&decomp_done_lock);
>> +    qemu_cond_init(&decomp_done_cond);
>> +    for (i = 0; i < thread_count; i++) {
>> +        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
>> +            goto exit;
>> +        }
>> +        decomp_param[i].stream.opaque = &decomp_param[i];
>> +
>> +        qemu_mutex_init(&decomp_param[i].mutex);
>> +        qemu_cond_init(&decomp_param[i].cond);
>> +        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
>> +        decomp_param[i].done = true;
>> +        decomp_param[i].quit = false;
>> +        qemu_thread_create(decompress_threads + i, "decompress",
>> +                           do_data_decompress, decomp_param + i,
>> +                           QEMU_THREAD_JOINABLE);
>> +    }
>> +    return 0;
>> +exit:
>> +    compress_threads_load_cleanup();
> 
> I don't think this is safe; if inflateInit(..) fails in not-the-last
> thread, compress_threads_load_cleanup() will try and destroy all the
> mutex's and condition variables, even though they've not yet all been
> _init'd.
> 

That is exactly why we used 'opaque', please see more below...

> However, other than that I think the patch is OK; a chat with Dan
> Berrange has convinced me this probably doesn't affect the stream
> format, so that's OK.
> 
> One thing I would like is a comment as to how the 'opaque' field is
> being used; I don't think I quite understand what you're doing there.

The zlib.h file says that:
"     The opaque value provided by the application will be passed as the first
    parameter for calls of zalloc and zfree.  This can be useful for custom
    memory management.  The compression library attaches no meaning to the
    opaque value."
So we can use it to store our private data.

Here, we use it as a indicator which shows if the thread is properly init'd
or not. If inflateInit() is successful we will set it to non-NULL, otherwise
it is NULL, so that the cleanup path can figure out the first thread failed
to do inflateInit().

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-15 11:29     ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-16  8:25       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/15/2018 07:29 PM, Dr. David Alan Gilbert wrote:

>> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>>   {
>>       RAMState *rs = ram_state;
>>       int bytes_sent, blen;
>> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
>> +    uint8_t buf[TARGET_PAGE_SIZE], *p;
> 
> That should be malloc'd somewhere rather than be on the stack; it's a
> bit big and also there are architectures where TARGET_PAGE_SIZE isn't
> compile time constant.
> 

Okay, i will allocate a internal buffer for each thread...

> (Also, please use g_try_malloc rather than g_malloc on larger chunks,
> since g_try_malloc will return NULL so you can fail nicely;  g_malloc is
> OK for small things that are very unlikely to fail).
> 
> Other than that, I think the patch is fine.

Thank you, Dave!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-16  8:25       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/15/2018 07:29 PM, Dr. David Alan Gilbert wrote:

>> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>>   {
>>       RAMState *rs = ram_state;
>>       int bytes_sent, blen;
>> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
>> +    uint8_t buf[TARGET_PAGE_SIZE], *p;
> 
> That should be malloc'd somewhere rather than be on the stack; it's a
> bit big and also there are architectures where TARGET_PAGE_SIZE isn't
> compile time constant.
> 

Okay, i will allocate a internal buffer for each thread...

> (Also, please use g_try_malloc rather than g_malloc on larger chunks,
> since g_try_malloc will return NULL so you can fail nicely;  g_malloc is
> OK for small things that are very unlikely to fail).
> 
> Other than that, I think the patch is fine.

Thank you, Dave!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/8] migration: introduce control_save_page()
  2018-03-15 11:37     ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-16  8:52       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, quintela
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/15/2018 07:37 PM, Dr. David Alan Gilbert wrote:
> * guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Abstract the common function control_save_page() to cleanup the code,
>> no logic is changed
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 

Thank you, Dave!

>>   
>> +/*
>> + * @pages: the number of pages written by the control path,
>> + *        < 0 - error
>> + *        > 0 - number of pages written
> 
> What about 0 !
> 

The control patch does not support 0 (means duplication). :)

Based on the implementation of qemu_rdma_save_page(), if any data
is properly posted out, @bytes_sent is set to 1, otherwise a error
is detected...

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] migration: introduce control_save_page()
@ 2018-03-16  8:52       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, quintela
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/15/2018 07:37 PM, Dr. David Alan Gilbert wrote:
> * guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Abstract the common function control_save_page() to cleanup the code,
>> no logic is changed
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 

Thank you, Dave!

>>   
>> +/*
>> + * @pages: the number of pages written by the control path,
>> + *        < 0 - error
>> + *        > 0 - number of pages written
> 
> What about 0 !
> 

The control patch does not support 0 (means duplication). :)

Based on the implementation of qemu_rdma_save_page(), if any data
is properly posted out, @bytes_sent is set to 1, otherwise a error
is detected...

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 5/8] migration: move calling control_save_page to the common place
  2018-03-15 11:47     ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-16  8:59       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/15/2018 07:47 PM, Dr. David Alan Gilbert wrote:

>>       /* Check the pages is dirty and if it is send it */
>>       if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> +        RAMBlock *block = pss->block;
>> +        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>> +
>> +        if (control_save_page(rs, block, offset, &res)) {
>> +            goto page_saved;
> 
> OK, but I'd prefer if you avoided this forward goto;  we do use goto but
> we tend to keep it just for error cases.
> 

There is a common operation, clearing unsentmap, for save_control,
save_zero, save_compressed and save_normal, if we do not use 'goto',
the operation would to be duplicated several times or we will have
big if...elseif...elseif... section.

So it may be not too bad to have 'goto' under this case? :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] migration: move calling control_save_page to the common place
@ 2018-03-16  8:59       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-16  8:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/15/2018 07:47 PM, Dr. David Alan Gilbert wrote:

>>       /* Check the pages is dirty and if it is send it */
>>       if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> +        RAMBlock *block = pss->block;
>> +        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>> +
>> +        if (control_save_page(rs, block, offset, &res)) {
>> +            goto page_saved;
> 
> OK, but I'd prefer if you avoided this forward goto;  we do use goto but
> we tend to keep it just for error cases.
> 

There is a common operation, clearing unsentmap, for save_control,
save_zero, save_compressed and save_normal, if we do not use 'goto',
the operation would to be duplicated several times or we will have
big if...elseif...elseif... section.

So it may be not too bad to have 'goto' under this case? :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeingmemory frequently
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-19  1:49     ` jiang.biao2
  -1 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-19  1:49 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

Hi, guangrong
> 
> +/* return the size after compression, or negative value on error */
> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                              const uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = deflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = (uint8_t *)source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
>+
duplicated code with qemu_uncompress(), would initializing stream outside 
of qemu_compress_data() be better? In that case, we could pass much less
parameters down, and avoid the duplicated code. Or could we encapsulate 
some struct to ease the case?
> +    err = deflate(stream, Z_FINISH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->next_out - dest;
> +}
> +
> 
> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
> return -1;
> }
> }
> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
> -                  (Bytef *)p, size, level) != Z_OK) {
> +
> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
> +                              blen, p, size);
The "level" parameter is never used after the patch, could we just removed it? 
On the other hand, deflate() of zlib supports compression level too(by 
deflateInit(stream, level)), should we just reuse the level properly?  If not, the 
*migrate parameter compress_level* will be useless. 
> +    if (blen < 0) {
> error_report("Compress Failed!");
> return 0;
> }
>
> +/* return the size after decompression, or negative value on error */
> +static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                           uint8_t *source, size_t source_len)
The name of *qemu_uncompress* does not quite match *qemu_compress_data*,
would *qemu_uncompress_data* be better?
Besides, the prototype is not consistent with  *qemu_compress_data* either, 
should the -*source- be -const- also here?
> +{
> +    int err;
> +
> +    err = inflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
> +
> +    err = inflate(stream, Z_NO_FLUSH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->total_out;
> +}
> +

Jiang
Regards,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeingmemory frequently
@ 2018-03-19  1:49     ` jiang.biao2
  0 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-19  1:49 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, xiaoguangrong, qemu-devel, kvm

Hi, guangrong
> 
> +/* return the size after compression, or negative value on error */
> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                              const uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = deflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = (uint8_t *)source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
>+
duplicated code with qemu_uncompress(), would initializing stream outside 
of qemu_compress_data() be better? In that case, we could pass much less
parameters down, and avoid the duplicated code. Or could we encapsulate 
some struct to ease the case?
> +    err = deflate(stream, Z_FINISH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->next_out - dest;
> +}
> +
> 
> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
> return -1;
> }
> }
> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
> -                  (Bytef *)p, size, level) != Z_OK) {
> +
> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
> +                              blen, p, size);
The "level" parameter is never used after the patch, could we just removed it? 
On the other hand, deflate() of zlib supports compression level too(by 
deflateInit(stream, level)), should we just reuse the level properly?  If not, the 
*migrate parameter compress_level* will be useless. 
> +    if (blen < 0) {
> error_report("Compress Failed!");
> return 0;
> }
>
> +/* return the size after decompression, or negative value on error */
> +static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                           uint8_t *source, size_t source_len)
The name of *qemu_uncompress* does not quite match *qemu_compress_data*,
would *qemu_uncompress_data* be better?
Besides, the prototype is not consistent with  *qemu_compress_data* either, 
should the -*source- be -const- also here?
> +{
> +    int err;
> +
> +    err = inflateReset(stream);
> +    if (err != Z_OK) {
> +        return -1;
> +    }
> +
> +    stream->avail_in = source_len;
> +    stream->next_in = source;
> +    stream->avail_out = dest_len;
> +    stream->next_out = dest;
> +
> +    err = inflate(stream, Z_NO_FLUSH);
> +    if (err != Z_STREAM_END) {
> +        return -1;
> +    }
> +
> +    return stream->total_out;
> +}
> +

Jiang
Regards,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeingmemory frequently
  2018-03-19  1:49     ` [Qemu-devel] " jiang.biao2
@ 2018-03-19  4:03       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-19  4:03 UTC (permalink / raw)
  To: jiang.biao2; +Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini



On 03/19/2018 09:49 AM, jiang.biao2@zte.com.cn wrote:
> Hi, guangrong
>>
>> +/* return the size after compression, or negative value on error */
>> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
>> +                              const uint8_t *source, size_t source_len)
>> +{
>> +    int err;
>> +
>> +    err = deflateReset(stream);
>> +    if (err != Z_OK) {
>> +        return -1;
>> +    }
>> +
>> +    stream->avail_in = source_len;
>> +    stream->next_in = (uint8_t *)source;
>> +    stream->avail_out = dest_len;
>> +    stream->next_out = dest;
>> +
> duplicated code with qemu_uncompress(), would initializing stream outside
> of qemu_compress_data() be better? In that case, we could pass much less
> parameters down, and avoid the duplicated code. Or could we encapsulate
> some struct to ease the case?

There are multiple places to do compression/uncompression in QEMU,
i am going to introduce common functions to cleanup these places,
that can be another patchset later...

>> +    err = deflate(stream, Z_FINISH);
>> +    if (err != Z_STREAM_END) {
>> +        return -1;
>> +    }
>> +
>> +    return stream->next_out - dest;
>> +}
>> +
>>
>> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
>> return -1;
>> }
>> }
>> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
>> -                  (Bytef *)p, size, level) != Z_OK) {
>> +
>> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>> +                              blen, p, size);
> The "level" parameter is never used after the patch, could we just removed it?
> On the other hand, deflate() of zlib supports compression level too(by
> deflateInit(stream, level)), should we just reuse the level properly?  If not, the
> *migrate parameter compress_level* will be useless.

The 'level' has been pushed to @stream:
+        if (deflateInit(&comp_param[i].stream,
+                           migrate_compress_level()) != Z_OK) {
+            goto exit;
+        }

>> +    if (blen < 0) {
>> error_report("Compress Failed!");
>> return 0;
>> }
>>
>> +/* return the size after decompression, or negative value on error */
>> +static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
>> +                           uint8_t *source, size_t source_len)
> The name of *qemu_uncompress* does not quite match *qemu_compress_data*,
> would *qemu_uncompress_data* be better?

It's good to me. will rename it.

> Besides, the prototype is not consistent with  *qemu_compress_data* either,
> should the -*source- be -const- also here?

Okay.

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeingmemory frequently
@ 2018-03-19  4:03       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-19  4:03 UTC (permalink / raw)
  To: jiang.biao2; +Cc: pbonzini, mst, mtosatti, xiaoguangrong, qemu-devel, kvm



On 03/19/2018 09:49 AM, jiang.biao2@zte.com.cn wrote:
> Hi, guangrong
>>
>> +/* return the size after compression, or negative value on error */
>> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
>> +                              const uint8_t *source, size_t source_len)
>> +{
>> +    int err;
>> +
>> +    err = deflateReset(stream);
>> +    if (err != Z_OK) {
>> +        return -1;
>> +    }
>> +
>> +    stream->avail_in = source_len;
>> +    stream->next_in = (uint8_t *)source;
>> +    stream->avail_out = dest_len;
>> +    stream->next_out = dest;
>> +
> duplicated code with qemu_uncompress(), would initializing stream outside
> of qemu_compress_data() be better? In that case, we could pass much less
> parameters down, and avoid the duplicated code. Or could we encapsulate
> some struct to ease the case?

There are multiple places to do compression/uncompression in QEMU,
i am going to introduce common functions to cleanup these places,
that can be another patchset later...

>> +    err = deflate(stream, Z_FINISH);
>> +    if (err != Z_STREAM_END) {
>> +        return -1;
>> +    }
>> +
>> +    return stream->next_out - dest;
>> +}
>> +
>>
>> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
>> return -1;
>> }
>> }
>> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
>> -                  (Bytef *)p, size, level) != Z_OK) {
>> +
>> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>> +                              blen, p, size);
> The "level" parameter is never used after the patch, could we just removed it?
> On the other hand, deflate() of zlib supports compression level too(by
> deflateInit(stream, level)), should we just reuse the level properly?  If not, the
> *migrate parameter compress_level* will be useless.

The 'level' has been pushed to @stream:
+        if (deflateInit(&comp_param[i].stream,
+                           migrate_compress_level()) != Z_OK) {
+            goto exit;
+        }

>> +    if (blen < 0) {
>> error_report("Compress Failed!");
>> return 0;
>> }
>>
>> +/* return the size after decompression, or negative value on error */
>> +static int qemu_uncompress(z_stream *stream, uint8_t *dest, size_t dest_len,
>> +                           uint8_t *source, size_t source_len)
> The name of *qemu_uncompress* does not quite match *qemu_compress_data*,
> would *qemu_uncompress_data* be better?

It's good to me. will rename it.

> Besides, the prototype is not consistent with  *qemu_compress_data* either,
> should the -*source- be -const- also here?

Okay.

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating andfreeingmemory frequently
  2018-03-19  4:03       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-19  4:48         ` jiang.biao2
  -1 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-19  4:48 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

>>> +    err = deflate(stream, Z_FINISH);
>>> +    if (err != Z_STREAM_END) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    return stream->next_out - dest;
>>> +}
>>> +
>>>
>>> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
>>> return -1;
>>> }
>>> }
>>> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
>>> -                  (Bytef *)p, size, level) != Z_OK) {
>>> +
>>> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>>> +                              blen, p, size);
>> The "level" parameter is never used after the patch, could we just removed it?
>> On the other hand, deflate() of zlib supports compression level too(by
>> deflateInit(stream, level)), should we just reuse the level properly?  If not, the
>> *migrate parameter compress_level* will be useless.
> 
> The 'level' has been pushed to @stream:
> +        if (deflateInit(&comp_param[i].stream,
> +                           migrate_compress_level()) != Z_OK) {
> +            goto exit;
> +        }
Indeed, I missed that. 
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating andfreeingmemory frequently
@ 2018-03-19  4:48         ` jiang.biao2
  0 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-19  4:48 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

>>> +    err = deflate(stream, Z_FINISH);
>>> +    if (err != Z_STREAM_END) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    return stream->next_out - dest;
>>> +}
>>> +
>>>
>>> @@ -683,8 +707,10 @@ ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
>>> return -1;
>>> }
>>> }
>>> -    if (compress2(f->buf + f->buf_index + sizeof(int32_t), (uLongf *)&blen,
>>> -                  (Bytef *)p, size, level) != Z_OK) {
>>> +
>>> +    blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>>> +                              blen, p, size);
>> The "level" parameter is never used after the patch, could we just removed it?
>> On the other hand, deflate() of zlib supports compression level too(by
>> deflateInit(stream, level)), should we just reuse the level properly?  If not, the
>> *migrate parameter compress_level* will be useless.
> 
> The 'level' has been pushed to @stream:
> +        if (deflateInit(&comp_param[i].stream,
> +                           migrate_compress_level()) != Z_OK) {
> +            goto exit;
> +        }
Indeed, I missed that. 
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compressionand decompression errors
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-19  7:56     ` jiang.biao2
  -1 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-19  7:56 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

Hi, guangrong
> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
> {
> RAMState *rs = ram_state;
> int bytes_sent, blen;
> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> +    uint8_t buf[TARGET_PAGE_SIZE], *p;

> +    p = block->host + (offset & TARGET_PAGE_MASK);
> bytes_sent = save_page_header(rs, f, block, offset |
> RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
> +    memcpy(buf, p, TARGET_PAGE_SIZE);
> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
Memory copy operation for every page to be compressed is not cheap, especially
when the page number is huge, and it may be not necessary for pages never 
updated during migration. 
Is there any possibility that we can distinguish the real compress/decompress 
errors from those being caused by source VM updating? Such as the return 
value of qemu_uncompress(distinguish Z_DATA_ERROR and other error codes 
returned by inflate())? 

Jiang
Regards,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compressionand decompression errors
@ 2018-03-19  7:56     ` jiang.biao2
  0 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-19  7:56 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, xiaoguangrong, qemu-devel, kvm

Hi, guangrong
> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
> {
> RAMState *rs = ram_state;
> int bytes_sent, blen;
> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> +    uint8_t buf[TARGET_PAGE_SIZE], *p;

> +    p = block->host + (offset & TARGET_PAGE_MASK);
> bytes_sent = save_page_header(rs, f, block, offset |
> RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
> +    memcpy(buf, p, TARGET_PAGE_SIZE);
> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
Memory copy operation for every page to be compressed is not cheap, especially
when the page number is huge, and it may be not necessary for pages never 
updated during migration. 
Is there any possibility that we can distinguish the real compress/decompress 
errors from those being caused by source VM updating? Such as the return 
value of qemu_uncompress(distinguish Z_DATA_ERROR and other error codes 
returned by inflate())? 

Jiang
Regards,

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compressionand decompression errors
  2018-03-19  7:56     ` [Qemu-devel] " jiang.biao2
@ 2018-03-19  8:01       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-19  8:01 UTC (permalink / raw)
  To: jiang.biao2; +Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini



On 03/19/2018 03:56 PM, jiang.biao2@zte.com.cn wrote:
> Hi, guangrong
>> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>> {
>> RAMState *rs = ram_state;
>> int bytes_sent, blen;
>> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
>> +    uint8_t buf[TARGET_PAGE_SIZE], *p;
> 
>> +    p = block->host + (offset & TARGET_PAGE_MASK);
>> bytes_sent = save_page_header(rs, f, block, offset |
>> RAM_SAVE_FLAG_COMPRESS_PAGE);
>> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
>> +    memcpy(buf, p, TARGET_PAGE_SIZE);
>> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
> Memory copy operation for every page to be compressed is not cheap, especially
> when the page number is huge, and it may be not necessary for pages never
> updated during migration.

This is only for 4k page.

> Is there any possibility that we can distinguish the real compress/decompress
> errors from those being caused by source VM updating? Such as the return
> value of qemu_uncompress(distinguish Z_DATA_ERROR and other error codes
> returned by inflate())?

Unfortunately, no. :(

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compressionand decompression errors
@ 2018-03-19  8:01       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-19  8:01 UTC (permalink / raw)
  To: jiang.biao2; +Cc: pbonzini, mst, mtosatti, xiaoguangrong, qemu-devel, kvm



On 03/19/2018 03:56 PM, jiang.biao2@zte.com.cn wrote:
> Hi, guangrong
>> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>> {
>> RAMState *rs = ram_state;
>> int bytes_sent, blen;
>> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
>> +    uint8_t buf[TARGET_PAGE_SIZE], *p;
> 
>> +    p = block->host + (offset & TARGET_PAGE_MASK);
>> bytes_sent = save_page_header(rs, f, block, offset |
>> RAM_SAVE_FLAG_COMPRESS_PAGE);
>> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
>> +    memcpy(buf, p, TARGET_PAGE_SIZE);
>> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
> Memory copy operation for every page to be compressed is not cheap, especially
> when the page number is huge, and it may be not necessary for pages never
> updated during migration.

This is only for 4k page.

> Is there any possibility that we can distinguish the real compress/decompress
> errors from those being caused by source VM updating? Such as the return
> value of qemu_uncompress(distinguish Z_DATA_ERROR and other error codes
> returned by inflate())?

Unfortunately, no. :(

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-16  8:19       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-19 10:54         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-19 10:54 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, quintela, pbonzini

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 03/15/2018 07:03 PM, Dr. David Alan Gilbert wrote:
> 
> > > +static int compress_threads_load_setup(void)
> > > +{
> > > +    int i, thread_count;
> > > +
> > > +    if (!migrate_use_compression()) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    thread_count = migrate_decompress_threads();
> > > +    decompress_threads = g_new0(QemuThread, thread_count);
> > > +    decomp_param = g_new0(DecompressParam, thread_count);
> > > +    qemu_mutex_init(&decomp_done_lock);
> > > +    qemu_cond_init(&decomp_done_cond);
> > > +    for (i = 0; i < thread_count; i++) {
> > > +        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
> > > +            goto exit;
> > > +        }
> > > +        decomp_param[i].stream.opaque = &decomp_param[i];
> > > +
> > > +        qemu_mutex_init(&decomp_param[i].mutex);
> > > +        qemu_cond_init(&decomp_param[i].cond);
> > > +        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> > > +        decomp_param[i].done = true;
> > > +        decomp_param[i].quit = false;
> > > +        qemu_thread_create(decompress_threads + i, "decompress",
> > > +                           do_data_decompress, decomp_param + i,
> > > +                           QEMU_THREAD_JOINABLE);
> > > +    }
> > > +    return 0;
> > > +exit:
> > > +    compress_threads_load_cleanup();
> > 
> > I don't think this is safe; if inflateInit(..) fails in not-the-last
> > thread, compress_threads_load_cleanup() will try and destroy all the
> > mutex's and condition variables, even though they've not yet all been
> > _init'd.
> > 
> 
> That is exactly why we used 'opaque', please see more below...
> 
> > However, other than that I think the patch is OK; a chat with Dan
> > Berrange has convinced me this probably doesn't affect the stream
> > format, so that's OK.
> > 
> > One thing I would like is a comment as to how the 'opaque' field is
> > being used; I don't think I quite understand what you're doing there.
> 
> The zlib.h file says that:
> "     The opaque value provided by the application will be passed as the first
>    parameter for calls of zalloc and zfree.  This can be useful for custom
>    memory management.  The compression library attaches no meaning to the
>    opaque value."
> So we can use it to store our private data.
> 
> Here, we use it as a indicator which shows if the thread is properly init'd
> or not. If inflateInit() is successful we will set it to non-NULL, otherwise
> it is NULL, so that the cleanup path can figure out the first thread failed
> to do inflateInit().

OK, so I think you just need to add a comment to explain that. Put it
above the 'if (!decomp_param[i].stream.opaque) {' in
compress_threads_load_cleanup  so it'll be easy to understand.

Dave

> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-19 10:54         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-19 10:54 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: quintela, pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 03/15/2018 07:03 PM, Dr. David Alan Gilbert wrote:
> 
> > > +static int compress_threads_load_setup(void)
> > > +{
> > > +    int i, thread_count;
> > > +
> > > +    if (!migrate_use_compression()) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    thread_count = migrate_decompress_threads();
> > > +    decompress_threads = g_new0(QemuThread, thread_count);
> > > +    decomp_param = g_new0(DecompressParam, thread_count);
> > > +    qemu_mutex_init(&decomp_done_lock);
> > > +    qemu_cond_init(&decomp_done_cond);
> > > +    for (i = 0; i < thread_count; i++) {
> > > +        if (inflateInit(&decomp_param[i].stream) != Z_OK) {
> > > +            goto exit;
> > > +        }
> > > +        decomp_param[i].stream.opaque = &decomp_param[i];
> > > +
> > > +        qemu_mutex_init(&decomp_param[i].mutex);
> > > +        qemu_cond_init(&decomp_param[i].cond);
> > > +        decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> > > +        decomp_param[i].done = true;
> > > +        decomp_param[i].quit = false;
> > > +        qemu_thread_create(decompress_threads + i, "decompress",
> > > +                           do_data_decompress, decomp_param + i,
> > > +                           QEMU_THREAD_JOINABLE);
> > > +    }
> > > +    return 0;
> > > +exit:
> > > +    compress_threads_load_cleanup();
> > 
> > I don't think this is safe; if inflateInit(..) fails in not-the-last
> > thread, compress_threads_load_cleanup() will try and destroy all the
> > mutex's and condition variables, even though they've not yet all been
> > _init'd.
> > 
> 
> That is exactly why we used 'opaque', please see more below...
> 
> > However, other than that I think the patch is OK; a chat with Dan
> > Berrange has convinced me this probably doesn't affect the stream
> > format, so that's OK.
> > 
> > One thing I would like is a comment as to how the 'opaque' field is
> > being used; I don't think I quite understand what you're doing there.
> 
> The zlib.h file says that:
> "     The opaque value provided by the application will be passed as the first
>    parameter for calls of zalloc and zfree.  This can be useful for custom
>    memory management.  The compression library attaches no meaning to the
>    opaque value."
> So we can use it to store our private data.
> 
> Here, we use it as a indicator which shows if the thread is properly init'd
> or not. If inflateInit() is successful we will set it to non-NULL, otherwise
> it is NULL, so that the cleanup path can figure out the first thread failed
> to do inflateInit().

OK, so I think you just need to add a comment to explain that. Put it
above the 'if (!decomp_param[i].stream.opaque) {' in
compress_threads_load_cleanup  so it'll be easy to understand.

Dave

> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-19 10:54         ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-19 12:11           ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-19 12:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, quintela, pbonzini



On 03/19/2018 06:54 PM, Dr. David Alan Gilbert wrote:

>>>> +    return 0;
>>>> +exit:
>>>> +    compress_threads_load_cleanup();
>>>
>>> I don't think this is safe; if inflateInit(..) fails in not-the-last
>>> thread, compress_threads_load_cleanup() will try and destroy all the
>>> mutex's and condition variables, even though they've not yet all been
>>> _init'd.
>>>
>>
>> That is exactly why we used 'opaque', please see more below...
>>
>>> However, other than that I think the patch is OK; a chat with Dan
>>> Berrange has convinced me this probably doesn't affect the stream
>>> format, so that's OK.
>>>
>>> One thing I would like is a comment as to how the 'opaque' field is
>>> being used; I don't think I quite understand what you're doing there.
>>
>> The zlib.h file says that:
>> "     The opaque value provided by the application will be passed as the first
>>     parameter for calls of zalloc and zfree.  This can be useful for custom
>>     memory management.  The compression library attaches no meaning to the
>>     opaque value."
>> So we can use it to store our private data.
>>
>> Here, we use it as a indicator which shows if the thread is properly init'd
>> or not. If inflateInit() is successful we will set it to non-NULL, otherwise
>> it is NULL, so that the cleanup path can figure out the first thread failed
>> to do inflateInit().
> 
> OK, so I think you just need to add a comment to explain that. Put it
> above the 'if (!decomp_param[i].stream.opaque) {' in
> compress_threads_load_cleanup  so it'll be easy to understand.

Yes, indeed, i will do.

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-19 12:11           ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-19 12:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: quintela, pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/19/2018 06:54 PM, Dr. David Alan Gilbert wrote:

>>>> +    return 0;
>>>> +exit:
>>>> +    compress_threads_load_cleanup();
>>>
>>> I don't think this is safe; if inflateInit(..) fails in not-the-last
>>> thread, compress_threads_load_cleanup() will try and destroy all the
>>> mutex's and condition variables, even though they've not yet all been
>>> _init'd.
>>>
>>
>> That is exactly why we used 'opaque', please see more below...
>>
>>> However, other than that I think the patch is OK; a chat with Dan
>>> Berrange has convinced me this probably doesn't affect the stream
>>> format, so that's OK.
>>>
>>> One thing I would like is a comment as to how the 'opaque' field is
>>> being used; I don't think I quite understand what you're doing there.
>>
>> The zlib.h file says that:
>> "     The opaque value provided by the application will be passed as the first
>>     parameter for calls of zalloc and zfree.  This can be useful for custom
>>     memory management.  The compression library attaches no meaning to the
>>     opaque value."
>> So we can use it to store our private data.
>>
>> Here, we use it as a indicator which shows if the thread is properly init'd
>> or not. If inflateInit() is successful we will set it to non-NULL, otherwise
>> it is NULL, so that the cleanup path can figure out the first thread failed
>> to do inflateInit().
> 
> OK, so I think you just need to add a comment to explain that. Put it
> above the 'if (!decomp_param[i].stream.opaque) {' in
> compress_threads_load_cleanup  so it'll be easy to understand.

Yes, indeed, i will do.

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-16  8:05       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-19 12:11         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-19 12:11 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong, qemu-devel,
	mst, pbonzini

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> Hi David,
> 
> Thanks for your review.
> 
> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> 
> > >   migration/ram.c | 32 ++++++++++++++++----------------
> > 
> > Hi,
> >    Do you have some performance numbers to show this helps?  Were those
> > taken on a normal system or were they taken with one of the compression
> > accelerators (which I think the compression migration was designed for)?
> 
> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> 
> During the migration, a workload which has 8 threads repeatedly written total
> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> applying, the bandwidth is ~50 mbps.

OK, that's good - worth adding those notes to your cover letter.
I wonder how well it works with compression acceleration hardware; I
can't see anything in this series making it worse.

> BTW, Compression will use almost all valid bandwidth after all of our work
> which i will post it out part by part.

Oh, that will be very nice.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-19 12:11         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-19 12:11 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, mst, mtosatti, quintela, liang.z.li, Xiao Guangrong,
	qemu-devel, kvm

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> Hi David,
> 
> Thanks for your review.
> 
> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> 
> > >   migration/ram.c | 32 ++++++++++++++++----------------
> > 
> > Hi,
> >    Do you have some performance numbers to show this helps?  Were those
> > taken on a normal system or were they taken with one of the compression
> > accelerators (which I think the compression migration was designed for)?
> 
> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> 
> During the migration, a workload which has 8 threads repeatedly written total
> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> applying, the bandwidth is ~50 mbps.

OK, that's good - worth adding those notes to your cover letter.
I wonder how well it works with compression acceleration hardware; I
can't see anything in this series making it worse.

> BTW, Compression will use almost all valid bandwidth after all of our work
> which i will post it out part by part.

Oh, that will be very nice.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 5/8] migration: move calling control_save_page to the common place
  2018-03-16  8:59       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-19 13:15         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-19 13:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 03/15/2018 07:47 PM, Dr. David Alan Gilbert wrote:
> 
> > >       /* Check the pages is dirty and if it is send it */
> > >       if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> > > +        RAMBlock *block = pss->block;
> > > +        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> > > +
> > > +        if (control_save_page(rs, block, offset, &res)) {
> > > +            goto page_saved;
> > 
> > OK, but I'd prefer if you avoided this forward goto;  we do use goto but
> > we tend to keep it just for error cases.
> > 
> 
> There is a common operation, clearing unsentmap, for save_control,
> save_zero, save_compressed and save_normal, if we do not use 'goto',
> the operation would to be duplicated several times or we will have
> big if...elseif...elseif... section.
> 
> So it may be not too bad to have 'goto' under this case? :)

The problem is it always tends to creep a bit, and then you soon have
a knot of goto's.

I suggest you add a 'page_saved' bool, set it instead of taking the
goto, and then add a if (!page_saved) around the next section.
It doesn't need to nest for the last section; you just do another
if (!page_saved) if around that.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] migration: move calling control_save_page to the common place
@ 2018-03-19 13:15         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-19 13:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 03/15/2018 07:47 PM, Dr. David Alan Gilbert wrote:
> 
> > >       /* Check the pages is dirty and if it is send it */
> > >       if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> > > +        RAMBlock *block = pss->block;
> > > +        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> > > +
> > > +        if (control_save_page(rs, block, offset, &res)) {
> > > +            goto page_saved;
> > 
> > OK, but I'd prefer if you avoided this forward goto;  we do use goto but
> > we tend to keep it just for error cases.
> > 
> 
> There is a common operation, clearing unsentmap, for save_control,
> save_zero, save_compressed and save_normal, if we do not use 'goto',
> the operation would to be duplicated several times or we will have
> big if...elseif...elseif... section.
> 
> So it may be not too bad to have 'goto' under this case? :)

The problem is it always tends to creep a bit, and then you soon have
a knot of goto's.

I suggest you add a 'page_saved' bool, set it instead of taking the
goto, and then add a if (!page_saved) around the next section.
It doesn't need to nest for the last section; you just do another
if (!page_saved) if around that.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-16  8:05       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-21  8:19         ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-21  8:19 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong,
	Dr. David Alan Gilbert, qemu-devel, mst, pbonzini

On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> 
> Hi David,
> 
> Thanks for your review.
> 
> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> 
> > >   migration/ram.c | 32 ++++++++++++++++----------------
> > 
> > Hi,
> >    Do you have some performance numbers to show this helps?  Were those
> > taken on a normal system or were they taken with one of the compression
> > accelerators (which I think the compression migration was designed for)?
> 
> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> 
> During the migration, a workload which has 8 threads repeatedly written total
> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> applying, the bandwidth is ~50 mbps.

Hi, Guangrong,

Not really review comments, but I got some questions. :)

IIUC this patch will only change the behavior when last_sent_block
changed.  I see that the performance is doubled after the change,
which is really promising.  However I don't fully understand why it
brings such a big difference considering that IMHO current code is
sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
not change frequently?  Or am I wrong?

Another follow-up question would be: have you measured how long time
needed to compress a 4k page, and how many time to send it?  I think
"sending the page" is not really meaningful considering that we just
put a page into the buffer (which should be extremely fast since we
don't really flush it every time), however I would be curious on how
slow would compressing a page be.

Thanks,

> 
> BTW, Compression will use almost all valid bandwidth after all of our work
> which i will post it out part by part.
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-21  8:19         ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-21  8:19 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Dr. David Alan Gilbert, liang.z.li, kvm, quintela, mtosatti,
	Xiao Guangrong, qemu-devel, mst, pbonzini

On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> 
> Hi David,
> 
> Thanks for your review.
> 
> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> 
> > >   migration/ram.c | 32 ++++++++++++++++----------------
> > 
> > Hi,
> >    Do you have some performance numbers to show this helps?  Were those
> > taken on a normal system or were they taken with one of the compression
> > accelerators (which I think the compression migration was designed for)?
> 
> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> 
> During the migration, a workload which has 8 threads repeatedly written total
> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> applying, the bandwidth is ~50 mbps.

Hi, Guangrong,

Not really review comments, but I got some questions. :)

IIUC this patch will only change the behavior when last_sent_block
changed.  I see that the performance is doubled after the change,
which is really promising.  However I don't fully understand why it
brings such a big difference considering that IMHO current code is
sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
not change frequently?  Or am I wrong?

Another follow-up question would be: have you measured how long time
needed to compress a 4k page, and how many time to send it?  I think
"sending the page" is not really meaningful considering that we just
put a page into the buffer (which should be extremely fast since we
don't really flush it every time), however I would be curious on how
slow would compressing a page be.

Thanks,

> 
> BTW, Compression will use almost all valid bandwidth after all of our work
> which i will post it out part by part.
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-21  9:06     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-21  9:06 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 13, 2018 at 03:57:33PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Current code uses compress2()/uncompress() to compress/decompress
> memory, these two function manager memory allocation and release
> internally, that causes huge memory is allocated and freed very
> frequently
> 
> More worse, frequently returning memory to kernel will flush TLBs
> and trigger invalidation callbacks on mmu-notification which
> interacts with KVM MMU, that dramatically reduce the performance
> of VM
> 
> So, we maintain the memory by ourselves and reuse it for each
> compression and decompression
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  34 ++++++++++--
>  migration/qemu-file.h |   6 ++-
>  migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 42 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2ab2bf362d..1ff33a1ffb 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
>      return v;
>  }
>  
> +/* return the size after compression, or negative value on error */
> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                              const uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = deflateReset(stream);

I'm not familiar with zlib, but I saw this in manual:

 https://www.zlib.net/manual.html

 This function is equivalent to deflateEnd followed by deflateInit,
 but does not free and reallocate the internal compression state. The
 stream will leave the compression level and any other attributes that
 may have been set unchanged.

I thought it was deflateInit() who is slow?  Can we avoid the reset as
long as we make sure to deflateInit() before doing anything else?

Meanwhile, is there any performance number for this single patch?
Since I thought the old code is calling compress2() which contains
deflateInit() and deflateEnd() too, just like what current patch do?

It would be nice too if we can split the patch into two (decode,
encode) if you want, but that's optional.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-21  9:06     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-21  9:06 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 13, 2018 at 03:57:33PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Current code uses compress2()/uncompress() to compress/decompress
> memory, these two function manager memory allocation and release
> internally, that causes huge memory is allocated and freed very
> frequently
> 
> More worse, frequently returning memory to kernel will flush TLBs
> and trigger invalidation callbacks on mmu-notification which
> interacts with KVM MMU, that dramatically reduce the performance
> of VM
> 
> So, we maintain the memory by ourselves and reuse it for each
> compression and decompression
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  34 ++++++++++--
>  migration/qemu-file.h |   6 ++-
>  migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 42 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2ab2bf362d..1ff33a1ffb 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
>      return v;
>  }
>  
> +/* return the size after compression, or negative value on error */
> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> +                              const uint8_t *source, size_t source_len)
> +{
> +    int err;
> +
> +    err = deflateReset(stream);

I'm not familiar with zlib, but I saw this in manual:

 https://www.zlib.net/manual.html

 This function is equivalent to deflateEnd followed by deflateInit,
 but does not free and reallocate the internal compression state. The
 stream will leave the compression level and any other attributes that
 may have been set unchanged.

I thought it was deflateInit() who is slow?  Can we avoid the reset as
long as we make sure to deflateInit() before doing anything else?

Meanwhile, is there any performance number for this single patch?
Since I thought the old code is calling compress2() which contains
deflateInit() and deflateEnd() too, just like what current patch do?

It would be nice too if we can split the patch into two (decode,
encode) if you want, but that's optional.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-21 10:00     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-21 10:00 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Currently the page being compressed is allowed to be updated by
> the VM on the source QEMU, correspondingly the destination QEMU
> just ignores the decompression error. However, we completely miss
> the chance to catch real errors, then the VM is corrupted silently
> 
> To make the migration more robuster, we copy the page to a buffer
> first to avoid it being written by VM, then detect and handle the
> errors of both compression and decompression errors properly

Not sure I missed anything important, but I'll just shoot my thoughts
as questions (again)...

Actually this is a more general question? Say, even without
compression, we can be sending a page that is being modified.

However, IMHO we don't need to worry that, since if that page is
modified, we'll definitely send that page again, so the new page will
replace the old.  So on destination side, even if decompress() failed
on a page it'll be fine IMHO.  Though now we are copying the corrupted
buffer.  On that point, I fully agree that we should not - maybe we
can just drop the page entirely?

For non-compress pages, we can't detect that, so we'll copy the page
even if corrupted.

The special part for compression would be: would the deflate() fail if
there is concurrent update to the buffer being compressed?  And would
that corrupt the whole compression stream, or it would only fail the
deflate() call?

Thanks,

> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  4 ++--
>  migration/ram.c       | 29 +++++++++++++++++++----------
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1ff33a1ffb..137bcc8bdc 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
>      blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>                                blen, p, size);
>      if (blen < 0) {
> -        error_report("Compress Failed!");
> -        return 0;
> +        return -1;
>      }
> +
>      qemu_put_be32(f, blen);
>      if (f->ops->writev_buffer) {
>          add_to_iovec(f, f->buf + f->buf_index, blen, false);
> diff --git a/migration/ram.c b/migration/ram.c
> index fff3f31e90..c47185d38c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -273,6 +273,7 @@ struct DecompressParam {
>      bool quit;
>      QemuMutex mutex;
>      QemuCond cond;
> +    QEMUFile *file;
>      void *des;
>      uint8_t *compbuf;
>      int len;
> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>  {
>      RAMState *rs = ram_state;
>      int bytes_sent, blen;
> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> +    uint8_t buf[TARGET_PAGE_SIZE], *p;
>  
> +    p = block->host + (offset & TARGET_PAGE_MASK);
>      bytes_sent = save_page_header(rs, f, block, offset |
>                                    RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
> +    memcpy(buf, p, TARGET_PAGE_SIZE);
> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
>      if (blen < 0) {
>          bytes_sent = 0;
>          qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
> @@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque)
>      DecompressParam *param = opaque;
>      unsigned long pagesize;
>      uint8_t *des;
> -    int len;
> +    int len, ret;
>  
>      qemu_mutex_lock(&param->mutex);
>      while (!param->quit) {
> @@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque)
>               * not a problem because the dirty page will be retransferred
>               * and uncompress() won't break the data in other pages.
>               */
> -            qemu_uncompress(&param->stream, des, pagesize,
> -                            param->compbuf, len);
> +            ret = qemu_uncompress(&param->stream, des, pagesize,
> +                                  param->compbuf, len);
> +            if (ret < 0) {
> +                error_report("decompress data failed");
> +                qemu_file_set_error(param->file, ret);
> +            }
>  
>              qemu_mutex_lock(&decomp_done_lock);
>              param->done = true;
> @@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque)
>      return NULL;
>  }
>  
> -static void wait_for_decompress_done(void)
> +static int wait_for_decompress_done(QEMUFile *f)
>  {
>      int idx, thread_count;
>  
>      if (!migrate_use_compression()) {
> -        return;
> +        return 0;
>      }
>  
>      thread_count = migrate_decompress_threads();
> @@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void)
>          }
>      }
>      qemu_mutex_unlock(&decomp_done_lock);
> +    return qemu_file_get_error(f);
>  }
>  
>  static void compress_threads_load_cleanup(void)
> @@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void)
>      decomp_param = NULL;
>  }
>  
> -static int compress_threads_load_setup(void)
> +static int compress_threads_load_setup(QEMUFile *f)
>  {
>      int i, thread_count;
>  
> @@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void)
>          }
>          decomp_param[i].stream.opaque = &decomp_param[i];
>  
> +        decomp_param[i].file = f;
>          qemu_mutex_init(&decomp_param[i].mutex);
>          qemu_cond_init(&decomp_param[i].cond);
>          decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> @@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>   */
>  static int ram_load_setup(QEMUFile *f, void *opaque)
>  {
> -    if (compress_threads_load_setup()) {
> +    if (compress_threads_load_setup(f)) {
>          return -1;
>      }
>  
> @@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>          }
>      }
>  
> -    wait_for_decompress_done();
> +    ret |= wait_for_decompress_done(f);
>      rcu_read_unlock();
>      trace_ram_load_complete(ret, seq_iter);
>      return ret;
> -- 
> 2.14.3
> 
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-21 10:00     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-21 10:00 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Currently the page being compressed is allowed to be updated by
> the VM on the source QEMU, correspondingly the destination QEMU
> just ignores the decompression error. However, we completely miss
> the chance to catch real errors, then the VM is corrupted silently
> 
> To make the migration more robuster, we copy the page to a buffer
> first to avoid it being written by VM, then detect and handle the
> errors of both compression and decompression errors properly

Not sure I missed anything important, but I'll just shoot my thoughts
as questions (again)...

Actually this is a more general question? Say, even without
compression, we can be sending a page that is being modified.

However, IMHO we don't need to worry that, since if that page is
modified, we'll definitely send that page again, so the new page will
replace the old.  So on destination side, even if decompress() failed
on a page it'll be fine IMHO.  Though now we are copying the corrupted
buffer.  On that point, I fully agree that we should not - maybe we
can just drop the page entirely?

For non-compress pages, we can't detect that, so we'll copy the page
even if corrupted.

The special part for compression would be: would the deflate() fail if
there is concurrent update to the buffer being compressed?  And would
that corrupt the whole compression stream, or it would only fail the
deflate() call?

Thanks,

> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/qemu-file.c |  4 ++--
>  migration/ram.c       | 29 +++++++++++++++++++----------
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1ff33a1ffb..137bcc8bdc 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
>      blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t),
>                                blen, p, size);
>      if (blen < 0) {
> -        error_report("Compress Failed!");
> -        return 0;
> +        return -1;
>      }
> +
>      qemu_put_be32(f, blen);
>      if (f->ops->writev_buffer) {
>          add_to_iovec(f, f->buf + f->buf_index, blen, false);
> diff --git a/migration/ram.c b/migration/ram.c
> index fff3f31e90..c47185d38c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -273,6 +273,7 @@ struct DecompressParam {
>      bool quit;
>      QemuMutex mutex;
>      QemuCond cond;
> +    QEMUFile *file;
>      void *des;
>      uint8_t *compbuf;
>      int len;
> @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>  {
>      RAMState *rs = ram_state;
>      int bytes_sent, blen;
> -    uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> +    uint8_t buf[TARGET_PAGE_SIZE], *p;
>  
> +    p = block->host + (offset & TARGET_PAGE_MASK);
>      bytes_sent = save_page_header(rs, f, block, offset |
>                                    RAM_SAVE_FLAG_COMPRESS_PAGE);
> -    blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE);
> +    memcpy(buf, p, TARGET_PAGE_SIZE);
> +    blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE);
>      if (blen < 0) {
>          bytes_sent = 0;
>          qemu_file_set_error(migrate_get_current()->to_dst_file, blen);
> @@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque)
>      DecompressParam *param = opaque;
>      unsigned long pagesize;
>      uint8_t *des;
> -    int len;
> +    int len, ret;
>  
>      qemu_mutex_lock(&param->mutex);
>      while (!param->quit) {
> @@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque)
>               * not a problem because the dirty page will be retransferred
>               * and uncompress() won't break the data in other pages.
>               */
> -            qemu_uncompress(&param->stream, des, pagesize,
> -                            param->compbuf, len);
> +            ret = qemu_uncompress(&param->stream, des, pagesize,
> +                                  param->compbuf, len);
> +            if (ret < 0) {
> +                error_report("decompress data failed");
> +                qemu_file_set_error(param->file, ret);
> +            }
>  
>              qemu_mutex_lock(&decomp_done_lock);
>              param->done = true;
> @@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque)
>      return NULL;
>  }
>  
> -static void wait_for_decompress_done(void)
> +static int wait_for_decompress_done(QEMUFile *f)
>  {
>      int idx, thread_count;
>  
>      if (!migrate_use_compression()) {
> -        return;
> +        return 0;
>      }
>  
>      thread_count = migrate_decompress_threads();
> @@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void)
>          }
>      }
>      qemu_mutex_unlock(&decomp_done_lock);
> +    return qemu_file_get_error(f);
>  }
>  
>  static void compress_threads_load_cleanup(void)
> @@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void)
>      decomp_param = NULL;
>  }
>  
> -static int compress_threads_load_setup(void)
> +static int compress_threads_load_setup(QEMUFile *f)
>  {
>      int i, thread_count;
>  
> @@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void)
>          }
>          decomp_param[i].stream.opaque = &decomp_param[i];
>  
> +        decomp_param[i].file = f;
>          qemu_mutex_init(&decomp_param[i].mutex);
>          qemu_cond_init(&decomp_param[i].cond);
>          decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> @@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>   */
>  static int ram_load_setup(QEMUFile *f, void *opaque)
>  {
> -    if (compress_threads_load_setup()) {
> +    if (compress_threads_load_setup(f)) {
>          return -1;
>      }
>  
> @@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>          }
>      }
>  
> -    wait_for_decompress_done();
> +    ret |= wait_for_decompress_done(f);
>      rcu_read_unlock();
>      trace_ram_load_complete(ret, seq_iter);
>      return ret;
> -- 
> 2.14.3
> 
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-21  8:19         ` [Qemu-devel] " Peter Xu
@ 2018-03-22 11:38           ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-22 11:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong,
	Dr. David Alan Gilbert, qemu-devel, mst, pbonzini



On 03/21/2018 04:19 PM, Peter Xu wrote:
> On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
>>
>> Hi David,
>>
>> Thanks for your review.
>>
>> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
>>
>>>>    migration/ram.c | 32 ++++++++++++++++----------------
>>>
>>> Hi,
>>>     Do you have some performance numbers to show this helps?  Were those
>>> taken on a normal system or were they taken with one of the compression
>>> accelerators (which I think the compression migration was designed for)?
>>
>> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
>> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
>>
>> During the migration, a workload which has 8 threads repeatedly written total
>> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
>> applying, the bandwidth is ~50 mbps.
> 
> Hi, Guangrong,
> 
> Not really review comments, but I got some questions. :)

Your comments are always valuable to me! :)

> 
> IIUC this patch will only change the behavior when last_sent_block
> changed.  I see that the performance is doubled after the change,
> which is really promising.  However I don't fully understand why it
> brings such a big difference considering that IMHO current code is
> sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> not change frequently?  Or am I wrong?

It's depends on the configuration, each memory-region which is ram or
file backend has a RAMBlock.

Actually, more benefits comes from the fact that the performance & throughput
of the multithreads has been improved as the threads is fed by the
migration thread and the result is consumed by the migration
thread.

> 
> Another follow-up question would be: have you measured how long time
> needed to compress a 4k page, and how many time to send it?  I think
> "sending the page" is not really meaningful considering that we just
> put a page into the buffer (which should be extremely fast since we
> don't really flush it every time), however I would be curious on how
> slow would compressing a page be.

I haven't benchmark the performance of zlib, i think it is CPU intensive
workload, particularly, there no compression-accelerator (e.g, QAT) on
our production. BTW, we were using lzo instead of zlib which worked
better for some workload.

Putting a page into buffer should depend on the network, i,e, if the
network is congested it should take long time. :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-22 11:38           ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-22 11:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, liang.z.li, kvm, quintela, mtosatti,
	Xiao Guangrong, qemu-devel, mst, pbonzini



On 03/21/2018 04:19 PM, Peter Xu wrote:
> On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
>>
>> Hi David,
>>
>> Thanks for your review.
>>
>> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
>>
>>>>    migration/ram.c | 32 ++++++++++++++++----------------
>>>
>>> Hi,
>>>     Do you have some performance numbers to show this helps?  Were those
>>> taken on a normal system or were they taken with one of the compression
>>> accelerators (which I think the compression migration was designed for)?
>>
>> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
>> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
>>
>> During the migration, a workload which has 8 threads repeatedly written total
>> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
>> applying, the bandwidth is ~50 mbps.
> 
> Hi, Guangrong,
> 
> Not really review comments, but I got some questions. :)

Your comments are always valuable to me! :)

> 
> IIUC this patch will only change the behavior when last_sent_block
> changed.  I see that the performance is doubled after the change,
> which is really promising.  However I don't fully understand why it
> brings such a big difference considering that IMHO current code is
> sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> not change frequently?  Or am I wrong?

It's depends on the configuration, each memory-region which is ram or
file backend has a RAMBlock.

Actually, more benefits comes from the fact that the performance & throughput
of the multithreads has been improved as the threads is fed by the
migration thread and the result is consumed by the migration
thread.

> 
> Another follow-up question would be: have you measured how long time
> needed to compress a 4k page, and how many time to send it?  I think
> "sending the page" is not really meaningful considering that we just
> put a page into the buffer (which should be extremely fast since we
> don't really flush it every time), however I would be curious on how
> slow would compressing a page be.

I haven't benchmark the performance of zlib, i think it is CPU intensive
workload, particularly, there no compression-accelerator (e.g, QAT) on
our production. BTW, we were using lzo instead of zlib which worked
better for some workload.

Putting a page into buffer should depend on the network, i,e, if the
network is congested it should take long time. :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-21  9:06     ` [Qemu-devel] " Peter Xu
@ 2018-03-22 11:57       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-22 11:57 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/21/2018 05:06 PM, Peter Xu wrote:
> On Tue, Mar 13, 2018 at 03:57:33PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Current code uses compress2()/uncompress() to compress/decompress
>> memory, these two function manager memory allocation and release
>> internally, that causes huge memory is allocated and freed very
>> frequently
>>
>> More worse, frequently returning memory to kernel will flush TLBs
>> and trigger invalidation callbacks on mmu-notification which
>> interacts with KVM MMU, that dramatically reduce the performance
>> of VM
>>
>> So, we maintain the memory by ourselves and reuse it for each
>> compression and decompression
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
>>   migration/qemu-file.c |  34 ++++++++++--
>>   migration/qemu-file.h |   6 ++-
>>   migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
>>   3 files changed, 140 insertions(+), 42 deletions(-)
>>
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index 2ab2bf362d..1ff33a1ffb 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
>>       return v;
>>   }
>>   
>> +/* return the size after compression, or negative value on error */
>> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
>> +                              const uint8_t *source, size_t source_len)
>> +{
>> +    int err;
>> +
>> +    err = deflateReset(stream);
> 
> I'm not familiar with zlib, but I saw this in manual:
> 
>   https://www.zlib.net/manual.html
> 
>   This function is equivalent to deflateEnd followed by deflateInit,
>   but does not free and reallocate the internal compression state. The
>   stream will leave the compression level and any other attributes that
>   may have been set unchanged.
> 
> I thought it was deflateInit() who is slow?  Can we avoid the reset as

deflateEnd() is worse as it frees memory to kernel which triggers
TLB flush and mmu-notifier.

> long as we make sure to deflateInit() before doing anything else?

Actually, deflateReset() is cheap... :)

> 
> Meanwhile, is there any performance number for this single patch?
> Since I thought the old code is calling compress2() which contains
> deflateInit() and deflateEnd() too, just like what current patch do?

No, after the patch, we just call deflateInit() / deflateEnd() one
time (in _setup() handler and _cleanup handler).

Yes. This is the perf data from our production,
after revert this patch:
+  57.88%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
+  10.55%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
+   4.83%  kqemu  [kernel.kallsyms]        [k] flush_tlb_func_common

-   1.16%  kqemu  [kernel.kallsyms]        [k] lock_acquire                                       ▒
    - lock_acquire                                                                                 ▒
       - 15.68% _raw_spin_lock                                                                     ▒
          + 29.42% __schedule                                                                      ▒
          + 29.14% perf_event_context_sched_out                                                    ▒
          + 23.60% tdp_page_fault                                                                  ▒
          + 10.54% do_anonymous_page                                                               ▒
          + 2.07% kvm_mmu_notifier_invalidate_range_start                                          ▒
          + 1.83% zap_pte_range                                                                    ▒
          + 1.44% kvm_mmu_notifier_invalidate_range_end


apply our work:
+  51.92%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
+  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
+   1.47%  kqemu  [kernel.kallsyms]        [k] mark_lock.clone.0
+   1.46%  kqemu  [kernel.kallsyms]        [k] native_sched_clock
+   1.31%  kqemu  [kernel.kallsyms]        [k] lock_acquire
+   1.24%  kqemu  libc-2.12.so             [.] __memset_sse2

-  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire                                     ▒
    - __lock_acquire                                                                               ▒
       - 99.75% lock_acquire                                                                       ▒
          - 18.38% _raw_spin_lock                                                                  ▒
             + 39.62% tdp_page_fault                                                               ▒
             + 31.32% __schedule                                                                   ▒
             + 27.53% perf_event_context_sched_out                                                 ▒
             + 0.58% hrtimer_interrupt


You can see the TLB flush and mmu-lock contention have gone after this patch.

> 
> It would be nice too if we can split the patch into two (decode,
> encode) if you want, but that's optional.

That's good to me, thank you, Peter.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-22 11:57       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-22 11:57 UTC (permalink / raw)
  To: Peter Xu; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/21/2018 05:06 PM, Peter Xu wrote:
> On Tue, Mar 13, 2018 at 03:57:33PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Current code uses compress2()/uncompress() to compress/decompress
>> memory, these two function manager memory allocation and release
>> internally, that causes huge memory is allocated and freed very
>> frequently
>>
>> More worse, frequently returning memory to kernel will flush TLBs
>> and trigger invalidation callbacks on mmu-notification which
>> interacts with KVM MMU, that dramatically reduce the performance
>> of VM
>>
>> So, we maintain the memory by ourselves and reuse it for each
>> compression and decompression
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
>>   migration/qemu-file.c |  34 ++++++++++--
>>   migration/qemu-file.h |   6 ++-
>>   migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
>>   3 files changed, 140 insertions(+), 42 deletions(-)
>>
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index 2ab2bf362d..1ff33a1ffb 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
>>       return v;
>>   }
>>   
>> +/* return the size after compression, or negative value on error */
>> +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
>> +                              const uint8_t *source, size_t source_len)
>> +{
>> +    int err;
>> +
>> +    err = deflateReset(stream);
> 
> I'm not familiar with zlib, but I saw this in manual:
> 
>   https://www.zlib.net/manual.html
> 
>   This function is equivalent to deflateEnd followed by deflateInit,
>   but does not free and reallocate the internal compression state. The
>   stream will leave the compression level and any other attributes that
>   may have been set unchanged.
> 
> I thought it was deflateInit() who is slow?  Can we avoid the reset as

deflateEnd() is worse as it frees memory to kernel which triggers
TLB flush and mmu-notifier.

> long as we make sure to deflateInit() before doing anything else?

Actually, deflateReset() is cheap... :)

> 
> Meanwhile, is there any performance number for this single patch?
> Since I thought the old code is calling compress2() which contains
> deflateInit() and deflateEnd() too, just like what current patch do?

No, after the patch, we just call deflateInit() / deflateEnd() one
time (in _setup() handler and _cleanup handler).

Yes. This is the perf data from our production,
after revert this patch:
+  57.88%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
+  10.55%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
+   4.83%  kqemu  [kernel.kallsyms]        [k] flush_tlb_func_common

-   1.16%  kqemu  [kernel.kallsyms]        [k] lock_acquire                                       ▒
    - lock_acquire                                                                                 ▒
       - 15.68% _raw_spin_lock                                                                     ▒
          + 29.42% __schedule                                                                      ▒
          + 29.14% perf_event_context_sched_out                                                    ▒
          + 23.60% tdp_page_fault                                                                  ▒
          + 10.54% do_anonymous_page                                                               ▒
          + 2.07% kvm_mmu_notifier_invalidate_range_start                                          ▒
          + 1.83% zap_pte_range                                                                    ▒
          + 1.44% kvm_mmu_notifier_invalidate_range_end


apply our work:
+  51.92%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
+  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
+   1.47%  kqemu  [kernel.kallsyms]        [k] mark_lock.clone.0
+   1.46%  kqemu  [kernel.kallsyms]        [k] native_sched_clock
+   1.31%  kqemu  [kernel.kallsyms]        [k] lock_acquire
+   1.24%  kqemu  libc-2.12.so             [.] __memset_sse2

-  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire                                     ▒
    - __lock_acquire                                                                               ▒
       - 99.75% lock_acquire                                                                       ▒
          - 18.38% _raw_spin_lock                                                                  ▒
             + 39.62% tdp_page_fault                                                               ▒
             + 31.32% __schedule                                                                   ▒
             + 27.53% perf_event_context_sched_out                                                 ▒
             + 0.58% hrtimer_interrupt


You can see the TLB flush and mmu-lock contention have gone after this patch.

> 
> It would be nice too if we can split the patch into two (decode,
> encode) if you want, but that's optional.

That's good to me, thank you, Peter.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-21 10:00     ` [Qemu-devel] " Peter Xu
@ 2018-03-22 12:03       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-22 12:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/21/2018 06:00 PM, Peter Xu wrote:
> On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Currently the page being compressed is allowed to be updated by
>> the VM on the source QEMU, correspondingly the destination QEMU
>> just ignores the decompression error. However, we completely miss
>> the chance to catch real errors, then the VM is corrupted silently
>>
>> To make the migration more robuster, we copy the page to a buffer
>> first to avoid it being written by VM, then detect and handle the
>> errors of both compression and decompression errors properly
> 
> Not sure I missed anything important, but I'll just shoot my thoughts
> as questions (again)...
> 
> Actually this is a more general question? Say, even without
> compression, we can be sending a page that is being modified.
> 
> However, IMHO we don't need to worry that, since if that page is
> modified, we'll definitely send that page again, so the new page will
> replace the old.  So on destination side, even if decompress() failed
> on a page it'll be fine IMHO.  Though now we are copying the corrupted
> buffer.  On that point, I fully agree that we should not - maybe we
> can just drop the page entirely?
> 
> For non-compress pages, we can't detect that, so we'll copy the page
> even if corrupted.
> 
> The special part for compression would be: would the deflate() fail if
> there is concurrent update to the buffer being compressed?  And would
> that corrupt the whole compression stream, or it would only fail the
> deflate() call?

It is not the same for normal page and compressed page.

For the normal page, the dirty-log mechanism in QEMU and the infrastructure
of the network (e.g, TCP) can make sure that the modified memory will
be posted to the destination without corruption.

However, nothing can guarantee compression/decompression is BUG-free,
e,g, consider the case, in the last step, vCPUs & dirty-log are paused and
the memory is compressed and posted to destination, if there is any error
in compression/decompression, VM dies silently.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-22 12:03       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-22 12:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/21/2018 06:00 PM, Peter Xu wrote:
> On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Currently the page being compressed is allowed to be updated by
>> the VM on the source QEMU, correspondingly the destination QEMU
>> just ignores the decompression error. However, we completely miss
>> the chance to catch real errors, then the VM is corrupted silently
>>
>> To make the migration more robuster, we copy the page to a buffer
>> first to avoid it being written by VM, then detect and handle the
>> errors of both compression and decompression errors properly
> 
> Not sure I missed anything important, but I'll just shoot my thoughts
> as questions (again)...
> 
> Actually this is a more general question? Say, even without
> compression, we can be sending a page that is being modified.
> 
> However, IMHO we don't need to worry that, since if that page is
> modified, we'll definitely send that page again, so the new page will
> replace the old.  So on destination side, even if decompress() failed
> on a page it'll be fine IMHO.  Though now we are copying the corrupted
> buffer.  On that point, I fully agree that we should not - maybe we
> can just drop the page entirely?
> 
> For non-compress pages, we can't detect that, so we'll copy the page
> even if corrupted.
> 
> The special part for compression would be: would the deflate() fail if
> there is concurrent update to the buffer being compressed?  And would
> that corrupt the whole compression stream, or it would only fail the
> deflate() call?

It is not the same for normal page and compressed page.

For the normal page, the dirty-log mechanism in QEMU and the infrastructure
of the network (e.g, TCP) can make sure that the modified memory will
be posted to the destination without corruption.

However, nothing can guarantee compression/decompression is BUG-free,
e,g, consider the case, in the last step, vCPUs & dirty-log are paused and
the memory is compressed and posted to destination, if there is any error
in compression/decompression, VM dies silently.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-22 11:38           ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-26  9:02             ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-26  9:02 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong,
	Dr. David Alan Gilbert, qemu-devel, mst, pbonzini

On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/21/2018 04:19 PM, Peter Xu wrote:
> > On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> > > 
> > > Hi David,
> > > 
> > > Thanks for your review.
> > > 
> > > On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> > > 
> > > > >    migration/ram.c | 32 ++++++++++++++++----------------
> > > > 
> > > > Hi,
> > > >     Do you have some performance numbers to show this helps?  Were those
> > > > taken on a normal system or were they taken with one of the compression
> > > > accelerators (which I think the compression migration was designed for)?
> > > 
> > > Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> > > the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> > > 
> > > During the migration, a workload which has 8 threads repeatedly written total
> > > 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> > > applying, the bandwidth is ~50 mbps.
> > 
> > Hi, Guangrong,
> > 
> > Not really review comments, but I got some questions. :)
> 
> Your comments are always valuable to me! :)
> 
> > 
> > IIUC this patch will only change the behavior when last_sent_block
> > changed.  I see that the performance is doubled after the change,
> > which is really promising.  However I don't fully understand why it
> > brings such a big difference considering that IMHO current code is
> > sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> > not change frequently?  Or am I wrong?
> 
> It's depends on the configuration, each memory-region which is ram or
> file backend has a RAMBlock.
> 
> Actually, more benefits comes from the fact that the performance & throughput
> of the multithreads has been improved as the threads is fed by the
> migration thread and the result is consumed by the migration
> thread.

I'm not sure whether I got your points - I think you mean that the
compression threads and the migration thread can form a better
pipeline if the migration thread does not do any compression at all.

I think I agree with that.

However it does not really explain to me on why a very rare event
(sending the first page of a RAMBlock, considering bitmap sync is
rare) can greatly affect the performance (it shows a doubled boost).

Btw, about the numbers: IMHO the numbers might not be really "true
numbers".  Or say, even the bandwidth is doubled, IMHO it does not
mean the performance is doubled. Becasue the data has changed.

Previously there were only compressed pages, and now for each cycle of
RAMBlock looping we'll send a normal page (then we'll get more thing
to send).  So IMHO we don't really know whether we sent more pages
with this patch, we can only know we sent more bytes (e.g., an extreme
case is that the extra 25Mbps/s are all caused by those normal pages,
and we can be sending exactly the same number of pages like before, or
even worse?).

> 
> > 
> > Another follow-up question would be: have you measured how long time
> > needed to compress a 4k page, and how many time to send it?  I think
> > "sending the page" is not really meaningful considering that we just
> > put a page into the buffer (which should be extremely fast since we
> > don't really flush it every time), however I would be curious on how
> > slow would compressing a page be.
> 
> I haven't benchmark the performance of zlib, i think it is CPU intensive
> workload, particularly, there no compression-accelerator (e.g, QAT) on
> our production. BTW, we were using lzo instead of zlib which worked
> better for some workload.

Never mind. Good to know about that.

> 
> Putting a page into buffer should depend on the network, i,e, if the
> network is congested it should take long time. :)

Again, considering that I don't know much on compression (especially I
hardly used that) mine are only questions, which should not block your
patches to be either queued/merged/reposted when proper. :)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-26  9:02             ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-26  9:02 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Dr. David Alan Gilbert, liang.z.li, kvm, quintela, mtosatti,
	Xiao Guangrong, qemu-devel, mst, pbonzini

On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/21/2018 04:19 PM, Peter Xu wrote:
> > On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> > > 
> > > Hi David,
> > > 
> > > Thanks for your review.
> > > 
> > > On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> > > 
> > > > >    migration/ram.c | 32 ++++++++++++++++----------------
> > > > 
> > > > Hi,
> > > >     Do you have some performance numbers to show this helps?  Were those
> > > > taken on a normal system or were they taken with one of the compression
> > > > accelerators (which I think the compression migration was designed for)?
> > > 
> > > Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> > > the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> > > 
> > > During the migration, a workload which has 8 threads repeatedly written total
> > > 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> > > applying, the bandwidth is ~50 mbps.
> > 
> > Hi, Guangrong,
> > 
> > Not really review comments, but I got some questions. :)
> 
> Your comments are always valuable to me! :)
> 
> > 
> > IIUC this patch will only change the behavior when last_sent_block
> > changed.  I see that the performance is doubled after the change,
> > which is really promising.  However I don't fully understand why it
> > brings such a big difference considering that IMHO current code is
> > sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> > not change frequently?  Or am I wrong?
> 
> It's depends on the configuration, each memory-region which is ram or
> file backend has a RAMBlock.
> 
> Actually, more benefits comes from the fact that the performance & throughput
> of the multithreads has been improved as the threads is fed by the
> migration thread and the result is consumed by the migration
> thread.

I'm not sure whether I got your points - I think you mean that the
compression threads and the migration thread can form a better
pipeline if the migration thread does not do any compression at all.

I think I agree with that.

However it does not really explain to me on why a very rare event
(sending the first page of a RAMBlock, considering bitmap sync is
rare) can greatly affect the performance (it shows a doubled boost).

Btw, about the numbers: IMHO the numbers might not be really "true
numbers".  Or say, even the bandwidth is doubled, IMHO it does not
mean the performance is doubled. Becasue the data has changed.

Previously there were only compressed pages, and now for each cycle of
RAMBlock looping we'll send a normal page (then we'll get more thing
to send).  So IMHO we don't really know whether we sent more pages
with this patch, we can only know we sent more bytes (e.g., an extreme
case is that the extra 25Mbps/s are all caused by those normal pages,
and we can be sending exactly the same number of pages like before, or
even worse?).

> 
> > 
> > Another follow-up question would be: have you measured how long time
> > needed to compress a 4k page, and how many time to send it?  I think
> > "sending the page" is not really meaningful considering that we just
> > put a page into the buffer (which should be extremely fast since we
> > don't really flush it every time), however I would be curious on how
> > slow would compressing a page be.
> 
> I haven't benchmark the performance of zlib, i think it is CPU intensive
> workload, particularly, there no compression-accelerator (e.g, QAT) on
> our production. BTW, we were using lzo instead of zlib which worked
> better for some workload.

Never mind. Good to know about that.

> 
> Putting a page into buffer should depend on the network, i,e, if the
> network is congested it should take long time. :)

Again, considering that I don't know much on compression (especially I
hardly used that) mine are only questions, which should not block your
patches to be either queued/merged/reposted when proper. :)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-26  9:02             ` [Qemu-devel] " Peter Xu
@ 2018-03-26 15:43               ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-26 15:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong,
	Dr. David Alan Gilbert, qemu-devel, mst, pbonzini



On 03/26/2018 05:02 PM, Peter Xu wrote:
> On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 03/21/2018 04:19 PM, Peter Xu wrote:
>>> On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thanks for your review.
>>>>
>>>> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
>>>>
>>>>>>     migration/ram.c | 32 ++++++++++++++++----------------
>>>>>
>>>>> Hi,
>>>>>      Do you have some performance numbers to show this helps?  Were those
>>>>> taken on a normal system or were they taken with one of the compression
>>>>> accelerators (which I think the compression migration was designed for)?
>>>>
>>>> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
>>>> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
>>>>
>>>> During the migration, a workload which has 8 threads repeatedly written total
>>>> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
>>>> applying, the bandwidth is ~50 mbps.
>>>
>>> Hi, Guangrong,
>>>
>>> Not really review comments, but I got some questions. :)
>>
>> Your comments are always valuable to me! :)
>>
>>>
>>> IIUC this patch will only change the behavior when last_sent_block
>>> changed.  I see that the performance is doubled after the change,
>>> which is really promising.  However I don't fully understand why it
>>> brings such a big difference considering that IMHO current code is
>>> sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
>>> not change frequently?  Or am I wrong?
>>
>> It's depends on the configuration, each memory-region which is ram or
>> file backend has a RAMBlock.
>>
>> Actually, more benefits comes from the fact that the performance & throughput
>> of the multithreads has been improved as the threads is fed by the
>> migration thread and the result is consumed by the migration
>> thread.
> 
> I'm not sure whether I got your points - I think you mean that the
> compression threads and the migration thread can form a better
> pipeline if the migration thread does not do any compression at all.
> 
> I think I agree with that.
> 
> However it does not really explain to me on why a very rare event
> (sending the first page of a RAMBlock, considering bitmap sync is
> rare) can greatly affect the performance (it shows a doubled boost).
> 

I understand it is trick indeed, but it is not very hard to explain.
Multi-threads (using 8 CPUs in our test) keep idle for a long time
for the origin code, however, after our patch, as the normal is
posted out async-ly that it's extremely fast as you said (the network
is almost idle for current implementation) so it has a long time that
the CPUs can be used effectively to generate more compressed data than
before.

> Btw, about the numbers: IMHO the numbers might not be really "true
> numbers".  Or say, even the bandwidth is doubled, IMHO it does not
> mean the performance is doubled. Becasue the data has changed.
> 
> Previously there were only compressed pages, and now for each cycle of
> RAMBlock looping we'll send a normal page (then we'll get more thing
> to send).  So IMHO we don't really know whether we sent more pages
> with this patch, we can only know we sent more bytes (e.g., an extreme
> case is that the extra 25Mbps/s are all caused by those normal pages,
> and we can be sending exactly the same number of pages like before, or
> even worse?).
> 

Current implementation uses CPU very ineffectively (it's our next work
to be posted out) that the network is almost idle so posting more data
out is a better choice,further more, migration thread plays a role for
parallel, it'd better to make it fast.

>>
>>>
>>> Another follow-up question would be: have you measured how long time
>>> needed to compress a 4k page, and how many time to send it?  I think
>>> "sending the page" is not really meaningful considering that we just
>>> put a page into the buffer (which should be extremely fast since we
>>> don't really flush it every time), however I would be curious on how
>>> slow would compressing a page be.
>>
>> I haven't benchmark the performance of zlib, i think it is CPU intensive
>> workload, particularly, there no compression-accelerator (e.g, QAT) on
>> our production. BTW, we were using lzo instead of zlib which worked
>> better for some workload.
> 
> Never mind. Good to know about that.
> 
>>
>> Putting a page into buffer should depend on the network, i,e, if the
>> network is congested it should take long time. :)
> 
> Again, considering that I don't know much on compression (especially I
> hardly used that) mine are only questions, which should not block your
> patches to be either queued/merged/reposted when proper. :)

Yes, i see. The discussion can potentially raise a better solution.

Thanks for your comment, Peter!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-26 15:43               ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-26 15:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, liang.z.li, kvm, quintela, mtosatti,
	Xiao Guangrong, qemu-devel, mst, pbonzini



On 03/26/2018 05:02 PM, Peter Xu wrote:
> On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 03/21/2018 04:19 PM, Peter Xu wrote:
>>> On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thanks for your review.
>>>>
>>>> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
>>>>
>>>>>>     migration/ram.c | 32 ++++++++++++++++----------------
>>>>>
>>>>> Hi,
>>>>>      Do you have some performance numbers to show this helps?  Were those
>>>>> taken on a normal system or were they taken with one of the compression
>>>>> accelerators (which I think the compression migration was designed for)?
>>>>
>>>> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
>>>> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
>>>>
>>>> During the migration, a workload which has 8 threads repeatedly written total
>>>> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
>>>> applying, the bandwidth is ~50 mbps.
>>>
>>> Hi, Guangrong,
>>>
>>> Not really review comments, but I got some questions. :)
>>
>> Your comments are always valuable to me! :)
>>
>>>
>>> IIUC this patch will only change the behavior when last_sent_block
>>> changed.  I see that the performance is doubled after the change,
>>> which is really promising.  However I don't fully understand why it
>>> brings such a big difference considering that IMHO current code is
>>> sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
>>> not change frequently?  Or am I wrong?
>>
>> It's depends on the configuration, each memory-region which is ram or
>> file backend has a RAMBlock.
>>
>> Actually, more benefits comes from the fact that the performance & throughput
>> of the multithreads has been improved as the threads is fed by the
>> migration thread and the result is consumed by the migration
>> thread.
> 
> I'm not sure whether I got your points - I think you mean that the
> compression threads and the migration thread can form a better
> pipeline if the migration thread does not do any compression at all.
> 
> I think I agree with that.
> 
> However it does not really explain to me on why a very rare event
> (sending the first page of a RAMBlock, considering bitmap sync is
> rare) can greatly affect the performance (it shows a doubled boost).
> 

I understand it is trick indeed, but it is not very hard to explain.
Multi-threads (using 8 CPUs in our test) keep idle for a long time
for the origin code, however, after our patch, as the normal is
posted out async-ly that it's extremely fast as you said (the network
is almost idle for current implementation) so it has a long time that
the CPUs can be used effectively to generate more compressed data than
before.

> Btw, about the numbers: IMHO the numbers might not be really "true
> numbers".  Or say, even the bandwidth is doubled, IMHO it does not
> mean the performance is doubled. Becasue the data has changed.
> 
> Previously there were only compressed pages, and now for each cycle of
> RAMBlock looping we'll send a normal page (then we'll get more thing
> to send).  So IMHO we don't really know whether we sent more pages
> with this patch, we can only know we sent more bytes (e.g., an extreme
> case is that the extra 25Mbps/s are all caused by those normal pages,
> and we can be sending exactly the same number of pages like before, or
> even worse?).
> 

Current implementation uses CPU very ineffectively (it's our next work
to be posted out) that the network is almost idle so posting more data
out is a better choice,further more, migration thread plays a role for
parallel, it'd better to make it fast.

>>
>>>
>>> Another follow-up question would be: have you measured how long time
>>> needed to compress a 4k page, and how many time to send it?  I think
>>> "sending the page" is not really meaningful considering that we just
>>> put a page into the buffer (which should be extremely fast since we
>>> don't really flush it every time), however I would be curious on how
>>> slow would compressing a page be.
>>
>> I haven't benchmark the performance of zlib, i think it is CPU intensive
>> workload, particularly, there no compression-accelerator (e.g, QAT) on
>> our production. BTW, we were using lzo instead of zlib which worked
>> better for some workload.
> 
> Never mind. Good to know about that.
> 
>>
>> Putting a page into buffer should depend on the network, i,e, if the
>> network is congested it should take long time. :)
> 
> Again, considering that I don't know much on compression (especially I
> hardly used that) mine are only questions, which should not block your
> patches to be either queued/merged/reposted when proper. :)

Yes, i see. The discussion can potentially raise a better solution.

Thanks for your comment, Peter!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-27  7:22         ` [Qemu-devel] " Peter Xu
@ 2018-03-26 19:42           ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-26 19:42 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/27/2018 03:22 PM, Peter Xu wrote:
> On Thu, Mar 22, 2018 at 08:03:53PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 03/21/2018 06:00 PM, Peter Xu wrote:
>>> On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
>>>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>
>>>> Currently the page being compressed is allowed to be updated by
>>>> the VM on the source QEMU, correspondingly the destination QEMU
>>>> just ignores the decompression error. However, we completely miss
>>>> the chance to catch real errors, then the VM is corrupted silently
>>>>
>>>> To make the migration more robuster, we copy the page to a buffer
>>>> first to avoid it being written by VM, then detect and handle the
>>>> errors of both compression and decompression errors properly
>>>
>>> Not sure I missed anything important, but I'll just shoot my thoughts
>>> as questions (again)...
>>>
>>> Actually this is a more general question? Say, even without
>>> compression, we can be sending a page that is being modified.
>>>
>>> However, IMHO we don't need to worry that, since if that page is
>>> modified, we'll definitely send that page again, so the new page will
>>> replace the old.  So on destination side, even if decompress() failed
>>> on a page it'll be fine IMHO.  Though now we are copying the corrupted
>>> buffer.  On that point, I fully agree that we should not - maybe we
>>> can just drop the page entirely?
>>>
>>> For non-compress pages, we can't detect that, so we'll copy the page
>>> even if corrupted.
>>>
>>> The special part for compression would be: would the deflate() fail if
>>> there is concurrent update to the buffer being compressed?  And would
>>> that corrupt the whole compression stream, or it would only fail the
>>> deflate() call?
>>
>> It is not the same for normal page and compressed page.
>>
>> For the normal page, the dirty-log mechanism in QEMU and the infrastructure
>> of the network (e.g, TCP) can make sure that the modified memory will
>> be posted to the destination without corruption.
>>
>> However, nothing can guarantee compression/decompression is BUG-free,
>> e,g, consider the case, in the last step, vCPUs & dirty-log are paused and
>> the memory is compressed and posted to destination, if there is any error
>> in compression/decompression, VM dies silently.
> 
> Here do you mean the compression error even if the VM is halted?  I'd
> say in that case IMHO the extra memcpy() would still help little since
> the coiped page should exactly be the same as the source page?

”compression error“ means that compress2() in original code returns a
error code.

If the data being compressed is being modified at the some time,
compression will fail and this failure is negative. We move the data to
a internal buffer to avoid this case, so that we can catch the real
error condition.

> 
> I'd say I don't know what we can really do if there are zlib bugs. I
> was assuming we'll definitely fail in a strange way if there is any,
> which should be hard to be detected from QEMU's POV (maybe a
> destination VM crash, as you mentioned).  It'll be easy for us to
> detect errors when we got error code returned from compress(), however
> IMHO when we say "zlib bug" it can also mean that data is corrputed
> even compress() and decompress() both returned with good state.
> 

Ah, sorry, i abused the word "BUG".

It does not mean the BUG in compression/decompression API, i mean the
failure conditions (The API returns a error code).

> It'll be understandable to me if the problem is that the compress()
> API does not allow the input buffer to be changed during the whole
> period of the call.  If that is a must, this patch for sure helps.

Yes, that is exactly what i want to say. :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-26 19:42           ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-26 19:42 UTC (permalink / raw)
  To: Peter Xu; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/27/2018 03:22 PM, Peter Xu wrote:
> On Thu, Mar 22, 2018 at 08:03:53PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 03/21/2018 06:00 PM, Peter Xu wrote:
>>> On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
>>>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>
>>>> Currently the page being compressed is allowed to be updated by
>>>> the VM on the source QEMU, correspondingly the destination QEMU
>>>> just ignores the decompression error. However, we completely miss
>>>> the chance to catch real errors, then the VM is corrupted silently
>>>>
>>>> To make the migration more robuster, we copy the page to a buffer
>>>> first to avoid it being written by VM, then detect and handle the
>>>> errors of both compression and decompression errors properly
>>>
>>> Not sure I missed anything important, but I'll just shoot my thoughts
>>> as questions (again)...
>>>
>>> Actually this is a more general question? Say, even without
>>> compression, we can be sending a page that is being modified.
>>>
>>> However, IMHO we don't need to worry that, since if that page is
>>> modified, we'll definitely send that page again, so the new page will
>>> replace the old.  So on destination side, even if decompress() failed
>>> on a page it'll be fine IMHO.  Though now we are copying the corrupted
>>> buffer.  On that point, I fully agree that we should not - maybe we
>>> can just drop the page entirely?
>>>
>>> For non-compress pages, we can't detect that, so we'll copy the page
>>> even if corrupted.
>>>
>>> The special part for compression would be: would the deflate() fail if
>>> there is concurrent update to the buffer being compressed?  And would
>>> that corrupt the whole compression stream, or it would only fail the
>>> deflate() call?
>>
>> It is not the same for normal page and compressed page.
>>
>> For the normal page, the dirty-log mechanism in QEMU and the infrastructure
>> of the network (e.g, TCP) can make sure that the modified memory will
>> be posted to the destination without corruption.
>>
>> However, nothing can guarantee compression/decompression is BUG-free,
>> e,g, consider the case, in the last step, vCPUs & dirty-log are paused and
>> the memory is compressed and posted to destination, if there is any error
>> in compression/decompression, VM dies silently.
> 
> Here do you mean the compression error even if the VM is halted?  I'd
> say in that case IMHO the extra memcpy() would still help little since
> the coiped page should exactly be the same as the source page?

”compression error“ means that compress2() in original code returns a
error code.

If the data being compressed is being modified at the some time,
compression will fail and this failure is negative. We move the data to
a internal buffer to avoid this case, so that we can catch the real
error condition.

> 
> I'd say I don't know what we can really do if there are zlib bugs. I
> was assuming we'll definitely fail in a strange way if there is any,
> which should be hard to be detected from QEMU's POV (maybe a
> destination VM crash, as you mentioned).  It'll be easy for us to
> detect errors when we got error code returned from compress(), however
> IMHO when we say "zlib bug" it can also mean that data is corrputed
> even compress() and decompress() both returned with good state.
> 

Ah, sorry, i abused the word "BUG".

It does not mean the BUG in compression/decompression API, i mean the
failure conditions (The API returns a error code).

> It'll be understandable to me if the problem is that the compress()
> API does not allow the input buffer to be changed during the whole
> period of the call.  If that is a must, this patch for sure helps.

Yes, that is exactly what i want to say. :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-27 11:17             ` [Qemu-devel] " Peter Xu
@ 2018-03-27  1:20               ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27  1:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini



On 03/27/2018 07:17 PM, Peter Xu wrote:
> On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
> 
> [...]
> 
>>> It'll be understandable to me if the problem is that the compress()
>>> API does not allow the input buffer to be changed during the whole
>>> period of the call.  If that is a must, this patch for sure helps.
>>
>> Yes, that is exactly what i want to say. :)
> 
> So I think now I know what this patch is for. :) And yeah, it makes
> sense.
> 
> Though another question would be: if the buffer is updated during
> compress() and compress() returned error, would that pollute the whole
> z_stream or it only fails the compress() call?
> 

I guess deflateReset() can recover everything, i.e, keep z_stream as
it is init'ed by deflate_init().

> (Same question applies to decompress().)
> 
> If it's only a compress() error and it won't pollute z_stream (or say,
> it can be recovered after a deflateReset() and then we can continue to
> call deflate() without problem), then we'll actually have two
> alternatives to solve this "buffer update" issue:
> 
> 1. Use the approach of current patch: we copy the page every time, so
>     deflate() never fails because update never happens.  But it's slow
>     since we copy the pages every time.
> 
> 2. Use the old approach, and when compress() fail, we just ignore that
>     page (since now we know that error _must_ be caused by page update,
>     then we are 100% sure that we'll send that page again so it'll be
>     perfectly fine).
> 

No, we can't make the assumption that "error _must_ be caused by page update".
No document/ABI about compress/decompress promised it. :)

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-27  1:20               ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27  1:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm



On 03/27/2018 07:17 PM, Peter Xu wrote:
> On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
> 
> [...]
> 
>>> It'll be understandable to me if the problem is that the compress()
>>> API does not allow the input buffer to be changed during the whole
>>> period of the call.  If that is a must, this patch for sure helps.
>>
>> Yes, that is exactly what i want to say. :)
> 
> So I think now I know what this patch is for. :) And yeah, it makes
> sense.
> 
> Though another question would be: if the buffer is updated during
> compress() and compress() returned error, would that pollute the whole
> z_stream or it only fails the compress() call?
> 

I guess deflateReset() can recover everything, i.e, keep z_stream as
it is init'ed by deflate_init().

> (Same question applies to decompress().)
> 
> If it's only a compress() error and it won't pollute z_stream (or say,
> it can be recovered after a deflateReset() and then we can continue to
> call deflate() without problem), then we'll actually have two
> alternatives to solve this "buffer update" issue:
> 
> 1. Use the approach of current patch: we copy the page every time, so
>     deflate() never fails because update never happens.  But it's slow
>     since we copy the pages every time.
> 
> 2. Use the old approach, and when compress() fail, we just ignore that
>     page (since now we know that error _must_ be caused by page update,
>     then we are 100% sure that we'll send that page again so it'll be
>     perfectly fine).
> 

No, we can't make the assumption that "error _must_ be caused by page update".
No document/ABI about compress/decompress promised it. :)

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/8] migration: stop allocating and freeing memory frequently
  2018-03-22 11:57       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-27  7:07         ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Thu, Mar 22, 2018 at 07:57:54PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/21/2018 05:06 PM, Peter Xu wrote:
> > On Tue, Mar 13, 2018 at 03:57:33PM +0800, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > Current code uses compress2()/uncompress() to compress/decompress
> > > memory, these two function manager memory allocation and release
> > > internally, that causes huge memory is allocated and freed very
> > > frequently
> > > 
> > > More worse, frequently returning memory to kernel will flush TLBs
> > > and trigger invalidation callbacks on mmu-notification which
> > > interacts with KVM MMU, that dramatically reduce the performance
> > > of VM
> > > 
> > > So, we maintain the memory by ourselves and reuse it for each
> > > compression and decompression
> > > 
> > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > ---
> > >   migration/qemu-file.c |  34 ++++++++++--
> > >   migration/qemu-file.h |   6 ++-
> > >   migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
> > >   3 files changed, 140 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > > index 2ab2bf362d..1ff33a1ffb 100644
> > > --- a/migration/qemu-file.c
> > > +++ b/migration/qemu-file.c
> > > @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
> > >       return v;
> > >   }
> > > +/* return the size after compression, or negative value on error */
> > > +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> > > +                              const uint8_t *source, size_t source_len)
> > > +{
> > > +    int err;
> > > +
> > > +    err = deflateReset(stream);
> > 
> > I'm not familiar with zlib, but I saw this in manual:
> > 
> >   https://www.zlib.net/manual.html
> > 
> >   This function is equivalent to deflateEnd followed by deflateInit,
> >   but does not free and reallocate the internal compression state. The
> >   stream will leave the compression level and any other attributes that
> >   may have been set unchanged.
> > 
> > I thought it was deflateInit() who is slow?  Can we avoid the reset as
> 
> deflateEnd() is worse as it frees memory to kernel which triggers
> TLB flush and mmu-notifier.
> 
> > long as we make sure to deflateInit() before doing anything else?
> 
> Actually, deflateReset() is cheap... :)
> 
> > 
> > Meanwhile, is there any performance number for this single patch?
> > Since I thought the old code is calling compress2() which contains
> > deflateInit() and deflateEnd() too, just like what current patch do?
> 
> No, after the patch, we just call deflateInit() / deflateEnd() one
> time (in _setup() handler and _cleanup handler).
> 
> Yes. This is the perf data from our production,
> after revert this patch:
> +  57.88%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
> +  10.55%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
> +   4.83%  kqemu  [kernel.kallsyms]        [k] flush_tlb_func_common
> 
> -   1.16%  kqemu  [kernel.kallsyms]        [k] lock_acquire                                       ▒
>    - lock_acquire                                                                                 ▒
>       - 15.68% _raw_spin_lock                                                                     ▒
>          + 29.42% __schedule                                                                      ▒
>          + 29.14% perf_event_context_sched_out                                                    ▒
>          + 23.60% tdp_page_fault                                                                  ▒
>          + 10.54% do_anonymous_page                                                               ▒
>          + 2.07% kvm_mmu_notifier_invalidate_range_start                                          ▒
>          + 1.83% zap_pte_range                                                                    ▒
>          + 1.44% kvm_mmu_notifier_invalidate_range_end
> 
> 
> apply our work:
> +  51.92%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
> +  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
> +   1.47%  kqemu  [kernel.kallsyms]        [k] mark_lock.clone.0
> +   1.46%  kqemu  [kernel.kallsyms]        [k] native_sched_clock
> +   1.31%  kqemu  [kernel.kallsyms]        [k] lock_acquire
> +   1.24%  kqemu  libc-2.12.so             [.] __memset_sse2
> 
> -  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire                                     ▒
>    - __lock_acquire                                                                               ▒
>       - 99.75% lock_acquire                                                                       ▒
>          - 18.38% _raw_spin_lock                                                                  ▒
>             + 39.62% tdp_page_fault                                                               ▒
>             + 31.32% __schedule                                                                   ▒
>             + 27.53% perf_event_context_sched_out                                                 ▒
>             + 0.58% hrtimer_interrupt
> 
> 
> You can see the TLB flush and mmu-lock contention have gone after this patch.

Yes.  Obviously I misunderstood the documentation for deflateReset().
It's not really a combined "End+Init", a quick glance in zlib code
shows that deflateInit() will do the mallocs, then call deflateReset()
at last.  So the buffers should be kept for reset(), as you explained.

> 
> > 
> > It would be nice too if we can split the patch into two (decode,
> > encode) if you want, but that's optional.
> 
> That's good to me, thank you, Peter.

Thanks for explaining.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently
@ 2018-03-27  7:07         ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Thu, Mar 22, 2018 at 07:57:54PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/21/2018 05:06 PM, Peter Xu wrote:
> > On Tue, Mar 13, 2018 at 03:57:33PM +0800, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > Current code uses compress2()/uncompress() to compress/decompress
> > > memory, these two function manager memory allocation and release
> > > internally, that causes huge memory is allocated and freed very
> > > frequently
> > > 
> > > More worse, frequently returning memory to kernel will flush TLBs
> > > and trigger invalidation callbacks on mmu-notification which
> > > interacts with KVM MMU, that dramatically reduce the performance
> > > of VM
> > > 
> > > So, we maintain the memory by ourselves and reuse it for each
> > > compression and decompression
> > > 
> > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > ---
> > >   migration/qemu-file.c |  34 ++++++++++--
> > >   migration/qemu-file.h |   6 ++-
> > >   migration/ram.c       | 142 +++++++++++++++++++++++++++++++++++++-------------
> > >   3 files changed, 140 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > > index 2ab2bf362d..1ff33a1ffb 100644
> > > --- a/migration/qemu-file.c
> > > +++ b/migration/qemu-file.c
> > > @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
> > >       return v;
> > >   }
> > > +/* return the size after compression, or negative value on error */
> > > +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> > > +                              const uint8_t *source, size_t source_len)
> > > +{
> > > +    int err;
> > > +
> > > +    err = deflateReset(stream);
> > 
> > I'm not familiar with zlib, but I saw this in manual:
> > 
> >   https://www.zlib.net/manual.html
> > 
> >   This function is equivalent to deflateEnd followed by deflateInit,
> >   but does not free and reallocate the internal compression state. The
> >   stream will leave the compression level and any other attributes that
> >   may have been set unchanged.
> > 
> > I thought it was deflateInit() who is slow?  Can we avoid the reset as
> 
> deflateEnd() is worse as it frees memory to kernel which triggers
> TLB flush and mmu-notifier.
> 
> > long as we make sure to deflateInit() before doing anything else?
> 
> Actually, deflateReset() is cheap... :)
> 
> > 
> > Meanwhile, is there any performance number for this single patch?
> > Since I thought the old code is calling compress2() which contains
> > deflateInit() and deflateEnd() too, just like what current patch do?
> 
> No, after the patch, we just call deflateInit() / deflateEnd() one
> time (in _setup() handler and _cleanup handler).
> 
> Yes. This is the perf data from our production,
> after revert this patch:
> +  57.88%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
> +  10.55%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
> +   4.83%  kqemu  [kernel.kallsyms]        [k] flush_tlb_func_common
> 
> -   1.16%  kqemu  [kernel.kallsyms]        [k] lock_acquire                                       ▒
>    - lock_acquire                                                                                 ▒
>       - 15.68% _raw_spin_lock                                                                     ▒
>          + 29.42% __schedule                                                                      ▒
>          + 29.14% perf_event_context_sched_out                                                    ▒
>          + 23.60% tdp_page_fault                                                                  ▒
>          + 10.54% do_anonymous_page                                                               ▒
>          + 2.07% kvm_mmu_notifier_invalidate_range_start                                          ▒
>          + 1.83% zap_pte_range                                                                    ▒
>          + 1.44% kvm_mmu_notifier_invalidate_range_end
> 
> 
> apply our work:
> +  51.92%  kqemu  [kernel.kallsyms]        [k] queued_spin_lock_slowpath
> +  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire
> +   1.47%  kqemu  [kernel.kallsyms]        [k] mark_lock.clone.0
> +   1.46%  kqemu  [kernel.kallsyms]        [k] native_sched_clock
> +   1.31%  kqemu  [kernel.kallsyms]        [k] lock_acquire
> +   1.24%  kqemu  libc-2.12.so             [.] __memset_sse2
> 
> -  14.82%  kqemu  [kernel.kallsyms]        [k] __lock_acquire                                     ▒
>    - __lock_acquire                                                                               ▒
>       - 99.75% lock_acquire                                                                       ▒
>          - 18.38% _raw_spin_lock                                                                  ▒
>             + 39.62% tdp_page_fault                                                               ▒
>             + 31.32% __schedule                                                                   ▒
>             + 27.53% perf_event_context_sched_out                                                 ▒
>             + 0.58% hrtimer_interrupt
> 
> 
> You can see the TLB flush and mmu-lock contention have gone after this patch.

Yes.  Obviously I misunderstood the documentation for deflateReset().
It's not really a combined "End+Init", a quick glance in zlib code
shows that deflateInit() will do the mallocs, then call deflateReset()
at last.  So the buffers should be kept for reset(), as you explained.

> 
> > 
> > It would be nice too if we can split the patch into two (decode,
> > encode) if you want, but that's optional.
> 
> That's good to me, thank you, Peter.

Thanks for explaining.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-22 12:03       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-27  7:22         ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:22 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Thu, Mar 22, 2018 at 08:03:53PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/21/2018 06:00 PM, Peter Xu wrote:
> > On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > Currently the page being compressed is allowed to be updated by
> > > the VM on the source QEMU, correspondingly the destination QEMU
> > > just ignores the decompression error. However, we completely miss
> > > the chance to catch real errors, then the VM is corrupted silently
> > > 
> > > To make the migration more robuster, we copy the page to a buffer
> > > first to avoid it being written by VM, then detect and handle the
> > > errors of both compression and decompression errors properly
> > 
> > Not sure I missed anything important, but I'll just shoot my thoughts
> > as questions (again)...
> > 
> > Actually this is a more general question? Say, even without
> > compression, we can be sending a page that is being modified.
> > 
> > However, IMHO we don't need to worry that, since if that page is
> > modified, we'll definitely send that page again, so the new page will
> > replace the old.  So on destination side, even if decompress() failed
> > on a page it'll be fine IMHO.  Though now we are copying the corrupted
> > buffer.  On that point, I fully agree that we should not - maybe we
> > can just drop the page entirely?
> > 
> > For non-compress pages, we can't detect that, so we'll copy the page
> > even if corrupted.
> > 
> > The special part for compression would be: would the deflate() fail if
> > there is concurrent update to the buffer being compressed?  And would
> > that corrupt the whole compression stream, or it would only fail the
> > deflate() call?
> 
> It is not the same for normal page and compressed page.
> 
> For the normal page, the dirty-log mechanism in QEMU and the infrastructure
> of the network (e.g, TCP) can make sure that the modified memory will
> be posted to the destination without corruption.
> 
> However, nothing can guarantee compression/decompression is BUG-free,
> e,g, consider the case, in the last step, vCPUs & dirty-log are paused and
> the memory is compressed and posted to destination, if there is any error
> in compression/decompression, VM dies silently.

Here do you mean the compression error even if the VM is halted?  I'd
say in that case IMHO the extra memcpy() would still help little since
the coiped page should exactly be the same as the source page?

I'd say I don't know what we can really do if there are zlib bugs. I
was assuming we'll definitely fail in a strange way if there is any,
which should be hard to be detected from QEMU's POV (maybe a
destination VM crash, as you mentioned).  It'll be easy for us to
detect errors when we got error code returned from compress(), however
IMHO when we say "zlib bug" it can also mean that data is corrputed
even compress() and decompress() both returned with good state.

It'll be understandable to me if the problem is that the compress()
API does not allow the input buffer to be changed during the whole
period of the call.  If that is a must, this patch for sure helps.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-27  7:22         ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:22 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Thu, Mar 22, 2018 at 08:03:53PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/21/2018 06:00 PM, Peter Xu wrote:
> > On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > Currently the page being compressed is allowed to be updated by
> > > the VM on the source QEMU, correspondingly the destination QEMU
> > > just ignores the decompression error. However, we completely miss
> > > the chance to catch real errors, then the VM is corrupted silently
> > > 
> > > To make the migration more robuster, we copy the page to a buffer
> > > first to avoid it being written by VM, then detect and handle the
> > > errors of both compression and decompression errors properly
> > 
> > Not sure I missed anything important, but I'll just shoot my thoughts
> > as questions (again)...
> > 
> > Actually this is a more general question? Say, even without
> > compression, we can be sending a page that is being modified.
> > 
> > However, IMHO we don't need to worry that, since if that page is
> > modified, we'll definitely send that page again, so the new page will
> > replace the old.  So on destination side, even if decompress() failed
> > on a page it'll be fine IMHO.  Though now we are copying the corrupted
> > buffer.  On that point, I fully agree that we should not - maybe we
> > can just drop the page entirely?
> > 
> > For non-compress pages, we can't detect that, so we'll copy the page
> > even if corrupted.
> > 
> > The special part for compression would be: would the deflate() fail if
> > there is concurrent update to the buffer being compressed?  And would
> > that corrupt the whole compression stream, or it would only fail the
> > deflate() call?
> 
> It is not the same for normal page and compressed page.
> 
> For the normal page, the dirty-log mechanism in QEMU and the infrastructure
> of the network (e.g, TCP) can make sure that the modified memory will
> be posted to the destination without corruption.
> 
> However, nothing can guarantee compression/decompression is BUG-free,
> e,g, consider the case, in the last step, vCPUs & dirty-log are paused and
> the memory is compressed and posted to destination, if there is any error
> in compression/decompression, VM dies silently.

Here do you mean the compression error even if the VM is halted?  I'd
say in that case IMHO the extra memcpy() would still help little since
the coiped page should exactly be the same as the source page?

I'd say I don't know what we can really do if there are zlib bugs. I
was assuming we'll definitely fail in a strange way if there is any,
which should be hard to be detected from QEMU's POV (maybe a
destination VM crash, as you mentioned).  It'll be easy for us to
detect errors when we got error code returned from compress(), however
IMHO when we say "zlib bug" it can also mean that data is corrputed
even compress() and decompress() both returned with good state.

It'll be understandable to me if the problem is that the compress()
API does not allow the input buffer to be changed during the whole
period of the call.  If that is a must, this patch for sure helps.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-26 15:43               ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-27  7:33                 ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:33 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong,
	Dr. David Alan Gilbert, qemu-devel, mst, pbonzini

On Mon, Mar 26, 2018 at 11:43:33PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/26/2018 05:02 PM, Peter Xu wrote:
> > On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 03/21/2018 04:19 PM, Peter Xu wrote:
> > > > On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> > > > > 
> > > > > Hi David,
> > > > > 
> > > > > Thanks for your review.
> > > > > 
> > > > > On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> > > > > 
> > > > > > >     migration/ram.c | 32 ++++++++++++++++----------------
> > > > > > 
> > > > > > Hi,
> > > > > >      Do you have some performance numbers to show this helps?  Were those
> > > > > > taken on a normal system or were they taken with one of the compression
> > > > > > accelerators (which I think the compression migration was designed for)?
> > > > > 
> > > > > Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> > > > > the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> > > > > 
> > > > > During the migration, a workload which has 8 threads repeatedly written total
> > > > > 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> > > > > applying, the bandwidth is ~50 mbps.
> > > > 
> > > > Hi, Guangrong,
> > > > 
> > > > Not really review comments, but I got some questions. :)
> > > 
> > > Your comments are always valuable to me! :)
> > > 
> > > > 
> > > > IIUC this patch will only change the behavior when last_sent_block
> > > > changed.  I see that the performance is doubled after the change,
> > > > which is really promising.  However I don't fully understand why it
> > > > brings such a big difference considering that IMHO current code is
> > > > sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> > > > not change frequently?  Or am I wrong?
> > > 
> > > It's depends on the configuration, each memory-region which is ram or
> > > file backend has a RAMBlock.
> > > 
> > > Actually, more benefits comes from the fact that the performance & throughput
> > > of the multithreads has been improved as the threads is fed by the
> > > migration thread and the result is consumed by the migration
> > > thread.
> > 
> > I'm not sure whether I got your points - I think you mean that the
> > compression threads and the migration thread can form a better
> > pipeline if the migration thread does not do any compression at all.
> > 
> > I think I agree with that.
> > 
> > However it does not really explain to me on why a very rare event
> > (sending the first page of a RAMBlock, considering bitmap sync is
> > rare) can greatly affect the performance (it shows a doubled boost).
> > 
> 
> I understand it is trick indeed, but it is not very hard to explain.
> Multi-threads (using 8 CPUs in our test) keep idle for a long time
> for the origin code, however, after our patch, as the normal is
> posted out async-ly that it's extremely fast as you said (the network
> is almost idle for current implementation) so it has a long time that
> the CPUs can be used effectively to generate more compressed data than
> before.

Ah.  If the compression threads are consuming more CPU after this
patch, then it can persuade me far better than the original numbers,
since AFAICT that means it's the real part of bandwidth that is
boosted (the first pages of RAMBlocks are not sent via compression
threads), and I suppose it proves a better pipeline.

> 
> > Btw, about the numbers: IMHO the numbers might not be really "true
> > numbers".  Or say, even the bandwidth is doubled, IMHO it does not
> > mean the performance is doubled. Becasue the data has changed.
> > 
> > Previously there were only compressed pages, and now for each cycle of
> > RAMBlock looping we'll send a normal page (then we'll get more thing
> > to send).  So IMHO we don't really know whether we sent more pages
> > with this patch, we can only know we sent more bytes (e.g., an extreme
> > case is that the extra 25Mbps/s are all caused by those normal pages,
> > and we can be sending exactly the same number of pages like before, or
> > even worse?).
> > 
> 
> Current implementation uses CPU very ineffectively (it's our next work
> to be posted out) that the network is almost idle so posting more data
> out is a better choice,further more, migration thread plays a role for
> parallel, it'd better to make it fast.
> 
> > > 
> > > > 
> > > > Another follow-up question would be: have you measured how long time
> > > > needed to compress a 4k page, and how many time to send it?  I think
> > > > "sending the page" is not really meaningful considering that we just
> > > > put a page into the buffer (which should be extremely fast since we
> > > > don't really flush it every time), however I would be curious on how
> > > > slow would compressing a page be.
> > > 
> > > I haven't benchmark the performance of zlib, i think it is CPU intensive
> > > workload, particularly, there no compression-accelerator (e.g, QAT) on
> > > our production. BTW, we were using lzo instead of zlib which worked
> > > better for some workload.
> > 
> > Never mind. Good to know about that.
> > 
> > > 
> > > Putting a page into buffer should depend on the network, i,e, if the
> > > network is congested it should take long time. :)
> > 
> > Again, considering that I don't know much on compression (especially I
> > hardly used that) mine are only questions, which should not block your
> > patches to be either queued/merged/reposted when proper. :)
> 
> Yes, i see. The discussion can potentially raise a better solution.
> 
> Thanks for your comment, Peter!

I think I have no problem on this patch.  Please take my r-b if you
like:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-27  7:33                 ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:33 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Dr. David Alan Gilbert, liang.z.li, kvm, quintela, mtosatti,
	Xiao Guangrong, qemu-devel, mst, pbonzini

On Mon, Mar 26, 2018 at 11:43:33PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/26/2018 05:02 PM, Peter Xu wrote:
> > On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 03/21/2018 04:19 PM, Peter Xu wrote:
> > > > On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> > > > > 
> > > > > Hi David,
> > > > > 
> > > > > Thanks for your review.
> > > > > 
> > > > > On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> > > > > 
> > > > > > >     migration/ram.c | 32 ++++++++++++++++----------------
> > > > > > 
> > > > > > Hi,
> > > > > >      Do you have some performance numbers to show this helps?  Were those
> > > > > > taken on a normal system or were they taken with one of the compression
> > > > > > accelerators (which I think the compression migration was designed for)?
> > > > > 
> > > > > Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> > > > > the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> > > > > 
> > > > > During the migration, a workload which has 8 threads repeatedly written total
> > > > > 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> > > > > applying, the bandwidth is ~50 mbps.
> > > > 
> > > > Hi, Guangrong,
> > > > 
> > > > Not really review comments, but I got some questions. :)
> > > 
> > > Your comments are always valuable to me! :)
> > > 
> > > > 
> > > > IIUC this patch will only change the behavior when last_sent_block
> > > > changed.  I see that the performance is doubled after the change,
> > > > which is really promising.  However I don't fully understand why it
> > > > brings such a big difference considering that IMHO current code is
> > > > sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> > > > not change frequently?  Or am I wrong?
> > > 
> > > It's depends on the configuration, each memory-region which is ram or
> > > file backend has a RAMBlock.
> > > 
> > > Actually, more benefits comes from the fact that the performance & throughput
> > > of the multithreads has been improved as the threads is fed by the
> > > migration thread and the result is consumed by the migration
> > > thread.
> > 
> > I'm not sure whether I got your points - I think you mean that the
> > compression threads and the migration thread can form a better
> > pipeline if the migration thread does not do any compression at all.
> > 
> > I think I agree with that.
> > 
> > However it does not really explain to me on why a very rare event
> > (sending the first page of a RAMBlock, considering bitmap sync is
> > rare) can greatly affect the performance (it shows a doubled boost).
> > 
> 
> I understand it is trick indeed, but it is not very hard to explain.
> Multi-threads (using 8 CPUs in our test) keep idle for a long time
> for the origin code, however, after our patch, as the normal is
> posted out async-ly that it's extremely fast as you said (the network
> is almost idle for current implementation) so it has a long time that
> the CPUs can be used effectively to generate more compressed data than
> before.

Ah.  If the compression threads are consuming more CPU after this
patch, then it can persuade me far better than the original numbers,
since AFAICT that means it's the real part of bandwidth that is
boosted (the first pages of RAMBlocks are not sent via compression
threads), and I suppose it proves a better pipeline.

> 
> > Btw, about the numbers: IMHO the numbers might not be really "true
> > numbers".  Or say, even the bandwidth is doubled, IMHO it does not
> > mean the performance is doubled. Becasue the data has changed.
> > 
> > Previously there were only compressed pages, and now for each cycle of
> > RAMBlock looping we'll send a normal page (then we'll get more thing
> > to send).  So IMHO we don't really know whether we sent more pages
> > with this patch, we can only know we sent more bytes (e.g., an extreme
> > case is that the extra 25Mbps/s are all caused by those normal pages,
> > and we can be sending exactly the same number of pages like before, or
> > even worse?).
> > 
> 
> Current implementation uses CPU very ineffectively (it's our next work
> to be posted out) that the network is almost idle so posting more data
> out is a better choice,further more, migration thread plays a role for
> parallel, it'd better to make it fast.
> 
> > > 
> > > > 
> > > > Another follow-up question would be: have you measured how long time
> > > > needed to compress a 4k page, and how many time to send it?  I think
> > > > "sending the page" is not really meaningful considering that we just
> > > > put a page into the buffer (which should be extremely fast since we
> > > > don't really flush it every time), however I would be curious on how
> > > > slow would compressing a page be.
> > > 
> > > I haven't benchmark the performance of zlib, i think it is CPU intensive
> > > workload, particularly, there no compression-accelerator (e.g, QAT) on
> > > our production. BTW, we were using lzo instead of zlib which worked
> > > better for some workload.
> > 
> > Never mind. Good to know about that.
> > 
> > > 
> > > Putting a page into buffer should depend on the network, i,e, if the
> > > network is congested it should take long time. :)
> > 
> > Again, considering that I don't know much on compression (especially I
> > hardly used that) mine are only questions, which should not block your
> > patches to be either queued/merged/reposted when proper. :)
> 
> Yes, i see. The discussion can potentially raise a better solution.
> 
> Thanks for your comment, Peter!

I think I have no problem on this patch.  Please take my r-b if you
like:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/8] migration: introduce control_save_page()
  2018-03-15 11:37     ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-27  7:47       ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, quintela,
	guangrong.xiao, pbonzini

On Thu, Mar 15, 2018 at 11:37:59AM +0000, Dr. David Alan Gilbert wrote:
> * guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > 
> > Abstract the common function control_save_page() to cleanup the code,
> > no logic is changed
> > 
> > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> It would be good to find a better name for control_save_page, but I
> can't think of one!).

Yeah.  I would prefer it's at least still prefixed with ram_*, however
I don't really hope we spend too much time on namings (always :).

Maybe we can just squash the changes into current
ram_control_save_page() directly.  But that's optional, current patch
is good to me already, so:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] migration: introduce control_save_page()
@ 2018-03-27  7:47       ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27  7:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: guangrong.xiao, quintela, kvm, mst, mtosatti, Xiao Guangrong,
	qemu-devel, pbonzini

On Thu, Mar 15, 2018 at 11:37:59AM +0000, Dr. David Alan Gilbert wrote:
> * guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > 
> > Abstract the common function control_save_page() to cleanup the code,
> > no logic is changed
> > 
> > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> It would be good to find a better name for control_save_page, but I
> can't think of one!).

Yeah.  I would prefer it's at least still prefixed with ram_*, however
I don't really hope we spend too much time on namings (always :).

Maybe we can just squash the changes into current
ram_control_save_page() directly.  But that's optional, current patch
is good to me already, so:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detect compression and decompression errors
  2018-03-26 19:42           ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-27 11:17             ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 11:17 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:

[...]

> > It'll be understandable to me if the problem is that the compress()
> > API does not allow the input buffer to be changed during the whole
> > period of the call.  If that is a must, this patch for sure helps.
> 
> Yes, that is exactly what i want to say. :)

So I think now I know what this patch is for. :) And yeah, it makes
sense.

Though another question would be: if the buffer is updated during
compress() and compress() returned error, would that pollute the whole
z_stream or it only fails the compress() call?

(Same question applies to decompress().)

If it's only a compress() error and it won't pollute z_stream (or say,
it can be recovered after a deflateReset() and then we can continue to
call deflate() without problem), then we'll actually have two
alternatives to solve this "buffer update" issue:

1. Use the approach of current patch: we copy the page every time, so
   deflate() never fails because update never happens.  But it's slow
   since we copy the pages every time.

2. Use the old approach, and when compress() fail, we just ignore that
   page (since now we know that error _must_ be caused by page update,
   then we are 100% sure that we'll send that page again so it'll be
   perfectly fine).

If you see, IMHO method 2 has its advantage, since actually it
"detects" the page update operation by getting a failure in
compress(), then we don't really need to send that page at all (since
we'll send it later again, for sure).  Then, we not only saved the
memcpy() CPU time for every single page, meanwhile we might save some
bandwidth since we won't bother to send the page when we know the page
is modified.

But all these depend on the assumption that:

1. compress() will fail only because of buffer update, and

2. compress() failures won't pollute the whole z_stream.

Same thing would apply to decompress() side - we drop the corrupted
page (when decompress() returned errors) since we know another one
will come soon.

It's a bit tricky, but I'm still curious about it, since actually
that's mostly the old code before this patch except that we don't
really drop corrputed pages but we still use them (which won't hurt
too IMHO).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors
@ 2018-03-27 11:17             ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 11:17 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:

[...]

> > It'll be understandable to me if the problem is that the compress()
> > API does not allow the input buffer to be changed during the whole
> > period of the call.  If that is a must, this patch for sure helps.
> 
> Yes, that is exactly what i want to say. :)

So I think now I know what this patch is for. :) And yeah, it makes
sense.

Though another question would be: if the buffer is updated during
compress() and compress() returned error, would that pollute the whole
z_stream or it only fails the compress() call?

(Same question applies to decompress().)

If it's only a compress() error and it won't pollute z_stream (or say,
it can be recovered after a deflateReset() and then we can continue to
call deflate() without problem), then we'll actually have two
alternatives to solve this "buffer update" issue:

1. Use the approach of current patch: we copy the page every time, so
   deflate() never fails because update never happens.  But it's slow
   since we copy the pages every time.

2. Use the old approach, and when compress() fail, we just ignore that
   page (since now we know that error _must_ be caused by page update,
   then we are 100% sure that we'll send that page again so it'll be
   perfectly fine).

If you see, IMHO method 2 has its advantage, since actually it
"detects" the page update operation by getting a failure in
compress(), then we don't really need to send that page at all (since
we'll send it later again, for sure).  Then, we not only saved the
memcpy() CPU time for every single page, meanwhile we might save some
bandwidth since we won't bother to send the page when we know the page
is modified.

But all these depend on the assumption that:

1. compress() will fail only because of buffer update, and

2. compress() failures won't pollute the whole z_stream.

Same thing would apply to decompress() side - we drop the corrupted
page (when decompress() returned errors) since we know another one
will come soon.

It's a bit tricky, but I'm still curious about it, since actually
that's mostly the old code before this patch except that we don't
really drop corrputed pages but we still use them (which won't hurt
too IMHO).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 5/8] migration: move calling control_save_page to the common place
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-27 12:35     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:35 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 13, 2018 at 03:57:36PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> The function is called by both ram_save_page and ram_save_target_page,
> so move it to the common caller to cleanup the code
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] migration: move calling control_save_page to the common place
@ 2018-03-27 12:35     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:35 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 13, 2018 at 03:57:36PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> The function is called by both ram_save_page and ram_save_target_page,
> so move it to the common caller to cleanup the code
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 6/8] migration: move calling save_zero_page to the common place
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-27 12:49     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:49 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 13, 2018 at 03:57:37PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> save_zero_page() is always our first approach to try, move it to
> the common place before calling ram_save_compressed_page
> and ram_save_page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 106 ++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 60 insertions(+), 46 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 839665d866..9627ce18e9 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1021,15 +1021,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
>      XBZRLE_cache_lock();
> -    pages = save_zero_page(rs, block, offset);
> -    if (pages > 0) {
> -        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -         * page would be stale
> -         */
> -        xbzrle_cache_zero_page(rs, current_addr);
> -        ram_release_pages(block->idstr, offset, pages);
> -    } else if (!rs->ram_bulk_stage &&
> -               !migration_in_postcopy() && migrate_use_xbzrle()) {
> +    if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
> +           migrate_use_xbzrle()) {

Nit: indent problem?

[...]

> +static bool save_page_use_compression(RAMState *rs)
> +{
> +    if (!migrate_use_compression()) {
> +        return false;
> +    }
> +
> +    /*
> +     * If xbzrle is on, stop using the data compression after first
> +     * round of migration even if compression is enabled. In theory,
> +     * xbzrle can do better than compression.
> +     */
> +    if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
> +        return true;
> +    }
> +
> +    return false;
> +

Nit: remove this line?

Otherwise I'd say I like this patch... :)

Better with the nit fixed:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] migration: move calling save_zero_page to the common place
@ 2018-03-27 12:49     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:49 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 13, 2018 at 03:57:37PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> save_zero_page() is always our first approach to try, move it to
> the common place before calling ram_save_compressed_page
> and ram_save_page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 106 ++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 60 insertions(+), 46 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 839665d866..9627ce18e9 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1021,15 +1021,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
>      XBZRLE_cache_lock();
> -    pages = save_zero_page(rs, block, offset);
> -    if (pages > 0) {
> -        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -         * page would be stale
> -         */
> -        xbzrle_cache_zero_page(rs, current_addr);
> -        ram_release_pages(block->idstr, offset, pages);
> -    } else if (!rs->ram_bulk_stage &&
> -               !migration_in_postcopy() && migrate_use_xbzrle()) {
> +    if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
> +           migrate_use_xbzrle()) {

Nit: indent problem?

[...]

> +static bool save_page_use_compression(RAMState *rs)
> +{
> +    if (!migrate_use_compression()) {
> +        return false;
> +    }
> +
> +    /*
> +     * If xbzrle is on, stop using the data compression after first
> +     * round of migration even if compression is enabled. In theory,
> +     * xbzrle can do better than compression.
> +     */
> +    if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
> +        return true;
> +    }
> +
> +    return false;
> +

Nit: remove this line?

Otherwise I'd say I like this patch... :)

Better with the nit fixed:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 7/8] migration: introduce save_normal_page()
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-27 12:54     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:54 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 13, 2018 at 03:57:38PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> It directly sends the page to the stream neither checking zero nor
> using xbzrle or compression
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] migration: introduce save_normal_page()
@ 2018-03-27 12:54     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:54 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 13, 2018 at 03:57:38PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> It directly sends the page to the stream neither checking zero nor
> using xbzrle or compression
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 8/8] migration: remove ram_save_compressed_page()
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-27 12:56     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:56 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel, pbonzini

On Tue, Mar 13, 2018 at 03:57:39PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Now, we can reuse the path in ram_save_page() to post the page out
> as normal, then the only thing remained in ram_save_compressed_page()
> is compression that we can move it out to the caller
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] migration: remove ram_save_compressed_page()
@ 2018-03-27 12:56     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-27 12:56 UTC (permalink / raw)
  To: guangrong.xiao; +Cc: pbonzini, mst, mtosatti, Xiao Guangrong, qemu-devel, kvm

On Tue, Mar 13, 2018 at 03:57:39PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Now, we can reuse the path in ram_save_page() to post the page out
> as normal, then the only thing remained in ram_save_compressed_page()
> is compression that we can move it out to the caller
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detectcompression and decompression errors
  2018-03-28  0:43                 ` [Qemu-devel] " jiang.biao2
@ 2018-03-27 14:35                   ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27 14:35 UTC (permalink / raw)
  To: jiang.biao2
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, peterx, pbonzini



On 03/28/2018 08:43 AM, jiang.biao2@zte.com.cn wrote:
>> On 03/27/2018 07:17 PM, Peter Xu wrote:
>>> On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
>>>
>>> [...]
>>>
>>>>> It'll be understandable to me if the problem is that the compress()
>>>>> API does not allow the input buffer to be changed during the whole
>>>>> period of the call.  If that is a must, this patch for sure helps.
>>>>
>>>> Yes, that is exactly what i want to say. :)
>>>
>>> So I think now I know what this patch is for. :) And yeah, it makes
>>> sense.
>>>
>>> Though another question would be: if the buffer is updated during
>>> compress() and compress() returned error, would that pollute the whole
>>> z_stream or it only fails the compress() call?
>>>
>>
>> I guess deflateReset() can recover everything, i.e, keep z_stream as
>> it is init'ed by deflate_init().
>>
>>> (Same question applies to decompress().)
>>>
>>> If it's only a compress() error and it won't pollute z_stream (or say,
>>> it can be recovered after a deflateReset() and then we can continue to
>>> call deflate() without problem), then we'll actually have two
>>> alternatives to solve this "buffer update" issue:
>>>
>>> 1. Use the approach of current patch: we copy the page every time, so
>>>      deflate() never fails because update never happens.  But it's slow
>>>      since we copy the pages every time.
>>>
>>> 2. Use the old approach, and when compress() fail, we just ignore that
>>>      page (since now we know that error _must_ be caused by page update,
>>>      then we are 100% sure that we'll send that page again so it'll be
>>>      perfectly fine).
>>>
>>
>> No, we can't make the assumption that "error _must_ be caused by page update".
>> No document/ABI about compress/decompress promised it. :)
> So, as I metioned before, can we just distingush the decompress/compress errors
> from errors caused by page update by the return code of inflate/deflate?
> According to the zlib manual, there seems to be several error codes for different
> cases,
> #define Z_ERRNO        (-1)
> #define Z_STREAM_ERROR (-2)
> #define Z_DATA_ERROR   (-3)
> #define Z_MEM_ERROR    (-4)
> #define Z_BUF_ERROR    (-5)
> #define Z_VERSION_ERROR (-6)
> Did you check the return code when silent failure(not caused by page update)
> happened before? :)

I am afraid there is no such error code and i guess zlib is not designed to
compress the data which is being modified.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detectcompression and decompression errors
@ 2018-03-27 14:35                   ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27 14:35 UTC (permalink / raw)
  To: jiang.biao2
  Cc: peterx, kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini



On 03/28/2018 08:43 AM, jiang.biao2@zte.com.cn wrote:
>> On 03/27/2018 07:17 PM, Peter Xu wrote:
>>> On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
>>>
>>> [...]
>>>
>>>>> It'll be understandable to me if the problem is that the compress()
>>>>> API does not allow the input buffer to be changed during the whole
>>>>> period of the call.  If that is a must, this patch for sure helps.
>>>>
>>>> Yes, that is exactly what i want to say. :)
>>>
>>> So I think now I know what this patch is for. :) And yeah, it makes
>>> sense.
>>>
>>> Though another question would be: if the buffer is updated during
>>> compress() and compress() returned error, would that pollute the whole
>>> z_stream or it only fails the compress() call?
>>>
>>
>> I guess deflateReset() can recover everything, i.e, keep z_stream as
>> it is init'ed by deflate_init().
>>
>>> (Same question applies to decompress().)
>>>
>>> If it's only a compress() error and it won't pollute z_stream (or say,
>>> it can be recovered after a deflateReset() and then we can continue to
>>> call deflate() without problem), then we'll actually have two
>>> alternatives to solve this "buffer update" issue:
>>>
>>> 1. Use the approach of current patch: we copy the page every time, so
>>>      deflate() never fails because update never happens.  But it's slow
>>>      since we copy the pages every time.
>>>
>>> 2. Use the old approach, and when compress() fail, we just ignore that
>>>      page (since now we know that error _must_ be caused by page update,
>>>      then we are 100% sure that we'll send that page again so it'll be
>>>      perfectly fine).
>>>
>>
>> No, we can't make the assumption that "error _must_ be caused by page update".
>> No document/ABI about compress/decompress promised it. :)
> So, as I metioned before, can we just distingush the decompress/compress errors
> from errors caused by page update by the return code of inflate/deflate?
> According to the zlib manual, there seems to be several error codes for different
> cases,
> #define Z_ERRNO        (-1)
> #define Z_STREAM_ERROR (-2)
> #define Z_DATA_ERROR   (-3)
> #define Z_MEM_ERROR    (-4)
> #define Z_BUF_ERROR    (-5)
> #define Z_VERSION_ERROR (-6)
> Did you check the return code when silent failure(not caused by page update)
> happened before? :)

I am afraid there is no such error code and i guess zlib is not designed to
compress the data which is being modified.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-28  3:01     ` [Qemu-devel] " Wang, Wei W
@ 2018-03-27 15:24       ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27 15:24 UTC (permalink / raw)
  To: Wang, Wei W, pbonzini, mst, mtosatti
  Cc: Peter Xu, Xiao Guangrong, qemu-devel, kvm, Dr. David Alan Gilbert



On 03/28/2018 11:01 AM, Wang, Wei W wrote:
> On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
>>
>> As compression is a heavy work, do not do it in migration thread, instead, we
>> post it out as a normal page
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> 
> Hi Guangrong,
> 
> Dave asked me to help review your patch, so I will just drop my 2 cents wherever possible, and hope that could be inspiring for your work.

Thank you both for the nice help on the work. :)

> 
> 
>> ---
>>   migration/ram.c | 32 ++++++++++++++++----------------
>>   1 file changed, 16 insertions(+), 16 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c index
>> 7266351fd0..615693f180 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
>> *rs, PageSearchStatus *pss,
>>       int pages = -1;
>>       uint64_t bytes_xmit = 0;
>>       uint8_t *p;
>> -    int ret, blen;
>> +    int ret;
>>       RAMBlock *block = pss->block;
>>       ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>>
>> @@ -1162,23 +1162,23 @@ static int
>> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>>           if (block != rs->last_sent_block) {
>>               flush_compressed_data(rs);
>>               pages = save_zero_page(rs, block, offset);
>> -            if (pages == -1) {
>> -                /* Make sure the first page is sent out before other pages */
>> -                bytes_xmit = save_page_header(rs, rs->f, block, offset |
>> -                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
>> -                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
>> -                                                 migrate_compress_level());
>> -                if (blen > 0) {
>> -                    ram_counters.transferred += bytes_xmit + blen;
>> -                    ram_counters.normal++;
>> -                    pages = 1;
>> -                } else {
>> -                    qemu_file_set_error(rs->f, blen);
>> -                    error_report("compressed data failed!");
>> -                }
>> -            }
>>               if (pages > 0) {
>>                   ram_release_pages(block->idstr, offset, pages);
>> +            } else {
>> +                /*
>> +                 * Make sure the first page is sent out before other pages.
>> +                 *
>> +                 * we post it as normal page as compression will take much
>> +                 * CPU resource.
>> +                 */
>> +                ram_counters.transferred += save_page_header(rs, rs->f, block,
>> +                                                offset | RAM_SAVE_FLAG_PAGE);
>> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
>> +                                      migrate_release_ram() &
>> +                                      migration_in_postcopy());
>> +                ram_counters.transferred += TARGET_PAGE_SIZE;
>> +                ram_counters.normal++;
>> +                pages = 1;
>>               }
>>           } else {
>>               pages = save_zero_page(rs, block, offset);
>> --
> 
> I agree that this patch is an improvement for the current implementation. So just pile up mine here:
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>

Thanks.

> 
> 
> If you are interested in something more aggressive, I can share an alternative approach, which I think would be better. Please see below.
> 
> Actually, we can use the multi-threaded compression for the first page as well, which will not block the migration thread progress. The advantage is that we can enjoy the compression benefit for the first page and meanwhile not blocking the migration thread - the page is given to a compression thread and compressed asynchronously to the migration thread execution.
> 

Yes, it is a good point.

> The main barrier to achieving the above that is that we need to make sure the first page of each block is sent first in the multi-threaded environment. We can twist the current implementation to achieve that, which is not hard:
> 
> For example, we can add a new flag to RAMBlock - bool first_page_added. In each thread of compression, they need
> 1) check if this is the first page of the block.
> 2) If it is the first page, set block->first_page_added after sending the page;
> 3) If it is not the first the page, wait to send the page only when block->first_page_added is set.


So there is another barrier introduced which hurts the parallel...

Hmm, we need more deliberate consideration on this point, let me think it over after this work.

Thank you.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-27 15:24       ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27 15:24 UTC (permalink / raw)
  To: Wang, Wei W, pbonzini, mst, mtosatti
  Cc: qemu-devel, kvm, Xiao Guangrong, Peter Xu, Dr. David Alan Gilbert



On 03/28/2018 11:01 AM, Wang, Wei W wrote:
> On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
>>
>> As compression is a heavy work, do not do it in migration thread, instead, we
>> post it out as a normal page
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> 
> Hi Guangrong,
> 
> Dave asked me to help review your patch, so I will just drop my 2 cents wherever possible, and hope that could be inspiring for your work.

Thank you both for the nice help on the work. :)

> 
> 
>> ---
>>   migration/ram.c | 32 ++++++++++++++++----------------
>>   1 file changed, 16 insertions(+), 16 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c index
>> 7266351fd0..615693f180 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
>> *rs, PageSearchStatus *pss,
>>       int pages = -1;
>>       uint64_t bytes_xmit = 0;
>>       uint8_t *p;
>> -    int ret, blen;
>> +    int ret;
>>       RAMBlock *block = pss->block;
>>       ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>>
>> @@ -1162,23 +1162,23 @@ static int
>> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>>           if (block != rs->last_sent_block) {
>>               flush_compressed_data(rs);
>>               pages = save_zero_page(rs, block, offset);
>> -            if (pages == -1) {
>> -                /* Make sure the first page is sent out before other pages */
>> -                bytes_xmit = save_page_header(rs, rs->f, block, offset |
>> -                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
>> -                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
>> -                                                 migrate_compress_level());
>> -                if (blen > 0) {
>> -                    ram_counters.transferred += bytes_xmit + blen;
>> -                    ram_counters.normal++;
>> -                    pages = 1;
>> -                } else {
>> -                    qemu_file_set_error(rs->f, blen);
>> -                    error_report("compressed data failed!");
>> -                }
>> -            }
>>               if (pages > 0) {
>>                   ram_release_pages(block->idstr, offset, pages);
>> +            } else {
>> +                /*
>> +                 * Make sure the first page is sent out before other pages.
>> +                 *
>> +                 * we post it as normal page as compression will take much
>> +                 * CPU resource.
>> +                 */
>> +                ram_counters.transferred += save_page_header(rs, rs->f, block,
>> +                                                offset | RAM_SAVE_FLAG_PAGE);
>> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
>> +                                      migrate_release_ram() &
>> +                                      migration_in_postcopy());
>> +                ram_counters.transferred += TARGET_PAGE_SIZE;
>> +                ram_counters.normal++;
>> +                pages = 1;
>>               }
>>           } else {
>>               pages = save_zero_page(rs, block, offset);
>> --
> 
> I agree that this patch is an improvement for the current implementation. So just pile up mine here:
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>

Thanks.

> 
> 
> If you are interested in something more aggressive, I can share an alternative approach, which I think would be better. Please see below.
> 
> Actually, we can use the multi-threaded compression for the first page as well, which will not block the migration thread progress. The advantage is that we can enjoy the compression benefit for the first page and meanwhile not blocking the migration thread - the page is given to a compression thread and compressed asynchronously to the migration thread execution.
> 

Yes, it is a good point.

> The main barrier to achieving the above that is that we need to make sure the first page of each block is sent first in the multi-threaded environment. We can twist the current implementation to achieve that, which is not hard:
> 
> For example, we can add a new flag to RAMBlock - bool first_page_added. In each thread of compression, they need
> 1) check if this is the first page of the block.
> 2) If it is the first page, set block->first_page_added after sending the page;
> 3) If it is not the first the page, wait to send the page only when block->first_page_added is set.


So there is another barrier introduced which hurts the parallel...

Hmm, we need more deliberate consideration on this point, let me think it over after this work.

Thank you.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support todetectcompression and decompression errors
  2018-03-28  4:20                         ` [Qemu-devel] " Peter Xu
@ 2018-03-27 18:44                           ` Xiao Guangrong
  -1 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27 18:44 UTC (permalink / raw)
  To: Peter Xu, jiang.biao2
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini



On 03/28/2018 12:20 PM, Peter Xu wrote:
> On Wed, Mar 28, 2018 at 12:08:19PM +0800, jiang.biao2@zte.com.cn wrote:
>>>
>>> On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
>>>
>>>>>> No, we can't make the assumption that "error _must_ be caused by page update".
>>>>>> No document/ABI about compress/decompress promised it. :)
>>>
>>> Indeed, I found no good documents about below errors that jiang.biao
>>> pointed out.
>> Hi, Peter
>> The description about the errors comes from here,
>> http://www.zlib.net/manual.html
>> And about the error codes returned by inflate(), they are described as,
>> ** inflate() returns
>> Z_OK if some progress has been made (more input processed or more output produced),
>> Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced,
>> Z_NEED_DICT if a preset dictionary is needed at this point,
>> Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a more specific error),
>> Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application),
>> Z_MEM_ERROR if there was not enough memory,
>> Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ...
>> **
> 
> Ah yes.  My bad to be so uncareful. :)
> 
>> According to the above description, the error caused by page update looks
>> more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)

No, still lack info to confirm the case of compressing the data being
updated is the only one to return Z_DATA_ERROR. And nothing provided
that no other error condition causes data corrupted will be squeezed
into this error code.

>> As I understand it, the real compress/decompress error cases other than that
>> caused by page update should be rare, maybe the error code is enough to
>> distinguish those if we can verify the the error codes returned by page update
>> and other silent failures by test. If so, we can cut the cost of memcpy.

Please note, compare with other operations, e.g, compression, detect zero page,
etc., memcpy() is not a hot function at all.

>> If not, I agree with Guangrong's idea too. I never read the zlib code and all my
>> information comes from the manual, so if anything inaccurate, pls ignore my
>> option. :)
> 
> So I suppose all of us know that alternative now, we just need a solid
> way to confirm the uncertainty.  I'll leave this to Guangrong.

Yes, i still prefer to memcpy() to make it safe enough to protect our production
unless we get enough certainty to figure out the error conditions.

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support todetectcompression and decompression errors
@ 2018-03-27 18:44                           ` Xiao Guangrong
  0 siblings, 0 replies; 126+ messages in thread
From: Xiao Guangrong @ 2018-03-27 18:44 UTC (permalink / raw)
  To: Peter Xu, jiang.biao2
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini



On 03/28/2018 12:20 PM, Peter Xu wrote:
> On Wed, Mar 28, 2018 at 12:08:19PM +0800, jiang.biao2@zte.com.cn wrote:
>>>
>>> On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
>>>
>>>>>> No, we can't make the assumption that "error _must_ be caused by page update".
>>>>>> No document/ABI about compress/decompress promised it. :)
>>>
>>> Indeed, I found no good documents about below errors that jiang.biao
>>> pointed out.
>> Hi, Peter
>> The description about the errors comes from here,
>> http://www.zlib.net/manual.html
>> And about the error codes returned by inflate(), they are described as,
>> ** inflate() returns
>> Z_OK if some progress has been made (more input processed or more output produced),
>> Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced,
>> Z_NEED_DICT if a preset dictionary is needed at this point,
>> Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a more specific error),
>> Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application),
>> Z_MEM_ERROR if there was not enough memory,
>> Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ...
>> **
> 
> Ah yes.  My bad to be so uncareful. :)
> 
>> According to the above description, the error caused by page update looks
>> more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)

No, still lack info to confirm the case of compressing the data being
updated is the only one to return Z_DATA_ERROR. And nothing provided
that no other error condition causes data corrupted will be squeezed
into this error code.

>> As I understand it, the real compress/decompress error cases other than that
>> caused by page update should be rare, maybe the error code is enough to
>> distinguish those if we can verify the the error codes returned by page update
>> and other silent failures by test. If so, we can cut the cost of memcpy.

Please note, compare with other operations, e.g, compression, detect zero page,
etc., memcpy() is not a hot function at all.

>> If not, I agree with Guangrong's idea too. I never read the zlib code and all my
>> information comes from the manual, so if anything inaccurate, pls ignore my
>> option. :)
> 
> So I suppose all of us know that alternative now, we just need a solid
> way to confirm the uncertainty.  I'll leave this to Guangrong.

Yes, i still prefer to memcpy() to make it safe enough to protect our production
unless we get enough certainty to figure out the error conditions.

Thanks!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-26 15:43               ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-27 19:12                 ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-27 19:12 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong, qemu-devel,
	Peter Xu, mst, pbonzini

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 03/26/2018 05:02 PM, Peter Xu wrote:
> > On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 03/21/2018 04:19 PM, Peter Xu wrote:
> > > > On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> > > > > 
> > > > > Hi David,
> > > > > 
> > > > > Thanks for your review.
> > > > > 
> > > > > On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> > > > > 
> > > > > > >     migration/ram.c | 32 ++++++++++++++++----------------
> > > > > > 
> > > > > > Hi,
> > > > > >      Do you have some performance numbers to show this helps?  Were those
> > > > > > taken on a normal system or were they taken with one of the compression
> > > > > > accelerators (which I think the compression migration was designed for)?
> > > > > 
> > > > > Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> > > > > the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> > > > > 
> > > > > During the migration, a workload which has 8 threads repeatedly written total
> > > > > 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> > > > > applying, the bandwidth is ~50 mbps.
> > > > 
> > > > Hi, Guangrong,
> > > > 
> > > > Not really review comments, but I got some questions. :)
> > > 
> > > Your comments are always valuable to me! :)
> > > 
> > > > 
> > > > IIUC this patch will only change the behavior when last_sent_block
> > > > changed.  I see that the performance is doubled after the change,
> > > > which is really promising.  However I don't fully understand why it
> > > > brings such a big difference considering that IMHO current code is
> > > > sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> > > > not change frequently?  Or am I wrong?
> > > 
> > > It's depends on the configuration, each memory-region which is ram or
> > > file backend has a RAMBlock.
> > > 
> > > Actually, more benefits comes from the fact that the performance & throughput
> > > of the multithreads has been improved as the threads is fed by the
> > > migration thread and the result is consumed by the migration
> > > thread.
> > 
> > I'm not sure whether I got your points - I think you mean that the
> > compression threads and the migration thread can form a better
> > pipeline if the migration thread does not do any compression at all.
> > 
> > I think I agree with that.
> > 
> > However it does not really explain to me on why a very rare event
> > (sending the first page of a RAMBlock, considering bitmap sync is
> > rare) can greatly affect the performance (it shows a doubled boost).
> > 
> 
> I understand it is trick indeed, but it is not very hard to explain.
> Multi-threads (using 8 CPUs in our test) keep idle for a long time
> for the origin code, however, after our patch, as the normal is
> posted out async-ly that it's extremely fast as you said (the network
> is almost idle for current implementation) so it has a long time that
> the CPUs can be used effectively to generate more compressed data than
> before.

One thing to try, to explain Peter's worry, would be, for testing, to
add a counter to see how often this case triggers, and perhaps add
some debug to see when;  Peter's right that flipping between the
RAMBlocks seems odd, unless you're either doing lots of iterations or
have lots of separate RAMBlocks for some reason.

Dave

> > Btw, about the numbers: IMHO the numbers might not be really "true
> > numbers".  Or say, even the bandwidth is doubled, IMHO it does not
> > mean the performance is doubled. Becasue the data has changed.
> > 
> > Previously there were only compressed pages, and now for each cycle of
> > RAMBlock looping we'll send a normal page (then we'll get more thing
> > to send).  So IMHO we don't really know whether we sent more pages
> > with this patch, we can only know we sent more bytes (e.g., an extreme
> > case is that the extra 25Mbps/s are all caused by those normal pages,
> > and we can be sending exactly the same number of pages like before, or
> > even worse?).
> > 
> 
> Current implementation uses CPU very ineffectively (it's our next work
> to be posted out) that the network is almost idle so posting more data
> out is a better choice,further more, migration thread plays a role for
> parallel, it'd better to make it fast.
> 
> > > 
> > > > 
> > > > Another follow-up question would be: have you measured how long time
> > > > needed to compress a 4k page, and how many time to send it?  I think
> > > > "sending the page" is not really meaningful considering that we just
> > > > put a page into the buffer (which should be extremely fast since we
> > > > don't really flush it every time), however I would be curious on how
> > > > slow would compressing a page be.
> > > 
> > > I haven't benchmark the performance of zlib, i think it is CPU intensive
> > > workload, particularly, there no compression-accelerator (e.g, QAT) on
> > > our production. BTW, we were using lzo instead of zlib which worked
> > > better for some workload.
> > 
> > Never mind. Good to know about that.
> > 
> > > 
> > > Putting a page into buffer should depend on the network, i,e, if the
> > > network is congested it should take long time. :)
> > 
> > Again, considering that I don't know much on compression (especially I
> > hardly used that) mine are only questions, which should not block your
> > patches to be either queued/merged/reposted when proper. :)
> 
> Yes, i see. The discussion can potentially raise a better solution.
> 
> Thanks for your comment, Peter!
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-27 19:12                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 126+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-27 19:12 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Peter Xu, liang.z.li, kvm, quintela, mtosatti, Xiao Guangrong,
	qemu-devel, mst, pbonzini

* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 03/26/2018 05:02 PM, Peter Xu wrote:
> > On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 03/21/2018 04:19 PM, Peter Xu wrote:
> > > > On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote:
> > > > > 
> > > > > Hi David,
> > > > > 
> > > > > Thanks for your review.
> > > > > 
> > > > > On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote:
> > > > > 
> > > > > > >     migration/ram.c | 32 ++++++++++++++++----------------
> > > > > > 
> > > > > > Hi,
> > > > > >      Do you have some performance numbers to show this helps?  Were those
> > > > > > taken on a normal system or were they taken with one of the compression
> > > > > > accelerators (which I think the compression migration was designed for)?
> > > > > 
> > > > > Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate
> > > > > the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350.
> > > > > 
> > > > > During the migration, a workload which has 8 threads repeatedly written total
> > > > > 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after
> > > > > applying, the bandwidth is ~50 mbps.
> > > > 
> > > > Hi, Guangrong,
> > > > 
> > > > Not really review comments, but I got some questions. :)
> > > 
> > > Your comments are always valuable to me! :)
> > > 
> > > > 
> > > > IIUC this patch will only change the behavior when last_sent_block
> > > > changed.  I see that the performance is doubled after the change,
> > > > which is really promising.  However I don't fully understand why it
> > > > brings such a big difference considering that IMHO current code is
> > > > sending dirty pages per-RAMBlock.  I mean, IMHO last_sent_block should
> > > > not change frequently?  Or am I wrong?
> > > 
> > > It's depends on the configuration, each memory-region which is ram or
> > > file backend has a RAMBlock.
> > > 
> > > Actually, more benefits comes from the fact that the performance & throughput
> > > of the multithreads has been improved as the threads is fed by the
> > > migration thread and the result is consumed by the migration
> > > thread.
> > 
> > I'm not sure whether I got your points - I think you mean that the
> > compression threads and the migration thread can form a better
> > pipeline if the migration thread does not do any compression at all.
> > 
> > I think I agree with that.
> > 
> > However it does not really explain to me on why a very rare event
> > (sending the first page of a RAMBlock, considering bitmap sync is
> > rare) can greatly affect the performance (it shows a doubled boost).
> > 
> 
> I understand it is trick indeed, but it is not very hard to explain.
> Multi-threads (using 8 CPUs in our test) keep idle for a long time
> for the origin code, however, after our patch, as the normal is
> posted out async-ly that it's extremely fast as you said (the network
> is almost idle for current implementation) so it has a long time that
> the CPUs can be used effectively to generate more compressed data than
> before.

One thing to try, to explain Peter's worry, would be, for testing, to
add a counter to see how often this case triggers, and perhaps add
some debug to see when;  Peter's right that flipping between the
RAMBlocks seems odd, unless you're either doing lots of iterations or
have lots of separate RAMBlocks for some reason.

Dave

> > Btw, about the numbers: IMHO the numbers might not be really "true
> > numbers".  Or say, even the bandwidth is doubled, IMHO it does not
> > mean the performance is doubled. Becasue the data has changed.
> > 
> > Previously there were only compressed pages, and now for each cycle of
> > RAMBlock looping we'll send a normal page (then we'll get more thing
> > to send).  So IMHO we don't really know whether we sent more pages
> > with this patch, we can only know we sent more bytes (e.g., an extreme
> > case is that the extra 25Mbps/s are all caused by those normal pages,
> > and we can be sending exactly the same number of pages like before, or
> > even worse?).
> > 
> 
> Current implementation uses CPU very ineffectively (it's our next work
> to be posted out) that the network is almost idle so posting more data
> out is a better choice,further more, migration thread plays a role for
> parallel, it'd better to make it fast.
> 
> > > 
> > > > 
> > > > Another follow-up question would be: have you measured how long time
> > > > needed to compress a 4k page, and how many time to send it?  I think
> > > > "sending the page" is not really meaningful considering that we just
> > > > put a page into the buffer (which should be extremely fast since we
> > > > don't really flush it every time), however I would be curious on how
> > > > slow would compressing a page be.
> > > 
> > > I haven't benchmark the performance of zlib, i think it is CPU intensive
> > > workload, particularly, there no compression-accelerator (e.g, QAT) on
> > > our production. BTW, we were using lzo instead of zlib which worked
> > > better for some workload.
> > 
> > Never mind. Good to know about that.
> > 
> > > 
> > > Putting a page into buffer should depend on the network, i,e, if the
> > > network is congested it should take long time. :)
> > 
> > Again, considering that I don't know much on compression (especially I
> > hardly used that) mine are only questions, which should not block your
> > patches to be either queued/merged/reposted when proper. :)
> 
> Yes, i see. The discussion can potentially raise a better solution.
> 
> Thanks for your comment, Peter!
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detectcompression and decompression errors
  2018-03-27  1:20               ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-28  0:43                 ` jiang.biao2
  -1 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-28  0:43 UTC (permalink / raw)
  To: guangrong.xiao
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, peterx, pbonzini

> On 03/27/2018 07:17 PM, Peter Xu wrote:
>> On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
>> 
>> [...]
>> 
>>>> It'll be understandable to me if the problem is that the compress()
>>>> API does not allow the input buffer to be changed during the whole
>>>> period of the call.  If that is a must, this patch for sure helps.
>>>
>>> Yes, that is exactly what i want to say. :)
>> 
>> So I think now I know what this patch is for. :) And yeah, it makes
>> sense.
>> 
>> Though another question would be: if the buffer is updated during
>> compress() and compress() returned error, would that pollute the whole
>> z_stream or it only fails the compress() call?
>> 
>
> I guess deflateReset() can recover everything, i.e, keep z_stream as
> it is init'ed by deflate_init().
>
>> (Same question applies to decompress().)
>> 
>> If it's only a compress() error and it won't pollute z_stream (or say,
>> it can be recovered after a deflateReset() and then we can continue to
>> call deflate() without problem), then we'll actually have two
>> alternatives to solve this "buffer update" issue:
>> 
>> 1. Use the approach of current patch: we copy the page every time, so
>>     deflate() never fails because update never happens.  But it's slow
>>     since we copy the pages every time.
>> 
>> 2. Use the old approach, and when compress() fail, we just ignore that
>>     page (since now we know that error _must_ be caused by page update,
>>     then we are 100% sure that we'll send that page again so it'll be
>>     perfectly fine).
>> 
>
> No, we can't make the assumption that "error _must_ be caused by page update". 
> No document/ABI about compress/decompress promised it. :)
So, as I metioned before, can we just distingush the decompress/compress errors 
from errors caused by page update by the return code of inflate/deflate?
According to the zlib manual, there seems to be several error codes for different 
cases,
#define Z_ERRNO        (-1) 
#define Z_STREAM_ERROR (-2) 
#define Z_DATA_ERROR   (-3) 
#define Z_MEM_ERROR    (-4)
#define Z_BUF_ERROR    (-5)
#define Z_VERSION_ERROR (-6)
Did you check the return code when silent failure(not caused by page update) 
happened before? :)

Jiang
Regards

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detectcompression and decompression errors
@ 2018-03-28  0:43                 ` jiang.biao2
  0 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-28  0:43 UTC (permalink / raw)
  To: guangrong.xiao
  Cc: peterx, kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

> On 03/27/2018 07:17 PM, Peter Xu wrote:
>> On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
>> 
>> [...]
>> 
>>>> It'll be understandable to me if the problem is that the compress()
>>>> API does not allow the input buffer to be changed during the whole
>>>> period of the call.  If that is a must, this patch for sure helps.
>>>
>>> Yes, that is exactly what i want to say. :)
>> 
>> So I think now I know what this patch is for. :) And yeah, it makes
>> sense.
>> 
>> Though another question would be: if the buffer is updated during
>> compress() and compress() returned error, would that pollute the whole
>> z_stream or it only fails the compress() call?
>> 
>
> I guess deflateReset() can recover everything, i.e, keep z_stream as
> it is init'ed by deflate_init().
>
>> (Same question applies to decompress().)
>> 
>> If it's only a compress() error and it won't pollute z_stream (or say,
>> it can be recovered after a deflateReset() and then we can continue to
>> call deflate() without problem), then we'll actually have two
>> alternatives to solve this "buffer update" issue:
>> 
>> 1. Use the approach of current patch: we copy the page every time, so
>>     deflate() never fails because update never happens.  But it's slow
>>     since we copy the pages every time.
>> 
>> 2. Use the old approach, and when compress() fail, we just ignore that
>>     page (since now we know that error _must_ be caused by page update,
>>     then we are 100% sure that we'll send that page again so it'll be
>>     perfectly fine).
>> 
>
> No, we can't make the assumption that "error _must_ be caused by page update". 
> No document/ABI about compress/decompress promised it. :)
So, as I metioned before, can we just distingush the decompress/compress errors 
from errors caused by page update by the return code of inflate/deflate?
According to the zlib manual, there seems to be several error codes for different 
cases,
#define Z_ERRNO        (-1) 
#define Z_STREAM_ERROR (-2) 
#define Z_DATA_ERROR   (-3) 
#define Z_MEM_ERROR    (-4)
#define Z_BUF_ERROR    (-5)
#define Z_VERSION_ERROR (-6)
Did you check the return code when silent failure(not caused by page update) 
happened before? :)

Jiang
Regards

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
@ 2018-03-28  3:01     ` Wang, Wei W
  -1 siblings, 0 replies; 126+ messages in thread
From: Wang, Wei W @ 2018-03-28  3:01 UTC (permalink / raw)
  To: 'guangrong.xiao@gmail.com', pbonzini, mst, mtosatti
  Cc: Xiao Guangrong, qemu-devel, kvm

On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
> 
> As compression is a heavy work, do not do it in migration thread, instead, we
> post it out as a normal page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>


Hi Guangrong,

Dave asked me to help review your patch, so I will just drop my 2 cents wherever possible, and hope that could be inspiring for your work.


> ---
>  migration/ram.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c index
> 7266351fd0..615693f180 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
> *rs, PageSearchStatus *pss,
>      int pages = -1;
>      uint64_t bytes_xmit = 0;
>      uint8_t *p;
> -    int ret, blen;
> +    int ret;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> 
> @@ -1162,23 +1162,23 @@ static int
> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>          if (block != rs->last_sent_block) {
>              flush_compressed_data(rs);
>              pages = save_zero_page(rs, block, offset);
> -            if (pages == -1) {
> -                /* Make sure the first page is sent out before other pages */
> -                bytes_xmit = save_page_header(rs, rs->f, block, offset |
> -                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
> -                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
> -                                                 migrate_compress_level());
> -                if (blen > 0) {
> -                    ram_counters.transferred += bytes_xmit + blen;
> -                    ram_counters.normal++;
> -                    pages = 1;
> -                } else {
> -                    qemu_file_set_error(rs->f, blen);
> -                    error_report("compressed data failed!");
> -                }
> -            }
>              if (pages > 0) {
>                  ram_release_pages(block->idstr, offset, pages);
> +            } else {
> +                /*
> +                 * Make sure the first page is sent out before other pages.
> +                 *
> +                 * we post it as normal page as compression will take much
> +                 * CPU resource.
> +                 */
> +                ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                                offset | RAM_SAVE_FLAG_PAGE);
> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                                      migrate_release_ram() &
> +                                      migration_in_postcopy());
> +                ram_counters.transferred += TARGET_PAGE_SIZE;
> +                ram_counters.normal++;
> +                pages = 1;
>              }
>          } else {
>              pages = save_zero_page(rs, block, offset);
> --

I agree that this patch is an improvement for the current implementation. So just pile up mine here:
Reviewed-by: Wei Wang <wei.w.wang@intel.com>


If you are interested in something more aggressive, I can share an alternative approach, which I think would be better. Please see below.

Actually, we can use the multi-threaded compression for the first page as well, which will not block the migration thread progress. The advantage is that we can enjoy the compression benefit for the first page and meanwhile not blocking the migration thread - the page is given to a compression thread and compressed asynchronously to the migration thread execution.

The main barrier to achieving the above that is that we need to make sure the first page of each block is sent first in the multi-threaded environment. We can twist the current implementation to achieve that, which is not hard:

For example, we can add a new flag to RAMBlock - bool first_page_added. In each thread of compression, they need
1) check if this is the first page of the block.
2) If it is the first page, set block->first_page_added after sending the page;
3) If it is not the first the page, wait to send the page only when block->first_page_added is set.

Best,
Wei

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-28  3:01     ` Wang, Wei W
  0 siblings, 0 replies; 126+ messages in thread
From: Wang, Wei W @ 2018-03-28  3:01 UTC (permalink / raw)
  To: 'guangrong.xiao@gmail.com', pbonzini, mst, mtosatti
  Cc: qemu-devel, kvm, Xiao Guangrong

On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
> 
> As compression is a heavy work, do not do it in migration thread, instead, we
> post it out as a normal page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>


Hi Guangrong,

Dave asked me to help review your patch, so I will just drop my 2 cents wherever possible, and hope that could be inspiring for your work.


> ---
>  migration/ram.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c index
> 7266351fd0..615693f180 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
> *rs, PageSearchStatus *pss,
>      int pages = -1;
>      uint64_t bytes_xmit = 0;
>      uint8_t *p;
> -    int ret, blen;
> +    int ret;
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> 
> @@ -1162,23 +1162,23 @@ static int
> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>          if (block != rs->last_sent_block) {
>              flush_compressed_data(rs);
>              pages = save_zero_page(rs, block, offset);
> -            if (pages == -1) {
> -                /* Make sure the first page is sent out before other pages */
> -                bytes_xmit = save_page_header(rs, rs->f, block, offset |
> -                                              RAM_SAVE_FLAG_COMPRESS_PAGE);
> -                blen = qemu_put_compression_data(rs->f, p, TARGET_PAGE_SIZE,
> -                                                 migrate_compress_level());
> -                if (blen > 0) {
> -                    ram_counters.transferred += bytes_xmit + blen;
> -                    ram_counters.normal++;
> -                    pages = 1;
> -                } else {
> -                    qemu_file_set_error(rs->f, blen);
> -                    error_report("compressed data failed!");
> -                }
> -            }
>              if (pages > 0) {
>                  ram_release_pages(block->idstr, offset, pages);
> +            } else {
> +                /*
> +                 * Make sure the first page is sent out before other pages.
> +                 *
> +                 * we post it as normal page as compression will take much
> +                 * CPU resource.
> +                 */
> +                ram_counters.transferred += save_page_header(rs, rs->f, block,
> +                                                offset | RAM_SAVE_FLAG_PAGE);
> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> +                                      migrate_release_ram() &
> +                                      migration_in_postcopy());
> +                ram_counters.transferred += TARGET_PAGE_SIZE;
> +                ram_counters.normal++;
> +                pages = 1;
>              }
>          } else {
>              pages = save_zero_page(rs, block, offset);
> --

I agree that this patch is an improvement for the current implementation. So just pile up mine here:
Reviewed-by: Wei Wang <wei.w.wang@intel.com>


If you are interested in something more aggressive, I can share an alternative approach, which I think would be better. Please see below.

Actually, we can use the multi-threaded compression for the first page as well, which will not block the migration thread progress. The advantage is that we can enjoy the compression benefit for the first page and meanwhile not blocking the migration thread - the page is given to a compression thread and compressed asynchronously to the migration thread execution.

The main barrier to achieving the above that is that we need to make sure the first page of each block is sent first in the multi-threaded environment. We can twist the current implementation to achieve that, which is not hard:

For example, we can add a new flag to RAMBlock - bool first_page_added. In each thread of compression, they need
1) check if this is the first page of the block.
2) If it is the first page, set block->first_page_added after sending the page;
3) If it is not the first the page, wait to send the page only when block->first_page_added is set.

Best,
Wei

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support to detectcompression and decompression errors
  2018-03-27 14:35                   ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-28  3:03                     ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-28  3:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini, jiang.biao2

On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/28/2018 08:43 AM, jiang.biao2@zte.com.cn wrote:
> > > On 03/27/2018 07:17 PM, Peter Xu wrote:
> > > > On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > It'll be understandable to me if the problem is that the compress()
> > > > > > API does not allow the input buffer to be changed during the whole
> > > > > > period of the call.  If that is a must, this patch for sure helps.
> > > > > 
> > > > > Yes, that is exactly what i want to say. :)
> > > > 
> > > > So I think now I know what this patch is for. :) And yeah, it makes
> > > > sense.
> > > > 
> > > > Though another question would be: if the buffer is updated during
> > > > compress() and compress() returned error, would that pollute the whole
> > > > z_stream or it only fails the compress() call?
> > > > 
> > > 
> > > I guess deflateReset() can recover everything, i.e, keep z_stream as
> > > it is init'ed by deflate_init().
> > > 
> > > > (Same question applies to decompress().)
> > > > 
> > > > If it's only a compress() error and it won't pollute z_stream (or say,
> > > > it can be recovered after a deflateReset() and then we can continue to
> > > > call deflate() without problem), then we'll actually have two
> > > > alternatives to solve this "buffer update" issue:
> > > > 
> > > > 1. Use the approach of current patch: we copy the page every time, so
> > > >      deflate() never fails because update never happens.  But it's slow
> > > >      since we copy the pages every time.
> > > > 
> > > > 2. Use the old approach, and when compress() fail, we just ignore that
> > > >      page (since now we know that error _must_ be caused by page update,
> > > >      then we are 100% sure that we'll send that page again so it'll be
> > > >      perfectly fine).
> > > > 
> > > 
> > > No, we can't make the assumption that "error _must_ be caused by page update".
> > > No document/ABI about compress/decompress promised it. :)

Indeed, I found no good documents about below errors that jiang.biao
pointed out.

> > So, as I metioned before, can we just distingush the decompress/compress errors
> > from errors caused by page update by the return code of inflate/deflate?
> > According to the zlib manual, there seems to be several error codes for different
> > cases,
> > #define Z_ERRNO        (-1)
> > #define Z_STREAM_ERROR (-2)
> > #define Z_DATA_ERROR   (-3)
> > #define Z_MEM_ERROR    (-4)
> > #define Z_BUF_ERROR    (-5)
> > #define Z_VERSION_ERROR (-6)
> > Did you check the return code when silent failure(not caused by page update)
> > happened before? :)
> 
> I am afraid there is no such error code and i guess zlib is not designed to
> compress the data which is being modified.

So I agree with you, maybe the only right way to do now is copy the
page, until we know better about zlib and find something useful.

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support to detectcompression and decompression errors
@ 2018-03-28  3:03                     ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-28  3:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: jiang.biao2, kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
> 
> 
> On 03/28/2018 08:43 AM, jiang.biao2@zte.com.cn wrote:
> > > On 03/27/2018 07:17 PM, Peter Xu wrote:
> > > > On Tue, Mar 27, 2018 at 03:42:32AM +0800, Xiao Guangrong wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > It'll be understandable to me if the problem is that the compress()
> > > > > > API does not allow the input buffer to be changed during the whole
> > > > > > period of the call.  If that is a must, this patch for sure helps.
> > > > > 
> > > > > Yes, that is exactly what i want to say. :)
> > > > 
> > > > So I think now I know what this patch is for. :) And yeah, it makes
> > > > sense.
> > > > 
> > > > Though another question would be: if the buffer is updated during
> > > > compress() and compress() returned error, would that pollute the whole
> > > > z_stream or it only fails the compress() call?
> > > > 
> > > 
> > > I guess deflateReset() can recover everything, i.e, keep z_stream as
> > > it is init'ed by deflate_init().
> > > 
> > > > (Same question applies to decompress().)
> > > > 
> > > > If it's only a compress() error and it won't pollute z_stream (or say,
> > > > it can be recovered after a deflateReset() and then we can continue to
> > > > call deflate() without problem), then we'll actually have two
> > > > alternatives to solve this "buffer update" issue:
> > > > 
> > > > 1. Use the approach of current patch: we copy the page every time, so
> > > >      deflate() never fails because update never happens.  But it's slow
> > > >      since we copy the pages every time.
> > > > 
> > > > 2. Use the old approach, and when compress() fail, we just ignore that
> > > >      page (since now we know that error _must_ be caused by page update,
> > > >      then we are 100% sure that we'll send that page again so it'll be
> > > >      perfectly fine).
> > > > 
> > > 
> > > No, we can't make the assumption that "error _must_ be caused by page update".
> > > No document/ABI about compress/decompress promised it. :)

Indeed, I found no good documents about below errors that jiang.biao
pointed out.

> > So, as I metioned before, can we just distingush the decompress/compress errors
> > from errors caused by page update by the return code of inflate/deflate?
> > According to the zlib manual, there seems to be several error codes for different
> > cases,
> > #define Z_ERRNO        (-1)
> > #define Z_STREAM_ERROR (-2)
> > #define Z_DATA_ERROR   (-3)
> > #define Z_MEM_ERROR    (-4)
> > #define Z_BUF_ERROR    (-5)
> > #define Z_VERSION_ERROR (-6)
> > Did you check the return code when silent failure(not caused by page update)
> > happened before? :)
> 
> I am afraid there is no such error code and i guess zlib is not designed to
> compress the data which is being modified.

So I agree with you, maybe the only right way to do now is copy the
page, until we know better about zlib and find something useful.

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support todetectcompression and decompression errors
  2018-03-28  3:03                     ` [Qemu-devel] " Peter Xu
@ 2018-03-28  4:08                       ` jiang.biao2
  -1 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-28  4:08 UTC (permalink / raw)
  To: peterx
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, guangrong.xiao, pbonzini

> 
> On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
>
>> > > No, we can't make the assumption that "error _must_ be caused by page update".
>> > > No document/ABI about compress/decompress promised it. :)
>
> Indeed, I found no good documents about below errors that jiang.biao
> pointed out.
Hi, Peter
The description about the errors comes from here,
http://www.zlib.net/manual.html
And about the error codes returned by inflate(), they are described as,
** inflate() returns 
Z_OK if some progress has been made (more input processed or more output produced),
Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced, 
Z_NEED_DICT if a preset dictionary is needed at this point, 
Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a more specific error), 
Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application), 
Z_MEM_ERROR if there was not enough memory, 
Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ... 
**
According to the above description, the error caused by page update looks 
more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)
As I understand it, the real compress/decompress error cases other than that 
caused by page update should be rare, maybe the error code is enough to
distinguish those if we can verify the the error codes returned by page update
and other silent failures by test. If so, we can cut the cost of memcpy.  
If not, I agree with Guangrong's idea too. I never read the zlib code and all my
information comes from the manual, so if anything inaccurate, pls ignore my
option. :)

Regards,
Jiang

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support todetectcompression and decompression errors
@ 2018-03-28  4:08                       ` jiang.biao2
  0 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-28  4:08 UTC (permalink / raw)
  To: peterx
  Cc: guangrong.xiao, kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

> 
> On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
>
>> > > No, we can't make the assumption that "error _must_ be caused by page update".
>> > > No document/ABI about compress/decompress promised it. :)
>
> Indeed, I found no good documents about below errors that jiang.biao
> pointed out.
Hi, Peter
The description about the errors comes from here,
http://www.zlib.net/manual.html
And about the error codes returned by inflate(), they are described as,
** inflate() returns 
Z_OK if some progress has been made (more input processed or more output produced),
Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced, 
Z_NEED_DICT if a preset dictionary is needed at this point, 
Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a more specific error), 
Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application), 
Z_MEM_ERROR if there was not enough memory, 
Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ... 
**
According to the above description, the error caused by page update looks 
more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)
As I understand it, the real compress/decompress error cases other than that 
caused by page update should be rare, maybe the error code is enough to
distinguish those if we can verify the the error codes returned by page update
and other silent failures by test. If so, we can cut the cost of memcpy.  
If not, I agree with Guangrong's idea too. I never read the zlib code and all my
information comes from the manual, so if anything inaccurate, pls ignore my
option. :)

Regards,
Jiang

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support todetectcompression and decompression errors
  2018-03-28  4:08                       ` [Qemu-devel] " jiang.biao2
@ 2018-03-28  4:20                         ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-28  4:20 UTC (permalink / raw)
  To: jiang.biao2
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, guangrong.xiao, pbonzini

On Wed, Mar 28, 2018 at 12:08:19PM +0800, jiang.biao2@zte.com.cn wrote:
> > 
> > On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
> >
> >> > > No, we can't make the assumption that "error _must_ be caused by page update".
> >> > > No document/ABI about compress/decompress promised it. :)
> >
> > Indeed, I found no good documents about below errors that jiang.biao
> > pointed out.
> Hi, Peter
> The description about the errors comes from here,
> http://www.zlib.net/manual.html
> And about the error codes returned by inflate(), they are described as,
> ** inflate() returns 
> Z_OK if some progress has been made (more input processed or more output produced),
> Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced, 
> Z_NEED_DICT if a preset dictionary is needed at this point, 
> Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a more specific error), 
> Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application), 
> Z_MEM_ERROR if there was not enough memory, 
> Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ... 
> **

Ah yes.  My bad to be so uncareful. :)

> According to the above description, the error caused by page update looks 
> more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)
> As I understand it, the real compress/decompress error cases other than that 
> caused by page update should be rare, maybe the error code is enough to
> distinguish those if we can verify the the error codes returned by page update
> and other silent failures by test. If so, we can cut the cost of memcpy.  
> If not, I agree with Guangrong's idea too. I never read the zlib code and all my
> information comes from the manual, so if anything inaccurate, pls ignore my
> option. :)

So I suppose all of us know that alternative now, we just need a solid
way to confirm the uncertainty.  I'll leave this to Guangrong.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support todetectcompression and decompression errors
@ 2018-03-28  4:20                         ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-28  4:20 UTC (permalink / raw)
  To: jiang.biao2
  Cc: guangrong.xiao, kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

On Wed, Mar 28, 2018 at 12:08:19PM +0800, jiang.biao2@zte.com.cn wrote:
> > 
> > On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
> >
> >> > > No, we can't make the assumption that "error _must_ be caused by page update".
> >> > > No document/ABI about compress/decompress promised it. :)
> >
> > Indeed, I found no good documents about below errors that jiang.biao
> > pointed out.
> Hi, Peter
> The description about the errors comes from here,
> http://www.zlib.net/manual.html
> And about the error codes returned by inflate(), they are described as,
> ** inflate() returns 
> Z_OK if some progress has been made (more input processed or more output produced),
> Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced, 
> Z_NEED_DICT if a preset dictionary is needed at this point, 
> Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a more specific error), 
> Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application), 
> Z_MEM_ERROR if there was not enough memory, 
> Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ... 
> **

Ah yes.  My bad to be so uncareful. :)

> According to the above description, the error caused by page update looks 
> more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)
> As I understand it, the real compress/decompress error cases other than that 
> caused by page update should be rare, maybe the error code is enough to
> distinguish those if we can verify the the error codes returned by page update
> and other silent failures by test. If so, we can cut the cost of memcpy.  
> If not, I agree with Guangrong's idea too. I never read the zlib code and all my
> information comes from the manual, so if anything inaccurate, pls ignore my
> option. :)

So I suppose all of us know that alternative now, we just need a solid
way to confirm the uncertainty.  I'll leave this to Guangrong.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-27 15:24       ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-28  7:30         ` Wei Wang
  -1 siblings, 0 replies; 126+ messages in thread
From: Wei Wang @ 2018-03-28  7:30 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, mst, mtosatti
  Cc: Peter Xu, Xiao Guangrong, qemu-devel, kvm, Dr. David Alan Gilbert

On 03/27/2018 11:24 PM, Xiao Guangrong wrote:
>
>
> On 03/28/2018 11:01 AM, Wang, Wei W wrote:
>> On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
>>>
>>> As compression is a heavy work, do not do it in migration thread, 
>>> instead, we
>>> post it out as a normal page
>>>
>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>>
>> Hi Guangrong,
>>
>> Dave asked me to help review your patch, so I will just drop my 2 
>> cents wherever possible, and hope that could be inspiring for your work.
>
> Thank you both for the nice help on the work. :)
>
>>
>>
>>> ---
>>>   migration/ram.c | 32 ++++++++++++++++----------------
>>>   1 file changed, 16 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/migration/ram.c b/migration/ram.c index
>>> 7266351fd0..615693f180 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
>>> *rs, PageSearchStatus *pss,
>>>       int pages = -1;
>>>       uint64_t bytes_xmit = 0;
>>>       uint8_t *p;
>>> -    int ret, blen;
>>> +    int ret;
>>>       RAMBlock *block = pss->block;
>>>       ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>>>
>>> @@ -1162,23 +1162,23 @@ static int
>>> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>>>           if (block != rs->last_sent_block) {
>>>               flush_compressed_data(rs);
>>>               pages = save_zero_page(rs, block, offset);
>>> -            if (pages == -1) {
>>> -                /* Make sure the first page is sent out before 
>>> other pages */
>>> -                bytes_xmit = save_page_header(rs, rs->f, block, 
>>> offset |
>>> - RAM_SAVE_FLAG_COMPRESS_PAGE);
>>> -                blen = qemu_put_compression_data(rs->f, p, 
>>> TARGET_PAGE_SIZE,
>>> - migrate_compress_level());
>>> -                if (blen > 0) {
>>> -                    ram_counters.transferred += bytes_xmit + blen;
>>> -                    ram_counters.normal++;
>>> -                    pages = 1;
>>> -                } else {
>>> -                    qemu_file_set_error(rs->f, blen);
>>> -                    error_report("compressed data failed!");
>>> -                }
>>> -            }
>>>               if (pages > 0) {
>>>                   ram_release_pages(block->idstr, offset, pages);
>>> +            } else {
>>> +                /*
>>> +                 * Make sure the first page is sent out before 
>>> other pages.
>>> +                 *
>>> +                 * we post it as normal page as compression will 
>>> take much
>>> +                 * CPU resource.
>>> +                 */
>>> +                ram_counters.transferred += save_page_header(rs, 
>>> rs->f, block,
>>> +                                                offset | 
>>> RAM_SAVE_FLAG_PAGE);
>>> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
>>> +                                      migrate_release_ram() &
>>> + migration_in_postcopy());
>>> +                ram_counters.transferred += TARGET_PAGE_SIZE;
>>> +                ram_counters.normal++;
>>> +                pages = 1;
>>>               }
>>>           } else {
>>>               pages = save_zero_page(rs, block, offset);
>>> -- 
>>
>> I agree that this patch is an improvement for the current 
>> implementation. So just pile up mine here:
>> Reviewed-by: Wei Wang <wei.w.wang@intel.com>
>
> Thanks.
>
>>
>>
>> If you are interested in something more aggressive, I can share an 
>> alternative approach, which I think would be better. Please see below.
>>
>> Actually, we can use the multi-threaded compression for the first 
>> page as well, which will not block the migration thread progress. The 
>> advantage is that we can enjoy the compression benefit for the first 
>> page and meanwhile not blocking the migration thread - the page is 
>> given to a compression thread and compressed asynchronously to the 
>> migration thread execution.
>>
>
> Yes, it is a good point.
>
>> The main barrier to achieving the above that is that we need to make 
>> sure the first page of each block is sent first in the multi-threaded 
>> environment. We can twist the current implementation to achieve that, 
>> which is not hard:
>>
>> For example, we can add a new flag to RAMBlock - bool 
>> first_page_added. In each thread of compression, they need
>> 1) check if this is the first page of the block.
>> 2) If it is the first page, set block->first_page_added after sending 
>> the page;
>> 3) If it is not the first the page, wait to send the page only when 
>> block->first_page_added is set.
>
>
> So there is another barrier introduced which hurts the parallel...
>
> Hmm, we need more deliberate consideration on this point, let me think 
> it over after this work.
>

Sure. Just a reminder, this doesn't have to be a barrier to the 
compression, it is just used to serialize sending the pages.

Btw, this reminds me a possible bug in this patch (also in the current 
upstream code): there appears to be no guarantee that the first page 
will be sent before others. The migration thread and the compression 
thread use different buffers. The migration thread just puts the first 
page into its buffer first,  the second page is put to the compression 
thread buffer later. There appears to be no guarantee that the migration 
thread will flush its buffer before the compression thread.

Best,
Wei

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-28  7:30         ` Wei Wang
  0 siblings, 0 replies; 126+ messages in thread
From: Wei Wang @ 2018-03-28  7:30 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, mst, mtosatti
  Cc: qemu-devel, kvm, Xiao Guangrong, Peter Xu, Dr. David Alan Gilbert

On 03/27/2018 11:24 PM, Xiao Guangrong wrote:
>
>
> On 03/28/2018 11:01 AM, Wang, Wei W wrote:
>> On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
>>>
>>> As compression is a heavy work, do not do it in migration thread, 
>>> instead, we
>>> post it out as a normal page
>>>
>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>>
>> Hi Guangrong,
>>
>> Dave asked me to help review your patch, so I will just drop my 2 
>> cents wherever possible, and hope that could be inspiring for your work.
>
> Thank you both for the nice help on the work. :)
>
>>
>>
>>> ---
>>>   migration/ram.c | 32 ++++++++++++++++----------------
>>>   1 file changed, 16 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/migration/ram.c b/migration/ram.c index
>>> 7266351fd0..615693f180 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
>>> *rs, PageSearchStatus *pss,
>>>       int pages = -1;
>>>       uint64_t bytes_xmit = 0;
>>>       uint8_t *p;
>>> -    int ret, blen;
>>> +    int ret;
>>>       RAMBlock *block = pss->block;
>>>       ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>>>
>>> @@ -1162,23 +1162,23 @@ static int
>>> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>>>           if (block != rs->last_sent_block) {
>>>               flush_compressed_data(rs);
>>>               pages = save_zero_page(rs, block, offset);
>>> -            if (pages == -1) {
>>> -                /* Make sure the first page is sent out before 
>>> other pages */
>>> -                bytes_xmit = save_page_header(rs, rs->f, block, 
>>> offset |
>>> - RAM_SAVE_FLAG_COMPRESS_PAGE);
>>> -                blen = qemu_put_compression_data(rs->f, p, 
>>> TARGET_PAGE_SIZE,
>>> - migrate_compress_level());
>>> -                if (blen > 0) {
>>> -                    ram_counters.transferred += bytes_xmit + blen;
>>> -                    ram_counters.normal++;
>>> -                    pages = 1;
>>> -                } else {
>>> -                    qemu_file_set_error(rs->f, blen);
>>> -                    error_report("compressed data failed!");
>>> -                }
>>> -            }
>>>               if (pages > 0) {
>>>                   ram_release_pages(block->idstr, offset, pages);
>>> +            } else {
>>> +                /*
>>> +                 * Make sure the first page is sent out before 
>>> other pages.
>>> +                 *
>>> +                 * we post it as normal page as compression will 
>>> take much
>>> +                 * CPU resource.
>>> +                 */
>>> +                ram_counters.transferred += save_page_header(rs, 
>>> rs->f, block,
>>> +                                                offset | 
>>> RAM_SAVE_FLAG_PAGE);
>>> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
>>> +                                      migrate_release_ram() &
>>> + migration_in_postcopy());
>>> +                ram_counters.transferred += TARGET_PAGE_SIZE;
>>> +                ram_counters.normal++;
>>> +                pages = 1;
>>>               }
>>>           } else {
>>>               pages = save_zero_page(rs, block, offset);
>>> -- 
>>
>> I agree that this patch is an improvement for the current 
>> implementation. So just pile up mine here:
>> Reviewed-by: Wei Wang <wei.w.wang@intel.com>
>
> Thanks.
>
>>
>>
>> If you are interested in something more aggressive, I can share an 
>> alternative approach, which I think would be better. Please see below.
>>
>> Actually, we can use the multi-threaded compression for the first 
>> page as well, which will not block the migration thread progress. The 
>> advantage is that we can enjoy the compression benefit for the first 
>> page and meanwhile not blocking the migration thread - the page is 
>> given to a compression thread and compressed asynchronously to the 
>> migration thread execution.
>>
>
> Yes, it is a good point.
>
>> The main barrier to achieving the above that is that we need to make 
>> sure the first page of each block is sent first in the multi-threaded 
>> environment. We can twist the current implementation to achieve that, 
>> which is not hard:
>>
>> For example, we can add a new flag to RAMBlock - bool 
>> first_page_added. In each thread of compression, they need
>> 1) check if this is the first page of the block.
>> 2) If it is the first page, set block->first_page_added after sending 
>> the page;
>> 3) If it is not the first the page, wait to send the page only when 
>> block->first_page_added is set.
>
>
> So there is another barrier introduced which hurts the parallel...
>
> Hmm, we need more deliberate consideration on this point, let me think 
> it over after this work.
>

Sure. Just a reminder, this doesn't have to be a barrier to the 
compression, it is just used to serialize sending the pages.

Btw, this reminds me a possible bug in this patch (also in the current 
upstream code): there appears to be no guarantee that the first page 
will be sent before others. The migration thread and the compression 
thread use different buffers. The migration thread just puts the first 
page into its buffer first,  the second page is put to the compression 
thread buffer later. There appears to be no guarantee that the migration 
thread will flush its buffer before the compression thread.

Best,
Wei

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-28  7:30         ` [Qemu-devel] " Wei Wang
@ 2018-03-28  7:37           ` Peter Xu
  -1 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-28  7:37 UTC (permalink / raw)
  To: Wei Wang
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel,
	Dr. David Alan Gilbert, Xiao Guangrong, pbonzini

On Wed, Mar 28, 2018 at 03:30:06PM +0800, Wei Wang wrote:
> On 03/27/2018 11:24 PM, Xiao Guangrong wrote:
> > 
> > 
> > On 03/28/2018 11:01 AM, Wang, Wei W wrote:
> > > On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
> > > > 
> > > > As compression is a heavy work, do not do it in migration
> > > > thread, instead, we
> > > > post it out as a normal page
> > > > 
> > > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > 
> > > Hi Guangrong,
> > > 
> > > Dave asked me to help review your patch, so I will just drop my 2
> > > cents wherever possible, and hope that could be inspiring for your
> > > work.
> > 
> > Thank you both for the nice help on the work. :)
> > 
> > > 
> > > 
> > > > ---
> > > >   migration/ram.c | 32 ++++++++++++++++----------------
> > > >   1 file changed, 16 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/migration/ram.c b/migration/ram.c index
> > > > 7266351fd0..615693f180 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
> > > > *rs, PageSearchStatus *pss,
> > > >       int pages = -1;
> > > >       uint64_t bytes_xmit = 0;
> > > >       uint8_t *p;
> > > > -    int ret, blen;
> > > > +    int ret;
> > > >       RAMBlock *block = pss->block;
> > > >       ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> > > > 
> > > > @@ -1162,23 +1162,23 @@ static int
> > > > ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
> > > >           if (block != rs->last_sent_block) {
> > > >               flush_compressed_data(rs);
> > > >               pages = save_zero_page(rs, block, offset);
> > > > -            if (pages == -1) {
> > > > -                /* Make sure the first page is sent out before
> > > > other pages */
> > > > -                bytes_xmit = save_page_header(rs, rs->f, block,
> > > > offset |
> > > > - RAM_SAVE_FLAG_COMPRESS_PAGE);
> > > > -                blen = qemu_put_compression_data(rs->f, p,
> > > > TARGET_PAGE_SIZE,
> > > > - migrate_compress_level());
> > > > -                if (blen > 0) {
> > > > -                    ram_counters.transferred += bytes_xmit + blen;
> > > > -                    ram_counters.normal++;
> > > > -                    pages = 1;
> > > > -                } else {
> > > > -                    qemu_file_set_error(rs->f, blen);
> > > > -                    error_report("compressed data failed!");
> > > > -                }
> > > > -            }
> > > >               if (pages > 0) {
> > > >                   ram_release_pages(block->idstr, offset, pages);
> > > > +            } else {
> > > > +                /*
> > > > +                 * Make sure the first page is sent out before
> > > > other pages.
> > > > +                 *
> > > > +                 * we post it as normal page as compression
> > > > will take much
> > > > +                 * CPU resource.
> > > > +                 */
> > > > +                ram_counters.transferred +=
> > > > save_page_header(rs, rs->f, block,
> > > > +                                                offset |
> > > > RAM_SAVE_FLAG_PAGE);
> > > > +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> > > > +                                      migrate_release_ram() &
> > > > + migration_in_postcopy());
> > > > +                ram_counters.transferred += TARGET_PAGE_SIZE;
> > > > +                ram_counters.normal++;
> > > > +                pages = 1;
> > > >               }
> > > >           } else {
> > > >               pages = save_zero_page(rs, block, offset);
> > > > -- 
> > > 
> > > I agree that this patch is an improvement for the current
> > > implementation. So just pile up mine here:
> > > Reviewed-by: Wei Wang <wei.w.wang@intel.com>
> > 
> > Thanks.
> > 
> > > 
> > > 
> > > If you are interested in something more aggressive, I can share an
> > > alternative approach, which I think would be better. Please see
> > > below.
> > > 
> > > Actually, we can use the multi-threaded compression for the first
> > > page as well, which will not block the migration thread progress.
> > > The advantage is that we can enjoy the compression benefit for the
> > > first page and meanwhile not blocking the migration thread - the
> > > page is given to a compression thread and compressed asynchronously
> > > to the migration thread execution.
> > > 
> > 
> > Yes, it is a good point.
> > 
> > > The main barrier to achieving the above that is that we need to make
> > > sure the first page of each block is sent first in the
> > > multi-threaded environment. We can twist the current implementation
> > > to achieve that, which is not hard:
> > > 
> > > For example, we can add a new flag to RAMBlock - bool
> > > first_page_added. In each thread of compression, they need
> > > 1) check if this is the first page of the block.
> > > 2) If it is the first page, set block->first_page_added after
> > > sending the page;
> > > 3) If it is not the first the page, wait to send the page only when
> > > block->first_page_added is set.
> > 
> > 
> > So there is another barrier introduced which hurts the parallel...
> > 
> > Hmm, we need more deliberate consideration on this point, let me think
> > it over after this work.
> > 
> 
> Sure. Just a reminder, this doesn't have to be a barrier to the compression,
> it is just used to serialize sending the pages.
> 
> Btw, this reminds me a possible bug in this patch (also in the current
> upstream code): there appears to be no guarantee that the first page will be
> sent before others. The migration thread and the compression thread use
> different buffers. The migration thread just puts the first page into its
> buffer first,  the second page is put to the compression thread buffer
> later. There appears to be no guarantee that the migration thread will flush
> its buffer before the compression thread.

IIUC finally the compression buffers will be queued into the migration
IO stream, so they are still serialized.

In compress_page_with_multi_thread() there is:

        bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);

comp_param[idx].file should be the compression buffer.

rs->f should be the migration IO stream. Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-28  7:37           ` Peter Xu
  0 siblings, 0 replies; 126+ messages in thread
From: Peter Xu @ 2018-03-28  7:37 UTC (permalink / raw)
  To: Wei Wang
  Cc: Xiao Guangrong, pbonzini, mst, mtosatti, qemu-devel, kvm,
	Xiao Guangrong, Dr. David Alan Gilbert

On Wed, Mar 28, 2018 at 03:30:06PM +0800, Wei Wang wrote:
> On 03/27/2018 11:24 PM, Xiao Guangrong wrote:
> > 
> > 
> > On 03/28/2018 11:01 AM, Wang, Wei W wrote:
> > > On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
> > > > 
> > > > As compression is a heavy work, do not do it in migration
> > > > thread, instead, we
> > > > post it out as a normal page
> > > > 
> > > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > 
> > > Hi Guangrong,
> > > 
> > > Dave asked me to help review your patch, so I will just drop my 2
> > > cents wherever possible, and hope that could be inspiring for your
> > > work.
> > 
> > Thank you both for the nice help on the work. :)
> > 
> > > 
> > > 
> > > > ---
> > > >   migration/ram.c | 32 ++++++++++++++++----------------
> > > >   1 file changed, 16 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/migration/ram.c b/migration/ram.c index
> > > > 7266351fd0..615693f180 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
> > > > *rs, PageSearchStatus *pss,
> > > >       int pages = -1;
> > > >       uint64_t bytes_xmit = 0;
> > > >       uint8_t *p;
> > > > -    int ret, blen;
> > > > +    int ret;
> > > >       RAMBlock *block = pss->block;
> > > >       ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> > > > 
> > > > @@ -1162,23 +1162,23 @@ static int
> > > > ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
> > > >           if (block != rs->last_sent_block) {
> > > >               flush_compressed_data(rs);
> > > >               pages = save_zero_page(rs, block, offset);
> > > > -            if (pages == -1) {
> > > > -                /* Make sure the first page is sent out before
> > > > other pages */
> > > > -                bytes_xmit = save_page_header(rs, rs->f, block,
> > > > offset |
> > > > - RAM_SAVE_FLAG_COMPRESS_PAGE);
> > > > -                blen = qemu_put_compression_data(rs->f, p,
> > > > TARGET_PAGE_SIZE,
> > > > - migrate_compress_level());
> > > > -                if (blen > 0) {
> > > > -                    ram_counters.transferred += bytes_xmit + blen;
> > > > -                    ram_counters.normal++;
> > > > -                    pages = 1;
> > > > -                } else {
> > > > -                    qemu_file_set_error(rs->f, blen);
> > > > -                    error_report("compressed data failed!");
> > > > -                }
> > > > -            }
> > > >               if (pages > 0) {
> > > >                   ram_release_pages(block->idstr, offset, pages);
> > > > +            } else {
> > > > +                /*
> > > > +                 * Make sure the first page is sent out before
> > > > other pages.
> > > > +                 *
> > > > +                 * we post it as normal page as compression
> > > > will take much
> > > > +                 * CPU resource.
> > > > +                 */
> > > > +                ram_counters.transferred +=
> > > > save_page_header(rs, rs->f, block,
> > > > +                                                offset |
> > > > RAM_SAVE_FLAG_PAGE);
> > > > +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> > > > +                                      migrate_release_ram() &
> > > > + migration_in_postcopy());
> > > > +                ram_counters.transferred += TARGET_PAGE_SIZE;
> > > > +                ram_counters.normal++;
> > > > +                pages = 1;
> > > >               }
> > > >           } else {
> > > >               pages = save_zero_page(rs, block, offset);
> > > > -- 
> > > 
> > > I agree that this patch is an improvement for the current
> > > implementation. So just pile up mine here:
> > > Reviewed-by: Wei Wang <wei.w.wang@intel.com>
> > 
> > Thanks.
> > 
> > > 
> > > 
> > > If you are interested in something more aggressive, I can share an
> > > alternative approach, which I think would be better. Please see
> > > below.
> > > 
> > > Actually, we can use the multi-threaded compression for the first
> > > page as well, which will not block the migration thread progress.
> > > The advantage is that we can enjoy the compression benefit for the
> > > first page and meanwhile not blocking the migration thread - the
> > > page is given to a compression thread and compressed asynchronously
> > > to the migration thread execution.
> > > 
> > 
> > Yes, it is a good point.
> > 
> > > The main barrier to achieving the above that is that we need to make
> > > sure the first page of each block is sent first in the
> > > multi-threaded environment. We can twist the current implementation
> > > to achieve that, which is not hard:
> > > 
> > > For example, we can add a new flag to RAMBlock - bool
> > > first_page_added. In each thread of compression, they need
> > > 1) check if this is the first page of the block.
> > > 2) If it is the first page, set block->first_page_added after
> > > sending the page;
> > > 3) If it is not the first the page, wait to send the page only when
> > > block->first_page_added is set.
> > 
> > 
> > So there is another barrier introduced which hurts the parallel...
> > 
> > Hmm, we need more deliberate consideration on this point, let me think
> > it over after this work.
> > 
> 
> Sure. Just a reminder, this doesn't have to be a barrier to the compression,
> it is just used to serialize sending the pages.
> 
> Btw, this reminds me a possible bug in this patch (also in the current
> upstream code): there appears to be no guarantee that the first page will be
> sent before others. The migration thread and the compression thread use
> different buffers. The migration thread just puts the first page into its
> buffer first,  the second page is put to the compression thread buffer
> later. There appears to be no guarantee that the migration thread will flush
> its buffer before the compression thread.

IIUC finally the compression buffers will be queued into the migration
IO stream, so they are still serialized.

In compress_page_with_multi_thread() there is:

        bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);

comp_param[idx].file should be the compression buffer.

rs->f should be the migration IO stream. Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/8] migration: support todetectcompressionand decompression errors
  2018-03-27 18:44                           ` [Qemu-devel] " Xiao Guangrong
@ 2018-03-28  8:07                             ` jiang.biao2
  -1 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-28  8:07 UTC (permalink / raw)
  To: guangrong.xiao
  Cc: kvm, mst, mtosatti, xiaoguangrong, qemu-devel, peterx, pbonzini

> On 03/28/2018 12:20 PM, Peter Xu wrote:
>> On Wed, Mar 28, 2018 at 12:08:19PM +0800, jiang.biao2@zte.com.cn wrote:
>>>>
>>>> On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
>>>>
>>>>>>> No, we can't make the assumption that "error _must_ be caused by page update".
>>>>>>> No document/ABI about compress/decompress promised it. :)
>>>>
>>>> Indeed, I found no good documents about below errors that jiang.biao
>>>> pointed out.
>>> Hi, Peter
>>> The description about the errors comes from here,
>>> http://www.zlib.net/manual.html
>>> And about the error codes returned by inflate(), they are described as,
>>> ** inflate() returns
>>> Z_OK if some progress has been made (more input processed or more output produced),
>>> Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced,
>>> Z_NEED_DICT if a preset dictionary is needed at this point,
>>> Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a >more specific error),
>>> Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application),
>>> Z_MEM_ERROR if there was not enough memory,
>>> Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ...
>>> **
>>
>> Ah yes.  My bad to be so uncareful. :)
>>
>>> According to the above description, the error caused by page update looks
>>> more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)
>
> No, still lack info to confirm the case of compressing the data being
> updated is the only one to return Z_DATA_ERROR. And nothing provided
> that no other error condition causes data corrupted will be squeezed
> into this error code.
>
>>> As I understand it, the real compress/decompress error cases other than that
>>> caused by page update should be rare, maybe the error code is enough to
>>> distinguish those if we can verify the the error codes returned by page update
>>> and other silent failures by test. If so, we can cut the cost of memcpy.
>
> Please note, compare with other operations, e.g, compression, detect zero page,
> etc., memcpy() is not a hot function at all.

Just out of curiousity, what's level of memory num need to be copied for 
normal cases? KBs, MBs? 

>>> If not, I agree with Guangrong's idea too. I never read the zlib code and all my
>>> information comes from the manual, so if anything inaccurate, pls ignore my
>>> option. :)
>>
>> So I suppose all of us know that alternative now, we just need a solid
>> way to confirm the uncertainty.  I'll leave this to Guangrong.
>
> Yes, i still prefer to memcpy() to make it safe enough to protect our production
> unless we get enough certainty to figure out the error conditions.

Indeed, no garantee for that currently, so for safe, we need memcpy(). 
Never mind, pls just ignore my opinion. :)  
Thanks!

Regards,
Jiang

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] migration: support todetectcompressionand decompression errors
@ 2018-03-28  8:07                             ` jiang.biao2
  0 siblings, 0 replies; 126+ messages in thread
From: jiang.biao2 @ 2018-03-28  8:07 UTC (permalink / raw)
  To: guangrong.xiao
  Cc: peterx, kvm, mst, mtosatti, xiaoguangrong, qemu-devel, pbonzini

> On 03/28/2018 12:20 PM, Peter Xu wrote:
>> On Wed, Mar 28, 2018 at 12:08:19PM +0800, jiang.biao2@zte.com.cn wrote:
>>>>
>>>> On Tue, Mar 27, 2018 at 10:35:29PM +0800, Xiao Guangrong wrote:
>>>>
>>>>>>> No, we can't make the assumption that "error _must_ be caused by page update".
>>>>>>> No document/ABI about compress/decompress promised it. :)
>>>>
>>>> Indeed, I found no good documents about below errors that jiang.biao
>>>> pointed out.
>>> Hi, Peter
>>> The description about the errors comes from here,
>>> http://www.zlib.net/manual.html
>>> And about the error codes returned by inflate(), they are described as,
>>> ** inflate() returns
>>> Z_OK if some progress has been made (more input processed or more output produced),
>>> Z_STREAM_END if the end of the compressed data has been reached and all uncompressed output has been produced,
>>> Z_NEED_DICT if a preset dictionary is needed at this point,
>>> Z_DATA_ERROR if the input data was corrupted (input stream not conforming to the zlib format or incorrect check value, in which case strm->msg points to a string with a >more specific error),
>>> Z_STREAM_ERROR if the stream structure was inconsistent (for example next_in or next_out was Z_NULL, or the state was inadvertently written over by the application),
>>> Z_MEM_ERROR if there was not enough memory,
>>> Z_BUF_ERROR if no progress was possible or if there was not enough room in the output buffer when Z_FINISH is used. ...
>>> **
>>
>> Ah yes.  My bad to be so uncareful. :)
>>
>>> According to the above description, the error caused by page update looks
>>> more like tend to return Z_DATA_ERROR, but I do not have env to verify that. :)
>
> No, still lack info to confirm the case of compressing the data being
> updated is the only one to return Z_DATA_ERROR. And nothing provided
> that no other error condition causes data corrupted will be squeezed
> into this error code.
>
>>> As I understand it, the real compress/decompress error cases other than that
>>> caused by page update should be rare, maybe the error code is enough to
>>> distinguish those if we can verify the the error codes returned by page update
>>> and other silent failures by test. If so, we can cut the cost of memcpy.
>
> Please note, compare with other operations, e.g, compression, detect zero page,
> etc., memcpy() is not a hot function at all.

Just out of curiousity, what's level of memory num need to be copied for 
normal cases? KBs, MBs? 

>>> If not, I agree with Guangrong's idea too. I never read the zlib code and all my
>>> information comes from the manual, so if anything inaccurate, pls ignore my
>>> option. :)
>>
>> So I suppose all of us know that alternative now, we just need a solid
>> way to confirm the uncertainty.  I'll leave this to Guangrong.
>
> Yes, i still prefer to memcpy() to make it safe enough to protect our production
> unless we get enough certainty to figure out the error conditions.

Indeed, no garantee for that currently, so for safe, we need memcpy(). 
Never mind, pls just ignore my opinion. :)  
Thanks!

Regards,
Jiang

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/8] migration: stop compressing page in migration thread
  2018-03-28  7:37           ` [Qemu-devel] " Peter Xu
@ 2018-03-28  8:30             ` Wei Wang
  -1 siblings, 0 replies; 126+ messages in thread
From: Wei Wang @ 2018-03-28  8:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, mst, mtosatti, Xiao Guangrong, qemu-devel,
	Dr. David Alan Gilbert, Xiao Guangrong, pbonzini

On 03/28/2018 03:37 PM, Peter Xu wrote:
> On Wed, Mar 28, 2018 at 03:30:06PM +0800, Wei Wang wrote:
>> On 03/27/2018 11:24 PM, Xiao Guangrong wrote:
>>>
>>> On 03/28/2018 11:01 AM, Wang, Wei W wrote:
>>>> On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
>>>>> As compression is a heavy work, do not do it in migration
>>>>> thread, instead, we
>>>>> post it out as a normal page
>>>>>
>>>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>
>>>> Hi Guangrong,
>>>>
>>>> Dave asked me to help review your patch, so I will just drop my 2
>>>> cents wherever possible, and hope that could be inspiring for your
>>>> work.
>>> Thank you both for the nice help on the work. :)
>>>
>>>>
>>>>> ---
>>>>>    migration/ram.c | 32 ++++++++++++++++----------------
>>>>>    1 file changed, 16 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/migration/ram.c b/migration/ram.c index
>>>>> 7266351fd0..615693f180 100644
>>>>> --- a/migration/ram.c
>>>>> +++ b/migration/ram.c
>>>>> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
>>>>> *rs, PageSearchStatus *pss,
>>>>>        int pages = -1;
>>>>>        uint64_t bytes_xmit = 0;
>>>>>        uint8_t *p;
>>>>> -    int ret, blen;
>>>>> +    int ret;
>>>>>        RAMBlock *block = pss->block;
>>>>>        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>>>>>
>>>>> @@ -1162,23 +1162,23 @@ static int
>>>>> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>>>>>            if (block != rs->last_sent_block) {
>>>>>                flush_compressed_data(rs);
>>>>>                pages = save_zero_page(rs, block, offset);
>>>>> -            if (pages == -1) {
>>>>> -                /* Make sure the first page is sent out before
>>>>> other pages */
>>>>> -                bytes_xmit = save_page_header(rs, rs->f, block,
>>>>> offset |
>>>>> - RAM_SAVE_FLAG_COMPRESS_PAGE);
>>>>> -                blen = qemu_put_compression_data(rs->f, p,
>>>>> TARGET_PAGE_SIZE,
>>>>> - migrate_compress_level());
>>>>> -                if (blen > 0) {
>>>>> -                    ram_counters.transferred += bytes_xmit + blen;
>>>>> -                    ram_counters.normal++;
>>>>> -                    pages = 1;
>>>>> -                } else {
>>>>> -                    qemu_file_set_error(rs->f, blen);
>>>>> -                    error_report("compressed data failed!");
>>>>> -                }
>>>>> -            }
>>>>>                if (pages > 0) {
>>>>>                    ram_release_pages(block->idstr, offset, pages);
>>>>> +            } else {
>>>>> +                /*
>>>>> +                 * Make sure the first page is sent out before
>>>>> other pages.
>>>>> +                 *
>>>>> +                 * we post it as normal page as compression
>>>>> will take much
>>>>> +                 * CPU resource.
>>>>> +                 */
>>>>> +                ram_counters.transferred +=
>>>>> save_page_header(rs, rs->f, block,
>>>>> +                                                offset |
>>>>> RAM_SAVE_FLAG_PAGE);
>>>>> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
>>>>> +                                      migrate_release_ram() &
>>>>> + migration_in_postcopy());
>>>>> +                ram_counters.transferred += TARGET_PAGE_SIZE;
>>>>> +                ram_counters.normal++;
>>>>> +                pages = 1;
>>>>>                }
>>>>>            } else {
>>>>>                pages = save_zero_page(rs, block, offset);
>>>>> -- 
>>>> I agree that this patch is an improvement for the current
>>>> implementation. So just pile up mine here:
>>>> Reviewed-by: Wei Wang <wei.w.wang@intel.com>
>>> Thanks.
>>>
>>>>
>>>> If you are interested in something more aggressive, I can share an
>>>> alternative approach, which I think would be better. Please see
>>>> below.
>>>>
>>>> Actually, we can use the multi-threaded compression for the first
>>>> page as well, which will not block the migration thread progress.
>>>> The advantage is that we can enjoy the compression benefit for the
>>>> first page and meanwhile not blocking the migration thread - the
>>>> page is given to a compression thread and compressed asynchronously
>>>> to the migration thread execution.
>>>>
>>> Yes, it is a good point.
>>>
>>>> The main barrier to achieving the above that is that we need to make
>>>> sure the first page of each block is sent first in the
>>>> multi-threaded environment. We can twist the current implementation
>>>> to achieve that, which is not hard:
>>>>
>>>> For example, we can add a new flag to RAMBlock - bool
>>>> first_page_added. In each thread of compression, they need
>>>> 1) check if this is the first page of the block.
>>>> 2) If it is the first page, set block->first_page_added after
>>>> sending the page;
>>>> 3) If it is not the first the page, wait to send the page only when
>>>> block->first_page_added is set.
>>>
>>> So there is another barrier introduced which hurts the parallel...
>>>
>>> Hmm, we need more deliberate consideration on this point, let me think
>>> it over after this work.
>>>
>> Sure. Just a reminder, this doesn't have to be a barrier to the compression,
>> it is just used to serialize sending the pages.
>>
>> Btw, this reminds me a possible bug in this patch (also in the current
>> upstream code): there appears to be no guarantee that the first page will be
>> sent before others. The migration thread and the compression thread use
>> different buffers. The migration thread just puts the first page into its
>> buffer first,  the second page is put to the compression thread buffer
>> later. There appears to be no guarantee that the migration thread will flush
>> its buffer before the compression thread.
> IIUC finally the compression buffers will be queued into the migration
> IO stream, so they are still serialized.
>
> In compress_page_with_multi_thread() there is:
>
>          bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
>
> comp_param[idx].file should be the compression buffer.
>
> rs->f should be the migration IO stream.

OK, thanks. It turns out that the comp_param[idx].file is not writable 
currently. This needs an extra copy, which could be avoided with the 
above approach.

Best,
Wei

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread
@ 2018-03-28  8:30             ` Wei Wang
  0 siblings, 0 replies; 126+ messages in thread
From: Wei Wang @ 2018-03-28  8:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: Xiao Guangrong, pbonzini, mst, mtosatti, qemu-devel, kvm,
	Xiao Guangrong, Dr. David Alan Gilbert

On 03/28/2018 03:37 PM, Peter Xu wrote:
> On Wed, Mar 28, 2018 at 03:30:06PM +0800, Wei Wang wrote:
>> On 03/27/2018 11:24 PM, Xiao Guangrong wrote:
>>>
>>> On 03/28/2018 11:01 AM, Wang, Wei W wrote:
>>>> On Tuesday, March 13, 2018 3:58 PM, Xiao Guangrong wrote:
>>>>> As compression is a heavy work, do not do it in migration
>>>>> thread, instead, we
>>>>> post it out as a normal page
>>>>>
>>>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>
>>>> Hi Guangrong,
>>>>
>>>> Dave asked me to help review your patch, so I will just drop my 2
>>>> cents wherever possible, and hope that could be inspiring for your
>>>> work.
>>> Thank you both for the nice help on the work. :)
>>>
>>>>
>>>>> ---
>>>>>    migration/ram.c | 32 ++++++++++++++++----------------
>>>>>    1 file changed, 16 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/migration/ram.c b/migration/ram.c index
>>>>> 7266351fd0..615693f180 100644
>>>>> --- a/migration/ram.c
>>>>> +++ b/migration/ram.c
>>>>> @@ -1132,7 +1132,7 @@ static int ram_save_compressed_page(RAMState
>>>>> *rs, PageSearchStatus *pss,
>>>>>        int pages = -1;
>>>>>        uint64_t bytes_xmit = 0;
>>>>>        uint8_t *p;
>>>>> -    int ret, blen;
>>>>> +    int ret;
>>>>>        RAMBlock *block = pss->block;
>>>>>        ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>>>>>
>>>>> @@ -1162,23 +1162,23 @@ static int
>>>>> ram_save_compressed_page(RAMState *rs, PageSearchStatus *pss,
>>>>>            if (block != rs->last_sent_block) {
>>>>>                flush_compressed_data(rs);
>>>>>                pages = save_zero_page(rs, block, offset);
>>>>> -            if (pages == -1) {
>>>>> -                /* Make sure the first page is sent out before
>>>>> other pages */
>>>>> -                bytes_xmit = save_page_header(rs, rs->f, block,
>>>>> offset |
>>>>> - RAM_SAVE_FLAG_COMPRESS_PAGE);
>>>>> -                blen = qemu_put_compression_data(rs->f, p,
>>>>> TARGET_PAGE_SIZE,
>>>>> - migrate_compress_level());
>>>>> -                if (blen > 0) {
>>>>> -                    ram_counters.transferred += bytes_xmit + blen;
>>>>> -                    ram_counters.normal++;
>>>>> -                    pages = 1;
>>>>> -                } else {
>>>>> -                    qemu_file_set_error(rs->f, blen);
>>>>> -                    error_report("compressed data failed!");
>>>>> -                }
>>>>> -            }
>>>>>                if (pages > 0) {
>>>>>                    ram_release_pages(block->idstr, offset, pages);
>>>>> +            } else {
>>>>> +                /*
>>>>> +                 * Make sure the first page is sent out before
>>>>> other pages.
>>>>> +                 *
>>>>> +                 * we post it as normal page as compression
>>>>> will take much
>>>>> +                 * CPU resource.
>>>>> +                 */
>>>>> +                ram_counters.transferred +=
>>>>> save_page_header(rs, rs->f, block,
>>>>> +                                                offset |
>>>>> RAM_SAVE_FLAG_PAGE);
>>>>> +                qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
>>>>> +                                      migrate_release_ram() &
>>>>> + migration_in_postcopy());
>>>>> +                ram_counters.transferred += TARGET_PAGE_SIZE;
>>>>> +                ram_counters.normal++;
>>>>> +                pages = 1;
>>>>>                }
>>>>>            } else {
>>>>>                pages = save_zero_page(rs, block, offset);
>>>>> -- 
>>>> I agree that this patch is an improvement for the current
>>>> implementation. So just pile up mine here:
>>>> Reviewed-by: Wei Wang <wei.w.wang@intel.com>
>>> Thanks.
>>>
>>>>
>>>> If you are interested in something more aggressive, I can share an
>>>> alternative approach, which I think would be better. Please see
>>>> below.
>>>>
>>>> Actually, we can use the multi-threaded compression for the first
>>>> page as well, which will not block the migration thread progress.
>>>> The advantage is that we can enjoy the compression benefit for the
>>>> first page and meanwhile not blocking the migration thread - the
>>>> page is given to a compression thread and compressed asynchronously
>>>> to the migration thread execution.
>>>>
>>> Yes, it is a good point.
>>>
>>>> The main barrier to achieving the above that is that we need to make
>>>> sure the first page of each block is sent first in the
>>>> multi-threaded environment. We can twist the current implementation
>>>> to achieve that, which is not hard:
>>>>
>>>> For example, we can add a new flag to RAMBlock - bool
>>>> first_page_added. In each thread of compression, they need
>>>> 1) check if this is the first page of the block.
>>>> 2) If it is the first page, set block->first_page_added after
>>>> sending the page;
>>>> 3) If it is not the first the page, wait to send the page only when
>>>> block->first_page_added is set.
>>>
>>> So there is another barrier introduced which hurts the parallel...
>>>
>>> Hmm, we need more deliberate consideration on this point, let me think
>>> it over after this work.
>>>
>> Sure. Just a reminder, this doesn't have to be a barrier to the compression,
>> it is just used to serialize sending the pages.
>>
>> Btw, this reminds me a possible bug in this patch (also in the current
>> upstream code): there appears to be no guarantee that the first page will be
>> sent before others. The migration thread and the compression thread use
>> different buffers. The migration thread just puts the first page into its
>> buffer first,  the second page is put to the compression thread buffer
>> later. There appears to be no guarantee that the migration thread will flush
>> its buffer before the compression thread.
> IIUC finally the compression buffers will be queued into the migration
> IO stream, so they are still serialized.
>
> In compress_page_with_multi_thread() there is:
>
>          bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
>
> comp_param[idx].file should be the compression buffer.
>
> rs->f should be the migration IO stream.

OK, thanks. It turns out that the comp_param[idx].file is not writable 
currently. This needs an extra copy, which could be avoided with the 
above approach.

Best,
Wei

^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2018-03-28  8:30 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-13  7:57 [PATCH 0/8] migration: improve and cleanup compression guangrong.xiao
2018-03-13  7:57 ` [Qemu-devel] " guangrong.xiao
2018-03-13  7:57 ` [PATCH 1/8] migration: stop compressing page in migration thread guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 10:25   ` Dr. David Alan Gilbert
2018-03-15 10:25     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-16  8:05     ` Xiao Guangrong
2018-03-16  8:05       ` [Qemu-devel] " Xiao Guangrong
2018-03-19 12:11       ` Dr. David Alan Gilbert
2018-03-19 12:11         ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-21  8:19       ` Peter Xu
2018-03-21  8:19         ` [Qemu-devel] " Peter Xu
2018-03-22 11:38         ` Xiao Guangrong
2018-03-22 11:38           ` [Qemu-devel] " Xiao Guangrong
2018-03-26  9:02           ` Peter Xu
2018-03-26  9:02             ` [Qemu-devel] " Peter Xu
2018-03-26 15:43             ` Xiao Guangrong
2018-03-26 15:43               ` [Qemu-devel] " Xiao Guangrong
2018-03-27  7:33               ` Peter Xu
2018-03-27  7:33                 ` [Qemu-devel] " Peter Xu
2018-03-27 19:12               ` Dr. David Alan Gilbert
2018-03-27 19:12                 ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-28  3:01   ` Wang, Wei W
2018-03-28  3:01     ` [Qemu-devel] " Wang, Wei W
2018-03-27 15:24     ` Xiao Guangrong
2018-03-27 15:24       ` [Qemu-devel] " Xiao Guangrong
2018-03-28  7:30       ` Wei Wang
2018-03-28  7:30         ` [Qemu-devel] " Wei Wang
2018-03-28  7:37         ` Peter Xu
2018-03-28  7:37           ` [Qemu-devel] " Peter Xu
2018-03-28  8:30           ` Wei Wang
2018-03-28  8:30             ` [Qemu-devel] " Wei Wang
2018-03-13  7:57 ` [PATCH 2/8] migration: stop allocating and freeing memory frequently guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 11:03   ` Dr. David Alan Gilbert
2018-03-15 11:03     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-16  8:19     ` Xiao Guangrong
2018-03-16  8:19       ` [Qemu-devel] " Xiao Guangrong
2018-03-19 10:54       ` Dr. David Alan Gilbert
2018-03-19 10:54         ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-19 12:11         ` Xiao Guangrong
2018-03-19 12:11           ` [Qemu-devel] " Xiao Guangrong
2018-03-19  1:49   ` [PATCH 2/8] migration: stop allocating and freeingmemory frequently jiang.biao2
2018-03-19  1:49     ` [Qemu-devel] " jiang.biao2
2018-03-19  4:03     ` Xiao Guangrong
2018-03-19  4:03       ` [Qemu-devel] " Xiao Guangrong
2018-03-19  4:48       ` [PATCH 2/8] migration: stop allocating andfreeingmemory frequently jiang.biao2
2018-03-19  4:48         ` [Qemu-devel] " jiang.biao2
2018-03-21  9:06   ` [PATCH 2/8] migration: stop allocating and freeing memory frequently Peter Xu
2018-03-21  9:06     ` [Qemu-devel] " Peter Xu
2018-03-22 11:57     ` Xiao Guangrong
2018-03-22 11:57       ` [Qemu-devel] " Xiao Guangrong
2018-03-27  7:07       ` Peter Xu
2018-03-27  7:07         ` [Qemu-devel] " Peter Xu
2018-03-13  7:57 ` [PATCH 3/8] migration: support to detect compression and decompression errors guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 11:29   ` Dr. David Alan Gilbert
2018-03-15 11:29     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-16  8:25     ` Xiao Guangrong
2018-03-16  8:25       ` [Qemu-devel] " Xiao Guangrong
2018-03-19  7:56   ` [PATCH 3/8] migration: support to detect compressionand " jiang.biao2
2018-03-19  7:56     ` [Qemu-devel] " jiang.biao2
2018-03-19  8:01     ` Xiao Guangrong
2018-03-19  8:01       ` [Qemu-devel] " Xiao Guangrong
2018-03-21 10:00   ` [PATCH 3/8] migration: support to detect compression and " Peter Xu
2018-03-21 10:00     ` [Qemu-devel] " Peter Xu
2018-03-22 12:03     ` Xiao Guangrong
2018-03-22 12:03       ` [Qemu-devel] " Xiao Guangrong
2018-03-27  7:22       ` Peter Xu
2018-03-27  7:22         ` [Qemu-devel] " Peter Xu
2018-03-26 19:42         ` Xiao Guangrong
2018-03-26 19:42           ` [Qemu-devel] " Xiao Guangrong
2018-03-27 11:17           ` Peter Xu
2018-03-27 11:17             ` [Qemu-devel] " Peter Xu
2018-03-27  1:20             ` Xiao Guangrong
2018-03-27  1:20               ` [Qemu-devel] " Xiao Guangrong
2018-03-28  0:43               ` [PATCH 3/8] migration: support to detectcompression " jiang.biao2
2018-03-28  0:43                 ` [Qemu-devel] " jiang.biao2
2018-03-27 14:35                 ` Xiao Guangrong
2018-03-27 14:35                   ` [Qemu-devel] " Xiao Guangrong
2018-03-28  3:03                   ` Peter Xu
2018-03-28  3:03                     ` [Qemu-devel] " Peter Xu
2018-03-28  4:08                     ` [PATCH 3/8] migration: support todetectcompression " jiang.biao2
2018-03-28  4:08                       ` [Qemu-devel] " jiang.biao2
2018-03-28  4:20                       ` Peter Xu
2018-03-28  4:20                         ` [Qemu-devel] " Peter Xu
2018-03-27 18:44                         ` Xiao Guangrong
2018-03-27 18:44                           ` [Qemu-devel] " Xiao Guangrong
2018-03-28  8:07                           ` [PATCH 3/8] migration: support todetectcompressionand " jiang.biao2
2018-03-28  8:07                             ` [Qemu-devel] " jiang.biao2
2018-03-13  7:57 ` [PATCH 4/8] migration: introduce control_save_page() guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 11:37   ` Dr. David Alan Gilbert
2018-03-15 11:37     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-16  8:52     ` Xiao Guangrong
2018-03-16  8:52       ` [Qemu-devel] " Xiao Guangrong
2018-03-27  7:47     ` Peter Xu
2018-03-27  7:47       ` [Qemu-devel] " Peter Xu
2018-03-13  7:57 ` [PATCH 5/8] migration: move calling control_save_page to the common place guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 11:47   ` Dr. David Alan Gilbert
2018-03-15 11:47     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-16  8:59     ` Xiao Guangrong
2018-03-16  8:59       ` [Qemu-devel] " Xiao Guangrong
2018-03-19 13:15       ` Dr. David Alan Gilbert
2018-03-19 13:15         ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-27 12:35   ` Peter Xu
2018-03-27 12:35     ` [Qemu-devel] " Peter Xu
2018-03-13  7:57 ` [PATCH 6/8] migration: move calling save_zero_page " guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 12:27   ` Dr. David Alan Gilbert
2018-03-15 12:27     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-27 12:49   ` Peter Xu
2018-03-27 12:49     ` [Qemu-devel] " Peter Xu
2018-03-13  7:57 ` [PATCH 7/8] migration: introduce save_normal_page() guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 12:30   ` Dr. David Alan Gilbert
2018-03-15 12:30     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-27 12:54   ` Peter Xu
2018-03-27 12:54     ` [Qemu-devel] " Peter Xu
2018-03-13  7:57 ` [PATCH 8/8] migration: remove ram_save_compressed_page() guangrong.xiao
2018-03-13  7:57   ` [Qemu-devel] " guangrong.xiao
2018-03-15 12:32   ` Dr. David Alan Gilbert
2018-03-15 12:32     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-27 12:56   ` Peter Xu
2018-03-27 12:56     ` [Qemu-devel] " Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.