[PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO
@ 2021-04-01  7:47 leirao
  2021-04-01  7:47 ` [PATCH v5 01/10] Remove some duplicate trace code leirao
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao,Lei, qemu-devel

From: Rao,Lei <lei.rao@intel.com>

Changes since v4:
        --Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD in colo_bitmap_clear_dirty.
        --Modify some minor issues about variable definition.
        --Add some performance test data in the commit message.

Changes since v3:
        --Remove cpu_throttle_stop from mig_throttle_counter_reset.

Changes since v2:
        --Add a function named packet_new_nocopy.
        --Continue to optimize the function of colo_flush_ram_cache.

Changes since v1:
        --Reset the state of the auto-converge counters at every checkpoint instead of directly disabling.
        --Treat the filter_send function returning zero as a normal case.

The series of patches include:
        Fixed some bugs of qemu crash.
        Optimized some code to reduce the time of checkpoint.
        Remove some unnecessary code to improve COLO.

Rao, Lei (10):
  Remove some duplicate trace code.
  Fix the qemu crash when guest shutdown during checkpoint
  Optimize the function of filter_send
  Remove migrate_set_block_enabled in checkpoint
  Add a function named packet_new_nocopy for COLO.
  Add the function of colo_compare_cleanup
  Reset the auto-converge counter at every checkpoint.
  Reduce the PVM stop time during Checkpoint
  Add the function of colo_bitmap_clear_dirty
  Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()

 migration/colo.c      | 10 +++----
 migration/migration.c |  4 +++
 migration/ram.c       | 83 +++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/ram.h       |  1 +
 net/colo-compare.c    | 25 +++++++---------
 net/colo-compare.h    |  1 +
 net/colo.c            | 23 ++++++++++++++
 net/colo.h            |  1 +
 net/filter-mirror.c   |  8 ++---
 net/filter-rewriter.c |  3 +-
 net/net.c             |  4 +++
 softmmu/runstate.c    |  1 +
 12 files changed, 135 insertions(+), 29 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 01/10] Remove some duplicate trace code.
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  5:21   ` Zhang, Chen
  2021-04-01  7:47 ` [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint leirao
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

There is the same trace code in the colo_compare_packet_payload.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 net/colo-compare.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 9d1ad99..c142c08 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -590,19 +590,6 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt)
     uint16_t offset = ppkt->vnet_hdr_len;
 
     trace_colo_compare_main("compare other");
-    if (trace_event_get_state_backends(TRACE_COLO_COMPARE_IP_INFO)) {
-        char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
-
-        strcpy(pri_ip_src, inet_ntoa(ppkt->ip->ip_src));
-        strcpy(pri_ip_dst, inet_ntoa(ppkt->ip->ip_dst));
-        strcpy(sec_ip_src, inet_ntoa(spkt->ip->ip_src));
-        strcpy(sec_ip_dst, inet_ntoa(spkt->ip->ip_dst));
-
-        trace_colo_compare_ip_info(ppkt->size, pri_ip_src,
-                                   pri_ip_dst, spkt->size,
-                                   sec_ip_src, sec_ip_dst);
-    }
-
     if (ppkt->size != spkt->size) {
         trace_colo_compare_main("Other: payload size of packets are different");
         return -1;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
  2021-04-01  7:47 ` [PATCH v5 01/10] Remove some duplicate trace code leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  5:21   ` Zhang, Chen
  2021-04-01  7:47 ` [PATCH v5 03/10] Optimize the function of filter_send leirao
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

This patch fixes the following:
    qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
    Aborted (core dumped)

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 softmmu/runstate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index ce8977c..1564057 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -126,6 +126,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
     { RUN_STATE_COLO, RUN_STATE_RUNNING },
+    { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
 
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
     { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 03/10] Optimize the function of filter_send
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
  2021-04-01  7:47 ` [PATCH v5 01/10] Remove some duplicate trace code leirao
  2021-04-01  7:47 ` [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  5:23   ` Zhang, Chen
  2021-04-01  7:47 ` [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint leirao
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

The iov_size has been calculated in filter_send(). we can directly
return the size.In this way, this is no need to repeat calculations
in filter_redirector_receive_iov();

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 net/filter-mirror.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index f8e6500..f20240c 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -88,7 +88,7 @@ static int filter_send(MirrorState *s,
         goto err;
     }
 
-    return 0;
+    return size;
 
 err:
     return ret < 0 ? ret : -EIO;
@@ -159,7 +159,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
     int ret;
 
     ret = filter_send(s, iov, iovcnt);
-    if (ret) {
+    if (ret < 0) {
         error_report("filter mirror send failed(%s)", strerror(-ret));
     }
 
@@ -182,10 +182,10 @@ static ssize_t filter_redirector_receive_iov(NetFilterState *nf,
 
     if (qemu_chr_fe_backend_connected(&s->chr_out)) {
         ret = filter_send(s, iov, iovcnt);
-        if (ret) {
+        if (ret < 0) {
             error_report("filter redirector send failed(%s)", strerror(-ret));
         }
-        return iov_size(iov, iovcnt);
+        return ret;
     } else {
         return 0;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (2 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 03/10] Optimize the function of filter_send leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  5:25   ` Zhang, Chen
  2021-04-01  7:47 ` [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO leirao
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

We can detect disk migration in migrate_prepare, if disk migration
is enabled in COLO mode, we can directly report an error.and there
is no need to disable block migration at every checkpoint.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/colo.c      | 6 ------
 migration/migration.c | 4 ++++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index de27662..1aaf316 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -435,12 +435,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     if (failover_get_state() != FAILOVER_STATUS_NONE) {
         goto out;
     }
-
-    /* Disable block migration */
-    migrate_set_block_enabled(false, &local_err);
-    if (local_err) {
-        goto out;
-    }
     qemu_mutex_lock_iothread();
 
 #ifdef CONFIG_REPLICATION
diff --git a/migration/migration.c b/migration/migration.c
index ca8b97b..4578f22 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2219,6 +2219,10 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
     }
 
     if (blk || blk_inc) {
+        if (migrate_colo_enabled()) {
+            error_setg(errp, "No disk migration is required in COLO mode");
+            return false;
+        }
         if (migrate_use_block() || migrate_use_block_incremental()) {
             error_setg(errp, "Command options are incompatible with "
                        "current migration capabilities");
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO.
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (3 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  5:30   ` Zhang, Chen
  2021-04-01  7:47 ` [PATCH v5 06/10] Add the function of colo_compare_cleanup leirao
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

Use the packet_new_nocopy instead of packet_new in the
filter-rewriter module. There will be one less memory
copy in the processing of each network packet.

Signed-off-by: Lei Rao <lei.rao@intel.com>
---
 net/colo.c            | 23 +++++++++++++++++++++++
 net/colo.h            |  1 +
 net/filter-rewriter.c |  3 +--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index ef00609..58106a8 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -174,6 +174,29 @@ Packet *packet_new(const void *data, int size, int vnet_hdr_len)
     return pkt;
 }
 
+/*
+ * packet_new_nocopy will not copy data, so the caller can't release
+ * the data. And it will be released in packet_destroy.
+ */
+Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len)
+{
+    Packet *pkt = g_slice_new(Packet);
+
+    pkt->data = data;
+    pkt->size = size;
+    pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    pkt->vnet_hdr_len = vnet_hdr_len;
+    pkt->tcp_seq = 0;
+    pkt->tcp_ack = 0;
+    pkt->seq_end = 0;
+    pkt->header_size = 0;
+    pkt->payload_size = 0;
+    pkt->offset = 0;
+    pkt->flags = 0;
+
+    return pkt;
+}
+
 void packet_destroy(void *opaque, void *user_data)
 {
     Packet *pkt = opaque;
diff --git a/net/colo.h b/net/colo.h
index 573ab91..d91cd24 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -101,6 +101,7 @@ bool connection_has_tracked(GHashTable *connection_track_table,
                             ConnectionKey *key);
 void connection_hashtable_reset(GHashTable *connection_track_table);
 Packet *packet_new(const void *data, int size, int vnet_hdr_len);
+Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
 void packet_destroy_partial(void *opaque, void *user_data);
 
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 10fe393..cb3a96c 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -270,8 +270,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
         vnet_hdr_len = nf->netdev->vnet_hdr_len;
     }
 
-    pkt = packet_new(buf, size, vnet_hdr_len);
-    g_free(buf);
+    pkt = packet_new_nocopy(buf, size, vnet_hdr_len);
 
     /*
      * if we get tcp packet
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 06/10] Add the function of colo_compare_cleanup
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (4 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  8:48   ` Zhang, Chen
  2021-04-01  7:47 ` [PATCH v5 07/10] Reset the auto-converge counter at every checkpoint leirao
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

This patch fixes the following:
    #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    #1  0x00007f6ae4559859 in __GI_abort () at abort.c:79
    #2  0x0000559aaa386720 in error_exit (err=16, msg=0x559aaa5973d0 <__func__.16227> "qemu_mutex_destroy") at util/qemu-thread-posix.c:36
    #3  0x0000559aaa3868c5 in qemu_mutex_destroy (mutex=0x559aabffe828) at util/qemu-thread-posix.c:69
    #4  0x0000559aaa2f93a8 in char_finalize (obj=0x559aabffe800) at chardev/char.c:285
    #5  0x0000559aaa23318a in object_deinit (obj=0x559aabffe800, type=0x559aabfd7d20) at qom/object.c:606
    #6  0x0000559aaa2331b8 in object_deinit (obj=0x559aabffe800, type=0x559aabfd9060) at qom/object.c:610
    #7  0x0000559aaa233200 in object_finalize (data=0x559aabffe800) at qom/object.c:620
    #8  0x0000559aaa234202 in object_unref (obj=0x559aabffe800) at qom/object.c:1074
    #9  0x0000559aaa2356b6 in object_finalize_child_property (obj=0x559aac0dac10, name=0x559aac778760 "compare0-0", opaque=0x559aabffe800) at qom/object.c:1584
    #10 0x0000559aaa232f70 in object_property_del_all (obj=0x559aac0dac10) at qom/object.c:557
    #11 0x0000559aaa2331ed in object_finalize (data=0x559aac0dac10) at qom/object.c:619
    #12 0x0000559aaa234202 in object_unref (obj=0x559aac0dac10) at qom/object.c:1074
    #13 0x0000559aaa2356b6 in object_finalize_child_property (obj=0x559aac0c75c0, name=0x559aac0dadc0 "chardevs", opaque=0x559aac0dac10) at qom/object.c:1584
    #14 0x0000559aaa233071 in object_property_del_child (obj=0x559aac0c75c0, child=0x559aac0dac10, errp=0x0) at qom/object.c:580
    #15 0x0000559aaa233155 in object_unparent (obj=0x559aac0dac10) at qom/object.c:599
    #16 0x0000559aaa2fb721 in qemu_chr_cleanup () at chardev/char.c:1159
    #17 0x0000559aa9f9b110 in main (argc=54, argv=0x7ffeb62fa998, envp=0x7ffeb62fab50) at vl.c:4539

When chardev is cleaned up, chr_write_lock needs to be destroyed. But
the colo-compare module is not cleaned up normally before it when the
guest poweroff. It is holding chr_write_lock at this time. This will
cause qemu crash.So we add the function of colo_compare_cleanup() before
qemu_chr_cleanup() to fix the bug.

Signed-off-by: Lei Rao <lei.rao@intel.com>
---
 net/colo-compare.c | 10 ++++++++++
 net/colo-compare.h |  1 +
 net/net.c          |  4 ++++
 3 files changed, 15 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index c142c08..5b538f4 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1402,6 +1402,16 @@ static void colo_compare_init(Object *obj)
                              compare_set_vnet_hdr);
 }
 
+void colo_compare_cleanup(void)
+{
+    CompareState *tmp = NULL;
+    CompareState *n = NULL;
+
+    QTAILQ_FOREACH_SAFE(tmp, &net_compares, next, n) {
+        object_unparent(OBJECT(tmp));
+    }
+}
+
 static void colo_compare_finalize(Object *obj)
 {
     CompareState *s = COLO_COMPARE(obj);
diff --git a/net/colo-compare.h b/net/colo-compare.h
index 22ddd51..b055270 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -20,5 +20,6 @@
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
 void colo_compare_register_notifier(Notifier *notify);
 void colo_compare_unregister_notifier(Notifier *notify);
+void colo_compare_cleanup(void);
 
 #endif /* QEMU_COLO_COMPARE_H */
diff --git a/net/net.c b/net/net.c
index 725a4e1..8fcb2e7 100644
--- a/net/net.c
+++ b/net/net.c
@@ -53,6 +53,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
+#include "net/colo-compare.h"
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/hmp-output-visitor.h"
@@ -1463,6 +1464,9 @@ void net_cleanup(void)
 {
     NetClientState *nc;
 
+    /*cleanup colo compare module for COLO*/
+    colo_compare_cleanup();
+
     /* We may del multiple entries during qemu_del_net_client(),
      * so QTAILQ_FOREACH_SAFE() is also not safe here.
      */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 07/10] Reset the auto-converge counter at every checkpoint.
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (5 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 06/10] Add the function of colo_compare_cleanup leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-01  7:47 ` [PATCH v5 08/10] Reduce the PVM stop time during Checkpoint leirao
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

if we don't reset the auto-converge counter,
it will continue to run with COLO running,
and eventually the system will hang due to the
CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/colo.c | 4 ++++
 migration/ram.c  | 9 +++++++++
 migration/ram.h  | 1 +
 3 files changed, 14 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 1aaf316..723ffb8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -459,6 +459,10 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     if (ret < 0) {
         goto out;
     }
+
+    if (migrate_auto_converge()) {
+        mig_throttle_counter_reset();
+    }
     /*
      * Only save VM's live state, which not including device state.
      * TODO: We may need a timeout mechanism to prevent COLO process
diff --git a/migration/ram.c b/migration/ram.c
index 40e7895..c69a8e0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -652,6 +652,15 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period,
     }
 }
 
+void mig_throttle_counter_reset(void)
+{
+    RAMState *rs = ram_state;
+
+    rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    rs->num_dirty_pages_period = 0;
+    rs->bytes_xfer_prev = ram_counters.transferred;
+}
+
 /**
  * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache
  *
diff --git a/migration/ram.h b/migration/ram.h
index 6378bb3..3f78175 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -50,6 +50,7 @@ bool ramblock_is_ignored(RAMBlock *block);
 int xbzrle_cache_resize(uint64_t new_size, Error **errp);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_total(void);
+void mig_throttle_counter_reset(void);
 
 uint64_t ram_pagesize_summary(void);
 int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 08/10] Reduce the PVM stop time during Checkpoint
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (6 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 07/10] Reset the auto-converge counter at every checkpoint leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-01  7:47 ` [PATCH v5 09/10] Add the function of colo_bitmap_clear_dirty leirao
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

When flushing memory from ram cache to ram during every checkpoint
on secondary VM, we can copy continuous chunks of memory instead of
4096 bytes per time to reduce the time of VM stop during checkpoint.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c69a8e0..317fa4e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -822,6 +822,41 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return next;
 }
 
+/*
+ * colo_bitmap_find_diry:find contiguous dirty pages from start
+ *
+ * Returns the page offset within memory region of the start of the contiguout
+ * dirty page
+ *
+ * @rs: current RAM state
+ * @rb: RAMBlock where to search for dirty pages
+ * @start: page where we start the search
+ * @num: the number of contiguous dirty pages
+ */
+static inline
+unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
+                                     unsigned long start, unsigned long *num)
+{
+    unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
+    unsigned long *bitmap = rb->bmap;
+    unsigned long first, next;
+
+    *num = 0;
+
+    if (ramblock_is_ignored(rb)) {
+        return size;
+    }
+
+    first = find_next_bit(bitmap, size, start);
+    if (first >= size) {
+        return first;
+    }
+    next = find_next_zero_bit(bitmap, size, first + 1);
+    assert(next >= first);
+    *num = next - first;
+    return first;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -3679,19 +3714,26 @@ void colo_flush_ram_cache(void)
         block = QLIST_FIRST_RCU(&ram_list.blocks);
 
         while (block) {
-            offset = migration_bitmap_find_dirty(ram_state, block, offset);
+            unsigned long num = 0;
 
+            offset = colo_bitmap_find_dirty(ram_state, block, offset, &num);
             if (((ram_addr_t)offset) << TARGET_PAGE_BITS
                 >= block->used_length) {
                 offset = 0;
+                num = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-                migration_bitmap_clear_dirty(ram_state, block, offset);
+                unsigned long i = 0;
+
+                for (i = 0; i < num; i++) {
+                    migration_bitmap_clear_dirty(ram_state, block, offset + i);
+                }
                 dst_host = block->host
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
                 src_host = block->colo_cache
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
-                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE * num);
+                offset += num;
             }
         }
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 09/10] Add the function of colo_bitmap_clear_dirty
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (7 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 08/10] Reduce the PVM stop time during Checkpoint leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-01  7:47 ` [PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info() leirao
  2021-04-04 10:22 ` [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO Lukas Straub
  10 siblings, 0 replies; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

When we use continuous dirty memory copy for flushing ram cache on
secondary VM, we can also clean up the bitmap of contiguous dirty
page memory. This also can reduce the VM stop time during checkpoint.

The performance test for COLO as follow:

Server configuraton:
CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
MEM :251G(type:DDR4 Speed:2666 MT/s)
SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs

dirty pages:3189376  migration_bitmap_clear_dirty time consuming(ns):105194000
dirty pages:3189784  migration_bitmap_clear_dirty time consuming(ns):105297000
dirty pages:3190501  migration_bitmap_clear_dirty time consuming(ns):105410000
dirty pages:3188734  migration_bitmap_clear_dirty time consuming(ns):105138000
dirty pages:3189464  migration_bitmap_clear_dirty time consuming(ns):111736000
dirty pages:3188558  migration_bitmap_clear_dirty time consuming(ns):105079000
dirty pages:3239489  migration_bitmap_clear_dirty time consuming(ns):106761000

dirty pages:3190240  colo_bitmap_clear_dirty time consuming(ns):8369000
dirty pages:3189293  colo_bitmap_clear_dirty time consuming(ns):8388000
dirty pages:3189171  colo_bitmap_clear_dirty time consuming(ns):8641000
dirty pages:3189099  colo_bitmap_clear_dirty time consuming(ns):8280000
dirty pages:3189974  colo_bitmap_clear_dirty time consuming(ns):8352000
dirty pages:3189471  colo_bitmap_clear_dirty time consuming(ns):8348000
dirty pages:3189681  colo_bitmap_clear_dirty time consuming(ns):8426000

it can be seen from the data that colo_bitmap_clear_dirty is more
efficient.

Signed-off-by: Lei Rao <lei.rao@intel.com>
---
 migration/ram.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 317fa4e..570ffa4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,6 +857,36 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return first;
 }
 
+/**
+ * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use
+ * continuous memory copy, so we can also clean up the bitmap of contiguous
+ * dirty memory.
+ */
+static inline bool colo_bitmap_clear_dirty(RAMState *rs,
+                                           RAMBlock *rb,
+                                           unsigned long start,
+                                           unsigned long num)
+{
+    bool ret;
+    unsigned long i = 0;
+
+    /*
+     * Since flush ram cache to ram can only happen on Secondary VM.
+     * and the clear bitmap always is NULL on destination side.
+     * Therefore, there is unnecessary to judge whether the
+     * clear_bitmap needs clear.
+     */
+    QEMU_LOCK_GUARD(&rs->bitmap_mutex);
+    for (i = 0; i < num; i++) {
+        ret = test_and_clear_bit(start + i, rb->bmap);
+        if (ret) {
+            rs->migration_dirty_pages--;
+        }
+    }
+
+    return ret;
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -3723,11 +3753,7 @@ void colo_flush_ram_cache(void)
                 num = 0;
                 block = QLIST_NEXT_RCU(block, next);
             } else {
-                unsigned long i = 0;
-
-                for (i = 0; i < num; i++) {
-                    migration_bitmap_clear_dirty(ram_state, block, offset + i);
-                }
+                colo_bitmap_clear_dirty(ram_state, block, offset, num);
                 dst_host = block->host
                          + (((ram_addr_t)offset) << TARGET_PAGE_BITS);
                 src_host = block->colo_cache
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (8 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 09/10] Add the function of colo_bitmap_clear_dirty leirao
@ 2021-04-01  7:47 ` leirao
  2021-04-08  8:49   ` Zhang, Chen
  2021-04-04 10:22 ` [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO Lukas Straub
  10 siblings, 1 reply; 19+ messages in thread
From: leirao @ 2021-04-01  7:47 UTC (permalink / raw)
  To: chen.zhang, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: Rao, Lei, qemu-devel

From: "Rao, Lei" <lei.rao@intel.com>

The data pointer has skipped vnet_hdr_len in the function of
parse_packet_early().So, we can not subtract vnet_hdr_len again
when calculating pkt->header_size in fill_pkt_tcp_info(). Otherwise,
it will cause network packet comparsion errors and greatly increase
the frequency of checkpoints.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 net/colo-compare.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 5b538f4..b100e7b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -211,7 +211,7 @@ static void fill_pkt_tcp_info(void *data, uint32_t *max_ack)
     pkt->tcp_ack = ntohl(tcphd->th_ack);
     *max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack;
     pkt->header_size = pkt->transport_header - (uint8_t *)pkt->data
-                       + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
+                       + (tcphd->th_off << 2);
     pkt->payload_size = pkt->size - pkt->header_size;
     pkt->seq_end = pkt->tcp_seq + pkt->payload_size;
     pkt->flags = tcphd->th_flags;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO
  2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
                   ` (9 preceding siblings ...)
  2021-04-01  7:47 ` [PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info() leirao
@ 2021-04-04 10:22 ` Lukas Straub
  10 siblings, 0 replies; 19+ messages in thread
From: Lukas Straub @ 2021-04-04 10:22 UTC (permalink / raw)
  To: leirao
  Cc: lizhijian, quintela, jasowang, dgilbert, qemu-devel, chen.zhang,
	pbonzini

[-- Attachment #1: Type: text/plain, Size: 2299 bytes --]

On Thu,  1 Apr 2021 15:47:19 +0800
leirao <lei.rao@intel.com> wrote:

> From: Rao,Lei <lei.rao@intel.com>
> 
> Changes since v4:
>         --Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD in colo_bitmap_clear_dirty.
>         --Modify some minor issues about variable definition.
>         --Add some performance test data in the commit message.
> 
> Changes since v3:
>         --Remove cpu_throttle_stop from mig_throttle_counter_reset.
> 
> Changes since v2:
>         --Add a function named packet_new_nocopy.
>         --Continue to optimize the function of colo_flush_ram_cache.
> 
> Changes since v1:
>         --Reset the state of the auto-converge counters at every checkpoint instead of directly disabling.
>         --Treat the filter_send function returning zero as a normal case.
> 
> The series of patches include:
>         Fixed some bugs of qemu crash.
>         Optimized some code to reduce the time of checkpoint.
>         Remove some unnecessary code to improve COLO.
> 
> Rao, Lei (10):
>   Remove some duplicate trace code.
>   Fix the qemu crash when guest shutdown during checkpoint
>   Optimize the function of filter_send
>   Remove migrate_set_block_enabled in checkpoint
>   Add a function named packet_new_nocopy for COLO.
>   Add the function of colo_compare_cleanup
>   Reset the auto-converge counter at every checkpoint.
>   Reduce the PVM stop time during Checkpoint
>   Add the function of colo_bitmap_clear_dirty
>   Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()
> 
>  migration/colo.c      | 10 +++----
>  migration/migration.c |  4 +++
>  migration/ram.c       | 83 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  migration/ram.h       |  1 +
>  net/colo-compare.c    | 25 +++++++---------
>  net/colo-compare.h    |  1 +
>  net/colo.c            | 23 ++++++++++++++
>  net/colo.h            |  1 +
>  net/filter-mirror.c   |  8 ++---
>  net/filter-rewriter.c |  3 +-
>  net/net.c             |  4 +++
>  softmmu/runstate.c    |  1 +
>  12 files changed, 135 insertions(+), 29 deletions(-)
> 

Looks good to me and works well in my tests:

Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>

Regards,
Lukas Straub

-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 01/10] Remove some duplicate trace code.
  2021-04-01  7:47 ` [PATCH v5 01/10] Remove some duplicate trace code leirao
@ 2021-04-08  5:21   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  5:21 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 01/10] Remove some duplicate trace code.
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> There is the same trace code in the colo_compare_packet_payload.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

Thanks
Chen

> ---
>  net/colo-compare.c | 13 -------------
>  1 file changed, 13 deletions(-)
> 
> diff --git a/net/colo-compare.c b/net/colo-compare.c index 9d1ad99..c142c08
> 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -590,19 +590,6 @@ static int colo_packet_compare_other(Packet *spkt,
> Packet *ppkt)
>      uint16_t offset = ppkt->vnet_hdr_len;
> 
>      trace_colo_compare_main("compare other");
> -    if (trace_event_get_state_backends(TRACE_COLO_COMPARE_IP_INFO))
> {
> -        char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
> -
> -        strcpy(pri_ip_src, inet_ntoa(ppkt->ip->ip_src));
> -        strcpy(pri_ip_dst, inet_ntoa(ppkt->ip->ip_dst));
> -        strcpy(sec_ip_src, inet_ntoa(spkt->ip->ip_src));
> -        strcpy(sec_ip_dst, inet_ntoa(spkt->ip->ip_dst));
> -
> -        trace_colo_compare_ip_info(ppkt->size, pri_ip_src,
> -                                   pri_ip_dst, spkt->size,
> -                                   sec_ip_src, sec_ip_dst);
> -    }
> -
>      if (ppkt->size != spkt->size) {
>          trace_colo_compare_main("Other: payload size of packets are
> different");
>          return -1;
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint
  2021-04-01  7:47 ` [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint leirao
@ 2021-04-08  5:21   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  5:21 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 02/10] Fix the qemu crash when guest shutdown during
> checkpoint
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> This patch fixes the following:
>     qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
>     Aborted (core dumped)
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

> ---
>  softmmu/runstate.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c index
> ce8977c..1564057 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -126,6 +126,7 @@ static const RunStateTransition
> runstate_transitions_def[] = {
>      { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
> 
>      { RUN_STATE_COLO, RUN_STATE_RUNNING },
> +    { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
> 
>      { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>      { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 03/10] Optimize the function of filter_send
  2021-04-01  7:47 ` [PATCH v5 03/10] Optimize the function of filter_send leirao
@ 2021-04-08  5:23   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  5:23 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 03/10] Optimize the function of filter_send
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> The iov_size has been calculated in filter_send(). we can directly return the
> size.In this way, this is no need to repeat calculations in
> filter_redirector_receive_iov();
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

> ---
>  net/filter-mirror.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/filter-mirror.c b/net/filter-mirror.c index f8e6500..f20240c
> 100644
> --- a/net/filter-mirror.c
> +++ b/net/filter-mirror.c
> @@ -88,7 +88,7 @@ static int filter_send(MirrorState *s,
>          goto err;
>      }
> 
> -    return 0;
> +    return size;
> 
>  err:
>      return ret < 0 ? ret : -EIO;
> @@ -159,7 +159,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState
> *nf,
>      int ret;
> 
>      ret = filter_send(s, iov, iovcnt);
> -    if (ret) {
> +    if (ret < 0) {
>          error_report("filter mirror send failed(%s)", strerror(-ret));
>      }
> 
> @@ -182,10 +182,10 @@ static ssize_t
> filter_redirector_receive_iov(NetFilterState *nf,
> 
>      if (qemu_chr_fe_backend_connected(&s->chr_out)) {
>          ret = filter_send(s, iov, iovcnt);
> -        if (ret) {
> +        if (ret < 0) {
>              error_report("filter redirector send failed(%s)", strerror(-ret));
>          }
> -        return iov_size(iov, iovcnt);
> +        return ret;
>      } else {
>          return 0;
>      }
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint
  2021-04-01  7:47 ` [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint leirao
@ 2021-04-08  5:25   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  5:25 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 04/10] Remove migrate_set_block_enabled in
> checkpoint
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> We can detect disk migration in migrate_prepare, if disk migration is enabled
> in COLO mode, we can directly report an error.and there is no need to
> disable block migration at every checkpoint.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

> ---
>  migration/colo.c      | 6 ------
>  migration/migration.c | 4 ++++
>  2 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index de27662..1aaf316
> 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -435,12 +435,6 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>      if (failover_get_state() != FAILOVER_STATUS_NONE) {
>          goto out;
>      }
> -
> -    /* Disable block migration */
> -    migrate_set_block_enabled(false, &local_err);
> -    if (local_err) {
> -        goto out;
> -    }
>      qemu_mutex_lock_iothread();
> 
>  #ifdef CONFIG_REPLICATION
> diff --git a/migration/migration.c b/migration/migration.c index
> ca8b97b..4578f22 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2219,6 +2219,10 @@ static bool migrate_prepare(MigrationState *s,
> bool blk, bool blk_inc,
>      }
> 
>      if (blk || blk_inc) {
> +        if (migrate_colo_enabled()) {
> +            error_setg(errp, "No disk migration is required in COLO mode");
> +            return false;
> +        }
>          if (migrate_use_block() || migrate_use_block_incremental()) {
>              error_setg(errp, "Command options are incompatible with "
>                         "current migration capabilities");
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO.
  2021-04-01  7:47 ` [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO leirao
@ 2021-04-08  5:30   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  5:30 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 05/10] Add a function named packet_new_nocopy for
> COLO.
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> Use the packet_new_nocopy instead of packet_new in the filter-rewriter
> module. There will be one less memory copy in the processing of each
> network packet.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> ---
>  net/colo.c            | 23 +++++++++++++++++++++++
>  net/colo.h            |  1 +
>  net/filter-rewriter.c |  3 +--
>  3 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/net/colo.c b/net/colo.c
> index ef00609..58106a8 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -174,6 +174,29 @@ Packet *packet_new(const void *data, int size, int
> vnet_hdr_len)
>      return pkt;
>  }
> 
> +/*
> + * packet_new_nocopy will not copy data, so the caller can't release
> + * the data. And it will be released in packet_destroy.
> + */
> +Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len) {
> +    Packet *pkt = g_slice_new(Packet);

We can use g_slice_new0() to avoid "pkt->xxx = 0" here.
For the original code also need do this work to optimize code.

Thanks
Chen

> +
> +    pkt->data = data;
> +    pkt->size = size;
> +    pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    pkt->vnet_hdr_len = vnet_hdr_len;
> +    pkt->tcp_seq = 0;
> +    pkt->tcp_ack = 0;
> +    pkt->seq_end = 0;
> +    pkt->header_size = 0;
> +    pkt->payload_size = 0;
> +    pkt->offset = 0;
> +    pkt->flags = 0;
> +
> +    return pkt;
> +}
> +
>  void packet_destroy(void *opaque, void *user_data)  {
>      Packet *pkt = opaque;
> diff --git a/net/colo.h b/net/colo.h
> index 573ab91..d91cd24 100644
> --- a/net/colo.h
> +++ b/net/colo.h
> @@ -101,6 +101,7 @@ bool connection_has_tracked(GHashTable
> *connection_track_table,
>                              ConnectionKey *key);  void
> connection_hashtable_reset(GHashTable *connection_track_table);  Packet
> *packet_new(const void *data, int size, int vnet_hdr_len);
> +Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len);
>  void packet_destroy(void *opaque, void *user_data);  void
> packet_destroy_partial(void *opaque, void *user_data);
> 
> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index 10fe393..cb3a96c
> 100644
> --- a/net/filter-rewriter.c
> +++ b/net/filter-rewriter.c
> @@ -270,8 +270,7 @@ static ssize_t
> colo_rewriter_receive_iov(NetFilterState *nf,
>          vnet_hdr_len = nf->netdev->vnet_hdr_len;
>      }
> 
> -    pkt = packet_new(buf, size, vnet_hdr_len);
> -    g_free(buf);
> +    pkt = packet_new_nocopy(buf, size, vnet_hdr_len);
> 
>      /*
>       * if we get tcp packet
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 06/10] Add the function of colo_compare_cleanup
  2021-04-01  7:47 ` [PATCH v5 06/10] Add the function of colo_compare_cleanup leirao
@ 2021-04-08  8:48   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  8:48 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 06/10] Add the function of colo_compare_cleanup
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> This patch fixes the following:
>     #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>     #1  0x00007f6ae4559859 in __GI_abort () at abort.c:79
>     #2  0x0000559aaa386720 in error_exit (err=16, msg=0x559aaa5973d0
> <__func__.16227> "qemu_mutex_destroy") at util/qemu-thread-posix.c:36
>     #3  0x0000559aaa3868c5 in qemu_mutex_destroy (mutex=0x559aabffe828)
> at util/qemu-thread-posix.c:69
>     #4  0x0000559aaa2f93a8 in char_finalize (obj=0x559aabffe800) at
> chardev/char.c:285
>     #5  0x0000559aaa23318a in object_deinit (obj=0x559aabffe800,
> type=0x559aabfd7d20) at qom/object.c:606
>     #6  0x0000559aaa2331b8 in object_deinit (obj=0x559aabffe800,
> type=0x559aabfd9060) at qom/object.c:610
>     #7  0x0000559aaa233200 in object_finalize (data=0x559aabffe800) at
> qom/object.c:620
>     #8  0x0000559aaa234202 in object_unref (obj=0x559aabffe800) at
> qom/object.c:1074
>     #9  0x0000559aaa2356b6 in object_finalize_child_property
> (obj=0x559aac0dac10, name=0x559aac778760 "compare0-0",
> opaque=0x559aabffe800) at qom/object.c:1584
>     #10 0x0000559aaa232f70 in object_property_del_all (obj=0x559aac0dac10)
> at qom/object.c:557
>     #11 0x0000559aaa2331ed in object_finalize (data=0x559aac0dac10) at
> qom/object.c:619
>     #12 0x0000559aaa234202 in object_unref (obj=0x559aac0dac10) at
> qom/object.c:1074
>     #13 0x0000559aaa2356b6 in object_finalize_child_property
> (obj=0x559aac0c75c0, name=0x559aac0dadc0 "chardevs",
> opaque=0x559aac0dac10) at qom/object.c:1584
>     #14 0x0000559aaa233071 in object_property_del_child
> (obj=0x559aac0c75c0, child=0x559aac0dac10, errp=0x0) at qom/object.c:580
>     #15 0x0000559aaa233155 in object_unparent (obj=0x559aac0dac10) at
> qom/object.c:599
>     #16 0x0000559aaa2fb721 in qemu_chr_cleanup () at chardev/char.c:1159
>     #17 0x0000559aa9f9b110 in main (argc=54, argv=0x7ffeb62fa998,
> envp=0x7ffeb62fab50) at vl.c:4539
> 
> When chardev is cleaned up, chr_write_lock needs to be destroyed. But the
> colo-compare module is not cleaned up normally before it when the guest
> poweroff. It is holding chr_write_lock at this time. This will cause qemu
> crash.So we add the function of colo_compare_cleanup() before
> qemu_chr_cleanup() to fix the bug.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

Thanks
Chen

> ---
>  net/colo-compare.c | 10 ++++++++++
>  net/colo-compare.h |  1 +
>  net/net.c          |  4 ++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/net/colo-compare.c b/net/colo-compare.c index c142c08..5b538f4
> 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -1402,6 +1402,16 @@ static void colo_compare_init(Object *obj)
>                               compare_set_vnet_hdr);  }
> 
> +void colo_compare_cleanup(void)
> +{
> +    CompareState *tmp = NULL;
> +    CompareState *n = NULL;
> +
> +    QTAILQ_FOREACH_SAFE(tmp, &net_compares, next, n) {
> +        object_unparent(OBJECT(tmp));
> +    }
> +}
> +
>  static void colo_compare_finalize(Object *obj)  {
>      CompareState *s = COLO_COMPARE(obj); diff --git a/net/colo-compare.h
> b/net/colo-compare.h index 22ddd51..b055270 100644
> --- a/net/colo-compare.h
> +++ b/net/colo-compare.h
> @@ -20,5 +20,6 @@
>  void colo_notify_compares_event(void *opaque, int event, Error **errp);
> void colo_compare_register_notifier(Notifier *notify);  void
> colo_compare_unregister_notifier(Notifier *notify);
> +void colo_compare_cleanup(void);
> 
>  #endif /* QEMU_COLO_COMPARE_H */
> diff --git a/net/net.c b/net/net.c
> index 725a4e1..8fcb2e7 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -53,6 +53,7 @@
>  #include "sysemu/sysemu.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/sysemu.h"
> +#include "net/colo-compare.h"
>  #include "net/filter.h"
>  #include "qapi/string-output-visitor.h"
>  #include "qapi/hmp-output-visitor.h"
> @@ -1463,6 +1464,9 @@ void net_cleanup(void)  {
>      NetClientState *nc;
> 
> +    /*cleanup colo compare module for COLO*/
> +    colo_compare_cleanup();
> +
>      /* We may del multiple entries during qemu_del_net_client(),
>       * so QTAILQ_FOREACH_SAFE() is also not safe here.
>       */
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()
  2021-04-01  7:47 ` [PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info() leirao
@ 2021-04-08  8:49   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2021-04-08  8:49 UTC (permalink / raw)
  To: Rao, Lei, lizhijian, jasowang, quintela, dgilbert, pbonzini,
	lukasstraub2
  Cc: qemu-devel



> -----Original Message-----
> From: Rao, Lei <lei.rao@intel.com>
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com;
> jasowang@redhat.com; quintela@redhat.com; dgilbert@redhat.com;
> pbonzini@redhat.com; lukasstraub2@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei <lei.rao@intel.com>
> Subject: [PATCH v5 10/10] Fixed calculation error of pkt->header_size in
> fill_pkt_tcp_info()
> 
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> The data pointer has skipped vnet_hdr_len in the function of
> parse_packet_early().So, we can not subtract vnet_hdr_len again when
> calculating pkt->header_size in fill_pkt_tcp_info(). Otherwise, it will cause
> network packet comparsion errors and greatly increase the frequency of
> checkpoints.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

> ---
>  net/colo-compare.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> 5b538f4..b100e7b 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -211,7 +211,7 @@ static void fill_pkt_tcp_info(void *data, uint32_t
> *max_ack)
>      pkt->tcp_ack = ntohl(tcphd->th_ack);
>      *max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack;
>      pkt->header_size = pkt->transport_header - (uint8_t *)pkt->data
> -                       + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
> +                       + (tcphd->th_off << 2);
>      pkt->payload_size = pkt->size - pkt->header_size;
>      pkt->seq_end = pkt->tcp_seq + pkt->payload_size;
>      pkt->flags = tcphd->th_flags;
> --
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-04-08  8:52 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  7:47 [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO leirao
2021-04-01  7:47 ` [PATCH v5 01/10] Remove some duplicate trace code leirao
2021-04-08  5:21   ` Zhang, Chen
2021-04-01  7:47 ` [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint leirao
2021-04-08  5:21   ` Zhang, Chen
2021-04-01  7:47 ` [PATCH v5 03/10] Optimize the function of filter_send leirao
2021-04-08  5:23   ` Zhang, Chen
2021-04-01  7:47 ` [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint leirao
2021-04-08  5:25   ` Zhang, Chen
2021-04-01  7:47 ` [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO leirao
2021-04-08  5:30   ` Zhang, Chen
2021-04-01  7:47 ` [PATCH v5 06/10] Add the function of colo_compare_cleanup leirao
2021-04-08  8:48   ` Zhang, Chen
2021-04-01  7:47 ` [PATCH v5 07/10] Reset the auto-converge counter at every checkpoint leirao
2021-04-01  7:47 ` [PATCH v5 08/10] Reduce the PVM stop time during Checkpoint leirao
2021-04-01  7:47 ` [PATCH v5 09/10] Add the function of colo_bitmap_clear_dirty leirao
2021-04-01  7:47 ` [PATCH v5 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info() leirao
2021-04-08  8:49   ` Zhang, Chen
2021-04-04 10:22 ` [PATCH v5 00/10] Fixed some bugs and optimized some codes for COLO Lukas Straub

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).