All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover
@ 2019-03-03 14:50 Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 1/7] Migration/colo.c: Fix double close bug when occur COLO failover Zhang Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

This series focus on COLO failover bug fix and optimization.

V3:
 - Fix grammer issues in patch 4/7.
 - Add more information in commit log of patch 5/7.

V2:
 - Add patch 4/7 to handle failover state.
 - Add new patches to optimize failover status.

V1:
 - Init patch.


Zhang Chen (7):
  Migration/colo.c: Fix double close bug when occur COLO failover
  Migration/colo.c: Fix COLO failover status error
  Migration/colo.c: Make COLO node running after failover
  Migration/colo.c: Add new COLOExitReason to handle all failover state
  qapi/migration.json: Remove a variable that doesn't exist in example
  Migration/colo.c: Add the necessary checks for colo_do_failover
  Migration/colo.c: Make user obtain the COLO mode info after failover

 migration/colo.c      | 69 ++++++++++++++++++++++++++++++-------------
 migration/migration.c |  3 ++
 qapi/migration.json   | 17 ++++++-----
 3 files changed, 61 insertions(+), 28 deletions(-)

-- 
2.17.GIT

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 1/7] Migration/colo.c: Fix double close bug when occur COLO failover
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 2/7] Migration/colo.c: Fix COLO failover status error Zhang Chen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

In migration_incoming_state_destroy(void) will check the mis->to_src_file
to double close the mis->to_src_file when occur COLO failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/colo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/colo.c b/migration/colo.c
index 398b239d1c..a916dc178c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -872,6 +872,7 @@ out:
     /* Must be called after failover BH is completed */
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
+        mis->to_src_file = NULL;
     }
     migration_incoming_disable_colo();
 
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 2/7] Migration/colo.c: Fix COLO failover status error
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 1/7] Migration/colo.c: Fix double close bug when occur COLO failover Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-08 17:32   ` Dr. David Alan Gilbert
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 3/7] Migration/colo.c: Make COLO node running after failover Zhang Chen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

When finished COLO failover, the status is FAILOVER_STATUS_COMPLETED.
The origin codes misunderstand the FAILOVER_STATUS_REQUIRE.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index a916dc178c..a13acac192 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -121,6 +121,7 @@ static void secondary_vm_do_failover(void)
     }
     /* Notify COLO incoming thread that failover work is finished */
     qemu_sem_post(&mis->colo_incoming_sem);
+
     /* For Secondary VM, jump to incoming co */
     if (mis->migration_incoming_co) {
         qemu_coroutine_enter(mis->migration_incoming_co);
@@ -262,7 +263,7 @@ COLOStatus *qmp_query_colo_status(Error **errp)
     case FAILOVER_STATUS_NONE:
         s->reason = COLO_EXIT_REASON_NONE;
         break;
-    case FAILOVER_STATUS_REQUIRE:
+    case FAILOVER_STATUS_COMPLETED:
         s->reason = COLO_EXIT_REASON_REQUEST;
         break;
     default:
@@ -582,7 +583,7 @@ out:
         qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
                                   COLO_EXIT_REASON_ERROR);
         break;
-    case FAILOVER_STATUS_REQUIRE:
+    case FAILOVER_STATUS_COMPLETED:
         qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
                                   COLO_EXIT_REASON_REQUEST);
         break;
@@ -854,7 +855,7 @@ out:
         qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
                                   COLO_EXIT_REASON_ERROR);
         break;
-    case FAILOVER_STATUS_REQUIRE:
+    case FAILOVER_STATUS_COMPLETED:
         qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
                                   COLO_EXIT_REASON_REQUEST);
         break;
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 3/7] Migration/colo.c: Make COLO node running after failover
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 1/7] Migration/colo.c: Fix double close bug when occur COLO failover Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 2/7] Migration/colo.c: Fix COLO failover status error Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state Zhang Chen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Delay to close COLO for auto start VM after failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/colo.c      | 1 -
 migration/migration.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index a13acac192..89325952c7 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -875,7 +875,6 @@ out:
         qemu_fclose(mis->to_src_file);
         mis->to_src_file = NULL;
     }
-    migration_incoming_disable_colo();
 
     rcu_unregister_thread();
     return NULL;
diff --git a/migration/migration.c b/migration/migration.c
index 37e06b76dc..cec5f529c3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -383,6 +383,9 @@ static void process_incoming_migration_bh(void *opaque)
         } else {
             runstate_set(RUN_STATE_PAUSED);
         }
+    } else if (migration_incoming_colo_enabled()) {
+        migration_incoming_disable_colo();
+        vm_start();
     } else {
         runstate_set(global_state_get_runstate());
     }
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
                   ` (2 preceding siblings ...)
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 3/7] Migration/colo.c: Make COLO node running after failover Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-08 17:39   ` Dr. David Alan Gilbert
  2019-03-09 17:06   ` Markus Armbruster
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 5/7] qapi/migration.json: Remove a variable that doesn't exist in example Zhang Chen
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

In this patch we add the processing state for COLOExitReason,
because we have to identify COLO in the failover processing state or
failover error state. In the way, we can handle all the failover state.
We have improved the description of the COLOExitReason by the way.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c    | 24 +++++++++++++-----------
 qapi/migration.json | 15 +++++++++------
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 89325952c7..dbe2b88807 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -267,7 +267,11 @@ COLOStatus *qmp_query_colo_status(Error **errp)
         s->reason = COLO_EXIT_REASON_REQUEST;
         break;
     default:
-        s->reason = COLO_EXIT_REASON_ERROR;
+        if (migration_in_colo_state()) {
+            s->reason = COLO_EXIT_REASON_PROCESSING;
+        } else {
+            s->reason = COLO_EXIT_REASON_ERROR;
+        }
     }
 
     return s;
@@ -579,16 +583,13 @@ out:
      * or the user triggered failover.
      */
     switch (failover_get_state()) {
-    case FAILOVER_STATUS_NONE:
-        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
-                                  COLO_EXIT_REASON_ERROR);
-        break;
     case FAILOVER_STATUS_COMPLETED:
         qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
                                   COLO_EXIT_REASON_REQUEST);
         break;
     default:
-        abort();
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+                                  COLO_EXIT_REASON_ERROR);
     }
 
     /* Hope this not to be too long to wait here */
@@ -850,17 +851,18 @@ out:
         error_report_err(local_err);
     }
 
+    /*
+     * There are only two reasons we can get here, some error happened
+     * or the user triggered failover.
+     */
     switch (failover_get_state()) {
-    case FAILOVER_STATUS_NONE:
-        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
-                                  COLO_EXIT_REASON_ERROR);
-        break;
     case FAILOVER_STATUS_COMPLETED:
         qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
                                   COLO_EXIT_REASON_REQUEST);
         break;
     default:
-        abort();
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+                                  COLO_EXIT_REASON_ERROR);
     }
 
     if (fb) {
diff --git a/qapi/migration.json b/qapi/migration.json
index 7a795ecc16..089ed67807 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -983,19 +983,22 @@
 ##
 # @COLOExitReason:
 #
-# The reason for a COLO exit
+# Describe the reason for COLO exit.
 #
-# @none: no failover has ever happened. This can't occur in the
-# COLO_EXIT event, only in the result of query-colo-status.
+# @none: failover has never happened. This state does not occur
+# in the COLO_EXIT event, and is only visible in the result of
+# query-colo-status.
 #
-# @request: COLO exit is due to an external request
+# @request: COLO exit caused by an external request.
 #
-# @error: COLO exit is due to an internal error
+# @error: COLO exit caused by an internal error.
+#
+# @processing: COLO is currently handling a failover (since 4.0).
 #
 # Since: 3.1
 ##
 { 'enum': 'COLOExitReason',
-  'data': [ 'none', 'request', 'error' ] }
+  'data': [ 'none', 'request', 'error' , 'processing' ] }
 
 ##
 # @x-colo-lost-heartbeat:
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 5/7] qapi/migration.json: Remove a variable that doesn't exist in example
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
                   ` (3 preceding siblings ...)
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover Zhang Chen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Remove the "active" variable in example for query-colo-status.
It is a doc bug from commit f56c0065

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qapi/migration.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 089ed67807..18929f1154 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1342,7 +1342,7 @@
 # Example:
 #
 # -> { "execute": "query-colo-status" }
-# <- { "return": { "mode": "primary", "active": true, "reason": "request" } }
+# <- { "return": { "mode": "primary", "reason": "request" } }
 #
 # Since: 3.1
 ##
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
                   ` (4 preceding siblings ...)
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 5/7] qapi/migration.json: Remove a variable that doesn't exist in example Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-08 17:41   ` Dr. David Alan Gilbert
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover Zhang Chen
  2019-03-05 15:22 ` [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Dr. David Alan Gilbert
  7 siblings, 1 reply; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index dbe2b88807..d1ae2e6d11 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -197,10 +197,16 @@ void colo_do_failover(MigrationState *s)
         vm_stop_force_state(RUN_STATE_COLO);
     }
 
-    if (get_colo_mode() == COLO_MODE_PRIMARY) {
+    switch (get_colo_mode()) {
+    case COLO_MODE_PRIMARY:
         primary_vm_do_failover();
-    } else {
+        break;
+    case COLO_MODE_SECONDARY:
         secondary_vm_do_failover();
+        break;
+    default:
+        error_report("colo_do_failover failed because the colo mode"
+                     " could not be obtained");
     }
 }
 
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
                   ` (5 preceding siblings ...)
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover Zhang Chen
@ 2019-03-03 14:50 ` Zhang Chen
  2019-03-08 17:45   ` Dr. David Alan Gilbert
  2019-03-05 15:22 ` [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Dr. David Alan Gilbert
  7 siblings, 1 reply; 19+ messages in thread
From: Zhang Chen @ 2019-03-03 14:50 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Markus Armbruster, Eric Blake, qemu-dev
  Cc: Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Add the last_colo_mode to save the status after failover.
This patch can solve the issue that user got nothing to call
query_colo_status after failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d1ae2e6d11..6eba8e06f2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -38,6 +38,9 @@
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
 
+/* User need to know colo mode after COLO failover */
+static COLOMode last_colo_mode;
+
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
 bool migration_in_colo_state(void)
@@ -197,7 +200,10 @@ void colo_do_failover(MigrationState *s)
         vm_stop_force_state(RUN_STATE_COLO);
     }
 
-    switch (get_colo_mode()) {
+    /* Update last_COLO_mode to avoid unexpectedly exit COLO status */
+    last_colo_mode = get_colo_mode();
+
+    switch (last_colo_mode) {
     case COLO_MODE_PRIMARY:
         primary_vm_do_failover();
         break;
@@ -263,7 +269,7 @@ COLOStatus *qmp_query_colo_status(Error **errp)
 {
     COLOStatus *s = g_new0(COLOStatus, 1);
 
-    s->mode = get_colo_mode();
+    s->mode = last_colo_mode;
 
     switch (failover_get_state()) {
     case FAILOVER_STATUS_NONE:
@@ -515,6 +521,12 @@ static void colo_process_checkpoint(MigrationState *s)
     Error *local_err = NULL;
     int ret;
 
+    last_colo_mode = get_colo_mode();
+    if (last_colo_mode != COLO_MODE_PRIMARY) {
+        error_report("COLO mode must be COLO_MODE_PRIMARY");
+        return;
+    }
+
     failover_init_state();
 
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
@@ -682,12 +694,18 @@ void *colo_process_incoming_thread(void *opaque)
     Error *local_err = NULL;
     int ret;
 
-    rcu_register_thread();
-    qemu_sem_init(&mis->colo_incoming_sem, 0);
-
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
+    last_colo_mode = get_colo_mode();
+    if (last_colo_mode != COLO_MODE_SECONDARY) {
+        error_report("COLO mode must be COLO_MODE_SECONDARY");
+        return NULL;
+    }
+
+    rcu_register_thread();
+    qemu_sem_init(&mis->colo_incoming_sem, 0);
+
     failover_init_state();
 
     mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
-- 
2.17.GIT

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover
  2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
                   ` (6 preceding siblings ...)
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover Zhang Chen
@ 2019-03-05 15:22 ` Dr. David Alan Gilbert
  2019-03-05 15:33   ` Zhang, Chen
  7 siblings, 1 reply; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2019-03-05 15:22 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> This series focus on COLO failover bug fix and optimization.
> 
> V3:
>  - Fix grammer issues in patch 4/7.
>  - Add more information in commit log of patch 5/7.

I've queued 1,3,5 but not the others yet, I need to look 
to do the reviews on them.

Dave

> V2:
>  - Add patch 4/7 to handle failover state.
>  - Add new patches to optimize failover status.
> 
> V1:
>  - Init patch.
> 
> 
> Zhang Chen (7):
>   Migration/colo.c: Fix double close bug when occur COLO failover
>   Migration/colo.c: Fix COLO failover status error
>   Migration/colo.c: Make COLO node running after failover
>   Migration/colo.c: Add new COLOExitReason to handle all failover state
>   qapi/migration.json: Remove a variable that doesn't exist in example
>   Migration/colo.c: Add the necessary checks for colo_do_failover
>   Migration/colo.c: Make user obtain the COLO mode info after failover
> 
>  migration/colo.c      | 69 ++++++++++++++++++++++++++++++-------------
>  migration/migration.c |  3 ++
>  qapi/migration.json   | 17 ++++++-----
>  3 files changed, 61 insertions(+), 28 deletions(-)
> 
> -- 
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover
  2019-03-05 15:22 ` [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Dr. David Alan Gilbert
@ 2019-03-05 15:33   ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2019-03-05 15:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev


-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Tuesday, March 5, 2019 11:23 PM
To: Zhang, Chen <chen.zhang@intel.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>; Zhang Chen <zhangckid@gmail.com>; Juan Quintela <quintela@redhat.com>; zhanghailiang <zhang.zhanghailiang@huawei.com>; Markus Armbruster <armbru@redhat.com>; Eric Blake <eblake@redhat.com>; qemu-dev <qemu-devel@nongnu.org>
Subject: Re: [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> This series focus on COLO failover bug fix and optimization.
> 
> V3:
>  - Fix grammer issues in patch 4/7.
>  - Add more information in commit log of patch 5/7.

I've queued 1,3,5 but not the others yet, I need to look to do the reviews on them.

Sure~ Thank you for your help~

Thanks
Zhang Chen

Dave

> V2:
>  - Add patch 4/7 to handle failover state.
>  - Add new patches to optimize failover status.
> 
> V1:
>  - Init patch.
> 
> 
> Zhang Chen (7):
>   Migration/colo.c: Fix double close bug when occur COLO failover
>   Migration/colo.c: Fix COLO failover status error
>   Migration/colo.c: Make COLO node running after failover
>   Migration/colo.c: Add new COLOExitReason to handle all failover state
>   qapi/migration.json: Remove a variable that doesn't exist in example
>   Migration/colo.c: Add the necessary checks for colo_do_failover
>   Migration/colo.c: Make user obtain the COLO mode info after failover
> 
>  migration/colo.c      | 69 ++++++++++++++++++++++++++++++-------------
>  migration/migration.c |  3 ++
>  qapi/migration.json   | 17 ++++++-----
>  3 files changed, 61 insertions(+), 28 deletions(-)
> 
> --
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 2/7] Migration/colo.c: Fix COLO failover status error
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 2/7] Migration/colo.c: Fix COLO failover status error Zhang Chen
@ 2019-03-08 17:32   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2019-03-08 17:32 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> When finished COLO failover, the status is FAILOVER_STATUS_COMPLETED.
> The origin codes misunderstand the FAILOVER_STATUS_REQUIRE.
> 
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>

OK, don't fully understand the colo states, but:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index a916dc178c..a13acac192 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -121,6 +121,7 @@ static void secondary_vm_do_failover(void)
>      }
>      /* Notify COLO incoming thread that failover work is finished */
>      qemu_sem_post(&mis->colo_incoming_sem);
> +
>      /* For Secondary VM, jump to incoming co */
>      if (mis->migration_incoming_co) {
>          qemu_coroutine_enter(mis->migration_incoming_co);
> @@ -262,7 +263,7 @@ COLOStatus *qmp_query_colo_status(Error **errp)
>      case FAILOVER_STATUS_NONE:
>          s->reason = COLO_EXIT_REASON_NONE;
>          break;
> -    case FAILOVER_STATUS_REQUIRE:
> +    case FAILOVER_STATUS_COMPLETED:
>          s->reason = COLO_EXIT_REASON_REQUEST;
>          break;
>      default:
> @@ -582,7 +583,7 @@ out:
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>                                    COLO_EXIT_REASON_ERROR);
>          break;
> -    case FAILOVER_STATUS_REQUIRE:
> +    case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
> @@ -854,7 +855,7 @@ out:
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>                                    COLO_EXIT_REASON_ERROR);
>          break;
> -    case FAILOVER_STATUS_REQUIRE:
> +    case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
> -- 
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state Zhang Chen
@ 2019-03-08 17:39   ` Dr. David Alan Gilbert
  2019-03-09 17:06   ` Markus Armbruster
  1 sibling, 0 replies; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2019-03-08 17:39 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> In this patch we add the processing state for COLOExitReason,
> because we have to identify COLO in the failover processing state or
> failover error state. In the way, we can handle all the failover state.
> We have improved the description of the COLOExitReason by the way.
> 
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c    | 24 +++++++++++++-----------
>  qapi/migration.json | 15 +++++++++------
>  2 files changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 89325952c7..dbe2b88807 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -267,7 +267,11 @@ COLOStatus *qmp_query_colo_status(Error **errp)
>          s->reason = COLO_EXIT_REASON_REQUEST;
>          break;
>      default:
> -        s->reason = COLO_EXIT_REASON_ERROR;
> +        if (migration_in_colo_state()) {
> +            s->reason = COLO_EXIT_REASON_PROCESSING;
> +        } else {
> +            s->reason = COLO_EXIT_REASON_ERROR;
> +        }
>      }
>  
>      return s;
> @@ -579,16 +583,13 @@ out:
>       * or the user triggered failover.
>       */
>      switch (failover_get_state()) {
> -    case FAILOVER_STATUS_NONE:
> -        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
> -                                  COLO_EXIT_REASON_ERROR);
> -        break;
>      case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
>      default:
> -        abort();
> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
> +                                  COLO_EXIT_REASON_ERROR);
>      }
>  
>      /* Hope this not to be too long to wait here */
> @@ -850,17 +851,18 @@ out:
>          error_report_err(local_err);
>      }
>  
> +    /*
> +     * There are only two reasons we can get here, some error happened
> +     * or the user triggered failover.
> +     */
>      switch (failover_get_state()) {
> -    case FAILOVER_STATUS_NONE:
> -        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> -                                  COLO_EXIT_REASON_ERROR);
> -        break;
>      case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
>      default:
> -        abort();
> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> +                                  COLO_EXIT_REASON_ERROR);
>      }
>  
>      if (fb) {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 7a795ecc16..089ed67807 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -983,19 +983,22 @@
>  ##
>  # @COLOExitReason:
>  #
> -# The reason for a COLO exit
> +# Describe the reason for COLO exit.
>  #
> -# @none: no failover has ever happened. This can't occur in the
> -# COLO_EXIT event, only in the result of query-colo-status.
> +# @none: failover has never happened. This state does not occur
> +# in the COLO_EXIT event, and is only visible in the result of
> +# query-colo-status.
>  #
> -# @request: COLO exit is due to an external request
> +# @request: COLO exit caused by an external request.
>  #
> -# @error: COLO exit is due to an internal error
> +# @error: COLO exit caused by an internal error.
> +#
> +# @processing: COLO is currently handling a failover (since 4.0).
>  #
>  # Since: 3.1
>  ##
>  { 'enum': 'COLOExitReason',
> -  'data': [ 'none', 'request', 'error' ] }
> +  'data': [ 'none', 'request', 'error' , 'processing' ] }
>  
>  ##
>  # @x-colo-lost-heartbeat:
> -- 
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover Zhang Chen
@ 2019-03-08 17:41   ` Dr. David Alan Gilbert
  2019-03-08 18:36     ` Zhang, Chen
  0 siblings, 1 reply; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2019-03-08 17:41 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>

OK, we should make that properly return an error.
(Actually we should make the failover command be one of the new OOB
commands; so that it can work even if the main loop is blocked- that
would make COLO robust properly against network failures)


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index dbe2b88807..d1ae2e6d11 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -197,10 +197,16 @@ void colo_do_failover(MigrationState *s)
>          vm_stop_force_state(RUN_STATE_COLO);
>      }
>  
> -    if (get_colo_mode() == COLO_MODE_PRIMARY) {
> +    switch (get_colo_mode()) {
> +    case COLO_MODE_PRIMARY:
>          primary_vm_do_failover();
> -    } else {
> +        break;
> +    case COLO_MODE_SECONDARY:
>          secondary_vm_do_failover();
> +        break;
> +    default:
> +        error_report("colo_do_failover failed because the colo mode"
> +                     " could not be obtained");
>      }
>  }
>  
> -- 
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover Zhang Chen
@ 2019-03-08 17:45   ` Dr. David Alan Gilbert
  2019-03-08 18:47     ` Zhang, Chen
  0 siblings, 1 reply; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2019-03-08 17:45 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Add the last_colo_mode to save the status after failover.
> This patch can solve the issue that user got nothing to call
> query_colo_status after failover.

I don't understand what this is doing; cna you show me an example
of the way the state looks before and after the patch?

Dave

> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  migration/colo.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index d1ae2e6d11..6eba8e06f2 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -38,6 +38,9 @@
>  static bool vmstate_loading;
>  static Notifier packets_compare_notifier;
>  
> +/* User need to know colo mode after COLO failover */
> +static COLOMode last_colo_mode;
> +
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>  
>  bool migration_in_colo_state(void)
> @@ -197,7 +200,10 @@ void colo_do_failover(MigrationState *s)
>          vm_stop_force_state(RUN_STATE_COLO);
>      }
>  
> -    switch (get_colo_mode()) {
> +    /* Update last_COLO_mode to avoid unexpectedly exit COLO status */
> +    last_colo_mode = get_colo_mode();
> +
> +    switch (last_colo_mode) {
>      case COLO_MODE_PRIMARY:
>          primary_vm_do_failover();
>          break;
> @@ -263,7 +269,7 @@ COLOStatus *qmp_query_colo_status(Error **errp)
>  {
>      COLOStatus *s = g_new0(COLOStatus, 1);
>  
> -    s->mode = get_colo_mode();
> +    s->mode = last_colo_mode;
>  
>      switch (failover_get_state()) {
>      case FAILOVER_STATUS_NONE:
> @@ -515,6 +521,12 @@ static void colo_process_checkpoint(MigrationState *s)
>      Error *local_err = NULL;
>      int ret;
>  
> +    last_colo_mode = get_colo_mode();
> +    if (last_colo_mode != COLO_MODE_PRIMARY) {
> +        error_report("COLO mode must be COLO_MODE_PRIMARY");
> +        return;
> +    }
> +
>      failover_init_state();
>  
>      s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> @@ -682,12 +694,18 @@ void *colo_process_incoming_thread(void *opaque)
>      Error *local_err = NULL;
>      int ret;
>  
> -    rcu_register_thread();
> -    qemu_sem_init(&mis->colo_incoming_sem, 0);
> -
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_COLO);
>  
> +    last_colo_mode = get_colo_mode();
> +    if (last_colo_mode != COLO_MODE_SECONDARY) {
> +        error_report("COLO mode must be COLO_MODE_SECONDARY");
> +        return NULL;
> +    }
> +
> +    rcu_register_thread();
> +    qemu_sem_init(&mis->colo_incoming_sem, 0);
> +
>      failover_init_state();
>  
>      mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
> -- 
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover
  2019-03-08 17:41   ` Dr. David Alan Gilbert
@ 2019-03-08 18:36     ` Zhang, Chen
  2019-03-08 19:02       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 19+ messages in thread
From: Zhang, Chen @ 2019-03-08 18:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev



-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Saturday, March 9, 2019 1:41 AM
To: Zhang, Chen <chen.zhang@intel.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>; Zhang Chen <zhangckid@gmail.com>; Juan Quintela <quintela@redhat.com>; zhanghailiang <zhang.zhanghailiang@huawei.com>; Markus Armbruster <armbru@redhat.com>; Eric Blake <eblake@redhat.com>; qemu-dev <qemu-devel@nongnu.org>
Subject: Re: [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>

OK, we should make that properly return an error.
(Actually we should make the failover command be one of the new OOB commands; so that it can work even if the main loop is blocked- that would make COLO robust properly against network failures)

Good idea~ which OOB command can be the demo in current Qemu?
Maybe I can do this job in the future.

Thanks
Zhang Chen

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index 
> dbe2b88807..d1ae2e6d11 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -197,10 +197,16 @@ void colo_do_failover(MigrationState *s)
>          vm_stop_force_state(RUN_STATE_COLO);
>      }
>  
> -    if (get_colo_mode() == COLO_MODE_PRIMARY) {
> +    switch (get_colo_mode()) {
> +    case COLO_MODE_PRIMARY:
>          primary_vm_do_failover();
> -    } else {
> +        break;
> +    case COLO_MODE_SECONDARY:
>          secondary_vm_do_failover();
> +        break;
> +    default:
> +        error_report("colo_do_failover failed because the colo mode"
> +                     " could not be obtained");
>      }
>  }
>  
> --
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover
  2019-03-08 17:45   ` Dr. David Alan Gilbert
@ 2019-03-08 18:47     ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2019-03-08 18:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev




-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Saturday, March 9, 2019 1:46 AM
To: Zhang, Chen <chen.zhang@intel.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>; Zhang Chen <zhangckid@gmail.com>; Juan Quintela <quintela@redhat.com>; zhanghailiang <zhang.zhanghailiang@huawei.com>; Markus Armbruster <armbru@redhat.com>; Eric Blake <eblake@redhat.com>; qemu-dev <qemu-devel@nongnu.org>
Subject: Re: [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover

* Zhang Chen (chen.zhang@intel.com) wrote:
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Add the last_colo_mode to save the status after failover.
> This patch can solve the issue that user got nothing to call 
> query_colo_status after failover.

I don't understand what this is doing; cna you show me an example of the way the state looks before and after the patch?

Sure.
In current codes, when we use the " query_colo_status "  after failover, the mode field will always return "none".
Because after failover we can't use  " get_colo_mode()" to get the mode, we need the last_colo_mode args to save the last state.
I think we will use this args in new colo feature development.

Thanks
Zhang Chen

Dave

> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  migration/colo.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index 
> d1ae2e6d11..6eba8e06f2 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -38,6 +38,9 @@
>  static bool vmstate_loading;
>  static Notifier packets_compare_notifier;
>  
> +/* User need to know colo mode after COLO failover */ static COLOMode 
> +last_colo_mode;
> +
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>  
>  bool migration_in_colo_state(void)
> @@ -197,7 +200,10 @@ void colo_do_failover(MigrationState *s)
>          vm_stop_force_state(RUN_STATE_COLO);
>      }
>  
> -    switch (get_colo_mode()) {
> +    /* Update last_COLO_mode to avoid unexpectedly exit COLO status */
> +    last_colo_mode = get_colo_mode();
> +
> +    switch (last_colo_mode) {
>      case COLO_MODE_PRIMARY:
>          primary_vm_do_failover();
>          break;
> @@ -263,7 +269,7 @@ COLOStatus *qmp_query_colo_status(Error **errp)  {
>      COLOStatus *s = g_new0(COLOStatus, 1);
>  
> -    s->mode = get_colo_mode();
> +    s->mode = last_colo_mode;
>  
>      switch (failover_get_state()) {
>      case FAILOVER_STATUS_NONE:
> @@ -515,6 +521,12 @@ static void colo_process_checkpoint(MigrationState *s)
>      Error *local_err = NULL;
>      int ret;
>  
> +    last_colo_mode = get_colo_mode();
> +    if (last_colo_mode != COLO_MODE_PRIMARY) {
> +        error_report("COLO mode must be COLO_MODE_PRIMARY");
> +        return;
> +    }
> +
>      failover_init_state();
>  
>      s->rp_state.from_dst_file = 
> qemu_file_get_return_path(s->to_dst_file);
> @@ -682,12 +694,18 @@ void *colo_process_incoming_thread(void *opaque)
>      Error *local_err = NULL;
>      int ret;
>  
> -    rcu_register_thread();
> -    qemu_sem_init(&mis->colo_incoming_sem, 0);
> -
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_COLO);
>  
> +    last_colo_mode = get_colo_mode();
> +    if (last_colo_mode != COLO_MODE_SECONDARY) {
> +        error_report("COLO mode must be COLO_MODE_SECONDARY");
> +        return NULL;
> +    }
> +
> +    rcu_register_thread();
> +    qemu_sem_init(&mis->colo_incoming_sem, 0);
> +
>      failover_init_state();
>  
>      mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
> --
> 2.17.GIT
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover
  2019-03-08 18:36     ` Zhang, Chen
@ 2019-03-08 19:02       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2019-03-08 19:02 UTC (permalink / raw)
  To: Zhang, Chen
  Cc: Li Zhijian, Zhang Chen, Juan Quintela, zhanghailiang,
	Markus Armbruster, Eric Blake, qemu-dev

* Zhang, Chen (chen.zhang@intel.com) wrote:
> 
> 
> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
> Sent: Saturday, March 9, 2019 1:41 AM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Li Zhijian <lizhijian@cn.fujitsu.com>; Zhang Chen <zhangckid@gmail.com>; Juan Quintela <quintela@redhat.com>; zhanghailiang <zhang.zhanghailiang@huawei.com>; Markus Armbruster <armbru@redhat.com>; Eric Blake <eblake@redhat.com>; qemu-dev <qemu-devel@nongnu.org>
> Subject: Re: [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover
> 
> * Zhang Chen (chen.zhang@intel.com) wrote:
> > From: Zhang Chen <chen.zhang@intel.com>
> > 
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> 
>>  OK, we should make that properly return an error.
>>  (Actually we should make the failover command be one of the new OOB commands; so that it can work even if the main loop is blocked- that would make COLO robust properly against network failures)
> 
> Good idea~ which OOB command can be the demo in current Qemu?
>  Maybe I can do this job in the future.

The 'migration-pause' command I think is the only real one so far.
Remember the important thing is the code that runs as part of it must
not take any lock that could be blocked by the main loop or any
part of the replication that could block based on the other host
having died.

Dave

> 
> Thanks
> Zhang Chen
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> > ---
> >  migration/colo.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/migration/colo.c b/migration/colo.c index 
> > dbe2b88807..d1ae2e6d11 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -197,10 +197,16 @@ void colo_do_failover(MigrationState *s)
> >          vm_stop_force_state(RUN_STATE_COLO);
> >      }
> >  
> > -    if (get_colo_mode() == COLO_MODE_PRIMARY) {
> > +    switch (get_colo_mode()) {
> > +    case COLO_MODE_PRIMARY:
> >          primary_vm_do_failover();
> > -    } else {
> > +        break;
> > +    case COLO_MODE_SECONDARY:
> >          secondary_vm_do_failover();
> > +        break;
> > +    default:
> > +        error_report("colo_do_failover failed because the colo mode"
> > +                     " could not be obtained");
> >      }
> >  }
> >  
> > --
> > 2.17.GIT
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state
  2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state Zhang Chen
  2019-03-08 17:39   ` Dr. David Alan Gilbert
@ 2019-03-09 17:06   ` Markus Armbruster
  2019-03-11  9:08     ` Zhang, Chen
  1 sibling, 1 reply; 19+ messages in thread
From: Markus Armbruster @ 2019-03-09 17:06 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Eric Blake, qemu-dev

Zhang Chen <chen.zhang@intel.com > writes:

> From: Zhang Chen <chen.zhang@intel.com>
>
> In this patch we add the processing state for COLOExitReason,
> because we have to identify COLO in the failover processing state or
> failover error state. In the way, we can handle all the failover state.
> We have improved the description of the COLOExitReason by the way.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  migration/colo.c    | 24 +++++++++++++-----------
>  qapi/migration.json | 15 +++++++++------
>  2 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 89325952c7..dbe2b88807 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -267,7 +267,11 @@ COLOStatus *qmp_query_colo_status(Error **errp)
       switch (failover_get_state()) {

failover_get_state() returns FailoverStatus, i.e. one of
FAILOVER_STATUS_NONE, _REQUIRE, _ACTIVE, _COMPLETED, _RELAUNCH.

       case FAILOVER_STATUS_NONE:
           s->reason = COLO_EXIT_REASON_NONE;
           break;
       case FAILOVER_STATUS_REQUIRE:
>          s->reason = COLO_EXIT_REASON_REQUEST;
>          break;
>      default:
> -        s->reason = COLO_EXIT_REASON_ERROR;
> +        if (migration_in_colo_state()) {
> +            s->reason = COLO_EXIT_REASON_PROCESSING;
> +        } else {
> +            s->reason = COLO_EXIT_REASON_ERROR;
> +        }
>      }
>  
>      return s;

In which FailoverStatus can migration_in_colo_state() return true?

> @@ -579,16 +583,13 @@ out:
       /*
        * There are only two reasons we can get here, some error happened
>       * or the user triggered failover.
>       */
>      switch (failover_get_state()) {
> -    case FAILOVER_STATUS_NONE:
> -        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
> -                                  COLO_EXIT_REASON_ERROR);
> -        break;
>      case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
>      default:
> -        abort();
> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
> +                                  COLO_EXIT_REASON_ERROR);
>      }
>  
>      /* Hope this not to be too long to wait here */

Change of behavior for FAILOVER_STATUS_REQUIRE, _ACTIVE, _COMPLETED,
_RELAUNCH: send a COLO_EXIT event instead of calling abort().  Why?

> @@ -850,17 +851,18 @@ out:
>          error_report_err(local_err);
>      }
>  
> +    /*
> +     * There are only two reasons we can get here, some error happened
> +     * or the user triggered failover.
> +     */
>      switch (failover_get_state()) {
> -    case FAILOVER_STATUS_NONE:
> -        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> -                                  COLO_EXIT_REASON_ERROR);
> -        break;
>      case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
>      default:
> -        abort();
> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> +                                  COLO_EXIT_REASON_ERROR);
>      }
>  
>      if (fb) {

Same question.

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 7a795ecc16..089ed67807 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -983,19 +983,22 @@
>  ##
>  # @COLOExitReason:
>  #
> -# The reason for a COLO exit
> +# Describe the reason for COLO exit.

The old text was better.

>  #
> -# @none: no failover has ever happened. This can't occur in the
> -# COLO_EXIT event, only in the result of query-colo-status.
> +# @none: failover has never happened. This state does not occur
> +# in the COLO_EXIT event, and is only visible in the result of
> +# query-colo-status.

This might be a small improvment.

>  #
> -# @request: COLO exit is due to an external request
> +# @request: COLO exit caused by an external request.
>  #
> -# @error: COLO exit is due to an internal error
> +# @error: COLO exit caused by an internal error.

I like the old text better.

> +#
> +# @processing: COLO is currently handling a failover (since 4.0).
>  #
>  # Since: 3.1
>  ##
>  { 'enum': 'COLOExitReason',
> -  'data': [ 'none', 'request', 'error' ] }
> +  'data': [ 'none', 'request', 'error' , 'processing' ] }
>  
>  ##
>  # @x-colo-lost-heartbeat:

The patch conflates addition of a new member with doc improvements.
Tolerable since both are small.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state
  2019-03-09 17:06   ` Markus Armbruster
@ 2019-03-11  9:08     ` Zhang, Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Zhang, Chen @ 2019-03-11  9:08 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Li Zhijian, Zhang Chen, Dr. David Alan Gilbert, Juan Quintela,
	zhanghailiang, Eric Blake, qemu-dev


-----Original Message-----
From: Markus Armbruster [mailto:armbru@redhat.com] 
Sent: Sunday, March 10, 2019 1:06 AM
To: Zhang, Chen <chen.zhang@intel.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>; Zhang Chen <zhangckid@gmail.com>; Dr. David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; zhanghailiang <zhang.zhanghailiang@huawei.com>; Eric Blake <eblake@redhat.com>; qemu-dev <qemu-devel@nongnu.org>; Zhang, Chen <chen.zhang@intel.com>
Subject: Re: [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state

Zhang Chen <chen.zhang@intel.com > writes:

> From: Zhang Chen <chen.zhang@intel.com>
>
> In this patch we add the processing state for COLOExitReason, because 
> we have to identify COLO in the failover processing state or failover 
> error state. In the way, we can handle all the failover state.
> We have improved the description of the COLOExitReason by the way.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  migration/colo.c    | 24 +++++++++++++-----------
>  qapi/migration.json | 15 +++++++++------
>  2 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c index 
> 89325952c7..dbe2b88807 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -267,7 +267,11 @@ COLOStatus *qmp_query_colo_status(Error **errp)
       switch (failover_get_state()) {

failover_get_state() returns FailoverStatus, i.e. one of FAILOVER_STATUS_NONE, _REQUIRE, _ACTIVE, _COMPLETED, _RELAUNCH.

Yes, have any concern here?

       case FAILOVER_STATUS_NONE:
           s->reason = COLO_EXIT_REASON_NONE;
           break;
       case FAILOVER_STATUS_REQUIRE:
>          s->reason = COLO_EXIT_REASON_REQUEST;
>          break;
>      default:
> -        s->reason = COLO_EXIT_REASON_ERROR;
> +        if (migration_in_colo_state()) {
> +            s->reason = COLO_EXIT_REASON_PROCESSING;
> +        } else {
> +            s->reason = COLO_EXIT_REASON_ERROR;
> +        }
>      }
>  
>      return s;

In which FailoverStatus can migration_in_colo_state() return true?

It is no close connection about the FailoverStatus,  except the "FAILOVER_STATUS_REQUIRE", 
other status still have the possibility be triggered here , if migration_in_colo_state() return true means
we still in COLO failover process, otherwise means we have exit COLO with some failover error.

> @@ -579,16 +583,13 @@ out:
       /*
        * There are only two reasons we can get here, some error happened
>       * or the user triggered failover.
>       */
>      switch (failover_get_state()) {
> -    case FAILOVER_STATUS_NONE:
> -        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
> -                                  COLO_EXIT_REASON_ERROR);
> -        break;
>      case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
>      default:
> -        abort();
> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
> +                                  COLO_EXIT_REASON_ERROR);
>      }
>  
>      /* Hope this not to be too long to wait here */

Change of behavior for FAILOVER_STATUS_REQUIRE, _ACTIVE, _COMPLETED,
_RELAUNCH: send a COLO_EXIT event instead of calling abort().  Why?

Because if COLO occur failover error, we want to make the VM back to normal running status rather than just crash VM.


> @@ -850,17 +851,18 @@ out:
>          error_report_err(local_err);
>      }
>  
> +    /*
> +     * There are only two reasons we can get here, some error happened
> +     * or the user triggered failover.
> +     */
>      switch (failover_get_state()) {
> -    case FAILOVER_STATUS_NONE:
> -        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> -                                  COLO_EXIT_REASON_ERROR);
> -        break;
>      case FAILOVER_STATUS_COMPLETED:
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>                                    COLO_EXIT_REASON_REQUEST);
>          break;
>      default:
> -        abort();
> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> +                                  COLO_EXIT_REASON_ERROR);
>      }
>  
>      if (fb) {

Same question.

> diff --git a/qapi/migration.json b/qapi/migration.json index 
> 7a795ecc16..089ed67807 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -983,19 +983,22 @@
>  ##
>  # @COLOExitReason:
>  #
> -# The reason for a COLO exit
> +# Describe the reason for COLO exit.

The old text was better.

OK, I will remove it.

>  #
> -# @none: no failover has ever happened. This can't occur in the -# 
> COLO_EXIT event, only in the result of query-colo-status.
> +# @none: failover has never happened. This state does not occur # in 
> +the COLO_EXIT event, and is only visible in the result of # 
> +query-colo-status.

This might be a small improvment.

>  #
> -# @request: COLO exit is due to an external request
> +# @request: COLO exit caused by an external request.
>  #
> -# @error: COLO exit is due to an internal error
> +# @error: COLO exit caused by an internal error.

I like the old text better.

Sorry, I'm not a native speaker.
I will remove it in next version.

> +#
> +# @processing: COLO is currently handling a failover (since 4.0).
>  #
>  # Since: 3.1
>  ##
>  { 'enum': 'COLOExitReason',
> -  'data': [ 'none', 'request', 'error' ] }
> +  'data': [ 'none', 'request', 'error' , 'processing' ] }
>  
>  ##
>  # @x-colo-lost-heartbeat:

The patch conflates addition of a new member with doc improvements.
Tolerable since both are small.

Yes, Just tiny fix...

Thanks
Zhang Chen

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-03-11  9:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-03 14:50 [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Zhang Chen
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 1/7] Migration/colo.c: Fix double close bug when occur COLO failover Zhang Chen
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 2/7] Migration/colo.c: Fix COLO failover status error Zhang Chen
2019-03-08 17:32   ` Dr. David Alan Gilbert
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 3/7] Migration/colo.c: Make COLO node running after failover Zhang Chen
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 4/7] Migration/colo.c: Add new COLOExitReason to handle all failover state Zhang Chen
2019-03-08 17:39   ` Dr. David Alan Gilbert
2019-03-09 17:06   ` Markus Armbruster
2019-03-11  9:08     ` Zhang, Chen
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 5/7] qapi/migration.json: Remove a variable that doesn't exist in example Zhang Chen
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 6/7] Migration/colo.c: Add the necessary checks for colo_do_failover Zhang Chen
2019-03-08 17:41   ` Dr. David Alan Gilbert
2019-03-08 18:36     ` Zhang, Chen
2019-03-08 19:02       ` Dr. David Alan Gilbert
2019-03-03 14:50 ` [Qemu-devel] [PATCH V3 7/7] Migration/colo.c: Make user obtain the COLO mode info after failover Zhang Chen
2019-03-08 17:45   ` Dr. David Alan Gilbert
2019-03-08 18:47     ` Zhang, Chen
2019-03-05 15:22 ` [Qemu-devel] [PATCH V3 0/7] Migration/colo: Fix upstream bugs when occur failover Dr. David Alan Gilbert
2019-03-05 15:33   ` Zhang, Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.