qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] migration/colo: Optimize COLO start code path
@ 2021-11-10 17:41 Zhang Chen
  2021-11-10 17:41 ` [PATCH 2/2] migration/colo: More accurate update checkpoint time Zhang Chen
  2021-11-16 16:27 ` [PATCH 1/2] migration/colo: Optimize COLO start code path Juan Quintela
  0 siblings, 2 replies; 6+ messages in thread
From: Zhang Chen @ 2021-11-10 17:41 UTC (permalink / raw)
  To: Hailiang Zhang, Juan Quintela, Dr . David Alan Gilbert
  Cc: Zhang Chen, qemu-dev

There is no need to start COLO through MIGRATION_STATUS_ACTIVE.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c      |  2 --
 migration/migration.c | 18 +++++++++++-------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 2415325262..ad1a4426b3 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -667,8 +667,6 @@ void migrate_start_colo_process(MigrationState *s)
                                 colo_checkpoint_notify, s);
 
     qemu_sem_init(&s->colo_exit_sem, 0);
-    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
-                      MIGRATION_STATUS_COLO);
     colo_process_checkpoint(s);
     qemu_mutex_lock_iothread();
 }
diff --git a/migration/migration.c b/migration/migration.c
index abaf6f9e3d..4c8662a839 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3222,7 +3222,10 @@ static void migration_completion(MigrationState *s)
         goto fail_invalidate;
     }
 
-    if (!migrate_colo_enabled()) {
+    if (migrate_colo_enabled()) {
+        migrate_set_state(&s->state, current_active_state,
+                          MIGRATION_STATUS_COLO);
+    } else {
         migrate_set_state(&s->state, current_active_state,
                           MIGRATION_STATUS_COMPLETED);
     }
@@ -3607,12 +3610,7 @@ static void migration_iteration_finish(MigrationState *s)
         migration_calculate_complete(s);
         runstate_set(RUN_STATE_POSTMIGRATE);
         break;
-
-    case MIGRATION_STATUS_ACTIVE:
-        /*
-         * We should really assert here, but since it's during
-         * migration, let's try to reduce the usage of assertions.
-         */
+    case MIGRATION_STATUS_COLO:
         if (!migrate_colo_enabled()) {
             error_report("%s: critical error: calling COLO code without "
                          "COLO enabled", __func__);
@@ -3622,6 +3620,12 @@ static void migration_iteration_finish(MigrationState *s)
          * Fixme: we will run VM in COLO no matter its old running state.
          * After exited COLO, we will keep running.
          */
+         /* Fallthrough */
+    case MIGRATION_STATUS_ACTIVE:
+        /*
+         * We should really assert here, but since it's during
+         * migration, let's try to reduce the usage of assertions.
+         */
         s->vm_was_running = true;
         /* Fallthrough */
     case MIGRATION_STATUS_FAILED:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] migration/colo: More accurate update checkpoint time
  2021-11-10 17:41 [PATCH 1/2] migration/colo: Optimize COLO start code path Zhang Chen
@ 2021-11-10 17:41 ` Zhang Chen
  2021-11-16 16:30   ` Juan Quintela
  2021-11-16 16:27 ` [PATCH 1/2] migration/colo: Optimize COLO start code path Juan Quintela
  1 sibling, 1 reply; 6+ messages in thread
From: Zhang Chen @ 2021-11-10 17:41 UTC (permalink / raw)
  To: Hailiang Zhang, Juan Quintela, Dr . David Alan Gilbert
  Cc: Zhang Chen, qemu-dev

Previous operation(like vm_start and replication_start_all) will consume
extra time before update the timer, so reduce time in this patch.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index ad1a4426b3..e3c8cecc24 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -530,7 +530,6 @@ static void colo_process_checkpoint(MigrationState *s)
 {
     QIOChannelBuffer *bioc;
     QEMUFile *fb = NULL;
-    int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     Error *local_err = NULL;
     int ret;
 
@@ -578,8 +577,8 @@ static void colo_process_checkpoint(MigrationState *s)
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
-    timer_mod(s->colo_delay_timer,
-            current_time + s->parameters.x_checkpoint_delay);
+    timer_mod(s->colo_delay_timer, qemu_clock_get_ms(QEMU_CLOCK_HOST) +
+              s->parameters.x_checkpoint_delay);
 
     while (s->state == MIGRATION_STATUS_COLO) {
         if (failover_get_state() != FAILOVER_STATUS_NONE) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] migration/colo: Optimize COLO start code path
  2021-11-10 17:41 [PATCH 1/2] migration/colo: Optimize COLO start code path Zhang Chen
  2021-11-10 17:41 ` [PATCH 2/2] migration/colo: More accurate update checkpoint time Zhang Chen
@ 2021-11-16 16:27 ` Juan Quintela
  2021-11-17  3:21   ` Zhang, Chen
  1 sibling, 1 reply; 6+ messages in thread
From: Juan Quintela @ 2021-11-16 16:27 UTC (permalink / raw)
  To: Zhang Chen; +Cc: qemu-dev, Hailiang Zhang, Dr . David Alan Gilbert

Zhang Chen <chen.zhang@intel.com> wrote:
> There is no need to start COLO through MIGRATION_STATUS_ACTIVE.

Hi

I don't understand what you are trying to do.  In my reading, at least
the commit message is wrong:

void migrate_start_colo_process(MigrationState *s)
{
    ...
    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                      MIGRATION_STATUS_COLO);
    ...
}

and

void *colo_process_incoming_thread(void *opaque)
{
    ...
    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                      MIGRATION_STATUS_COLO);

So colo starts with MIGRATION_STATUS_ACTIVE.


> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  migration/colo.c      |  2 --
>  migration/migration.c | 18 +++++++++++-------
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 2415325262..ad1a4426b3 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -667,8 +667,6 @@ void migrate_start_colo_process(MigrationState *s)
>                                  colo_checkpoint_notify, s);
>  
>      qemu_sem_init(&s->colo_exit_sem, 0);
> -    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> -                      MIGRATION_STATUS_COLO);
>      colo_process_checkpoint(s);
>      qemu_mutex_lock_iothread();
>  }
> diff --git a/migration/migration.c b/migration/migration.c
> index abaf6f9e3d..4c8662a839 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3222,7 +3222,10 @@ static void migration_completion(MigrationState *s)
>          goto fail_invalidate;
>      }
>  
> -    if (!migrate_colo_enabled()) {
> +    if (migrate_colo_enabled()) {
> +        migrate_set_state(&s->state, current_active_state,
> +                          MIGRATION_STATUS_COLO);
> +    } else {
>          migrate_set_state(&s->state, current_active_state,
>                            MIGRATION_STATUS_COMPLETED);
>      }

This moves the setup to MIGRATION_STATUS_COLO to completion time instead
of the beggining of the process.  I have no clue why.  I guess you can
put a comment/commit message to say what you ar.e trynig to do.

> @@ -3607,12 +3610,7 @@ static void migration_iteration_finish(MigrationState *s)
>          migration_calculate_complete(s);
>          runstate_set(RUN_STATE_POSTMIGRATE);
>          break;
> -
> -    case MIGRATION_STATUS_ACTIVE:
> -        /*
> -         * We should really assert here, but since it's during
> -         * migration, let's try to reduce the usage of assertions.
> -         */
> +    case MIGRATION_STATUS_COLO:
>          if (!migrate_colo_enabled()) {
>              error_report("%s: critical error: calling COLO code without "
>                           "COLO enabled", __func__);
> @@ -3622,6 +3620,12 @@ static void migration_iteration_finish(MigrationState *s)
>           * Fixme: we will run VM in COLO no matter its old running state.
>           * After exited COLO, we will keep running.
>           */
> +         /* Fallthrough */
> +    case MIGRATION_STATUS_ACTIVE:
> +        /*
> +         * We should really assert here, but since it's during
> +         * migration, let's try to reduce the usage of assertions.
> +         */
>          s->vm_was_running = true;
>          /* Fallthrough */
>      case MIGRATION_STATUS_FAILED:

I guess this change is related to the previous one, but I don't
understand colo enough to review it.

Later, Juan.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] migration/colo: More accurate update checkpoint time
  2021-11-10 17:41 ` [PATCH 2/2] migration/colo: More accurate update checkpoint time Zhang Chen
@ 2021-11-16 16:30   ` Juan Quintela
  0 siblings, 0 replies; 6+ messages in thread
From: Juan Quintela @ 2021-11-16 16:30 UTC (permalink / raw)
  To: Zhang Chen; +Cc: qemu-dev, Hailiang Zhang, Dr . David Alan Gilbert

Zhang Chen <chen.zhang@intel.com> wrote:
> Previous operation(like vm_start and replication_start_all) will consume
> extra time before update the timer, so reduce time in this patch.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Queued for 7.0.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH 1/2] migration/colo: Optimize COLO start code path
  2021-11-16 16:27 ` [PATCH 1/2] migration/colo: Optimize COLO start code path Juan Quintela
@ 2021-11-17  3:21   ` Zhang, Chen
  2021-11-17  8:17     ` Juan Quintela
  0 siblings, 1 reply; 6+ messages in thread
From: Zhang, Chen @ 2021-11-17  3:21 UTC (permalink / raw)
  To: quintela; +Cc: qemu-dev, Hailiang Zhang, Dr . David Alan Gilbert



> -----Original Message-----
> From: Juan Quintela <quintela@redhat.com>
> Sent: Wednesday, November 17, 2021 12:28 AM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>; Dr . David Alan
> Gilbert <dgilbert@redhat.com>; qemu-dev <qemu-devel@nongnu.org>
> Subject: Re: [PATCH 1/2] migration/colo: Optimize COLO start code path
> 
> Zhang Chen <chen.zhang@intel.com> wrote:
> > There is no need to start COLO through MIGRATION_STATUS_ACTIVE.
> 
> Hi
> 
> I don't understand what you are trying to do.  In my reading, at least the
> commit message is wrong:
> 
> void migrate_start_colo_process(MigrationState *s) {
>     ...
>     migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                       MIGRATION_STATUS_COLO);
>     ...
> }
> 
> and
> 
> void *colo_process_incoming_thread(void *opaque) {
>     ...
>     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                       MIGRATION_STATUS_COLO);
> 
> So colo starts with MIGRATION_STATUS_ACTIVE.

Yes, this patch just optimized COLO primary code path(migrate_start_colo_process()).
We can see this patch removed the 
 migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                      MIGRATION_STATUS_COLO);
In the migrate_start_colo_process().

Current COLO status path:
 MIGRATION_STATUS_XXX   --->   MIGRATION_STATUS_ACTIVE ---> MIGRATION_STATUS_COLO ---> MIGRATION_STATUS_COMPLETED

This patch try to remove redundant " MIGRATION_STATUS_ACTIVE " in COLO start. 
MIGRATION_STATUS_XXX   ---> MIGRATION_STATUS_COLO ---> MIGRATION_STATUS_COMPLETED

Actually COLO primary code did nothing when running on "MIGRATION_STATUS_ACTIVE".
But for COLO secondary (void *colo_process_incoming_thread()), it shared some code with normal migration. No need to do this.

So, I will fix commit message to:
Optimize COLO primary start path to:
MIGRATION_STATUS_XXX   ---> MIGRATION_STATUS_COLO ---> MIGRATION_STATUS_COMPLETED
No need to start primary COLO through "MIGRATION_STATUS_ACTIVE".

How about it?

> 
> 
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > ---
> >  migration/colo.c      |  2 --
> >  migration/migration.c | 18 +++++++++++-------
> >  2 files changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c index
> > 2415325262..ad1a4426b3 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -667,8 +667,6 @@ void migrate_start_colo_process(MigrationState *s)
> >                                  colo_checkpoint_notify, s);
> >
> >      qemu_sem_init(&s->colo_exit_sem, 0);
> > -    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> > -                      MIGRATION_STATUS_COLO);
> >      colo_process_checkpoint(s);
> >      qemu_mutex_lock_iothread();
> >  }
> > diff --git a/migration/migration.c b/migration/migration.c index
> > abaf6f9e3d..4c8662a839 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -3222,7 +3222,10 @@ static void migration_completion(MigrationState
> *s)
> >          goto fail_invalidate;
> >      }
> >
> > -    if (!migrate_colo_enabled()) {
> > +    if (migrate_colo_enabled()) {
> > +        migrate_set_state(&s->state, current_active_state,
> > +                          MIGRATION_STATUS_COLO);
> > +    } else {
> >          migrate_set_state(&s->state, current_active_state,
> >                            MIGRATION_STATUS_COMPLETED);
> >      }
> 
> This moves the setup to MIGRATION_STATUS_COLO to completion time
> instead of the beggining of the process.  I have no clue why.  I guess you can
> put a comment/commit message to say what you ar.e trynig to do.

You are right, no need to setup here.
I will remove this in next version.

> 
> > @@ -3607,12 +3610,7 @@ static void
> migration_iteration_finish(MigrationState *s)
> >          migration_calculate_complete(s);
> >          runstate_set(RUN_STATE_POSTMIGRATE);
> >          break;
> > -
> > -    case MIGRATION_STATUS_ACTIVE:
> > -        /*
> > -         * We should really assert here, but since it's during
> > -         * migration, let's try to reduce the usage of assertions.
> > -         */
> > +    case MIGRATION_STATUS_COLO:
> >          if (!migrate_colo_enabled()) {
> >              error_report("%s: critical error: calling COLO code without "
> >                           "COLO enabled", __func__); @@ -3622,6
> > +3620,12 @@ static void migration_iteration_finish(MigrationState *s)
> >           * Fixme: we will run VM in COLO no matter its old running state.
> >           * After exited COLO, we will keep running.
> >           */
> > +         /* Fallthrough */
> > +    case MIGRATION_STATUS_ACTIVE:
> > +        /*
> > +         * We should really assert here, but since it's during
> > +         * migration, let's try to reduce the usage of assertions.
> > +         */
> >          s->vm_was_running = true;
> >          /* Fallthrough */
> >      case MIGRATION_STATUS_FAILED:
> 
> I guess this change is related to the previous one, but I don't understand colo
> enough to review it.

I think this patch is the general code, little background needed.
You can simple understand COLO is two VMs(primary node and secondary node) entered a state of cyclic migration.
Thanks your comments.

Thanks
Chen
 

> 
> Later, Juan.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] migration/colo: Optimize COLO start code path
  2021-11-17  3:21   ` Zhang, Chen
@ 2021-11-17  8:17     ` Juan Quintela
  0 siblings, 0 replies; 6+ messages in thread
From: Juan Quintela @ 2021-11-17  8:17 UTC (permalink / raw)
  To: Zhang, Chen; +Cc: qemu-dev, Hailiang Zhang, Dr . David Alan Gilbert

"Zhang, Chen" <chen.zhang@intel.com> wrote:
>> -----Original Message-----
>> From: Juan Quintela <quintela@redhat.com>
>> Sent: Wednesday, November 17, 2021 12:28 AM
>> To: Zhang, Chen <chen.zhang@intel.com>
>> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>; Dr . David Alan
>> Gilbert <dgilbert@redhat.com>; qemu-dev <qemu-devel@nongnu.org>
>> Subject: Re: [PATCH 1/2] migration/colo: Optimize COLO start code path
>> 
>> Zhang Chen <chen.zhang@intel.com> wrote:
>> > There is no need to start COLO through MIGRATION_STATUS_ACTIVE.
>> 
>> Hi
>> 
>> I don't understand what you are trying to do.  In my reading, at least the
>> commit message is wrong:
>> 
>> void migrate_start_colo_process(MigrationState *s) {
>>     ...
>>     migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>>                       MIGRATION_STATUS_COLO);
>>     ...
>> }
>> 
>> and
>> 
>> void *colo_process_incoming_thread(void *opaque) {
>>     ...
>>     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>                       MIGRATION_STATUS_COLO);
>> 
>> So colo starts with MIGRATION_STATUS_ACTIVE.
>
> Yes, this patch just optimized COLO primary code path(migrate_start_colo_process()).
> We can see this patch removed the 
>  migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                       MIGRATION_STATUS_COLO);
> In the migrate_start_colo_process().
>
> Current COLO status path:
>  MIGRATION_STATUS_XXX   --->   MIGRATION_STATUS_ACTIVE ---> MIGRATION_STATUS_COLO ---> MIGRATION_STATUS_COMPLETED
>
> This patch try to remove redundant " MIGRATION_STATUS_ACTIVE " in COLO start. 
> MIGRATION_STATUS_XXX   ---> MIGRATION_STATUS_COLO ---> MIGRATION_STATUS_COMPLETED
>
> Actually COLO primary code did nothing when running on "MIGRATION_STATUS_ACTIVE".
> But for COLO secondary (void *colo_process_incoming_thread()), it shared some code with normal migration. No need to do this.
>
> So, I will fix commit message to:
> Optimize COLO primary start path to:
> MIGRATION_STATUS_XXX   ---> MIGRATION_STATUS_COLO ---> MIGRATION_STATUS_COMPLETED
> No need to start primary COLO through "MIGRATION_STATUS_ACTIVE".
>
> How about it?

Much better, thank.s

>> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>> > ---
>> >  migration/colo.c      |  2 --
>> >  migration/migration.c | 18 +++++++++++-------
>> >  2 files changed, 11 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/migration/colo.c b/migration/colo.c index
>> > 2415325262..ad1a4426b3 100644
>> > --- a/migration/colo.c
>> > +++ b/migration/colo.c
>> > @@ -667,8 +667,6 @@ void migrate_start_colo_process(MigrationState *s)
>> >                                  colo_checkpoint_notify, s);
>> >
>> >      qemu_sem_init(&s->colo_exit_sem, 0);
>> > -    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>> > -                      MIGRATION_STATUS_COLO);
>> >      colo_process_checkpoint(s);
>> >      qemu_mutex_lock_iothread();
>> >  }
>> > diff --git a/migration/migration.c b/migration/migration.c index
>> > abaf6f9e3d..4c8662a839 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -3222,7 +3222,10 @@ static void migration_completion(MigrationState
>> *s)
>> >          goto fail_invalidate;
>> >      }
>> >
>> > -    if (!migrate_colo_enabled()) {
>> > +    if (migrate_colo_enabled()) {
>> > +        migrate_set_state(&s->state, current_active_state,
>> > +                          MIGRATION_STATUS_COLO);
>> > +    } else {
>> >          migrate_set_state(&s->state, current_active_state,
>> >                            MIGRATION_STATUS_COMPLETED);
>> >      }
>> 
>> This moves the setup to MIGRATION_STATUS_COLO to completion time
>> instead of the beggining of the process.  I have no clue why.  I guess you can
>> put a comment/commit message to say what you ar.e trynig to do.
>
> You are right, no need to setup here.
> I will remove this in next version.

Thanks.

>> > @@ -3607,12 +3610,7 @@ static void
>> migration_iteration_finish(MigrationState *s)
>> >          migration_calculate_complete(s);
>> >          runstate_set(RUN_STATE_POSTMIGRATE);
>> >          break;
>> > -
>> > -    case MIGRATION_STATUS_ACTIVE:
>> > -        /*
>> > -         * We should really assert here, but since it's during
>> > -         * migration, let's try to reduce the usage of assertions.
>> > -         */
>> > +    case MIGRATION_STATUS_COLO:
>> >          if (!migrate_colo_enabled()) {
>> >              error_report("%s: critical error: calling COLO code without "
>> >                           "COLO enabled", __func__); @@ -3622,6
>> > +3620,12 @@ static void migration_iteration_finish(MigrationState *s)
>> >           * Fixme: we will run VM in COLO no matter its old running state.
>> >           * After exited COLO, we will keep running.
>> >           */
>> > +         /* Fallthrough */
>> > +    case MIGRATION_STATUS_ACTIVE:
>> > +        /*
>> > +         * We should really assert here, but since it's during
>> > +         * migration, let's try to reduce the usage of assertions.
>> > +         */
>> >          s->vm_was_running = true;
>> >          /* Fallthrough */
>> >      case MIGRATION_STATUS_FAILED:
>> 
>> I guess this change is related to the previous one, but I don't understand colo
>> enough to review it.
>
> I think this patch is the general code, little background needed.
> You can simple understand COLO is two VMs(primary node and secondary node) entered a state of cyclic migration.
> Thanks your comments.

Later, Juan.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-17  8:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 17:41 [PATCH 1/2] migration/colo: Optimize COLO start code path Zhang Chen
2021-11-10 17:41 ` [PATCH 2/2] migration/colo: More accurate update checkpoint time Zhang Chen
2021-11-16 16:30   ` Juan Quintela
2021-11-16 16:27 ` [PATCH 1/2] migration/colo: Optimize COLO start code path Juan Quintela
2021-11-17  3:21   ` Zhang, Chen
2021-11-17  8:17     ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).