qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] migration/postcopy: cleanup related to postcopy
@ 2019-10-01 10:01 Wei Yang
  2019-10-01 10:01 ` [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup Wei Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Wei Yang @ 2019-10-01 10:01 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, Wei Yang

The first one just tries to make function name more easy for reading and
understanding.

The next two patches are related to PostcopyState.

Wei Yang (3):
  migration/postcopy: rename postcopy_ram_enable_notify to
    postcopy_ram_incoming_setup
  migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup
    when state is ADVISE
  migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case
    properly

 migration/migration.c    |  3 +--
 migration/postcopy-ram.c | 17 +++++++++++------
 migration/postcopy-ram.h |  5 +++--
 migration/savevm.c       | 13 +++++++------
 4 files changed, 22 insertions(+), 16 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup
  2019-10-01 10:01 [PATCH 0/3] migration/postcopy: cleanup related to postcopy Wei Yang
@ 2019-10-01 10:01 ` Wei Yang
  2019-10-08 14:17   ` Dr. David Alan Gilbert
  2019-10-01 10:01 ` [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE Wei Yang
  2019-10-01 10:01 ` [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly Wei Yang
  2 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-01 10:01 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, Wei Yang

Function postcopy_ram_incoming_setup and postcopy_ram_incoming_cleanup
is a pair. Rename to make it clear for audience.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 migration/postcopy-ram.c | 4 ++--
 migration/postcopy-ram.h | 2 +-
 migration/savevm.c       | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 1f63e65ed7..b24c4a10c2 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1094,7 +1094,7 @@ retry:
     return NULL;
 }
 
-int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
 {
     /* Open the fd for the kernel to give us userfaults */
     mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
@@ -1321,7 +1321,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
     return -1;
 }
 
-int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
 {
     assert(0);
     return -1;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 9c8bd2bae0..d2668cc820 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -20,7 +20,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
  * and wire up anything necessary to deal with it.
  */
-int postcopy_ram_enable_notify(MigrationIncomingState *mis);
+int postcopy_ram_incoming_setup(MigrationIncomingState *mis);
 
 /*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
diff --git a/migration/savevm.c b/migration/savevm.c
index adad938f57..f3292eb003 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1865,7 +1865,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
      * shouldn't be doing anything yet so don't actually expect requests
      */
     if (migrate_postcopy_ram()) {
-        if (postcopy_ram_enable_notify(mis)) {
+        if (postcopy_ram_incoming_setup(mis)) {
             postcopy_ram_incoming_cleanup(mis);
             return -1;
         }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE
  2019-10-01 10:01 [PATCH 0/3] migration/postcopy: cleanup related to postcopy Wei Yang
  2019-10-01 10:01 ` [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup Wei Yang
@ 2019-10-01 10:01 ` Wei Yang
  2019-10-08 16:02   ` Dr. David Alan Gilbert
  2019-10-01 10:01 ` [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly Wei Yang
  2 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-01 10:01 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, Wei Yang

postcopy_ram_incoming_cleanup() does cleanup for
postcopy_ram_incoming_setup(), while the setup happens only after
migration enters LISTEN state.

This means there is nothing to cleanup when migration is still ADVISE
state.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 migration/migration.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5f7e4d15e9..34d5e66f06 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -461,7 +461,6 @@ static void process_incoming_migration_co(void *opaque)
              * but managed to complete within the precopy period, we can use
              * the normal exit.
              */
-            postcopy_ram_incoming_cleanup(mis);
         } else if (ret >= 0) {
             /*
              * Postcopy was started, cleanup should happen at the end of the
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-01 10:01 [PATCH 0/3] migration/postcopy: cleanup related to postcopy Wei Yang
  2019-10-01 10:01 ` [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup Wei Yang
  2019-10-01 10:01 ` [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE Wei Yang
@ 2019-10-01 10:01 ` Wei Yang
  2019-10-08 16:40   ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-01 10:01 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, Wei Yang

Currently, we set PostcopyState blindly to RUNNING, even we found the
previous state is not LISTENING. This will lead to a corner case.

First let's look at the code flow:

qemu_loadvm_state_main()
    ret = loadvm_process_command()
        loadvm_postcopy_handle_run()
            return -1;
    if (ret < 0) {
        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
            ...
    }

From above snippet, the corner case is loadvm_postcopy_handle_run()
always sets state to RUNNING. And then it checks the previous state. If
the previous state is not LISTENING, it will return -1. But at this
moment, PostcopyState is already been set to RUNNING.

Then ret is checked in qemu_loadvm_state_main(), when it is -1
PostcopyState is checked. Current logic would pause postcopy and retry
if PostcopyState is RUNNING. This is not what we expect, because
postcopy is not active yet.

This patch makes sure state is set to RUNNING only previous state is
LISTENING by introducing an old_state parameter in postcopy_state_set().
New state only would be set when current state equals to old_state.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 migration/migration.c    |  2 +-
 migration/postcopy-ram.c | 13 +++++++++----
 migration/postcopy-ram.h |  3 ++-
 migration/savevm.c       | 11 ++++++-----
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 34d5e66f06..369cf3826e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -447,7 +447,7 @@ static void process_incoming_migration_co(void *opaque)
     assert(mis->from_src_file);
     mis->migration_incoming_co = qemu_coroutine_self();
     mis->largest_page_size = qemu_ram_pagesize_largest();
-    postcopy_state_set(POSTCOPY_INCOMING_NONE);
+    postcopy_state_set(POSTCOPY_INCOMING_NONE, NULL);
     migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
                       MIGRATION_STATUS_ACTIVE);
     ret = qemu_loadvm_state(mis->from_src_file);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b24c4a10c2..8f741d636d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -577,7 +577,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         }
     }
 
-    postcopy_state_set(POSTCOPY_INCOMING_END);
+    postcopy_state_set(POSTCOPY_INCOMING_END, NULL);
 
     if (mis->postcopy_tmp_page) {
         munmap(mis->postcopy_tmp_page, mis->largest_page_size);
@@ -626,7 +626,7 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
         return -1;
     }
 
-    postcopy_state_set(POSTCOPY_INCOMING_DISCARD);
+    postcopy_state_set(POSTCOPY_INCOMING_DISCARD, NULL);
 
     return 0;
 }
@@ -1457,9 +1457,14 @@ PostcopyState  postcopy_state_get(void)
 }
 
 /* Set the state and return the old state */
-PostcopyState postcopy_state_set(PostcopyState new_state)
+PostcopyState postcopy_state_set(PostcopyState new_state,
+                                 const PostcopyState *old_state)
 {
-    return atomic_xchg(&incoming_postcopy_state, new_state);
+    if (!old_state) {
+        return atomic_xchg(&incoming_postcopy_state, new_state);
+    } else {
+        return atomic_cmpxchg(&incoming_postcopy_state, *old_state, new_state);
+    }
 }
 
 /* Register a handler for external shared memory postcopy
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index d2668cc820..e3dde32155 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -109,7 +109,8 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis);
 
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
-PostcopyState postcopy_state_set(PostcopyState new_state);
+PostcopyState postcopy_state_set(PostcopyState new_state,
+                                 const PostcopyState *old_state);
 
 void postcopy_fault_thread_notify(MigrationIncomingState *mis);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index f3292eb003..45474d9c95 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1599,7 +1599,7 @@ enum LoadVMExitCodes {
 static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
                                          uint16_t len)
 {
-    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE, NULL);
     uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
     Error *local_err = NULL;
 
@@ -1628,7 +1628,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
     }
 
     if (!postcopy_ram_supported_by_host(mis)) {
-        postcopy_state_set(POSTCOPY_INCOMING_NONE);
+        postcopy_state_set(POSTCOPY_INCOMING_NONE, NULL);
         return -1;
     }
 
@@ -1841,7 +1841,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
 /* After this message we must be able to immediately receive postcopy data */
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
-    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING, NULL);
     trace_loadvm_postcopy_handle_listen();
     Error *local_err = NULL;
 
@@ -1929,10 +1929,11 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 /* After all discards we can start running and asking for pages */
 static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 {
-    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
+    PostcopyState old_ps = POSTCOPY_INCOMING_LISTENING;
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING, &old_ps);
 
     trace_loadvm_postcopy_handle_run();
-    if (ps != POSTCOPY_INCOMING_LISTENING) {
+    if (ps != old_ps) {
         error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
         return -1;
     }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup
  2019-10-01 10:01 ` [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup Wei Yang
@ 2019-10-08 14:17   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-08 14:17 UTC (permalink / raw)
  To: Wei Yang; +Cc: qemu-devel, quintela

* Wei Yang (richardw.yang@linux.intel.com) wrote:
> Function postcopy_ram_incoming_setup and postcopy_ram_incoming_cleanup
> is a pair. Rename to make it clear for audience.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>

Yes OK.
It really initially just setup the userfault, but it also kicks off the
fault thread as well so it has got a bit more hairy.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/postcopy-ram.c | 4 ++--
>  migration/postcopy-ram.h | 2 +-
>  migration/savevm.c       | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 1f63e65ed7..b24c4a10c2 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1094,7 +1094,7 @@ retry:
>      return NULL;
>  }
>  
> -int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> +int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>  {
>      /* Open the fd for the kernel to give us userfaults */
>      mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> @@ -1321,7 +1321,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>      return -1;
>  }
>  
> -int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> +int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>  {
>      assert(0);
>      return -1;
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 9c8bd2bae0..d2668cc820 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -20,7 +20,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
>   * Make all of RAM sensitive to accesses to areas that haven't yet been written
>   * and wire up anything necessary to deal with it.
>   */
> -int postcopy_ram_enable_notify(MigrationIncomingState *mis);
> +int postcopy_ram_incoming_setup(MigrationIncomingState *mis);
>  
>  /*
>   * Initialise postcopy-ram, setting the RAM to a state where we can go into
> diff --git a/migration/savevm.c b/migration/savevm.c
> index adad938f57..f3292eb003 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1865,7 +1865,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>       * shouldn't be doing anything yet so don't actually expect requests
>       */
>      if (migrate_postcopy_ram()) {
> -        if (postcopy_ram_enable_notify(mis)) {
> +        if (postcopy_ram_incoming_setup(mis)) {
>              postcopy_ram_incoming_cleanup(mis);
>              return -1;
>          }
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE
  2019-10-01 10:01 ` [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE Wei Yang
@ 2019-10-08 16:02   ` Dr. David Alan Gilbert
  2019-10-09  0:55     ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-08 16:02 UTC (permalink / raw)
  To: Wei Yang; +Cc: qemu-devel, quintela

* Wei Yang (richardw.yang@linux.intel.com) wrote:
> postcopy_ram_incoming_cleanup() does cleanup for
> postcopy_ram_incoming_setup(), while the setup happens only after
> migration enters LISTEN state.
> 
> This means there is nothing to cleanup when migration is still ADVISE
> state.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>  migration/migration.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5f7e4d15e9..34d5e66f06 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -461,7 +461,6 @@ static void process_incoming_migration_co(void *opaque)
>               * but managed to complete within the precopy period, we can use
>               * the normal exit.
>               */
> -            postcopy_ram_incoming_cleanup(mis);
>          } else if (ret >= 0) {
>              /*
>               * Postcopy was started, cleanup should happen at the end of the

I think that misses the cleanup of mlock that corresponds to the
munlockall in postcopy_ram_supported_by_host - that's called very early
on; I think in the advise stage.

Dave

> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-01 10:01 ` [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly Wei Yang
@ 2019-10-08 16:40   ` Dr. David Alan Gilbert
  2019-10-09  1:02     ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-08 16:40 UTC (permalink / raw)
  To: Wei Yang, peterx; +Cc: qemu-devel, quintela

* Wei Yang (richardw.yang@linux.intel.com) wrote:
> Currently, we set PostcopyState blindly to RUNNING, even we found the
> previous state is not LISTENING. This will lead to a corner case.
> 
> First let's look at the code flow:
> 
> qemu_loadvm_state_main()
>     ret = loadvm_process_command()
>         loadvm_postcopy_handle_run()
>             return -1;
>     if (ret < 0) {
>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
>             ...
>     }
> 
> From above snippet, the corner case is loadvm_postcopy_handle_run()
> always sets state to RUNNING. And then it checks the previous state. If
> the previous state is not LISTENING, it will return -1. But at this
> moment, PostcopyState is already been set to RUNNING.
> 
> Then ret is checked in qemu_loadvm_state_main(), when it is -1
> PostcopyState is checked. Current logic would pause postcopy and retry
> if PostcopyState is RUNNING. This is not what we expect, because
> postcopy is not active yet.
> 
> This patch makes sure state is set to RUNNING only previous state is
> LISTENING by introducing an old_state parameter in postcopy_state_set().
> New state only would be set when current state equals to old_state.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>

OK, it's a shame to use a pointer there, but it works.
Note, something else; using '-1' as the return value and checking for it
is something we do a lot; but in this case it's an example of an error
we could never recover from so it never makes sense to try and recover.
We should probably look at different types of error.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Dave

> ---
>  migration/migration.c    |  2 +-
>  migration/postcopy-ram.c | 13 +++++++++----
>  migration/postcopy-ram.h |  3 ++-
>  migration/savevm.c       | 11 ++++++-----
>  4 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 34d5e66f06..369cf3826e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -447,7 +447,7 @@ static void process_incoming_migration_co(void *opaque)
>      assert(mis->from_src_file);
>      mis->migration_incoming_co = qemu_coroutine_self();
>      mis->largest_page_size = qemu_ram_pagesize_largest();
> -    postcopy_state_set(POSTCOPY_INCOMING_NONE);
> +    postcopy_state_set(POSTCOPY_INCOMING_NONE, NULL);
>      migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
>                        MIGRATION_STATUS_ACTIVE);
>      ret = qemu_loadvm_state(mis->from_src_file);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index b24c4a10c2..8f741d636d 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -577,7 +577,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>          }
>      }
>  
> -    postcopy_state_set(POSTCOPY_INCOMING_END);
> +    postcopy_state_set(POSTCOPY_INCOMING_END, NULL);
>  
>      if (mis->postcopy_tmp_page) {
>          munmap(mis->postcopy_tmp_page, mis->largest_page_size);
> @@ -626,7 +626,7 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> -    postcopy_state_set(POSTCOPY_INCOMING_DISCARD);
> +    postcopy_state_set(POSTCOPY_INCOMING_DISCARD, NULL);
>  
>      return 0;
>  }
> @@ -1457,9 +1457,14 @@ PostcopyState  postcopy_state_get(void)
>  }
>  
>  /* Set the state and return the old state */
> -PostcopyState postcopy_state_set(PostcopyState new_state)
> +PostcopyState postcopy_state_set(PostcopyState new_state,
> +                                 const PostcopyState *old_state)
>  {
> -    return atomic_xchg(&incoming_postcopy_state, new_state);
> +    if (!old_state) {
> +        return atomic_xchg(&incoming_postcopy_state, new_state);
> +    } else {
> +        return atomic_cmpxchg(&incoming_postcopy_state, *old_state, new_state);
> +    }
>  }
>  
>  /* Register a handler for external shared memory postcopy
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index d2668cc820..e3dde32155 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -109,7 +109,8 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis);
>  
>  PostcopyState postcopy_state_get(void);
>  /* Set the state and return the old state */
> -PostcopyState postcopy_state_set(PostcopyState new_state);
> +PostcopyState postcopy_state_set(PostcopyState new_state,
> +                                 const PostcopyState *old_state);
>  
>  void postcopy_fault_thread_notify(MigrationIncomingState *mis);
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index f3292eb003..45474d9c95 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1599,7 +1599,7 @@ enum LoadVMExitCodes {
>  static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>                                           uint16_t len)
>  {
> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE, NULL);
>      uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
>      Error *local_err = NULL;
>  
> @@ -1628,7 +1628,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>      }
>  
>      if (!postcopy_ram_supported_by_host(mis)) {
> -        postcopy_state_set(POSTCOPY_INCOMING_NONE);
> +        postcopy_state_set(POSTCOPY_INCOMING_NONE, NULL);
>          return -1;
>      }
>  
> @@ -1841,7 +1841,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>  /* After this message we must be able to immediately receive postcopy data */
>  static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>  {
> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING, NULL);
>      trace_loadvm_postcopy_handle_listen();
>      Error *local_err = NULL;
>  
> @@ -1929,10 +1929,11 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>  /* After all discards we can start running and asking for pages */
>  static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>  {
> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
> +    PostcopyState old_ps = POSTCOPY_INCOMING_LISTENING;
> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING, &old_ps);
>  
>      trace_loadvm_postcopy_handle_run();
> -    if (ps != POSTCOPY_INCOMING_LISTENING) {
> +    if (ps != old_ps) {
>          error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
>          return -1;
>      }
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE
  2019-10-08 16:02   ` Dr. David Alan Gilbert
@ 2019-10-09  0:55     ` Wei Yang
  2019-10-09  9:03       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-09  0:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Wei Yang, quintela

On Tue, Oct 08, 2019 at 05:02:02PM +0100, Dr. David Alan Gilbert wrote:
>* Wei Yang (richardw.yang@linux.intel.com) wrote:
>> postcopy_ram_incoming_cleanup() does cleanup for
>> postcopy_ram_incoming_setup(), while the setup happens only after
>> migration enters LISTEN state.
>> 
>> This means there is nothing to cleanup when migration is still ADVISE
>> state.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> ---
>>  migration/migration.c | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 5f7e4d15e9..34d5e66f06 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -461,7 +461,6 @@ static void process_incoming_migration_co(void *opaque)
>>               * but managed to complete within the precopy period, we can use
>>               * the normal exit.
>>               */
>> -            postcopy_ram_incoming_cleanup(mis);
>>          } else if (ret >= 0) {
>>              /*
>>               * Postcopy was started, cleanup should happen at the end of the
>
>I think that misses the cleanup of mlock that corresponds to the
>munlockall in postcopy_ram_supported_by_host - that's called very early
>on; I think in the advise stage.
>

Thanks you are right.

BTW, do we need to check enable_mlock when calling munlockall() in
postcopy_ram_supported_by_host() ?

>Dave
>
>> -- 
>> 2.17.1
>> 
>--
>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-08 16:40   ` Dr. David Alan Gilbert
@ 2019-10-09  1:02     ` Wei Yang
  2019-10-09  4:12       ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-09  1:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Wei Yang, peterx, quintela

On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
>* Wei Yang (richardw.yang@linux.intel.com) wrote:
>> Currently, we set PostcopyState blindly to RUNNING, even we found the
>> previous state is not LISTENING. This will lead to a corner case.
>> 
>> First let's look at the code flow:
>> 
>> qemu_loadvm_state_main()
>>     ret = loadvm_process_command()
>>         loadvm_postcopy_handle_run()
>>             return -1;
>>     if (ret < 0) {
>>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
>>             ...
>>     }
>> 
>> From above snippet, the corner case is loadvm_postcopy_handle_run()
>> always sets state to RUNNING. And then it checks the previous state. If
>> the previous state is not LISTENING, it will return -1. But at this
>> moment, PostcopyState is already been set to RUNNING.
>> 
>> Then ret is checked in qemu_loadvm_state_main(), when it is -1
>> PostcopyState is checked. Current logic would pause postcopy and retry
>> if PostcopyState is RUNNING. This is not what we expect, because
>> postcopy is not active yet.
>> 
>> This patch makes sure state is set to RUNNING only previous state is
>> LISTENING by introducing an old_state parameter in postcopy_state_set().
>> New state only would be set when current state equals to old_state.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>
>OK, it's a shame to use a pointer there, but it works.

You mean second parameter of postcopy_state_set()?

I don't have a better idea. Or we introduce a new state
POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?

>Note, something else; using '-1' as the return value and checking for it
>is something we do a lot; but in this case it's an example of an error
>we could never recover from so it never makes sense to try and recover.
>We should probably look at different types of error.
>
>
>Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>Dave
>
>> ---
>>  migration/migration.c    |  2 +-
>>  migration/postcopy-ram.c | 13 +++++++++----
>>  migration/postcopy-ram.h |  3 ++-
>>  migration/savevm.c       | 11 ++++++-----
>>  4 files changed, 18 insertions(+), 11 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 34d5e66f06..369cf3826e 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -447,7 +447,7 @@ static void process_incoming_migration_co(void *opaque)
>>      assert(mis->from_src_file);
>>      mis->migration_incoming_co = qemu_coroutine_self();
>>      mis->largest_page_size = qemu_ram_pagesize_largest();
>> -    postcopy_state_set(POSTCOPY_INCOMING_NONE);
>> +    postcopy_state_set(POSTCOPY_INCOMING_NONE, NULL);
>>      migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
>>                        MIGRATION_STATUS_ACTIVE);
>>      ret = qemu_loadvm_state(mis->from_src_file);
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index b24c4a10c2..8f741d636d 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -577,7 +577,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>>          }
>>      }
>>  
>> -    postcopy_state_set(POSTCOPY_INCOMING_END);
>> +    postcopy_state_set(POSTCOPY_INCOMING_END, NULL);
>>  
>>      if (mis->postcopy_tmp_page) {
>>          munmap(mis->postcopy_tmp_page, mis->largest_page_size);
>> @@ -626,7 +626,7 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
>>          return -1;
>>      }
>>  
>> -    postcopy_state_set(POSTCOPY_INCOMING_DISCARD);
>> +    postcopy_state_set(POSTCOPY_INCOMING_DISCARD, NULL);
>>  
>>      return 0;
>>  }
>> @@ -1457,9 +1457,14 @@ PostcopyState  postcopy_state_get(void)
>>  }
>>  
>>  /* Set the state and return the old state */
>> -PostcopyState postcopy_state_set(PostcopyState new_state)
>> +PostcopyState postcopy_state_set(PostcopyState new_state,
>> +                                 const PostcopyState *old_state)
>>  {
>> -    return atomic_xchg(&incoming_postcopy_state, new_state);
>> +    if (!old_state) {
>> +        return atomic_xchg(&incoming_postcopy_state, new_state);
>> +    } else {
>> +        return atomic_cmpxchg(&incoming_postcopy_state, *old_state, new_state);
>> +    }
>>  }
>>  
>>  /* Register a handler for external shared memory postcopy
>> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
>> index d2668cc820..e3dde32155 100644
>> --- a/migration/postcopy-ram.h
>> +++ b/migration/postcopy-ram.h
>> @@ -109,7 +109,8 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis);
>>  
>>  PostcopyState postcopy_state_get(void);
>>  /* Set the state and return the old state */
>> -PostcopyState postcopy_state_set(PostcopyState new_state);
>> +PostcopyState postcopy_state_set(PostcopyState new_state,
>> +                                 const PostcopyState *old_state);
>>  
>>  void postcopy_fault_thread_notify(MigrationIncomingState *mis);
>>  
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index f3292eb003..45474d9c95 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1599,7 +1599,7 @@ enum LoadVMExitCodes {
>>  static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>>                                           uint16_t len)
>>  {
>> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
>> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE, NULL);
>>      uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
>>      Error *local_err = NULL;
>>  
>> @@ -1628,7 +1628,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>>      }
>>  
>>      if (!postcopy_ram_supported_by_host(mis)) {
>> -        postcopy_state_set(POSTCOPY_INCOMING_NONE);
>> +        postcopy_state_set(POSTCOPY_INCOMING_NONE, NULL);
>>          return -1;
>>      }
>>  
>> @@ -1841,7 +1841,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>  /* After this message we must be able to immediately receive postcopy data */
>>  static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>>  {
>> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
>> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING, NULL);
>>      trace_loadvm_postcopy_handle_listen();
>>      Error *local_err = NULL;
>>  
>> @@ -1929,10 +1929,11 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>>  /* After all discards we can start running and asking for pages */
>>  static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>>  {
>> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
>> +    PostcopyState old_ps = POSTCOPY_INCOMING_LISTENING;
>> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING, &old_ps);
>>  
>>      trace_loadvm_postcopy_handle_run();
>> -    if (ps != POSTCOPY_INCOMING_LISTENING) {
>> +    if (ps != old_ps) {
>>          error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
>>          return -1;
>>      }
>> -- 
>> 2.17.1
>> 
>--
>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-09  1:02     ` Wei Yang
@ 2019-10-09  4:12       ` Peter Xu
  2019-10-09  5:07         ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2019-10-09  4:12 UTC (permalink / raw)
  To: Wei Yang; +Cc: qemu-devel, Dr. David Alan Gilbert, quintela

On Wed, Oct 09, 2019 at 09:02:04AM +0800, Wei Yang wrote:
> On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
> >* Wei Yang (richardw.yang@linux.intel.com) wrote:
> >> Currently, we set PostcopyState blindly to RUNNING, even we found the
> >> previous state is not LISTENING. This will lead to a corner case.
> >> 
> >> First let's look at the code flow:
> >> 
> >> qemu_loadvm_state_main()
> >>     ret = loadvm_process_command()
> >>         loadvm_postcopy_handle_run()
> >>             return -1;
> >>     if (ret < 0) {
> >>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
> >>             ...
> >>     }
> >> 
> >> From above snippet, the corner case is loadvm_postcopy_handle_run()
> >> always sets state to RUNNING. And then it checks the previous state. If
> >> the previous state is not LISTENING, it will return -1. But at this
> >> moment, PostcopyState is already been set to RUNNING.
> >> 
> >> Then ret is checked in qemu_loadvm_state_main(), when it is -1
> >> PostcopyState is checked. Current logic would pause postcopy and retry
> >> if PostcopyState is RUNNING. This is not what we expect, because
> >> postcopy is not active yet.
> >> 
> >> This patch makes sure state is set to RUNNING only previous state is
> >> LISTENING by introducing an old_state parameter in postcopy_state_set().
> >> New state only would be set when current state equals to old_state.
> >> 
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >
> >OK, it's a shame to use a pointer there, but it works.
> 
> You mean second parameter of postcopy_state_set()?
> 
> I don't have a better idea. Or we introduce a new state
> POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?

Maybe simply fix loadvm_postcopy_handle_run() to set the state after
the POSTCOPY_INCOMING_LISTENING check?

> 
> >Note, something else; using '-1' as the return value and checking for it
> >is something we do a lot; but in this case it's an example of an error
> >we could never recover from so it never makes sense to try and recover.
> >We should probably look at different types of error.

It is true that we might hang on some real errors, but IMHO it might
be no where better to quit QEMU if we're in postcopy...

(What I'm thinking in mind here is that sometimes even if postcopy
 failed we might still have a chance to recover a full VM by dumping
 both src/dst of the during-postcopy VM instances and manually merge
 them by black magic, in very extreme cases)

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-09  4:12       ` Peter Xu
@ 2019-10-09  5:07         ` Wei Yang
  2019-10-09  5:36           ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-09  5:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, quintela, Wei Yang, Dr. David Alan Gilbert

On Wed, Oct 09, 2019 at 12:12:25PM +0800, Peter Xu wrote:
>On Wed, Oct 09, 2019 at 09:02:04AM +0800, Wei Yang wrote:
>> On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
>> >* Wei Yang (richardw.yang@linux.intel.com) wrote:
>> >> Currently, we set PostcopyState blindly to RUNNING, even we found the
>> >> previous state is not LISTENING. This will lead to a corner case.
>> >> 
>> >> First let's look at the code flow:
>> >> 
>> >> qemu_loadvm_state_main()
>> >>     ret = loadvm_process_command()
>> >>         loadvm_postcopy_handle_run()
>> >>             return -1;
>> >>     if (ret < 0) {
>> >>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
>> >>             ...
>> >>     }
>> >> 
>> >> From above snippet, the corner case is loadvm_postcopy_handle_run()
>> >> always sets state to RUNNING. And then it checks the previous state. If
>> >> the previous state is not LISTENING, it will return -1. But at this
>> >> moment, PostcopyState is already been set to RUNNING.
>> >> 
>> >> Then ret is checked in qemu_loadvm_state_main(), when it is -1
>> >> PostcopyState is checked. Current logic would pause postcopy and retry
>> >> if PostcopyState is RUNNING. This is not what we expect, because
>> >> postcopy is not active yet.
>> >> 
>> >> This patch makes sure state is set to RUNNING only previous state is
>> >> LISTENING by introducing an old_state parameter in postcopy_state_set().
>> >> New state only would be set when current state equals to old_state.
>> >> 
>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >
>> >OK, it's a shame to use a pointer there, but it works.
>> 
>> You mean second parameter of postcopy_state_set()?
>> 
>> I don't have a better idea. Or we introduce a new state
>> POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?
>
>Maybe simply fix loadvm_postcopy_handle_run() to set the state after
>the POSTCOPY_INCOMING_LISTENING check?
>

Set state back to ps if ps is not POSTCOPY_INCOMING_LISTENING?

Sounds like another option.


-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-09  5:07         ` Wei Yang
@ 2019-10-09  5:36           ` Peter Xu
  2019-10-09  6:07             ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2019-10-09  5:36 UTC (permalink / raw)
  To: Wei Yang; +Cc: qemu-devel, Dr. David Alan Gilbert, quintela

On Wed, Oct 09, 2019 at 01:07:56PM +0800, Wei Yang wrote:
> On Wed, Oct 09, 2019 at 12:12:25PM +0800, Peter Xu wrote:
> >On Wed, Oct 09, 2019 at 09:02:04AM +0800, Wei Yang wrote:
> >> On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
> >> >* Wei Yang (richardw.yang@linux.intel.com) wrote:
> >> >> Currently, we set PostcopyState blindly to RUNNING, even we found the
> >> >> previous state is not LISTENING. This will lead to a corner case.
> >> >> 
> >> >> First let's look at the code flow:
> >> >> 
> >> >> qemu_loadvm_state_main()
> >> >>     ret = loadvm_process_command()
> >> >>         loadvm_postcopy_handle_run()
> >> >>             return -1;
> >> >>     if (ret < 0) {
> >> >>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
> >> >>             ...
> >> >>     }
> >> >> 
> >> >> From above snippet, the corner case is loadvm_postcopy_handle_run()
> >> >> always sets state to RUNNING. And then it checks the previous state. If
> >> >> the previous state is not LISTENING, it will return -1. But at this
> >> >> moment, PostcopyState is already been set to RUNNING.
> >> >> 
> >> >> Then ret is checked in qemu_loadvm_state_main(), when it is -1
> >> >> PostcopyState is checked. Current logic would pause postcopy and retry
> >> >> if PostcopyState is RUNNING. This is not what we expect, because
> >> >> postcopy is not active yet.
> >> >> 
> >> >> This patch makes sure state is set to RUNNING only previous state is
> >> >> LISTENING by introducing an old_state parameter in postcopy_state_set().
> >> >> New state only would be set when current state equals to old_state.
> >> >> 
> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> >
> >> >OK, it's a shame to use a pointer there, but it works.
> >> 
> >> You mean second parameter of postcopy_state_set()?
> >> 
> >> I don't have a better idea. Or we introduce a new state
> >> POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?
> >
> >Maybe simply fix loadvm_postcopy_handle_run() to set the state after
> >the POSTCOPY_INCOMING_LISTENING check?
> >
> 
> Set state back to ps if ps is not POSTCOPY_INCOMING_LISTENING?
> 
> Sounds like another option.

Even simpler?

  ps = postcopy_state_get();
  if (ps != INCOMING)
    return -1;
  postcopy_state_set(RUNNING);

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-09  5:36           ` Peter Xu
@ 2019-10-09  6:07             ` Wei Yang
  2019-10-09  9:08               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2019-10-09  6:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, quintela, Wei Yang, Dr. David Alan Gilbert

On Wed, Oct 09, 2019 at 01:36:34PM +0800, Peter Xu wrote:
>On Wed, Oct 09, 2019 at 01:07:56PM +0800, Wei Yang wrote:
>> On Wed, Oct 09, 2019 at 12:12:25PM +0800, Peter Xu wrote:
>> >On Wed, Oct 09, 2019 at 09:02:04AM +0800, Wei Yang wrote:
>> >> On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
>> >> >* Wei Yang (richardw.yang@linux.intel.com) wrote:
>> >> >> Currently, we set PostcopyState blindly to RUNNING, even we found the
>> >> >> previous state is not LISTENING. This will lead to a corner case.
>> >> >> 
>> >> >> First let's look at the code flow:
>> >> >> 
>> >> >> qemu_loadvm_state_main()
>> >> >>     ret = loadvm_process_command()
>> >> >>         loadvm_postcopy_handle_run()
>> >> >>             return -1;
>> >> >>     if (ret < 0) {
>> >> >>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
>> >> >>             ...
>> >> >>     }
>> >> >> 
>> >> >> From above snippet, the corner case is loadvm_postcopy_handle_run()
>> >> >> always sets state to RUNNING. And then it checks the previous state. If
>> >> >> the previous state is not LISTENING, it will return -1. But at this
>> >> >> moment, PostcopyState is already been set to RUNNING.
>> >> >> 
>> >> >> Then ret is checked in qemu_loadvm_state_main(), when it is -1
>> >> >> PostcopyState is checked. Current logic would pause postcopy and retry
>> >> >> if PostcopyState is RUNNING. This is not what we expect, because
>> >> >> postcopy is not active yet.
>> >> >> 
>> >> >> This patch makes sure state is set to RUNNING only previous state is
>> >> >> LISTENING by introducing an old_state parameter in postcopy_state_set().
>> >> >> New state only would be set when current state equals to old_state.
>> >> >> 
>> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> >
>> >> >OK, it's a shame to use a pointer there, but it works.
>> >> 
>> >> You mean second parameter of postcopy_state_set()?
>> >> 
>> >> I don't have a better idea. Or we introduce a new state
>> >> POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?
>> >
>> >Maybe simply fix loadvm_postcopy_handle_run() to set the state after
>> >the POSTCOPY_INCOMING_LISTENING check?
>> >
>> 
>> Set state back to ps if ps is not POSTCOPY_INCOMING_LISTENING?
>> 
>> Sounds like another option.
>
>Even simpler?
>
>  ps = postcopy_state_get();
>  if (ps != INCOMING)
>    return -1;
>  postcopy_state_set(RUNNING);
>

Looks good to me.

Dave,

Do you feel good with it?

>Thanks,
>
>-- 
>Peter Xu

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE
  2019-10-09  0:55     ` Wei Yang
@ 2019-10-09  9:03       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-09  9:03 UTC (permalink / raw)
  To: Wei Yang; +Cc: qemu-devel, quintela

* Wei Yang (richardw.yang@linux.intel.com) wrote:
> On Tue, Oct 08, 2019 at 05:02:02PM +0100, Dr. David Alan Gilbert wrote:
> >* Wei Yang (richardw.yang@linux.intel.com) wrote:
> >> postcopy_ram_incoming_cleanup() does cleanup for
> >> postcopy_ram_incoming_setup(), while the setup happens only after
> >> migration enters LISTEN state.
> >> 
> >> This means there is nothing to cleanup when migration is still ADVISE
> >> state.
> >> 
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> ---
> >>  migration/migration.c | 1 -
> >>  1 file changed, 1 deletion(-)
> >> 
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index 5f7e4d15e9..34d5e66f06 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -461,7 +461,6 @@ static void process_incoming_migration_co(void *opaque)
> >>               * but managed to complete within the precopy period, we can use
> >>               * the normal exit.
> >>               */
> >> -            postcopy_ram_incoming_cleanup(mis);
> >>          } else if (ret >= 0) {
> >>              /*
> >>               * Postcopy was started, cleanup should happen at the end of the
> >
> >I think that misses the cleanup of mlock that corresponds to the
> >munlockall in postcopy_ram_supported_by_host - that's called very early
> >on; I think in the advise stage.
> >
> 
> Thanks you are right.
> 
> BTW, do we need to check enable_mlock when calling munlockall() in
> postcopy_ram_supported_by_host() ?

I don't think so; it does an extra munlock in that case when nothing
should be locked anyway, no harm.

Dave

> >Dave
> >
> >> -- 
> >> 2.17.1
> >> 
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- 
> Wei Yang
> Help you, Help me
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-09  6:07             ` Wei Yang
@ 2019-10-09  9:08               ` Dr. David Alan Gilbert
  2019-10-10  0:54                 ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-09  9:08 UTC (permalink / raw)
  To: Wei Yang; +Cc: qemu-devel, Peter Xu, quintela

* Wei Yang (richardw.yang@linux.intel.com) wrote:
> On Wed, Oct 09, 2019 at 01:36:34PM +0800, Peter Xu wrote:
> >On Wed, Oct 09, 2019 at 01:07:56PM +0800, Wei Yang wrote:
> >> On Wed, Oct 09, 2019 at 12:12:25PM +0800, Peter Xu wrote:
> >> >On Wed, Oct 09, 2019 at 09:02:04AM +0800, Wei Yang wrote:
> >> >> On Tue, Oct 08, 2019 at 05:40:46PM +0100, Dr. David Alan Gilbert wrote:
> >> >> >* Wei Yang (richardw.yang@linux.intel.com) wrote:
> >> >> >> Currently, we set PostcopyState blindly to RUNNING, even we found the
> >> >> >> previous state is not LISTENING. This will lead to a corner case.
> >> >> >> 
> >> >> >> First let's look at the code flow:
> >> >> >> 
> >> >> >> qemu_loadvm_state_main()
> >> >> >>     ret = loadvm_process_command()
> >> >> >>         loadvm_postcopy_handle_run()
> >> >> >>             return -1;
> >> >> >>     if (ret < 0) {
> >> >> >>         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
> >> >> >>             ...
> >> >> >>     }
> >> >> >> 
> >> >> >> From above snippet, the corner case is loadvm_postcopy_handle_run()
> >> >> >> always sets state to RUNNING. And then it checks the previous state. If
> >> >> >> the previous state is not LISTENING, it will return -1. But at this
> >> >> >> moment, PostcopyState is already been set to RUNNING.
> >> >> >> 
> >> >> >> Then ret is checked in qemu_loadvm_state_main(), when it is -1
> >> >> >> PostcopyState is checked. Current logic would pause postcopy and retry
> >> >> >> if PostcopyState is RUNNING. This is not what we expect, because
> >> >> >> postcopy is not active yet.
> >> >> >> 
> >> >> >> This patch makes sure state is set to RUNNING only previous state is
> >> >> >> LISTENING by introducing an old_state parameter in postcopy_state_set().
> >> >> >> New state only would be set when current state equals to old_state.
> >> >> >> 
> >> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> >> >
> >> >> >OK, it's a shame to use a pointer there, but it works.
> >> >> 
> >> >> You mean second parameter of postcopy_state_set()?
> >> >> 
> >> >> I don't have a better idea. Or we introduce a new state
> >> >> POSTCOPY_INCOMING_NOCHECK. Do you feel better with this?
> >> >
> >> >Maybe simply fix loadvm_postcopy_handle_run() to set the state after
> >> >the POSTCOPY_INCOMING_LISTENING check?
> >> >
> >> 
> >> Set state back to ps if ps is not POSTCOPY_INCOMING_LISTENING?
> >> 
> >> Sounds like another option.
> >
> >Even simpler?
> >
> >  ps = postcopy_state_get();
> >  if (ps != INCOMING)

  ^^ LISTENING

> >    return -1;
> >  postcopy_state_set(RUNNING);
> >
> 
> Looks good to me.
> 
> Dave,
> 
> Do you feel good with it?

Yes, I think so; it's simpler.

Dave

> >Thanks,
> >
> >-- 
> >Peter Xu
> 
> -- 
> Wei Yang
> Help you, Help me
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly
  2019-10-09  9:08               ` Dr. David Alan Gilbert
@ 2019-10-10  0:54                 ` Wei Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2019-10-10  0:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Wei Yang, Peter Xu, quintela

On Wed, Oct 09, 2019 at 10:08:42AM +0100, Dr. David Alan Gilbert wrote:
>> >> >
>> >> >Maybe simply fix loadvm_postcopy_handle_run() to set the state after
>> >> >the POSTCOPY_INCOMING_LISTENING check?
>> >> >
>> >> 
>> >> Set state back to ps if ps is not POSTCOPY_INCOMING_LISTENING?
>> >> 
>> >> Sounds like another option.
>> >
>> >Even simpler?
>> >
>> >  ps = postcopy_state_get();
>> >  if (ps != INCOMING)
>
>  ^^ LISTENING
>

oops

>> >    return -1;
>> >  postcopy_state_set(RUNNING);
>> >
>> 
>> Looks good to me.
>> 
>> Dave,
>> 
>> Do you feel good with it?
>
>Yes, I think so; it's simpler.
>

I will prepare v2.

>Dave
>
>> >Thanks,
>> >
>> >-- 
>> >Peter Xu
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>--
>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-10-10  0:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-01 10:01 [PATCH 0/3] migration/postcopy: cleanup related to postcopy Wei Yang
2019-10-01 10:01 ` [PATCH 1/3] migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup Wei Yang
2019-10-08 14:17   ` Dr. David Alan Gilbert
2019-10-01 10:01 ` [PATCH 2/3] migration/postcopy: not necessary to do postcopy_ram_incoming_cleanup when state is ADVISE Wei Yang
2019-10-08 16:02   ` Dr. David Alan Gilbert
2019-10-09  0:55     ` Wei Yang
2019-10-09  9:03       ` Dr. David Alan Gilbert
2019-10-01 10:01 ` [PATCH 3/3] migration/postcopy: handle POSTCOPY_INCOMING_RUNNING corner case properly Wei Yang
2019-10-08 16:40   ` Dr. David Alan Gilbert
2019-10-09  1:02     ` Wei Yang
2019-10-09  4:12       ` Peter Xu
2019-10-09  5:07         ` Wei Yang
2019-10-09  5:36           ` Peter Xu
2019-10-09  6:07             ` Wei Yang
2019-10-09  9:08               ` Dr. David Alan Gilbert
2019-10-10  0:54                 ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).