All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-9.0 0/2] migration: Two migration bug fixes
@ 2024-03-28 14:02 Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon

Hello,

This small series fixes two migration bugs I stumbled upon recently.
Comments are welcome, thanks for reviewing.

Avihai Horon (2):
  migration: Set migration error in migration_completion()
  migration/postcopy: Ensure postcopy_start() sets errp if it fails

 migration/migration.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
@ 2024-03-28 14:02 ` Avihai Horon
  2024-03-28 15:09   ` Peter Xu
  2024-03-28 15:21   ` Cédric Le Goater
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
  2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
  2 siblings, 2 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.

This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.

Fix it by ensuring migration error is set in case of error in
migration_completion().

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
 {
     int ret = 0;
     int current_active_state = s->state;
+    Error *local_err = NULL;
 
     if (s->state == MIGRATION_STATUS_ACTIVE) {
         ret = migration_completion_precopy(s, &current_active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
     return;
 
 fail:
+    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+    } else if (ret) {
+        error_setg_errno(&local_err, -ret, "Error in migration completion");
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+    }
+
     migration_completion_failed(s, current_active_state);
 }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
  2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
@ 2024-03-28 14:02 ` Avihai Horon
  2024-03-28 15:04   ` Cédric Le Goater
  2024-03-28 15:10   ` Peter Xu
  2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
  2 siblings, 2 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon

There are several places where postcopy_start() fails without setting
errp. This can cause a null pointer de-reference, as in case of error,
the caller of postcopy_start() copies/prints the error set in errp.

Fix it by setting errp in all of postcopy_start() error paths.

Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index b73ae3a72c4..86bf76e9258 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
         migration_wait_main_channel(ms);
         if (postcopy_preempt_establish_channel(ms)) {
             migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+            error_setg(errp, "%s: Failed to establish preempt channel",
+                       __func__);
             return -1;
         }
     }
@@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
+        error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
         goto fail;
     }
 
     ret = migration_maybe_pause(ms, &cur_state,
                                 MIGRATION_STATUS_POSTCOPY_ACTIVE);
     if (ret < 0) {
+        error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
+                         __func__);
         goto fail;
     }
 
     ret = bdrv_inactivate_all();
     if (ret < 0) {
+        error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
+                         __func__);
         goto fail;
     }
     restart_block = true;
@@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     /* Now send that blob */
     if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
+        error_setg(errp, "%s: Failed to send packaged data", __func__);
         goto fail_closefb;
     }
     qemu_fclose(fb);
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
@ 2024-03-28 15:04   ` Cédric Le Goater
  2024-03-28 15:10   ` Peter Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 15:04 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas

On 3/28/24 15:02, Avihai Horon wrote:
> There are several places where postcopy_start() fails without setting
> errp. This can cause a null pointer de-reference, as in case of error,
> the caller of postcopy_start() copies/prints the error set in errp.
> 
> Fix it by setting errp in all of postcopy_start() error paths.
> 
> Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   migration/migration.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index b73ae3a72c4..86bf76e9258 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>           migration_wait_main_channel(ms);
>           if (postcopy_preempt_establish_channel(ms)) {
>               migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> +            error_setg(errp, "%s: Failed to establish preempt channel",
> +                       __func__);
>               return -1;
>           }
>       }
> @@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>   
>       ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
>       if (ret < 0) {
> +        error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
>           goto fail;
>       }
>   
>       ret = migration_maybe_pause(ms, &cur_state,
>                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
>       if (ret < 0) {
> +        error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
> +                         __func__);
>           goto fail;
>       }
>   
>       ret = bdrv_inactivate_all();
>       if (ret < 0) {
> +        error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
> +                         __func__);
>           goto fail;
>       }
>       restart_block = true;
> @@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>   
>       /* Now send that blob */
>       if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
> +        error_setg(errp, "%s: Failed to send packaged data", __func__);
>           goto fail_closefb;
>       }
>       qemu_fclose(fb);



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
@ 2024-03-28 15:09   ` Peter Xu
  2024-03-28 15:54     ` Avihai Horon
  2024-03-28 15:21   ` Cédric Le Goater
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Xu @ 2024-03-28 15:09 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater

On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
> 
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set.
> 
> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.
> 
> Fix it by ensuring migration error is set in case of error in
> migration_completion().
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

I'll attach this if it looks all right to you:

Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()")

Thanks,

> ---
>  migration/migration.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>  {
>      int ret = 0;
>      int current_active_state = s->state;
> +    Error *local_err = NULL;
>  
>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>          ret = migration_completion_precopy(s, &current_active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>      return;
>  
>  fail:
> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    } else if (ret) {
> +        error_setg_errno(&local_err, -ret, "Error in migration completion");
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    }
> +
>      migration_completion_failed(s, current_active_state);
>  }
>  
> -- 
> 2.26.3
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
  2024-03-28 15:04   ` Cédric Le Goater
@ 2024-03-28 15:10   ` Peter Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-03-28 15:10 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater

On Thu, Mar 28, 2024 at 04:02:52PM +0200, Avihai Horon wrote:
> There are several places where postcopy_start() fails without setting
> errp. This can cause a null pointer de-reference, as in case of error,
> the caller of postcopy_start() copies/prints the error set in errp.
> 
> Fix it by setting errp in all of postcopy_start() error paths.
> 
> Fixes: 908927db28ea ("migration: Update error description whenever migration fails")

I think this should need:

Cc: qemu-stable <qemu-stable@nongnu.org>

> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
  2024-03-28 15:09   ` Peter Xu
@ 2024-03-28 15:21   ` Cédric Le Goater
  2024-03-28 15:50     ` Avihai Horon
  1 sibling, 1 reply; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 15:21 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas

Hello Avihai,

On 3/28/24 15:02, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
> 
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set

Out of curiosity, could you describe a bit more the context ? Did
vfio_save_complete_precopy() fail ? why ?

We should propagate errors of .save_live_complete_precopy() handlers as
it was done .save_setup handlers(). For 9.1.

> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.

yes, but this test :

     if (ret < 0) {
         goto fail;
     }

will skip the close_return_path_on_source() call. Won't it ? So I don't
understand how it can be an issue. Am I missing something ?

> Fix it by ensuring migration error is set in case of error in
> migration_completion().

Why didn't you add a reference to commit 9425ef3f990a ?


> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   migration/migration.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>   {
>       int ret = 0;
>       int current_active_state = s->state;
> +    Error *local_err = NULL;
>   
>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>           ret = migration_completion_precopy(s, &current_active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>       return;
>   
>   fail:
> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    } else if (ret) {
> +        error_setg_errno(&local_err, -ret, "Error in migration completion");

The 'ret = -1' case could be improved with error_setg(). As a followup.

Thanks,

C.




> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    }
> +
>       migration_completion_failed(s, current_active_state);
>   }
>   



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 15:21   ` Cédric Le Goater
@ 2024-03-28 15:50     ` Avihai Horon
  2024-03-28 16:29       ` Cédric Le Goater
  0 siblings, 1 reply; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 15:50 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel; +Cc: Peter Xu, Fabiano Rosas


On 28/03/2024 17:21, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Avihai,
>
> On 3/28/24 15:02, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set
>
> Out of curiosity, could you describe a bit more the context ? Did
> vfio_save_complete_precopy() fail ? why ?

Yep, vfio_save_complete_precopy() failed (but it failed while I was 
experimenting with an unofficial debug FW).

>
> We should propagate errors of .save_live_complete_precopy() handlers as
> it was done .save_setup handlers(). For 9.1.

Agreed.

>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>
> yes, but this test :
>
>     if (ret < 0) {
>         goto fail;
>     }
>
> will skip the close_return_path_on_source() call. Won't it ? So I don't
> understand how it can be an issue. Am I missing something ?

It will skip the close_return_path_on_source() call in 
migration_completion(), but there is another 
close_return_path_on_source() call in migrate_fd_cleanup().

>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>
> Why didn't you add a reference to commit 9425ef3f990a ?

I thought this commit didn't introduce this bug, but looking again in 
the mailing list [1], it kinda did:
The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the 
return path thread before releasing to_dst_file") and then 9425ef3f990a 
re-introduced the bug, but only for migration_completion() case.
So, you are right, a fixes line with 9425ef3f990a should be added.

Thanks.

[1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/

>
>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   migration/migration.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState 
>> *s)
>>   {
>>       int ret = 0;
>>       int current_active_state = s->state;
>> +    Error *local_err = NULL;
>>
>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>           ret = migration_completion_precopy(s, &current_active_state);
>> @@ -2832,6 +2833,15 @@ static void 
>> migration_completion(MigrationState *s)
>>       return;
>>
>>   fail:
>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    } else if (ret) {
>> +        error_setg_errno(&local_err, -ret, "Error in migration 
>> completion");
>
> The 'ret = -1' case could be improved with error_setg(). As a followup.
>
> Thanks,
>
> C.
>
>
>
>
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    }
>> +
>>       migration_completion_failed(s, current_active_state);
>>   }
>>
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 15:09   ` Peter Xu
@ 2024-03-28 15:54     ` Avihai Horon
  0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 15:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater


On 28/03/2024 17:09, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set.
>>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> I'll attach this if it looks all right to you:
>
> Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()")

Yes, sure, go ahead.

Thanks.

>
> Thanks,
>
>> ---
>>   migration/migration.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>>   {
>>       int ret = 0;
>>       int current_active_state = s->state;
>> +    Error *local_err = NULL;
>>
>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>           ret = migration_completion_precopy(s, &current_active_state);
>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>>       return;
>>
>>   fail:
>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    } else if (ret) {
>> +        error_setg_errno(&local_err, -ret, "Error in migration completion");
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    }
>> +
>>       migration_completion_failed(s, current_active_state);
>>   }
>>
>> --
>> 2.26.3
>>
>>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 15:50     ` Avihai Horon
@ 2024-03-28 16:29       ` Cédric Le Goater
  0 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 16:29 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas

On 3/28/24 16:50, Avihai Horon wrote:
> 
> On 28/03/2024 17:21, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Avihai,
>>
>> On 3/28/24 15:02, Avihai Horon wrote:
>>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>>> close_return_path_on_source()"), close_return_path_on_source() assumes
>>> that migration error is set if an error occurs during migration.
>>>
>>> This may not be true if migration errors in migration_completion(). For
>>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>>> will not be set
>>
>> Out of curiosity, could you describe a bit more the context ? Did
>> vfio_save_complete_precopy() fail ? why ?
> 
> Yep, vfio_save_complete_precopy() failed (but it failed while I was experimenting with an unofficial debug FW).
> 
>>
>> We should propagate errors of .save_live_complete_precopy() handlers as
>> it was done .save_setup handlers(). For 9.1.
> 
> Agreed.
> 
>>
>>> This in turn, will cause a migration hang bug, similar to the bug that
>>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>>> thread before releasing to_dst_file"), as shutdown() will not be issued
>>> for the return-path channel.
>>
>> yes, but this test :
>>
>>     if (ret < 0) {
>>         goto fail;
>>     }
>>
>> will skip the close_return_path_on_source() call. Won't it ? So I don't
>> understand how it can be an issue. Am I missing something ?
> 
> It will skip the close_return_path_on_source() call in migration_completion(), but there is another close_return_path_on_source() call in migrate_fd_cleanup().

OK. Found it. This is a code path I hadn't explored yet.

Acked-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> 
>>
>>> Fix it by ensuring migration error is set in case of error in
>>> migration_completion().
>>
>> Why didn't you add a reference to commit 9425ef3f990a ?
> 
> I thought this commit didn't introduce this bug, but looking again in the mailing list [1], it kinda did:
> The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, but only for migration_completion() case.
> So, you are right, a fixes line with 9425ef3f990a should be added.
> 
> Thanks.
> 
> [1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/
> 
>>
>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>>   migration/migration.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9fe8fd2afd7..b73ae3a72c4 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>>>   {
>>>       int ret = 0;
>>>       int current_active_state = s->state;
>>> +    Error *local_err = NULL;
>>>
>>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>>           ret = migration_completion_precopy(s, &current_active_state);
>>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>>>       return;
>>>
>>>   fail:
>>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +    } else if (ret) {
>>> +        error_setg_errno(&local_err, -ret, "Error in migration completion");
>>
>> The 'ret = -1' case could be improved with error_setg(). As a followup.
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +    }
>>> +
>>>       migration_completion_failed(s, current_active_state);
>>>   }
>>>
>>
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 0/2] migration: Two migration bug fixes
  2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
@ 2024-03-28 17:04 ` Peter Xu
  2 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-03-28 17:04 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater

On Thu, Mar 28, 2024 at 04:02:50PM +0200, Avihai Horon wrote:
> Hello,
> 
> This small series fixes two migration bugs I stumbled upon recently.
> Comments are welcome, thanks for reviewing.
> 
> Avihai Horon (2):
>   migration: Set migration error in migration_completion()
>   migration/postcopy: Ensure postcopy_start() sets errp if it fails

queued for 9.0-rc2, thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-28 17:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 15:09   ` Peter Xu
2024-03-28 15:54     ` Avihai Horon
2024-03-28 15:21   ` Cédric Le Goater
2024-03-28 15:50     ` Avihai Horon
2024-03-28 16:29       ` Cédric Le Goater
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 15:04   ` Cédric Le Goater
2024-03-28 15:10   ` Peter Xu
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.