All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avihai Horon <avihaih@nvidia.com>
To: "Cédric Le Goater" <clg@redhat.com>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
Subject: Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
Date: Thu, 28 Mar 2024 17:50:22 +0200	[thread overview]
Message-ID: <9d3a476d-9e99-4ed8-8320-8bf86269dbf3@nvidia.com> (raw)
In-Reply-To: <9aea916e-ea64-424b-bb48-4d1ad75e85ac@redhat.com>


On 28/03/2024 17:21, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Avihai,
>
> On 3/28/24 15:02, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set
>
> Out of curiosity, could you describe a bit more the context ? Did
> vfio_save_complete_precopy() fail ? why ?

Yep, vfio_save_complete_precopy() failed (but it failed while I was 
experimenting with an unofficial debug FW).

>
> We should propagate errors of .save_live_complete_precopy() handlers as
> it was done .save_setup handlers(). For 9.1.

Agreed.

>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>
> yes, but this test :
>
>     if (ret < 0) {
>         goto fail;
>     }
>
> will skip the close_return_path_on_source() call. Won't it ? So I don't
> understand how it can be an issue. Am I missing something ?

It will skip the close_return_path_on_source() call in 
migration_completion(), but there is another 
close_return_path_on_source() call in migrate_fd_cleanup().

>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>
> Why didn't you add a reference to commit 9425ef3f990a ?

I thought this commit didn't introduce this bug, but looking again in 
the mailing list [1], it kinda did:
The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the 
return path thread before releasing to_dst_file") and then 9425ef3f990a 
re-introduced the bug, but only for migration_completion() case.
So, you are right, a fixes line with 9425ef3f990a should be added.

Thanks.

[1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/

>
>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   migration/migration.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState 
>> *s)
>>   {
>>       int ret = 0;
>>       int current_active_state = s->state;
>> +    Error *local_err = NULL;
>>
>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>           ret = migration_completion_precopy(s, &current_active_state);
>> @@ -2832,6 +2833,15 @@ static void 
>> migration_completion(MigrationState *s)
>>       return;
>>
>>   fail:
>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    } else if (ret) {
>> +        error_setg_errno(&local_err, -ret, "Error in migration 
>> completion");
>
> The 'ret = -1' case could be improved with error_setg(). As a followup.
>
> Thanks,
>
> C.
>
>
>
>
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    }
>> +
>>       migration_completion_failed(s, current_active_state);
>>   }
>>
>


  reply	other threads:[~2024-03-28 15:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 15:09   ` Peter Xu
2024-03-28 15:54     ` Avihai Horon
2024-03-28 15:21   ` Cédric Le Goater
2024-03-28 15:50     ` Avihai Horon [this message]
2024-03-28 16:29       ` Cédric Le Goater
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 15:04   ` Cédric Le Goater
2024-03-28 15:10   ` Peter Xu
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d3a476d-9e99-4ed8-8320-8bf86269dbf3@nvidia.com \
    --to=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.