All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fei Li <fli@suse.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, quintela@redhat.com, dgilbert@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC 0/2] Fix migration issues
Date: Thu, 25 Oct 2018 17:04:00 +0800	[thread overview]
Message-ID: <45194647-7b30-df97-5517-1c758947a91e@suse.com> (raw)
In-Reply-To: <20181024212742.GB30830@xz-x1.hotspot.internet-for-guests.com>



On 10/25/2018 05:27 AM, Peter Xu wrote:
> On Mon, Oct 22, 2018 at 07:08:52PM +0800, Fei Li wrote:
>> Hi,
>> these two patches are to fix live migration issues. The first is
>> about multifd, and the second is to fix some error handling.
>>
>> But I have a question about using multifd migration.
>> In our current code, when multifd is used during migration, if there
>> is an error before the destination receives all new channels (I mean
>> multifd_recv_new_channel(ioc)), the destination does not exit but
>> keeps waiting (Hang in recvmsg() in qio_channel_socket_readv) until
>> the source exits.
>>
>> My question is about the state of the destination host if fails during
>> this period. I did a test, after applying [1/2] patch, if
>> multifd_new_send_channel_async() fails, the destination host hangs for
>> a while then later pops up a window saying
>>      "'QEMU (...) [stopped]' is not responding.
>>      You may choose to wait a short while for it to continue or force
>>      the application to quit entirely."
>> But after closing the window by clicking, the qemu on the dest still
>> hangs there until I exclusively kill the qemu on the source.
>>
>> The source host keeps running as expected, but I guess the hang
>> phenonmenon in the dest is not right.
>> Would someone kindly give some suggestions on this? Thanks a lot.
> Note that it's during KVM forum so the response from anyone might be
> slow (it ends this week).
Thanks for the kindly reminder. :)
> I think the thing you described seems normal since we can't guarantee
> the network is always stable, normally I'll expect that the migration
> will fail but it won't matter much since after all it's a precopy so
> we lose nothing.  So I'm curious about when the error you mentioned
> happens (e.g., total channel number is N, you only got M channels
> connected, with M < N) could you just simply kill the destination?
> Then AFAIU the source can just continue to run, right?
Yes, for the M < N situation, IMO the destination can be simply killed by
adding exit(EXIT_FAILURE) when it failed to receive packet via some
channel. The code is as below which has been tested, and result is the
source continues to run and the destination exits.
I'd like to write a separate patch if the below code/idea is acceptable
to fix the hang issue.

@@ -1325,22 +1325,24 @@ bool multifd_recv_all_channels_created(void)
  /* Return true if multifd is ready for the migration, otherwise false */
  bool multifd_recv_new_channel(QIOChannel *ioc)
  {
+    MigrationIncomingState *mis = migration_incoming_get_current();
      MultiFDRecvParams *p;
      Error *local_err = NULL;
      int id;

      id = multifd_recv_initial_packet(ioc, &local_err);
      if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
-        return false;
+        error_reportf_err(local_err,
+                          "failed to receive packet via multifd channel 
%x: ",
+                          multifd_recv_state->count);
+        goto fail;
      }

      p = &multifd_recv_state->params[id];
      if (p->c != NULL) {
          error_setg(&local_err, "multifd: received id '%d' already setup'",
                     id);
-        multifd_recv_terminate_threads(local_err);
-        return false;
+        goto fail;
      }
      p->c = ioc;
      object_ref(OBJECT(ioc));
@@ -1352,6 +1354,11 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
                         QEMU_THREAD_JOINABLE);
      atomic_inc(&multifd_recv_state->count);
      return multifd_recv_state->count == migrate_multifd_channels();
+fail:
+    multifd_recv_terminate_threads(local_err);
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+    exit(EXIT_FAILURE);
  }

Have a nice day, thanks a lot
Fei
>>
>> Fei Li (2):
>>    migration: fix the multifd code
>>    migration: fix some error handling
>>
>>   migration/migration.c    |  5 +----
>>   migration/postcopy-ram.c |  3 +++
>>   migration/ram.c          | 33 +++++++++++++++++++++++----------
>>   migration/ram.h          |  2 +-
>>   4 files changed, 28 insertions(+), 15 deletions(-)
>>
>> -- 
>> 2.13.7
>>
> Regards,
>

  reply	other threads:[~2018-10-25  9:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-22 11:08 [Qemu-devel] [PATCH RFC 0/2] Fix migration issues Fei Li
2018-10-22 11:08 ` [Qemu-devel] [PATCH RFC 1/2] migration: fix the multifd code Fei Li
2018-10-22 11:08 ` [Qemu-devel] [PATCH RFC 2/2] migration: fix some error handling Fei Li
2018-10-24 21:27 ` [Qemu-devel] [PATCH RFC 0/2] Fix migration issues Peter Xu
2018-10-25  9:04   ` Fei Li [this message]
2018-10-25 12:58     ` Peter Xu
2018-10-26 13:10       ` Fei Li
2018-10-26 13:35         ` Peter Xu
2018-10-26 15:24           ` Dr. David Alan Gilbert
2018-10-29  7:15             ` Fei Li
2018-10-25 12:55 ` Dr. David Alan Gilbert
2018-10-26 12:59   ` Fei Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45194647-7b30-df97-5517-1c758947a91e@suse.com \
    --to=fli@suse.com \
    --cc=dgilbert@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.