From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59521)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yb5bV-0001Fb-C7
	for qemu-devel@nongnu.org; Thu, 26 Mar 2015 07:05:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yb5bN-00007O-OB
	for qemu-devel@nongnu.org; Thu, 26 Mar 2015 07:05:37 -0400
Received: from mx1.redhat.com ([209.132.183.28]:51588)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yb5bN-00007C-Cj
	for qemu-devel@nongnu.org; Thu, 26 Mar 2015 07:05:29 -0400
Date: Thu, 26 Mar 2015 11:05:18 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20150326110517.GA2370@work-vm>
References: <1424883128-9841-1-git-send-email-dgilbert@redhat.com>
	<1424883128-9841-31-git-send-email-dgilbert@redhat.com>
	<20150323042012.GN25043@voom.fritz.box>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150323042012.GN25043@voom.fritz.box>
Subject: Re: [Qemu-devel] [PATCH v5 30/45] Postcopy: Postcopy startup in
	migration thread
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, qemu-devel@nongnu.org, amit.shah@redhat.com, pbonzini@redhat.com, yanghy@cn.fujitsu.com

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Wed, Feb 25, 2015 at 04:51:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Rework the migration thread to setup and start postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   3 +
> >  migration/migration.c         | 161 ++++++++++++++++++++++++++++++++++++++++--
> >  trace-events                  |   4 ++
> >  3 files changed, 164 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 821d561..2c607e7 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -131,6 +131,9 @@ struct MigrationState
> >      /* Flag set once the migration has been asked to enter postcopy */
> >      bool start_postcopy;
> >  
> > +    /* Flag set once the migration thread is running (and needs joining) */
> > +    bool started_migration_thread;
> > +
> >      /* bitmap of pages that have been sent at least once
> >       * only maintained and used in postcopy at the moment
> >       * where it's used to send the dirtymap at the start
> > diff --git a/migration/migration.c b/migration/migration.c
> > index b1ad7b1..6bf9c8d 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -468,7 +468,10 @@ static void migrate_fd_cleanup(void *opaque)
> >      if (s->file) {
> >          trace_migrate_fd_cleanup();
> >          qemu_mutex_unlock_iothread();
> > -        qemu_thread_join(&s->thread);
> > +        if (s->started_migration_thread) {
> > +            qemu_thread_join(&s->thread);
> > +            s->started_migration_thread = false;
> > +        }
> >          qemu_mutex_lock_iothread();
> >  
> >          qemu_fclose(s->file);
> > @@ -874,7 +877,6 @@ out:
> >      return NULL;
> >  }
> >  
> > -__attribute__ (( unused )) /* Until later in patch series */
> >  static int open_outgoing_return_path(MigrationState *ms)
> >  {
> >  
> > @@ -911,23 +913,141 @@ static void await_outgoing_return_path_close(MigrationState *ms)
> >  }
> >  
> >  /*
> > + * Switch from normal iteration to postcopy
> > + * Returns non-0 on error
> > + */
> > +static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> > +{
> > +    int ret;
> > +    const QEMUSizedBuffer *qsb;
> > +    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
> > +
> > +    trace_postcopy_start();
> > +    qemu_mutex_lock_iothread();
> > +    trace_postcopy_start_set_run();
> > +
> > +    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> > +    *old_vm_running = runstate_is_running();
> 
> I think that needs some explanation.  Why are you doing a wakeup on
> the source host?

This matches the existing code in migration_thread for the end of precopy;
Paolo's explanation of what it does is here:
https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg04880.html

> > +    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > +
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +
> > +    /*
> > +     * in Finish migrate and with the io-lock held everything should
> > +     * be quiet, but we've potentially still got dirty pages and we
> > +     * need to tell the destination to throw any pages it's already received
> > +     * that are dirty
> > +     */
> > +    if (ram_postcopy_send_discard_bitmap(ms)) {
> > +        error_report("postcopy send discard bitmap failed");
> > +        goto fail;
> > +    }
> > +
> > +    /*
> > +     * send rest of state - note things that are doing postcopy
> > +     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
> > +     * wrap their state up here
> > +     */
> > +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
> > +    /* Ping just for debugging, helps line traces up */
> > +    qemu_savevm_send_ping(ms->file, 2);
> > +
> > +    /*
> > +     * We need to leave the fd free for page transfers during the
> > +     * loading of the device state, so wrap all the remaining
> > +     * commands and state into a package that gets sent in one go
> > +     */
> > +    QEMUFile *fb = qemu_bufopen("w", NULL);
> > +    if (!fb) {
> > +        error_report("Failed to create buffered file");
> > +        goto fail;
> > +    }
> > +
> > +    qemu_savevm_state_complete(fb);
> > +    qemu_savevm_send_ping(fb, 3);
> > +
> > +    qemu_savevm_send_postcopy_run(fb);
> > +
> > +    /* <><> end of stuff going into the package */
> > +    qsb = qemu_buf_get(fb);
> > +
> > +    /* Now send that blob */
> > +    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
> > +        error_report("postcopy_start: Unreasonably large packaged state: %lu",
> > +                     (unsigned long)(qsb_get_length(qsb)));
> > +        goto fail_closefb;
> > +    }
> > +    qemu_savevm_send_packaged(ms->file, qsb);
> > +    qemu_fclose(fb);
> > +    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> > +
> > +    qemu_mutex_unlock_iothread();
> > +
> > +    /*
> > +     * Although this ping is just for debug, it could potentially be
> > +     * used for getting a better measurement of downtime at the source.
> > +     */
> > +    qemu_savevm_send_ping(ms->file, 4);
> > +
> > +    ret = qemu_file_get_error(ms->file);
> > +    if (ret) {
> > +        error_report("postcopy_start: Migration stream errored");
> > +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +    }
> > +
> > +    return ret;
> > +
> > +fail_closefb:
> > +    qemu_fclose(fb);
> > +fail:
> > +    migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +    qemu_mutex_unlock_iothread();
> > +    return -1;
> > +}
> > +
> > +/*
> >   * Master migration thread on the source VM.
> >   * It drives the migration and pumps the data down the outgoing channel.
> >   */
> >  static void *migration_thread(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> > +    /* Used by the bandwidth calcs, updated later */
> >      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >      int64_t initial_bytes = 0;
> >      int64_t max_size = 0;
> >      int64_t start_time = initial_time;
> >      bool old_vm_running = false;
> > +    bool entered_postcopy = false;
> > +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > +    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
> >  
> >      qemu_savevm_state_header(s->file);
> > +
> > +    if (migrate_postcopy_ram()) {
> > +        /* Now tell the dest that it should open its end so it can reply */
> > +        qemu_savevm_send_open_return_path(s->file);
> > +
> > +        /* And do a ping that will make stuff easier to debug */
> > +        qemu_savevm_send_ping(s->file, 1);
> > +
> > +        /*
> > +         * Tell the destination that we *might* want to do postcopy later;
> > +         * if the other end can't do postcopy it should fail now, nice and
> > +         * early.
> > +         */
> > +        qemu_savevm_send_postcopy_advise(s->file);
> > +    }
> > +
> >      qemu_savevm_state_begin(s->file, &s->params);
> >  
> >      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> > +    current_active_type = MIG_STATE_ACTIVE;
> >      migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
> >  
> >      trace_migration_thread_setup_complete();
> > @@ -946,6 +1066,22 @@ static void *migration_thread(void *opaque)
> >              trace_migrate_pending(pending_size, max_size,
> >                                    pend_post, pend_nonpost);
> >              if (pending_size && pending_size >= max_size) {
> > +                /* Still a significant amount to transfer */
> > +
> > +                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +                if (migrate_postcopy_ram() &&
> > +                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> > +                    pend_nonpost <= max_size &&
> > +                    atomic_read(&s->start_postcopy)) {
> > +
> > +                    if (!postcopy_start(s, &old_vm_running)) {
> > +                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
> > +                        entered_postcopy = true;
> 
> Do you need entered_postcopy, or could you just use the existing
> MIG_STATE variable?

I need the separate flag, because this is used at the end of migration
(when the existing state is MIGRATION_STATUS_COMPLETED) to know that
there has been a postcopy stage, and is used to stop the recalculation
of the 'downtime' which was previously incorrect. See below.


> > +                    }
> > +
> > +                    continue;
> > +                }
> > +                /* Just another iteration step */
> >                  qemu_savevm_state_iterate(s->file);
> >              } else {
> >                  int ret;
> > @@ -975,7 +1111,8 @@ static void *migration_thread(void *opaque)
> >          }
> >  
> >          if (qemu_file_get_error(s->file)) {
> > -            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> > +            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
> > +            trace_migration_thread_file_err();
> >              break;
> >          }
> >          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > @@ -1006,12 +1143,15 @@ static void *migration_thread(void *opaque)
> >          }
> >      }
> >  
> > +    trace_migration_thread_after_loop();
> >      qemu_mutex_lock_iothread();
> >      if (s->state == MIG_STATE_COMPLETED) {
> >          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >          uint64_t transferred_bytes = qemu_ftell(s->file);
> >          s->total_time = end_time - s->total_time;
> > -        s->downtime = end_time - start_time;
> > +        if (!entered_postcopy) {
> > +            s->downtime = end_time - start_time;
> > +        }

Here's the use of entered_postcopy, and you see that the s->state
is always MIG_STATE_COMPLETED here.

Dave

> >          if (s->total_time) {
> >              s->mbps = (((double) transferred_bytes * 8.0) /
> >                         ((double) s->total_time)) / 1000;
> > @@ -1043,8 +1183,21 @@ void migrate_fd_connect(MigrationState *s)
> >      /* Notify before starting migration thread */
> >      notifier_list_notify(&migration_state_notifiers, s);
> >  
> > +    /* Open the return path; currently for postcopy but other things might
> > +     * also want it.
> > +     */
> > +    if (migrate_postcopy_ram()) {
> > +        if (open_outgoing_return_path(s)) {
> > +            error_report("Unable to open return-path for postcopy");
> > +            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
> > +            migrate_fd_cleanup(s);
> > +            return;
> > +        }
> > +    }
> > +
> >      qemu_thread_create(&s->thread, "migration", migration_thread, s,
> >                         QEMU_THREAD_JOINABLE);
> > +    s->started_migration_thread = true;
> >  }
> >  
> >  PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
> > diff --git a/trace-events b/trace-events
> > index 59dea4c..ed8bbe2 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1404,9 +1404,13 @@ migrate_fd_error(void) ""
> >  migrate_fd_cancel(void) ""
> >  migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
> >  migrate_send_rp_message(int cmd, uint16_t len) "cmd=%d, len=%d"
> > +migration_thread_after_loop(void) ""
> > +migration_thread_file_err(void) ""
> >  migration_thread_setup_complete(void) ""
> >  open_outgoing_return_path(void) ""
> >  open_outgoing_return_path_continue(void) ""
> > +postcopy_start(void) ""
> > +postcopy_start_set_run(void) ""
> >  source_return_path_thread_bad_end(void) ""
> >  source_return_path_bad_header_com(void) ""
> >  source_return_path_thread_end(void) ""
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK