From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59521) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yb5bV-0001Fb-C7 for qemu-devel@nongnu.org; Thu, 26 Mar 2015 07:05:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yb5bN-00007O-OB for qemu-devel@nongnu.org; Thu, 26 Mar 2015 07:05:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51588) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yb5bN-00007C-Cj for qemu-devel@nongnu.org; Thu, 26 Mar 2015 07:05:29 -0400 Date: Thu, 26 Mar 2015 11:05:18 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20150326110517.GA2370@work-vm> References: <1424883128-9841-1-git-send-email-dgilbert@redhat.com> <1424883128-9841-31-git-send-email-dgilbert@redhat.com> <20150323042012.GN25043@voom.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150323042012.GN25043@voom.fritz.box> Subject: Re: [Qemu-devel] [PATCH v5 30/45] Postcopy: Postcopy startup in migration thread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, qemu-devel@nongnu.org, amit.shah@redhat.com, pbonzini@redhat.com, yanghy@cn.fujitsu.com * David Gibson (david@gibson.dropbear.id.au) wrote: > On Wed, Feb 25, 2015 at 04:51:53PM +0000, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > Rework the migration thread to setup and start postcopy. > > > > Signed-off-by: Dr. David Alan Gilbert > > --- > > include/migration/migration.h | 3 + > > migration/migration.c | 161 ++++++++++++++++++++++++++++++++++++++++-- > > trace-events | 4 ++ > > 3 files changed, 164 insertions(+), 4 deletions(-) > > > > diff --git a/include/migration/migration.h b/include/migration/migration.h > > index 821d561..2c607e7 100644 > > --- a/include/migration/migration.h > > +++ b/include/migration/migration.h > > @@ -131,6 +131,9 @@ struct MigrationState > > /* Flag set once the migration has been asked to enter postcopy */ > > bool start_postcopy; > > > > + /* Flag set once the migration thread is running (and needs joining) */ > > + bool started_migration_thread; > > + > > /* bitmap of pages that have been sent at least once > > * only maintained and used in postcopy at the moment > > * where it's used to send the dirtymap at the start > > diff --git a/migration/migration.c b/migration/migration.c > > index b1ad7b1..6bf9c8d 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -468,7 +468,10 @@ static void migrate_fd_cleanup(void *opaque) > > if (s->file) { > > trace_migrate_fd_cleanup(); > > qemu_mutex_unlock_iothread(); > > - qemu_thread_join(&s->thread); > > + if (s->started_migration_thread) { > > + qemu_thread_join(&s->thread); > > + s->started_migration_thread = false; > > + } > > qemu_mutex_lock_iothread(); > > > > qemu_fclose(s->file); > > @@ -874,7 +877,6 @@ out: > > return NULL; > > } > > > > -__attribute__ (( unused )) /* Until later in patch series */ > > static int open_outgoing_return_path(MigrationState *ms) > > { > > > > @@ -911,23 +913,141 @@ static void await_outgoing_return_path_close(MigrationState *ms) > > } > > > > /* > > + * Switch from normal iteration to postcopy > > + * Returns non-0 on error > > + */ > > +static int postcopy_start(MigrationState *ms, bool *old_vm_running) > > +{ > > + int ret; > > + const QEMUSizedBuffer *qsb; > > + int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > + migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE); > > + > > + trace_postcopy_start(); > > + qemu_mutex_lock_iothread(); > > + trace_postcopy_start_set_run(); > > + > > + qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); > > + *old_vm_running = runstate_is_running(); > > I think that needs some explanation. Why are you doing a wakeup on > the source host? This matches the existing code in migration_thread for the end of precopy; Paolo's explanation of what it does is here: https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg04880.html > > + ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > > + > > + if (ret < 0) { > > + goto fail; > > + } > > + > > + /* > > + * in Finish migrate and with the io-lock held everything should > > + * be quiet, but we've potentially still got dirty pages and we > > + * need to tell the destination to throw any pages it's already received > > + * that are dirty > > + */ > > + if (ram_postcopy_send_discard_bitmap(ms)) { > > + error_report("postcopy send discard bitmap failed"); > > + goto fail; > > + } > > + > > + /* > > + * send rest of state - note things that are doing postcopy > > + * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually > > + * wrap their state up here > > + */ > > + qemu_file_set_rate_limit(ms->file, INT64_MAX); > > + /* Ping just for debugging, helps line traces up */ > > + qemu_savevm_send_ping(ms->file, 2); > > + > > + /* > > + * We need to leave the fd free for page transfers during the > > + * loading of the device state, so wrap all the remaining > > + * commands and state into a package that gets sent in one go > > + */ > > + QEMUFile *fb = qemu_bufopen("w", NULL); > > + if (!fb) { > > + error_report("Failed to create buffered file"); > > + goto fail; > > + } > > + > > + qemu_savevm_state_complete(fb); > > + qemu_savevm_send_ping(fb, 3); > > + > > + qemu_savevm_send_postcopy_run(fb); > > + > > + /* <><> end of stuff going into the package */ > > + qsb = qemu_buf_get(fb); > > + > > + /* Now send that blob */ > > + if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) { > > + error_report("postcopy_start: Unreasonably large packaged state: %lu", > > + (unsigned long)(qsb_get_length(qsb))); > > + goto fail_closefb; > > + } > > + qemu_savevm_send_packaged(ms->file, qsb); > > + qemu_fclose(fb); > > + ms->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop; > > + > > + qemu_mutex_unlock_iothread(); > > + > > + /* > > + * Although this ping is just for debug, it could potentially be > > + * used for getting a better measurement of downtime at the source. > > + */ > > + qemu_savevm_send_ping(ms->file, 4); > > + > > + ret = qemu_file_get_error(ms->file); > > + if (ret) { > > + error_report("postcopy_start: Migration stream errored"); > > + migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR); > > + } > > + > > + return ret; > > + > > +fail_closefb: > > + qemu_fclose(fb); > > +fail: > > + migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR); > > + qemu_mutex_unlock_iothread(); > > + return -1; > > +} > > + > > +/* > > * Master migration thread on the source VM. > > * It drives the migration and pumps the data down the outgoing channel. > > */ > > static void *migration_thread(void *opaque) > > { > > MigrationState *s = opaque; > > + /* Used by the bandwidth calcs, updated later */ > > int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > int64_t initial_bytes = 0; > > int64_t max_size = 0; > > int64_t start_time = initial_time; > > bool old_vm_running = false; > > + bool entered_postcopy = false; > > + /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ > > + enum MigrationPhase current_active_type = MIG_STATE_ACTIVE; > > > > qemu_savevm_state_header(s->file); > > + > > + if (migrate_postcopy_ram()) { > > + /* Now tell the dest that it should open its end so it can reply */ > > + qemu_savevm_send_open_return_path(s->file); > > + > > + /* And do a ping that will make stuff easier to debug */ > > + qemu_savevm_send_ping(s->file, 1); > > + > > + /* > > + * Tell the destination that we *might* want to do postcopy later; > > + * if the other end can't do postcopy it should fail now, nice and > > + * early. > > + */ > > + qemu_savevm_send_postcopy_advise(s->file); > > + } > > + > > qemu_savevm_state_begin(s->file, &s->params); > > > > s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; > > + current_active_type = MIG_STATE_ACTIVE; > > migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE); > > > > trace_migration_thread_setup_complete(); > > @@ -946,6 +1066,22 @@ static void *migration_thread(void *opaque) > > trace_migrate_pending(pending_size, max_size, > > pend_post, pend_nonpost); > > if (pending_size && pending_size >= max_size) { > > + /* Still a significant amount to transfer */ > > + > > + current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > + if (migrate_postcopy_ram() && > > + s->state != MIG_STATE_POSTCOPY_ACTIVE && > > + pend_nonpost <= max_size && > > + atomic_read(&s->start_postcopy)) { > > + > > + if (!postcopy_start(s, &old_vm_running)) { > > + current_active_type = MIG_STATE_POSTCOPY_ACTIVE; > > + entered_postcopy = true; > > Do you need entered_postcopy, or could you just use the existing > MIG_STATE variable? I need the separate flag, because this is used at the end of migration (when the existing state is MIGRATION_STATUS_COMPLETED) to know that there has been a postcopy stage, and is used to stop the recalculation of the 'downtime' which was previously incorrect. See below. > > + } > > + > > + continue; > > + } > > + /* Just another iteration step */ > > qemu_savevm_state_iterate(s->file); > > } else { > > int ret; > > @@ -975,7 +1111,8 @@ static void *migration_thread(void *opaque) > > } > > > > if (qemu_file_get_error(s->file)) { > > - migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR); > > + migrate_set_state(s, current_active_type, MIG_STATE_ERROR); > > + trace_migration_thread_file_err(); > > break; > > } > > current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > @@ -1006,12 +1143,15 @@ static void *migration_thread(void *opaque) > > } > > } > > > > + trace_migration_thread_after_loop(); > > qemu_mutex_lock_iothread(); > > if (s->state == MIG_STATE_COMPLETED) { > > int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > uint64_t transferred_bytes = qemu_ftell(s->file); > > s->total_time = end_time - s->total_time; > > - s->downtime = end_time - start_time; > > + if (!entered_postcopy) { > > + s->downtime = end_time - start_time; > > + } Here's the use of entered_postcopy, and you see that the s->state is always MIG_STATE_COMPLETED here. Dave > > if (s->total_time) { > > s->mbps = (((double) transferred_bytes * 8.0) / > > ((double) s->total_time)) / 1000; > > @@ -1043,8 +1183,21 @@ void migrate_fd_connect(MigrationState *s) > > /* Notify before starting migration thread */ > > notifier_list_notify(&migration_state_notifiers, s); > > > > + /* Open the return path; currently for postcopy but other things might > > + * also want it. > > + */ > > + if (migrate_postcopy_ram()) { > > + if (open_outgoing_return_path(s)) { > > + error_report("Unable to open return-path for postcopy"); > > + migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR); > > + migrate_fd_cleanup(s); > > + return; > > + } > > + } > > + > > qemu_thread_create(&s->thread, "migration", migration_thread, s, > > QEMU_THREAD_JOINABLE); > > + s->started_migration_thread = true; > > } > > > > PostcopyState postcopy_state_get(MigrationIncomingState *mis) > > diff --git a/trace-events b/trace-events > > index 59dea4c..ed8bbe2 100644 > > --- a/trace-events > > +++ b/trace-events > > @@ -1404,9 +1404,13 @@ migrate_fd_error(void) "" > > migrate_fd_cancel(void) "" > > migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")" > > migrate_send_rp_message(int cmd, uint16_t len) "cmd=%d, len=%d" > > +migration_thread_after_loop(void) "" > > +migration_thread_file_err(void) "" > > migration_thread_setup_complete(void) "" > > open_outgoing_return_path(void) "" > > open_outgoing_return_path_continue(void) "" > > +postcopy_start(void) "" > > +postcopy_start_set_run(void) "" > > source_return_path_thread_bad_end(void) "" > > source_return_path_bad_header_com(void) "" > > source_return_path_thread_end(void) "" > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK