From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34634) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YbLir-00068d-Sy for qemu-devel@nongnu.org; Fri, 27 Mar 2015 00:18:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YbLif-0002O7-CN for qemu-devel@nongnu.org; Fri, 27 Mar 2015 00:18:17 -0400 Received: from ozlabs.org ([2401:3900:2:1::2]:59222) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YbLie-0002Ns-NK for qemu-devel@nongnu.org; Fri, 27 Mar 2015 00:18:05 -0400 Date: Fri, 27 Mar 2015 15:13:53 +1100 From: David Gibson Message-ID: <20150327041353.GD2900@voom.fritz.box> References: <1424883128-9841-1-git-send-email-dgilbert@redhat.com> <1424883128-9841-18-git-send-email-dgilbert@redhat.com> <20150312093049.GW11973@voom.redhat.com> <20150326163327.GG2370@work-vm> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vni90+aGYgRvsTuO" Content-Disposition: inline In-Reply-To: <20150326163327.GG2370@work-vm> Subject: Re: [Qemu-devel] [PATCH v5 17/45] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, qemu-devel@nongnu.org, amit.shah@redhat.com, pbonzini@redhat.com, yanghy@cn.fujitsu.com --vni90+aGYgRvsTuO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 26, 2015 at 04:33:28PM +0000, Dr. David Alan Gilbert wrote: > (Only replying to some of the items in this mail - the others I'll get > to another time). >=20 > * David Gibson (david@gibson.dropbear.id.au) wrote: > > On Wed, Feb 25, 2015 at 04:51:40PM +0000, Dr. David Alan Gilbert (git) = wrote: > > > From: "Dr. David Alan Gilbert" > > >=20 > > > The state of the postcopy process is managed via a series of messages; > > > * Add wrappers and handlers for sending/receiving these messages > > > * Add state variable that track the current state of postcopy > > >=20 > > > Signed-off-by: Dr. David Alan Gilbert > > > --- > > > include/migration/migration.h | 15 ++ > > > include/sysemu/sysemu.h | 23 +++ > > > migration/migration.c | 13 ++ > > > savevm.c | 325 ++++++++++++++++++++++++++++++++= ++++++++++ > > > trace-events | 11 ++ > > > 5 files changed, 387 insertions(+) > > >=20 > > > diff --git a/include/migration/migration.h b/include/migration/migrat= ion.h > > > index f94af5b..81cd1f2 100644 > > > --- a/include/migration/migration.h > > > +++ b/include/migration/migration.h > > > @@ -52,6 +52,14 @@ typedef struct MigrationState MigrationState; > > > =20 > > > typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head; > > > =20 > > > +typedef enum { > > > + POSTCOPY_INCOMING_NONE =3D 0, /* Initial state - no postcopy */ > > > + POSTCOPY_INCOMING_ADVISE, > > > + POSTCOPY_INCOMING_LISTENING, > > > + POSTCOPY_INCOMING_RUNNING, > > > + POSTCOPY_INCOMING_END > > > +} PostcopyState; > > > + > > > /* State for the incoming migration */ > > > struct MigrationIncomingState { > > > QEMUFile *file; > > > @@ -59,6 +67,8 @@ struct MigrationIncomingState { > > > /* See savevm.c */ > > > LoadStateEntry_Head loadvm_handlers; > > > =20 > > > + PostcopyState postcopy_state; > > > + > > > QEMUFile *return_path; > > > QemuMutex rp_mutex; /* We send replies from multiple thr= eads */ > > > }; > > > @@ -219,4 +229,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_add= r_t block_offset, > > > ram_addr_t offset, size_t size, > > > int *bytes_sent); > > > =20 > > > +PostcopyState postcopy_state_get(MigrationIncomingState *mis); > > > + > > > +/* Set the state and return the old state */ > > > +PostcopyState postcopy_state_set(MigrationIncomingState *mis, > > > + PostcopyState new_state); > > > #endif > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h > > > index 8da879f..d6a6d51 100644 > > > --- a/include/sysemu/sysemu.h > > > +++ b/include/sysemu/sysemu.h > > > @@ -87,6 +87,18 @@ enum qemu_vm_cmd { > > > MIG_CMD_INVALID =3D 0, /* Must be 0 */ > > > MIG_CMD_OPEN_RETURN_PATH, /* Tell the dest to open the Return p= ath */ > > > MIG_CMD_PING, /* Request a PONG on the RP */ > > > + > > > + MIG_CMD_POSTCOPY_ADVISE =3D 20, /* Prior to any page transfers,= just > > > + warn we might want to do PC */ > > > + MIG_CMD_POSTCOPY_LISTEN, /* Start listening for incoming > > > + pages as it's running. */ > > > + MIG_CMD_POSTCOPY_RUN, /* Start execution */ > > > + MIG_CMD_POSTCOPY_END, /* Postcopy is finished. */ > > > + > > > + MIG_CMD_POSTCOPY_RAM_DISCARD, /* A list of pages to discard that > > > + were previously sent during > > > + precopy but are dirty. */ > > > + > > > }; > > > =20 > > > bool qemu_savevm_state_blocked(Error **errp); > > > @@ -101,6 +113,17 @@ void qemu_savevm_command_send(QEMUFile *f, enum = qemu_vm_cmd command, > > > uint16_t len, uint8_t *data); > > > void qemu_savevm_send_ping(QEMUFile *f, uint32_t value); > > > void qemu_savevm_send_open_return_path(QEMUFile *f); > > > +void qemu_savevm_send_postcopy_advise(QEMUFile *f); > > > +void qemu_savevm_send_postcopy_listen(QEMUFile *f); > > > +void qemu_savevm_send_postcopy_run(QEMUFile *f); > > > +void qemu_savevm_send_postcopy_end(QEMUFile *f, uint8_t status); > > > + > > > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *= name, > > > + uint16_t len, uint8_t off= set, > > > + uint64_t *addrlist, > > > + uint32_t *masklist); > > > + > > > + > > > int qemu_loadvm_state(QEMUFile *f); > > > =20 > > > /* SLIRP */ > > > diff --git a/migration/migration.c b/migration/migration.c > > > index 434864a..957115a 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -971,3 +971,16 @@ void migrate_fd_connect(MigrationState *s) > > > qemu_thread_create(&s->thread, "migration", migration_thread, s, > > > QEMU_THREAD_JOINABLE); > > > } > > > + > > > +PostcopyState postcopy_state_get(MigrationIncomingState *mis) > > > +{ > > > + return atomic_fetch_add(&mis->postcopy_state, 0); > > > +} > > > + > > > +/* Set the state and return the old state */ > > > +PostcopyState postcopy_state_set(MigrationIncomingState *mis, > > > + PostcopyState new_state) > > > +{ > > > + return atomic_xchg(&mis->postcopy_state, new_state); > >=20 > > Is there anything explaining what the overall atomicity requirements > > are for this state variable? It's a bit hard to tell if an atomic > > xchg is necessary or sufficient without a description of what the > > overall concurrency scheme is with regards to this variable. >=20 > Can you tell me how to define the requirements? Well, that is always the tricky question. > It's a state variable tested and changed by at least two threads and > it's got to go through a correct sequence of states. > So generally you're doing a 'I expect to be in .... now change to ....' > so the exchange works well for that. So.. in this case. It seems what might make sense as a basic atomicity option here is a cmpxchg. So your state_set function takes old_state and new_state. It will atomically update old_state to new_state, returning an error if somebody else changed the state away =66rom old_state in the meantime. Then you can have a blurb next to the state_set helper explaining why you need this atomic op for updates to the state variable from multiple threads. > > > + * n x > > > + * be64 Page addresses for start of an invalidation range > > > + * be32 mask of 32 pages, '1' to discard' > >=20 > > Is the extra compactness from this semi-sparse bitmap encoding > > actually worth it? A simple list of page addresses, or address ranges > > to discard would be substantially simpler to get one's head around, > > and also seems like it might be more robust against future > > implementation changes as a wire format. >=20 > As previously discussed I really think it is; what I'm tending to > see when I've been looking at these in debug is something that's > sparse but tends to be blobby with sets of pages discarded near > by. However you do this you're going to have to walk this > bitmap and format out some sort of set of messages. I'd really need to see some numbers to be convinced. The semi-sparse bitmap introduces a huge amount of code and complexity at both ends. Remember, using ranges, you can still coalesce adjacent discarded pages. Even using some compat differential encodings of the range endpoints might work out simpler than the bitmap fragments. Plus, ranges handle the host versus target page size distinctions very naturally. [snip] > > DISCARD will typically immediately precede LISTEN, won't it? Is there > > a reason not to put the discard data into the LISTEN command? >=20 > Discard data can be quite large, so I potentially send multiple discard > commands. > (Also as you can tell generally I've got a preference for one message doe= s one > thing, and thus I have tried to keep them separate). So, I like the idea of one message per action in principle - but only if that action really is well-defined without reference to what operations come before and after it. If there are hidden dependencies about what actions have to come in what order, I'd rather bake that into the command structure. > > > +static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis, > > > + uint64_t remote_hps, > > > + uint64_t remote_tps) > > > +{ > > > + PostcopyState ps =3D postcopy_state_get(mis); > > > + trace_loadvm_postcopy_handle_advise(); > > > + if (ps !=3D POSTCOPY_INCOMING_NONE) { > > > + error_report("CMD_POSTCOPY_ADVISE in wrong postcopy state (%= d)", ps); > > > + return -1; > > > + } > > > + > > > + if (remote_hps !=3D getpagesize()) { > > > + /* > > > + * Some combinations of mismatch are probably possible but i= t gets > > > + * a bit more complicated. In particular we need to place w= hole > > > + * host pages on the dest at once, and we need to ensure tha= t we > > > + * handle dirtying to make sure we never end up sending part= of > > > + * a hostpage on it's own. > > > + */ > > > + error_report("Postcopy needs matching host page sizes (s=3D%= d d=3D%d)", > > > + (int)remote_hps, getpagesize()); > > > + return -1; > > > + } > > > + > > > + if (remote_tps !=3D (1ul << qemu_target_page_bits())) { > > > + /* > > > + * Again, some differences could be dealt with, but for now = keep it > > > + * simple. > > > + */ > > > + error_report("Postcopy needs matching target page sizes (s= =3D%d d=3D%d)", > > > + (int)remote_tps, 1 << qemu_target_page_bits()); > > > + return -1; > > > + } > > > + > > > + postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE); > >=20 > > Should you be checking the return value here to make sure it's still > > POSTCOPY_INCOMING_NONE? Atomic xchgs seem overkill if you still have > > a race between the fetch at the top and the set here. > >=20 > > Or, in fact, should you be just doing an atomic exchange-then-check at > > the top, rather than checking at the top, then changing at the bottom. >=20 > There's no race at this point yet; going from None->advise we still only > have one thread. The check at the top is a check against protocol > violations (e.g. getting two advise or something like that). Yeah.. and having to have intimate knowledge of the thread structure at each point in order to analyze the atomic operation correctness is exactly what bothers me about it. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --vni90+aGYgRvsTuO Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVFNkBAAoJEGw4ysog2bOSIw8QAKjMN+Ga7ZtHY0JV6qFs/5vF I3ElXtpBw7RzEvn6i58d2Q+dnpG221YwTBEHyQB7qf7IvdxsfUPppNloOyjUNVnB hqHXTrusrC6H9bGJ6GASi3DuA45zFFB9WPLCVGTdRCznYWDYHilrTYwQfErWFZkj HhV7NShIajQI18ZECCZCvySyCjN2SpA+VrfhWid0rtHajFiL8WMvrIihtdlv1T4X 74A7CCT5ep2xYcmFLU+Wf3WyR6hFVmgMohuEKEwqab0OtD38VgUZzJHn8CvJRpfw owXIX2ShSw9fU+hPZ2D7YcAHBJ1uvWexg4M1mjbZKVDLgMPIgjuOh9lQbkSUPonQ wJW/zf8AR6Pe0ud+y3peZUokU1SCA2f5SIPdqirSl9qYemOwo7lQbtAtSyrbZIPf AEQOLQpi3oiDEcInwl50ALkp73fFVApFSbcuE0b1J7DTEMwFY61UXL3gmEpL/5zY QtTMsNbwX1191s8whNIbNUWnGbdKP9EXx7Mwb6OtrDOpwh1k5Isu3Eo7fWU6SFM9 5mwmqgmASIi5kSzy1lKRQLhe71M1rol8IT3LfA9Efrw/VJ8OiU1nvBkUeKRgQ9go 9H6P+JQ9hE9JSA2ei/sI+bh5M2Do/2Jjvlr9HXwmIOLZ85K5eO14FZdNjJmboMkR 80ENbcG1ZYowwmDtVKZH =oLAg -----END PGP SIGNATURE----- --vni90+aGYgRvsTuO--