From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37193) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWHzm-0000QB-Uq for qemu-devel@nongnu.org; Fri, 13 Mar 2015 01:18:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YWHzk-0005PL-KY for qemu-devel@nongnu.org; Fri, 13 Mar 2015 01:18:50 -0400 Received: from ozlabs.org ([2401:3900:2:1::2]:49163) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWHzj-0005NZ-Nx for qemu-devel@nongnu.org; Fri, 13 Mar 2015 01:18:48 -0400 Date: Fri, 13 Mar 2015 16:19:20 +1100 From: David Gibson Message-ID: <20150313051920.GE11973@voom.redhat.com> References: <1424883128-9841-1-git-send-email-dgilbert@redhat.com> <1424883128-9841-27-git-send-email-dgilbert@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NpJxosW6I2mTcxQ7" Content-Disposition: inline In-Reply-To: <1424883128-9841-27-git-send-email-dgilbert@redhat.com> Subject: Re: [Qemu-devel] [PATCH v5 26/45] Postcopy page-map-incoming (PMI) structure List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert (git)" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, qemu-devel@nongnu.org, amit.shah@redhat.com, pbonzini@redhat.com, yanghy@cn.fujitsu.com --NpJxosW6I2mTcxQ7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 25, 2015 at 04:51:49PM +0000, Dr. David Alan Gilbert (git) wrot= e: > From: "Dr. David Alan Gilbert" >=20 > The PMI holds the state of each page on the incoming side, > so that we can tell if the page is missing, already received > or there is a request outstanding for it. >=20 > Signed-off-by: Dr. David Alan Gilbert > --- > include/migration/migration.h | 18 ++++ > include/migration/postcopy-ram.h | 12 +++ > include/qemu/typedefs.h | 1 + > migration/postcopy-ram.c | 223 +++++++++++++++++++++++++++++++++= ++++++ > 4 files changed, 254 insertions(+) >=20 > diff --git a/include/migration/migration.h b/include/migration/migration.h > index b44b9b2..86200b9 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -48,6 +48,23 @@ enum mig_rpcomm_cmd { > MIG_RP_CMD_PONG, /* Response to a PING; data (seq: be32 ) */ > }; > =20 > +/* Postcopy page-map-incoming - data about each page on the inbound side= */ > +typedef enum { > + POSTCOPY_PMI_MISSING =3D 0, /* page hasn't yet been received */ This appears to be a 3 space indent instead of the usual 4. > + POSTCOPY_PMI_REQUESTED =3D 1, /* Kernel asked for a page, not yet go= t it */ > + POSTCOPY_PMI_RECEIVED =3D 2, /* We've got the page */ > +} PostcopyPMIState; TBH, I'm not sure this enum actually helps anything. I wonder if things might actually be cleared if you simply treat the received and requested bitmaps separately. > +struct PostcopyPMI { > + QemuMutex mutex; > + unsigned long *state0; /* Together with state1 form a */ > + unsigned long *state1; /* PostcopyPMIState */ The comments on the lines above don't appear to shed any light on anything. > + unsigned long host_mask; /* A mask with enough bits set to cove= r one > + host page in the PMI */ > + unsigned long host_bits; /* The number of bits in the map repre= senting > + one host page */ I find the host_bits name fairly confusing. Maybe "tp_per_hp"? > +}; > + > typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head; > =20 > typedef enum { > @@ -69,6 +86,7 @@ struct MigrationIncomingState { > =20 > QEMUFile *return_path; > QemuMutex rp_mutex; /* We send replies from multiple threads= */ > + PostcopyPMI postcopy_pmi; > }; > =20 > MigrationIncomingState *migration_incoming_get_current(void); > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcop= y-ram.h > index d81934f..e93ee8a 100644 > --- a/include/migration/postcopy-ram.h > +++ b/include/migration/postcopy-ram.h > @@ -13,7 +13,19 @@ > #ifndef QEMU_POSTCOPY_RAM_H > #define QEMU_POSTCOPY_RAM_H > =20 > +#include "migration/migration.h" > + > /* Return true if the host supports everything we need to do postcopy-ra= m */ > bool postcopy_ram_supported_by_host(void); > =20 > +/* > + * In 'advise' mode record that a page has been received. > + */ > +void postcopy_hook_early_receive(MigrationIncomingState *mis, > + size_t bitmap_index); > + > +void postcopy_pmi_destroy(MigrationIncomingState *mis); > +void postcopy_pmi_discard_range(MigrationIncomingState *mis, > + size_t start, size_t npages); > +void postcopy_pmi_dump(MigrationIncomingState *mis); > #endif > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h > index 611db46..924eeb6 100644 > --- a/include/qemu/typedefs.h > +++ b/include/qemu/typedefs.h > @@ -61,6 +61,7 @@ typedef struct PCIExpressHost PCIExpressHost; > typedef struct PCIHostState PCIHostState; > typedef struct PCMCIACardState PCMCIACardState; > typedef struct PixelFormat PixelFormat; > +typedef struct PostcopyPMI PostcopyPMI; > typedef struct PropertyInfo PropertyInfo; > typedef struct Property Property; > typedef struct QEMUBH QEMUBH; > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index a0e20b2..4f29055 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -24,6 +24,7 @@ > #include "migration/migration.h" > #include "migration/postcopy-ram.h" > #include "sysemu/sysemu.h" > +#include "qemu/bitmap.h" > #include "qemu/error-report.h" > #include "trace.h" > =20 > @@ -49,6 +50,220 @@ > =20 > #if defined(__linux__) && defined(__NR_userfaultfd) > =20 > +/* ---------------------------------------------------------------------= - */ > +/* Postcopy pagemap-inbound (pmi) - data structures that record the = */ > +/* state of each page used by the inbound postcopy = */ > +/* It's a pair of bitmaps (of the same structure as the migration bitmap= s)*/ > +/* holding one bit per target-page, although most operations work on hos= t */ > +/* pages, the exception being a hook that receives incoming pages off th= e */ > +/* migration stream which come in a TP at a time, although the source = */ > +/* _should_ guarantee it sends a sequence of TPs representing HPs during= */ > +/* the postcopy phase, there is no such guarantee during precopy. We = */ > +/* could boil this down to only holding one bit per-host page, but we lo= se*/ > +/* sanity checking that we really do get whole host-pages from the sourc= e.*/ > +__attribute__ (( unused )) /* Until later in patch series */ > +static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pa= ges) > +{ > + unsigned int tpb =3D qemu_target_page_bits(); > + unsigned long host_bits; > + > + qemu_mutex_init(&mis->postcopy_pmi.mutex); > + mis->postcopy_pmi.state0 =3D bitmap_new(ram_pages); > + mis->postcopy_pmi.state1 =3D bitmap_new(ram_pages); > + bitmap_clear(mis->postcopy_pmi.state0, 0, ram_pages); > + bitmap_clear(mis->postcopy_pmi.state1, 0, ram_pages); > + /* > + * Each bit in the map represents one 'target page' which is no bigg= er > + * than a host page but can be smaller. It's useful to have some > + * convenience masks for later > + */ > + > + /* > + * The number of bits one host page takes up in the bitmap > + * e.g. on a 64k host page, 4k Target page, host_bits=3D64/4=3D16 > + */ > + host_bits =3D getpagesize() / (1ul << tpb); That's equivalent to getpagesize() >> tpb, isn't it? > + assert(is_power_of_2(host_bits)); > + > + mis->postcopy_pmi.host_bits =3D host_bits; > + > + if (host_bits < BITS_PER_LONG) { > + /* A mask starting at bit 0 containing host_bits continuous set = bits */ > + mis->postcopy_pmi.host_mask =3D (1ul << host_bits) - 1; > + } else { > + /* > + * This is a host where the ratio between host and target pages = is > + * bigger than the size of our longs, so we can't make a mask > + * but we are only losing sanity checking if we just check one l= ong's > + * worth of bits. > + */ > + mis->postcopy_pmi.host_mask =3D ~0l; > + } > + > + > + assert((ram_pages % host_bits) =3D=3D 0); > +} > + > +void postcopy_pmi_destroy(MigrationIncomingState *mis) > +{ > + g_free(mis->postcopy_pmi.state0); > + mis->postcopy_pmi.state0 =3D NULL; > + g_free(mis->postcopy_pmi.state1); > + mis->postcopy_pmi.state1 =3D NULL; > + qemu_mutex_destroy(&mis->postcopy_pmi.mutex); > +} > + > +/* > + * Mark a set of pages in the PMI as being clear; this is used by the di= scard > + * at the start of postcopy, and before the postcopy stream starts. > + */ > +void postcopy_pmi_discard_range(MigrationIncomingState *mis, > + size_t start, size_t npages) > +{ > + /* Clear to state 0 =3D missing */ > + bitmap_clear(mis->postcopy_pmi.state0, start, npages); > + bitmap_clear(mis->postcopy_pmi.state1, start, npages); > +} > + > +/* > + * Test a host-page worth of bits in the map starting at bitmap_index > + * The bits should all be consistent > + */ > +static bool test_hpbits(MigrationIncomingState *mis, > + size_t bitmap_index, unsigned long *map) > +{ > + long masked; > + > + assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) =3D=3D 0); > + > + masked =3D (map[BIT_WORD(bitmap_index)] >> > + (bitmap_index % BITS_PER_LONG)) & > + mis->postcopy_pmi.host_mask; > + > + assert((masked =3D=3D 0) || (masked =3D=3D mis->postcopy_pmi.host_ma= sk)); > + return !!masked; > +} > + > +/* > + * Set host-page worth of bits in the map starting at bitmap_index > + * to the given state > + */ > +static void set_hp(MigrationIncomingState *mis, > + size_t bitmap_index, PostcopyPMIState state) > +{ > + long shifted_mask =3D mis->postcopy_pmi.host_mask << > + (bitmap_index % BITS_PER_LONG); > + > + assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) =3D=3D 0); assert(state !=3D 0)? > + > + if (state & 1) { Using the symbolic constants for PostcopyPMIState values here might be better. > + mis->postcopy_pmi.state0[BIT_WORD(bitmap_index)] |=3D shifted_ma= sk; > + } else { > + mis->postcopy_pmi.state0[BIT_WORD(bitmap_index)] &=3D ~shifted_m= ask; > + } > + if (state & 2) { > + mis->postcopy_pmi.state1[BIT_WORD(bitmap_index)] |=3D shifted_ma= sk; > + } else { > + mis->postcopy_pmi.state1[BIT_WORD(bitmap_index)] &=3D ~shifted_m= ask; > + } > +} > + > +/* > + * Retrieve the state of the given page > + * Note: This version for use by callers already holding the lock > + */ > +static PostcopyPMIState postcopy_pmi_get_state_nolock( > + MigrationIncomingState *mis, > + size_t bitmap_index) > +{ > + bool b0, b1; > + > + b0 =3D test_hpbits(mis, bitmap_index, mis->postcopy_pmi.state0); > + b1 =3D test_hpbits(mis, bitmap_index, mis->postcopy_pmi.state1); > + > + return (b0 ? 1 : 0) + (b1 ? 2 : 0); Ugh.. this is a hidden dependency on the PostcopyPMIState enum elements never changing value. Safer to code it as: if (!b0 && !b1) { return POSTCOPY_PMI_MISSING; } else if (...) ... and let gcc sort it out. > +} > + > +/* Retrieve the state of the given page */ > +__attribute__ (( unused )) /* Until later in patch series */ > +static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *m= is, > + size_t bitmap_index) > +{ > + PostcopyPMIState ret; > + qemu_mutex_lock(&mis->postcopy_pmi.mutex); > + ret =3D postcopy_pmi_get_state_nolock(mis, bitmap_index); > + qemu_mutex_unlock(&mis->postcopy_pmi.mutex); > + > + return ret; > +} > + > +/* > + * Set the page state to the given state if the previous state was as ex= pected > + * Return the actual previous state. > + */ > +__attribute__ (( unused )) /* Until later in patch series */ > +static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState= *mis, > + size_t bitmap_index, > + PostcopyPMIState expected_sta= te, > + PostcopyPMIState new_state) > +{ > + PostcopyPMIState old_state; > + > + qemu_mutex_lock(&mis->postcopy_pmi.mutex); > + old_state =3D postcopy_pmi_get_state_nolock(mis, bitmap_index); > + > + if (old_state =3D=3D expected_state) { > + switch (new_state) { > + case POSTCOPY_PMI_MISSING: > + assert(0); /* This shouldn't happen - use discard_range */ > + break; > + > + case POSTCOPY_PMI_REQUESTED: > + assert(old_state =3D=3D POSTCOPY_PMI_MISSING); > + /* missing -> requested */ > + set_hp(mis, bitmap_index, POSTCOPY_PMI_REQUESTED); > + break; > + > + case POSTCOPY_PMI_RECEIVED: > + assert(old_state =3D=3D POSTCOPY_PMI_MISSING || > + old_state =3D=3D POSTCOPY_PMI_REQUESTED); > + /* -> received */ > + set_hp(mis, bitmap_index, POSTCOPY_PMI_RECEIVED); > + break; > + } > + } > + > + qemu_mutex_unlock(&mis->postcopy_pmi.mutex); > + return old_state; > +} > + > +/* > + * Useful when debugging postcopy, although if it failed early the > + * received map can be quite sparse and thus big when dumped. > + */ > +void postcopy_pmi_dump(MigrationIncomingState *mis) > +{ > + fprintf(stderr, "postcopy_pmi_dump: bit 0\n"); > + ram_debug_dump_bitmap(mis->postcopy_pmi.state0, false); > + fprintf(stderr, "postcopy_pmi_dump: bit 1\n"); > + ram_debug_dump_bitmap(mis->postcopy_pmi.state1, true); > + fprintf(stderr, "postcopy_pmi_dump: end\n"); > +} > + > +/* Called by ram_load prior to mapping the page */ > +void postcopy_hook_early_receive(MigrationIncomingState *mis, > + size_t bitmap_index) > +{ > + if (mis->postcopy_state =3D=3D POSTCOPY_INCOMING_ADVISE) { > + /* > + * If we're in precopy-advise mode we need to track received pag= es even > + * though we don't need to place pages atomically yet. > + * In advise mode there's only a single thread, so don't need lo= cks > + */ > + set_bit(bitmap_index, mis->postcopy_pmi.state1); /* 2=3Dreceived= */ Yeah.. so this bypasses postcopy_pmi_{get,change}_state, which again makes me wonder whether the enum serves any purpose. > + } > +} > + > static bool ufd_version_check(int ufd) > { > struct uffdio_api api_struct; > @@ -71,6 +286,7 @@ static bool ufd_version_check(int ufd) > return true; > } > =20 > + Extraneous whitespace change. > bool postcopy_ram_supported_by_host(void) > { > long pagesize =3D getpagesize(); > @@ -157,5 +373,12 @@ bool postcopy_ram_supported_by_host(void) > return false; > } > =20 > +/* Called by ram_load prior to mapping the page */ > +void postcopy_hook_early_receive(MigrationIncomingState *mis, > + size_t bitmap_index) > +{ > + /* We don't support postcopy so don't care */ > +} > + > #endif > =20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --NpJxosW6I2mTcxQ7 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVAnNYAAoJEGw4ysog2bOSuHMQALGlwShd+yY2l9ZdTdTo6OdC pwHfWzgOll5xtk98iVLOUmOBlrtt6CNJkt5KhcJP+G7cmR9zcdldnEWocmc4Z6GW Q7T8g92RmOL8VY+TYOWJkGJ7DvG3SikIpPAN1Cy53XYOQzoycwoodqpbdlny0UJd PFrMEpG53SI+zkCBPM6TIfXNwWrfZQ+X7G4eAVOFEN3yTsGQfA1+3VagAMgzlQXQ t343RhWNLPHwfE7fLbt/5mLc8hJyZVyI4/ynwXrpdpxYgqksEqgPeS88NeIdlQhu IAUyZHLgx4jxYy9bliPJ/D62445BfWf5Z84O6okA1cZLly+NZ5gzWMaWV3BKXCjG dMvIEdSCNtpGgHRIPlYA6mLlUuaxNq1Mb1gSd9XhbdcSgu960AzoxzsRJmtKZqrv bHDYJcgU/5B6Ts8NEUW8k0+D8yzFkGsVY6JQU1pKQStNqx+JNToTnGRz6MFI5EHn /aMF2LmiTbj4RsardSkMkFpdrY4uFHwK0x1pQWutzj5p3haq6gQGU3DdLuSOnmCa RQKnMyZJHn3E3/UD+VJxkuUbPyPftnLOlvDnKW2q5WjGhO+rtNWNefC6QkzVl7/O 5oCIHiiClNV9borPWdOMHx+uBawlCnufvVFUjbunP0HuoAQMHNXgDaK8pOFTV/Cg +ZKk9AQu6NDR4Nk54r4J =a8BZ -----END PGP SIGNATURE----- --NpJxosW6I2mTcxQ7--