From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=40886 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PgI0q-0001xk-QR for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:30:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PgI0l-0008Vf-85 for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:30:52 -0500 Received: from mail-ew0-f45.google.com ([209.85.215.45]:53835) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PgI0k-0008VY-V5 for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:30:47 -0500 Received: by ewy10 with SMTP id 10so936095ewy.4 for ; Fri, 21 Jan 2011 06:30:46 -0800 (PST) MIME-Version: 1.0 Sender: tamura.yoshiaki@gmail.com In-Reply-To: <73163FA4-194C-483B-A20B-3AF6843C6BC7@irisa.fr> References: <1295449188-17877-1-git-send-email-Pierre.Riteau@irisa.fr> <04350B7C-9933-4A70-8FA9-B5B409D1E10A@irisa.fr> <43211019-BF0D-405A-99B7-54C9B3BBE58E@irisa.fr> <4D397C8E.7080703@redhat.com> <292A277F-FDB6-4842-9133-8CAC22F08453@irisa.fr> <35F6C0D9-8888-4C79-A0EE-6CAF3E154520@irisa.fr> <73163FA4-194C-483B-A20B-3AF6843C6BC7@irisa.fr> Date: Fri, 21 Jan 2011 23:30:45 +0900 Message-ID: Subject: Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB From: Yoshiaki Tamura Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pierre Riteau Cc: Kevin Wolf , "qemu-devel@nongnu.org" 2011/1/21 Pierre Riteau : > On 21 janv. 2011, at 15:21, Yoshiaki Tamura wrote: > >> 2011/1/21 Pierre Riteau : >>> On 21 janv. 2011, at 14:59, Yoshiaki Tamura wrote: >>> >>>> 2011/1/21 Pierre Riteau : >>>>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote: >>>>> >>>>>> 2011/1/21 Kevin Wolf : >>>>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura: >>>>>>>> 2011/1/21 Pierre Riteau : >>>>>>>>> Le 20 janv. 2011 =E0 17:18, Yoshiaki Tamura a =E9crit : >>>>>>>>> >>>>>>>>>> 2011/1/20 Pierre Riteau : >>>>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote: >>>>>>>>>>> >>>>>>>>>>>> 2011/1/19 Pierre Riteau : >>>>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the= return >>>>>>>>>>>>> value of bdrv_write and aborts migration when it fails. Howev= er, if the >>>>>>>>>>>>> size of the block device to migrate is not a multiple of BLOC= K_SIZE >>>>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO. >>>>>>>>>>>>> >>>>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last= block. >>>>>>>>>>>>> --- >>>>>>>>>>>>> =A0block-migration.c | =A0 16 +++++++++++++++- >>>>>>>>>>>>> =A01 files changed, 15 insertions(+), 1 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c >>>>>>>>>>>>> index 1475325..eeb9c62 100644 >>>>>>>>>>>>> --- a/block-migration.c >>>>>>>>>>>>> +++ b/block-migration.c >>>>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *= opaque, int version_id) >>>>>>>>>>>>> =A0 =A0 int64_t addr; >>>>>>>>>>>>> =A0 =A0 BlockDriverState *bs; >>>>>>>>>>>>> =A0 =A0 uint8_t *buf; >>>>>>>>>>>>> + =A0 =A0int64_t total_sectors; >>>>>>>>>>>>> + =A0 =A0int nr_sectors; >>>>>>>>>>>>> >>>>>>>>>>>>> =A0 =A0 do { >>>>>>>>>>>>> =A0 =A0 =A0 =A0 addr =3D qemu_get_be64(f); >>>>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void= *opaque, int version_id) >>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EINVAL; >>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 } >>>>>>>>>>>>> >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0total_sectors =3D bdrv_getlength(bs)= >> BDRV_SECTOR_BITS; >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0if (total_sectors <=3D 0) { >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fprintf(stderr, "Error getti= ng length of block device %s\n", device_name); >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -EINVAL; >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0} >>>>>>>>>>>>> + >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0if (total_sectors - addr < BDRV_SECT= ORS_PER_DIRTY_CHUNK) { >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nr_sectors =3D total_sectors= - addr; >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0} else { >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nr_sectors =3D BDRV_SECTORS_= PER_DIRTY_CHUNK; >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0} >>>>>>>>>>>>> + >>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 buf =3D qemu_malloc(BLOCK_SIZE); >>>>>>>>>>>>> >>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 qemu_get_buffer(f, buf, BLOCK_SIZE); >>>>>>>>>>>>> - =A0 =A0 =A0 =A0 =A0 =A0ret =3D bdrv_write(bs, addr, buf, BD= RV_SECTORS_PER_DIRTY_CHUNK); >>>>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0ret =3D bdrv_write(bs, addr, buf, nr= _sectors); >>>>>>>>>>>>> >>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 qemu_free(buf); >>>>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 if (ret < 0) { >>>>>>>>>>>>> -- >>>>>>>>>>>>> 1.7.3.5 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Pierre, >>>>>>>>>>>> >>>>>>>>>>>> I don't think the fix above is correct. =A0If you have a file = which >>>>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the >>>>>>>>>>>> patch. =A0However, the receiver doesn't know how much sectors = which >>>>>>>>>>>> the sender wants to be written, so the guest may fail after >>>>>>>>>>>> migration because some data may not be written. =A0IIUC, altho= ugh >>>>>>>>>>>> changing bytestream should be prevented as much as possible, w= e >>>>>>>>>>>> should save/load total_sectors to check appropriate file is >>>>>>>>>>>> allocated on the receiver side. >>>>>>>>>>> >>>>>>>>>>> Isn't the guest supposed to be started using a file with the co= rrect size? >>>>>>>>>> >>>>>>>>>> I personally don't like that; It's insisting too much to the use= r. >>>>>>>>>> Can't we expand the image on the fly? =A0We can just abort if ex= panding >>>>>>>>>> failed anyway. >>>>>>>>> >>>>>>>>> At first I thought your expansion idea was best, but now I think = there are valid scenarios where it fails. >>>>>>>>> >>>>>>>>> Imagine both sides are not using a file but a disk partition as s= torage. If the partition size is not rounded to 1 MB, the last write will f= ail with the current code, and there is no way we can expand the partition. >>>>>>>>> >>>>>>>> >>>>>>>> Right. =A0But in case of partition doesn't the check in the patch = below >>>>>>>> return error? =A0Does bdrv_getlength return the size correctly? >>>>>>> >>>>>>> I'm pretty sure that it does. We would have problems in other place= s if >>>>>>> it didn't (e.g. we're checking if I/O requests are within the disk = size). >>>>>> >>>>>> Sorry for the noise. =A0I just learned it's returning the value of l= seek >>>>>> in case of raw-posix. >>>>> >>>>> >>>>> And it does a ioctl call on other platforms than Linux. >>>> >>>> Thanks. =A0Just a quick question regarding total_sectors. >>>> BlockDriverState seems to contain total_sectors. =A0Can we avoid >>>> calling bdrv_getlength() if bs->total_sectors were already there? >>> >>> From a comment in bdrv_getlength(): >>> >>> Fixed size devices use the total_sectors value for speed instead of >>> issuing a length query (like lseek) on each call. =A0Also, legacy block >>> drivers don't provide a bdrv_getlength function and must use >>> total_sectors. >>> >>> So using bdrv_getlength will protect against devices being resized duri= ng migration, but as far as I can see, the sender side doesn't support it: = the value of total_sectors is cached for the whole block migration. >> >> Even if the sender supports it, as far as total_sectors isn't >> sent to the receiver, can we follow the resize on the receiver? > > > I was referring to the complex, and probably unrealistic scenario, where = a user allocates a file of the correct size on the receiving side, starts b= lock migration, and during migration grows the size of the disk on both the= sender and receiver side. I thought supporting resize while block-migration would be a good feature because Kemari is live migrating again and again :) Yoshi > > -- > Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France > http://perso.univ-rennes1.fr/pierre.riteau/ > > >