From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=43701 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PgHek-0003rC-3g for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:08:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PgHef-0000fa-K4 for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:08:01 -0500 Received: from mx1.redhat.com ([209.132.183.28]:10070) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PgHef-0000eu-C4 for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:07:57 -0500 Message-ID: <4D399393.7030506@redhat.com> Date: Fri, 21 Jan 2011 15:09:23 +0100 From: Kevin Wolf MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB References: <1295449188-17877-1-git-send-email-Pierre.Riteau@irisa.fr> <04350B7C-9933-4A70-8FA9-B5B409D1E10A@irisa.fr> <43211019-BF0D-405A-99B7-54C9B3BBE58E@irisa.fr> <4D397C8E.7080703@redhat.com> <292A277F-FDB6-4842-9133-8CAC22F08453@irisa.fr> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: "qemu-devel@nongnu.org" , Pierre Riteau Am 21.01.2011 14:59, schrieb Yoshiaki Tamura: > 2011/1/21 Pierre Riteau : >> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote: >> >>> 2011/1/21 Kevin Wolf : >>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura: >>>>> 2011/1/21 Pierre Riteau : >>>>>> Le 20 janv. 2011 =E0 17:18, Yoshiaki Tamura a =E9crit : >>>>>> >>>>>>> 2011/1/20 Pierre Riteau : >>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote: >>>>>>>> >>>>>>>>> 2011/1/19 Pierre Riteau : >>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the = return >>>>>>>>>> value of bdrv_write and aborts migration when it fails. Howeve= r, if the >>>>>>>>>> size of the block device to migrate is not a multiple of BLOCK= _SIZE >>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO. >>>>>>>>>> >>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last = block. >>>>>>>>>> --- >>>>>>>>>> block-migration.c | 16 +++++++++++++++- >>>>>>>>>> 1 files changed, 15 insertions(+), 1 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/block-migration.c b/block-migration.c >>>>>>>>>> index 1475325..eeb9c62 100644 >>>>>>>>>> --- a/block-migration.c >>>>>>>>>> +++ b/block-migration.c >>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *o= paque, int version_id) >>>>>>>>>> int64_t addr; >>>>>>>>>> BlockDriverState *bs; >>>>>>>>>> uint8_t *buf; >>>>>>>>>> + int64_t total_sectors; >>>>>>>>>> + int nr_sectors; >>>>>>>>>> >>>>>>>>>> do { >>>>>>>>>> addr =3D qemu_get_be64(f); >>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void = *opaque, int version_id) >>>>>>>>>> return -EINVAL; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> + total_sectors =3D bdrv_getlength(bs) >> BDRV_SECT= OR_BITS; >>>>>>>>>> + if (total_sectors <=3D 0) { >>>>>>>>>> + fprintf(stderr, "Error getting length of bloc= k device %s\n", device_name); >>>>>>>>>> + return -EINVAL; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY= _CHUNK) { >>>>>>>>>> + nr_sectors =3D total_sectors - addr; >>>>>>>>>> + } else { >>>>>>>>>> + nr_sectors =3D BDRV_SECTORS_PER_DIRTY_CHUNK; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> buf =3D qemu_malloc(BLOCK_SIZE); >>>>>>>>>> >>>>>>>>>> qemu_get_buffer(f, buf, BLOCK_SIZE); >>>>>>>>>> - ret =3D bdrv_write(bs, addr, buf, BDRV_SECTORS_PE= R_DIRTY_CHUNK); >>>>>>>>>> + ret =3D bdrv_write(bs, addr, buf, nr_sectors); >>>>>>>>>> >>>>>>>>>> qemu_free(buf); >>>>>>>>>> if (ret < 0) { >>>>>>>>>> -- >>>>>>>>>> 1.7.3.5 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Pierre, >>>>>>>>> >>>>>>>>> I don't think the fix above is correct. If you have a file whi= ch >>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the >>>>>>>>> patch. However, the receiver doesn't know how much sectors whi= ch >>>>>>>>> the sender wants to be written, so the guest may fail after >>>>>>>>> migration because some data may not be written. IIUC, although >>>>>>>>> changing bytestream should be prevented as much as possible, we >>>>>>>>> should save/load total_sectors to check appropriate file is >>>>>>>>> allocated on the receiver side. >>>>>>>> >>>>>>>> Isn't the guest supposed to be started using a file with the cor= rect size? >>>>>>> >>>>>>> I personally don't like that; It's insisting too much to the user. >>>>>>> Can't we expand the image on the fly? We can just abort if expan= ding >>>>>>> failed anyway. >>>>>> >>>>>> At first I thought your expansion idea was best, but now I think t= here are valid scenarios where it fails. >>>>>> >>>>>> Imagine both sides are not using a file but a disk partition as st= orage. If the partition size is not rounded to 1 MB, the last write will = fail with the current code, and there is no way we can expand the partiti= on. >>>>>> >>>>> >>>>> Right. But in case of partition doesn't the check in the patch bel= ow >>>>> return error? Does bdrv_getlength return the size correctly? >>>> >>>> I'm pretty sure that it does. We would have problems in other places= if >>>> it didn't (e.g. we're checking if I/O requests are within the disk s= ize). >>> >>> Sorry for the noise. I just learned it's returning the value of lsee= k >>> in case of raw-posix. >> >> >> And it does a ioctl call on other platforms than Linux. >=20 > Thanks. Just a quick question regarding total_sectors. > BlockDriverState seems to contain total_sectors. Can we avoid > calling bdrv_getlength() if bs->total_sectors were already there? I'd need to check the details, but I think it may not be correct with growable files. Kevin