All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pierre Riteau <Pierre.Riteau@irisa.fr>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB
Date: Fri, 21 Jan 2011 15:23:03 +0100	[thread overview]
Message-ID: <73163FA4-194C-483B-A20B-3AF6843C6BC7@irisa.fr> (raw)
In-Reply-To: <AANLkTiknxuDBfQiN4Ct-9qS5s2onGS8YY2wtdaOv7ohr@mail.gmail.com>

On 21 janv. 2011, at 15:21, Yoshiaki Tamura wrote:

> 2011/1/21 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>> On 21 janv. 2011, at 14:59, Yoshiaki Tamura wrote:
>> 
>>> 2011/1/21 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote:
>>>> 
>>>>> 2011/1/21 Kevin Wolf <kwolf@redhat.com>:
>>>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura:
>>>>>>> 2011/1/21 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>>>>>> Le 20 janv. 2011 à 17:18, Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp> a écrit :
>>>>>>>> 
>>>>>>>>> 2011/1/20 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote:
>>>>>>>>>> 
>>>>>>>>>>> 2011/1/19 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return
>>>>>>>>>>>> value of bdrv_write and aborts migration when it fails. However, if the
>>>>>>>>>>>> size of the block device to migrate is not a multiple of BLOCK_SIZE
>>>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO.
>>>>>>>>>>>> 
>>>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last block.
>>>>>>>>>>>> ---
>>>>>>>>>>>>  block-migration.c |   16 +++++++++++++++-
>>>>>>>>>>>>  1 files changed, 15 insertions(+), 1 deletions(-)
>>>>>>>>>>>> 
>>>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c
>>>>>>>>>>>> index 1475325..eeb9c62 100644
>>>>>>>>>>>> --- a/block-migration.c
>>>>>>>>>>>> +++ b/block-migration.c
>>>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>>>>>>>     int64_t addr;
>>>>>>>>>>>>     BlockDriverState *bs;
>>>>>>>>>>>>     uint8_t *buf;
>>>>>>>>>>>> +    int64_t total_sectors;
>>>>>>>>>>>> +    int nr_sectors;
>>>>>>>>>>>> 
>>>>>>>>>>>>     do {
>>>>>>>>>>>>         addr = qemu_get_be64(f);
>>>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>>>>>>>                 return -EINVAL;
>>>>>>>>>>>>             }
>>>>>>>>>>>> 
>>>>>>>>>>>> +            total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>>>>>>>>>>>> +            if (total_sectors <= 0) {
>>>>>>>>>>>> +                fprintf(stderr, "Error getting length of block device %s\n", device_name);
>>>>>>>>>>>> +                return -EINVAL;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +
>>>>>>>>>>>> +            if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>>>>>>>>>>>> +                nr_sectors = total_sectors - addr;
>>>>>>>>>>>> +            } else {
>>>>>>>>>>>> +                nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +
>>>>>>>>>>>>             buf = qemu_malloc(BLOCK_SIZE);
>>>>>>>>>>>> 
>>>>>>>>>>>>             qemu_get_buffer(f, buf, BLOCK_SIZE);
>>>>>>>>>>>> -            ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK);
>>>>>>>>>>>> +            ret = bdrv_write(bs, addr, buf, nr_sectors);
>>>>>>>>>>>> 
>>>>>>>>>>>>             qemu_free(buf);
>>>>>>>>>>>>             if (ret < 0) {
>>>>>>>>>>>> --
>>>>>>>>>>>> 1.7.3.5
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>> 
>>>>>>>>>>> I don't think the fix above is correct.  If you have a file which
>>>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the
>>>>>>>>>>> patch.  However, the receiver doesn't know how much sectors which
>>>>>>>>>>> the sender wants to be written, so the guest may fail after
>>>>>>>>>>> migration because some data may not be written.  IIUC, although
>>>>>>>>>>> changing bytestream should be prevented as much as possible, we
>>>>>>>>>>> should save/load total_sectors to check appropriate file is
>>>>>>>>>>> allocated on the receiver side.
>>>>>>>>>> 
>>>>>>>>>> Isn't the guest supposed to be started using a file with the correct size?
>>>>>>>>> 
>>>>>>>>> I personally don't like that; It's insisting too much to the user.
>>>>>>>>> Can't we expand the image on the fly?  We can just abort if expanding
>>>>>>>>> failed anyway.
>>>>>>>> 
>>>>>>>> At first I thought your expansion idea was best, but now I think there are valid scenarios where it fails.
>>>>>>>> 
>>>>>>>> Imagine both sides are not using a file but a disk partition as storage. If the partition size is not rounded to 1 MB, the last write will fail with the current code, and there is no way we can expand the partition.
>>>>>>>> 
>>>>>>> 
>>>>>>> Right.  But in case of partition doesn't the check in the patch below
>>>>>>> return error?  Does bdrv_getlength return the size correctly?
>>>>>> 
>>>>>> I'm pretty sure that it does. We would have problems in other places if
>>>>>> it didn't (e.g. we're checking if I/O requests are within the disk size).
>>>>> 
>>>>> Sorry for the noise.  I just learned it's returning the value of lseek
>>>>> in case of raw-posix.
>>>> 
>>>> 
>>>> And it does a ioctl call on other platforms than Linux.
>>> 
>>> Thanks.  Just a quick question regarding total_sectors.
>>> BlockDriverState seems to contain total_sectors.  Can we avoid
>>> calling bdrv_getlength() if bs->total_sectors were already there?
>> 
>> From a comment in bdrv_getlength():
>> 
>> Fixed size devices use the total_sectors value for speed instead of
>> issuing a length query (like lseek) on each call.  Also, legacy block
>> drivers don't provide a bdrv_getlength function and must use
>> total_sectors.
>> 
>> So using bdrv_getlength will protect against devices being resized during migration, but as far as I can see, the sender side doesn't support it: the value of total_sectors is cached for the whole block migration.
> 
> Even if the sender supports it, as far as total_sectors isn't
> sent to the receiver, can we follow the resize on the receiver?


I was referring to the complex, and probably unrealistic scenario, where a user allocates a file of the correct size on the receiving side, starts block migration, and during migration grows the size of the disk on both the sender and receiver side.

-- 
Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France
http://perso.univ-rennes1.fr/pierre.riteau/

  reply	other threads:[~2011-01-21 14:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-19 14:59 [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB Pierre Riteau
2011-01-20  2:06 ` Yoshiaki Tamura
2011-01-20  6:49   ` Pierre Riteau
2011-01-20 16:18     ` Yoshiaki Tamura
2011-01-21  8:08       ` Pierre Riteau
2011-01-21  9:11         ` Kevin Wolf
2011-01-21 12:26           ` Yoshiaki Tamura
2011-01-21 12:15         ` Yoshiaki Tamura
2011-01-21 12:31           ` Kevin Wolf
2011-01-21 12:36             ` Yoshiaki Tamura
2011-01-21 12:40               ` Pierre Riteau
2011-01-21 13:59                 ` Yoshiaki Tamura
2011-01-21 14:09                   ` Kevin Wolf
2011-01-21 14:18                     ` Yoshiaki Tamura
2011-01-21 14:14                   ` Pierre Riteau
2011-01-21 14:21                     ` Yoshiaki Tamura
2011-01-21 14:23                       ` Pierre Riteau [this message]
2011-01-21 14:30                         ` Yoshiaki Tamura
2011-01-21 14:48                           ` Pierre Riteau
2011-01-21  9:16 ` Kevin Wolf
2011-01-21 11:38   ` Pierre Riteau
2011-01-21 11:45     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73163FA4-194C-483B-A20B-3AF6843C6BC7@irisa.fr \
    --to=pierre.riteau@irisa.fr \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tamura.yoshiaki@lab.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.