Re: [Qemu-devel] [PATCH 1/6 v10] docs: spec for add-cow file format

From: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
To: Eric Blake <eblake@redhat.com>
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/6 v10] docs: spec for add-cow file format
Date: Thu, 14 Jun 2012 11:06:40 +0800	[thread overview]
Message-ID: <CAGrFBsjdXYTTdda4JGOFEfpEZ_E3qcs0DEWCv_pPnTYX_4CE7g@mail.gmail.com> (raw)
In-Reply-To: <4FD8AD4C.6020708@redhat.com>

On Wed, Jun 13, 2012 at 11:10 PM, Eric Blake <eblake@redhat.com> wrote:
> On 06/13/2012 08:36 AM, Dong Xu Wang wrote:
>> Introduce a new file format:add-cow. The usage can be found at this patch.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  docs/specs/add-cow.txt |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 87 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..e077fc2
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,87 @@
>> +== General ==
>> +
>> +Raw file format does not support backing_file and copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file.  Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +While using add-cow, procedures may like this:
>> +(ubuntu.img is a disk image which has been installed OS.)
>> +    1)  Create a raw image with the same size of ubuntu.img
>> +            qemu-img create -f raw test.raw 8G
>
> Make sure we also support a raw file larger than the backing file.
>

Okay, in this version, I just truncate raw file with the same size of
backing_file,
I will correct this in v11.

> Does it matter whether the raw file starts life sparse?

No, it can work with sparse raw file.

>
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>
> How does this interact with live snapshots/live disk mirroring?  Is this
> something where I have to call 'block-stream' to pull data into the new
> raw file on-demand?  I take it that test.add-cow is required until the
> block-stream completes?  Is there a way, while qemu is still running,
> but after the block-stream is complete, to reassociate the drive with
> the actual raw file instead of the add-cow, or does the add-cow have to
> remain around as long as the qemu process is still running?
>

I did not touch snapshots/live disk mirroring code much, so have to call
"block_stream".

Now, add-cow has to remain until qemu process is still running, I did not
implement a way to use raw file directly after the block-stream is complete.

>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + |     Header    |   Reserved  |    COW bitmap   |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>
> Okay, but different than network byte order, which means we can't use
> htonl() and friends to convert host bytes into the proper format.
> <endian.h> is not (yet) standardized, although there is talk of adding
> it to the next version of POSIX, in which case htole32() and friends
> would be guaranteed.
>
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +
>> +    Byte    0 -  7:     magic
>> +                        add-cow magic string ("ADD_COW\xff")
>> +
>> +            8 -  11:    version
>> +                        Version number (only valid value is 1 now)
>> +
>> +            12 - 15:    backing_filename_offset
>> +                        Offset in the add-cow file at which the backing file name
>> +                        is stored (NB: The string is not null terminated). 0 if the
>> +                        image doesn't have a backing file.
>
> Mention that if this is not 0, then it must be between 36 and 4094 (a
> file name must be at least 1 byte).  What are the semantics if the
> filename is relative?

relative filename is ok, I tested it just now.

>
>> +
>> +            16 - 19:    backing_filename_size
>> +                        Length of the backing file name in bytes. Undefined if the
>> +                        image doesn't have a backing file.
>
> Better to require 0 if backing_filename_offset is 0, than to leave this
> field undefined; also if backing_filename_offset is non-zero, then this
> must be non-zero.  Must be less than 4096-36 to fit in the reserved part
> of the header.
>

Okay.

>> +
>> +            20 - 23:    image_filename_offset
>> +                        Offset in the add-cow file at which the image_file name
>> +                        is stored (NB: The string is not null terminated).
>
> Mention that this must be between 36 and 4094 (a file name must be at
> least 1 byte). What are the semantics if the filename is relative?

relative filename is ok, I tested it just now.

>
>> +
>> +            24 - 27:    image_filename_size
>> +                        Length of the image_file name in bytes.
>
> If backing_filename_offset is non-zero, then this must be non-zero.
> Must be less than 4096-36 to fit in the reserved part of the header.
>

Yes,

> May image_filename and backing_filename overlap (possible if one is a
> suffix of the other)?  Are there any constraints to prevent infinite
> loops, such as forbidding backing_filename and image_filename from
> resolving either to the same file or to the add-cow file?
>

Sorry, I should add the code that judge if image_file is valid. will fix.

>> +
>> +            28 - 35:    features
>> +                        Currently only 2 feature bits are used:
>> +                        Feature bits:
>> +                            The image uses a backing file:
>> +                            * ADD_COW_F_BACKING_FILE = 0x01.
>> +                            The backing file's format is raw:
>> +                            * ADD_COW_F_BACKING_FORMAT_NO_PROBE = 0x02.
>
> Should this follow the qcow2v3 proposal of splitting into mandatory vs.
> optional feature bits?
>
> I agree that ADD_COW_F_BACKING_FORMAT_NO_PROBE is sufficient to avoid
> security implications, but do we want the extra flexibility of
> specifying the backing format file format rather than just requiring
> probes on all but raw?

Kevin, or Stefan, can you give some comments for this? thanks.

>
>> +
>> +== Reserved ==
>> +
>> +    Byte    36 - 4095:  Reserved field:
>> +                        It is used to make sure COW bitmap field starts at the
>> +                        4096th byte, backing_file name and image_file name will
>> +                        be stored here.
>
> Do we want to keep a fixed-size header, or should we be planning on the
> possibility of future extensions requiring enough other header
> extensions that a variable-sized header would be wiser?  That is, I'm
> fine with requiring that the header be a multiple of 4k, but maybe it
> would make sense to have a mandatory header field that states how many
> header pages are present before the COW bitmap begins.  In the first
> round of implementation, this header field can be required to be 1 (that
> is, for now, we require exactly 4k header), but having the field would
> let us change in the future to a design with an 8k header to hold more
> metadata as needed.
>

Okay.

>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at the 4096th byte, stores a bitmap related to
>> +backing_file and image_file. The bitmap will track whether the sector in
>> +backing_file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes, So the size of bitmap is
>> +calculated according to virtual size of image_file. In each byte, bit 0 to 7
>> +will track the 1st to 7th cluster in sequence, bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image_file, data
>> +should be loaded from backing_file while reading; if the bit is 1,  indicates the
>> +related sector has been dirty, should be loaded from image_file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>
> So basically an add-cow image is thin as long as at least one bit is 0,
> and the add-cow wrapper can only be discarded when all bits are 1.
>
> How do you handle the case where the raw image is not an even multiple
> of cluster bytes?  That is, do bits that correspond to bytes beyond the
> raw file size have to be in a certain state?
>

Now, bits correspond to bytes beyond the raw file size are set to 0.

Really thanks for your reviewing, Eric.

> --
> Eric Blake   eblake@redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>