On 06/13/2012 08:36 AM, Dong Xu Wang wrote:
> Introduce a new file format:add-cow. The usage can be found at this patch.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  docs/specs/add-cow.txt |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 87 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/add-cow.txt
> 
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..e077fc2
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,87 @@
> +== General ==
> +
> +Raw file format does not support backing_file and copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file.  Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +While using add-cow, procedures may like this:
> +(ubuntu.img is a disk image which has been installed OS.)
> +    1)  Create a raw image with the same size of ubuntu.img
> +            qemu-img create -f raw test.raw 8G

Make sure we also support a raw file larger than the backing file.

Does it matter whether the raw file starts life sparse?

> +    2)  Create an add-cow image which will store dirty bitmap
> +            qemu-img create -f add-cow test.add-cow \
> +                -o backing_file=ubuntu.img,image_file=test.raw
> +    3)  Run qemu with add-cow image
> +            qemu -drive if=virtio,file=test.add-cow

How does this interact with live snapshots/live disk mirroring?  Is this
something where I have to call 'block-stream' to pull data into the new
raw file on-demand?  I take it that test.add-cow is required until the
block-stream completes?  Is there a way, while qemu is still running,
but after the block-stream is complete, to reassociate the drive with
the actual raw file instead of the add-cow, or does the add-cow have to
remain around as long as the qemu process is still running?

> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + |     Header    |   Reserved  |    COW bitmap   |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.

Okay, but different than network byte order, which means we can't use
htonl() and friends to convert host bytes into the proper format.
<endian.h> is not (yet) standardized, although there is talk of adding
it to the next version of POSIX, in which case htole32() and friends
would be guaranteed.

> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +
> +    Byte    0 -  7:     magic
> +                        add-cow magic string ("ADD_COW\xff")
> +
> +            8 -  11:    version
> +                        Version number (only valid value is 1 now)
> +
> +            12 - 15:    backing_filename_offset
> +                        Offset in the add-cow file at which the backing file name
> +                        is stored (NB: The string is not null terminated). 0 if the
> +                        image doesn't have a backing file.

Mention that if this is not 0, then it must be between 36 and 4094 (a
file name must be at least 1 byte).  What are the semantics if the
filename is relative?

> +
> +            16 - 19:    backing_filename_size
> +                        Length of the backing file name in bytes. Undefined if the
> +                        image doesn't have a backing file.

Better to require 0 if backing_filename_offset is 0, than to leave this
field undefined; also if backing_filename_offset is non-zero, then this
must be non-zero.  Must be less than 4096-36 to fit in the reserved part
of the header.

> +
> +            20 - 23:    image_filename_offset
> +                        Offset in the add-cow file at which the image_file name
> +                        is stored (NB: The string is not null terminated).

Mention that this must be between 36 and 4094 (a file name must be at
least 1 byte). What are the semantics if the filename is relative?

> +
> +            24 - 27:    image_filename_size
> +                        Length of the image_file name in bytes.

If backing_filename_offset is non-zero, then this must be non-zero.
Must be less than 4096-36 to fit in the reserved part of the header.

May image_filename and backing_filename overlap (possible if one is a
suffix of the other)?  Are there any constraints to prevent infinite
loops, such as forbidding backing_filename and image_filename from
resolving either to the same file or to the add-cow file?

> +
> +            28 - 35:    features
> +                        Currently only 2 feature bits are used:
> +                        Feature bits:
> +                            The image uses a backing file:
> +                            * ADD_COW_F_BACKING_FILE = 0x01.
> +                            The backing file's format is raw:
> +                            * ADD_COW_F_BACKING_FORMAT_NO_PROBE = 0x02.

Should this follow the qcow2v3 proposal of splitting into mandatory vs.
optional feature bits?

I agree that ADD_COW_F_BACKING_FORMAT_NO_PROBE is sufficient to avoid
security implications, but do we want the extra flexibility of
specifying the backing format file format rather than just requiring
probes on all but raw?

> +
> +== Reserved ==
> +
> +    Byte    36 - 4095:  Reserved field:
> +                        It is used to make sure COW bitmap field starts at the
> +                        4096th byte, backing_file name and image_file name will
> +                        be stored here.

Do we want to keep a fixed-size header, or should we be planning on the
possibility of future extensions requiring enough other header
extensions that a variable-sized header would be wiser?  That is, I'm
fine with requiring that the header be a multiple of 4k, but maybe it
would make sense to have a mandatory header field that states how many
header pages are present before the COW bitmap begins.  In the first
round of implementation, this header field can be required to be 1 (that
is, for now, we require exactly 4k header), but having the field would
let us change in the future to a design with an 8k header to hold more
metadata as needed.

> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at the 4096th byte, stores a bitmap related to
> +backing_file and image_file. The bitmap will track whether the sector in
> +backing_file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes, So the size of bitmap is
> +calculated according to virtual size of image_file. In each byte, bit 0 to 7
> +will track the 1st to 7th cluster in sequence, bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image_file, data
> +should be loaded from backing_file while reading; if the bit is 1,  indicates the
> +related sector has been dirty, should be loaded from image_file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.

So basically an add-cow image is thin as long as at least one bit is 0,
and the add-cow wrapper can only be discarded when all bits are 1.

How do you handle the case where the raw image is not an even multiple
of cluster bytes?  That is, do bits that correspond to bytes beyond the
raw file size have to be in a certain state?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org