From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46497) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMCiu-0001YG-AM for qemu-devel@nongnu.org; Thu, 21 Jan 2016 05:44:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMCip-0005kF-7Y for qemu-devel@nongnu.org; Thu, 21 Jan 2016 05:44:16 -0500 Received: from mx2.parallels.com ([199.115.105.18]:39085) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMCio-0005k3-UV for qemu-devel@nongnu.org; Thu, 21 Jan 2016 05:44:11 -0500 Message-ID: <56A0B672.2050907@virtuozzo.com> Date: Thu, 21 Jan 2016 13:44:02 +0300 From: Vladimir Sementsov-Ogievskiy MIME-Version: 1.0 References: <1452517517-3953-1-git-send-email-vsementsov@virtuozzo.com> <20160119174849.GH4579@noname.redhat.com> <569F7EE2.6080402@virtuozzo.com> <569FFA90.9040401@redhat.com> <56A09559.7040805@virtuozzo.com> <20160121095335.GB4490@noname.redhat.com> In-Reply-To: <20160121095335.GB4490@noname.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: famz@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, stefanha@redhat.com, den@openvz.org, John Snow On 21.01.2016 12:53, Kevin Wolf wrote: > Am 21.01.2016 um 09:22 hat Vladimir Sementsov-Ogievskiy geschrieben: >> On 21.01.2016 00:22, John Snow wrote: >>> On 01/20/2016 07:34 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> On 19.01.2016 20:48, Kevin Wolf wrote: >>>>> Am 11.01.2016 um 14:05 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>>> The new feature for qcow2: storing bitmaps. >>>>>> >>>>>> This patch adds new header extension to qcow2 - Bitmaps Extension. It >>>>>> provides an ability to store virtual disk related bitmaps in a qcow2 >>>>>> image. For now there is only one type of such bitmaps: Dirty Tracking >>>>>> Bitmap, which just tracks virtual disk changes from some moment. >>>>>> >>>>>> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, >>>>>> should be stored in this qcow2 file. The size of each bitmap >>>>>> (considering its granularity) is equal to virtual disk size. >>>>>> >>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>>>>> --- >>>>>> >>>>>> v7: >>>>>> >>>>>> - Rewordings, grammar. >>>>>> Max, Eric, John, thank you very much. >>>>>> >>>>>> - add last paragraph: remaining bits in bitmap data clusters must be >>>>>> zero. >>>>>> >>>>>> - s/Bitmap Directory/bitmap directory/ and other names like this at >>>>>> the request of Max. >>>>>> >>>>>> v6: >>>>>> >>>>>> - reword bitmap_directory_size description >>>>>> - bitmap type: make 0 reserved >>>>>> - extra_data_size: resize to 4bytes >>>>>> Also, I've marked this field as "must be zero". We can always change >>>>>> it, if we decide allowing managing app to specify any extra data, by >>>>>> defining some magic value as a top of user extra data.. So, for now >>>>>> non zeor extra_data_size should be considered as an error. >>>>>> - swap name and extra_data to give good alignment to extra_data. >>>>>> >>>>>> >>>>>> v5: >>>>>> >>>>>> - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of >>>>>> bitmaps. >>>>>> - rewordings >>>>>> - move upper bounds to "Notes about Qemu limits" >>>>>> - s/should/must somewhere. (but not everywhere) >>>>>> - move name_size field closer to name itself in bitmap header >>>>>> - add extra data area to bitmap header >>>>>> - move bitmap data description to separate section >>>>>> >>>>>> docs/specs/qcow2.txt | 172 >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++- >>>>>> 1 file changed, 171 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt >>>>>> index 121dfc8..997239d 100644 >>>>>> --- a/docs/specs/qcow2.txt >>>>>> +++ b/docs/specs/qcow2.txt >>>>>> @@ -103,7 +103,18 @@ in the description of a field. >>>>>> write to an image with unknown auto-clear >>>>>> features if it >>>>>> clears the respective bits from this field first. >>>>>> - Bits 0-63: Reserved (set to 0) >>>>>> + Bit 0: Bitmaps extension bit >>>>>> + This bit indicates consistency for >>>>>> the bitmaps >>>>>> + extension data. >>>>>> + >>>>>> + It is an error if this bit is set >>>>>> without the >>>>>> + bitmaps extension present. >>>>>> + >>>>>> + If the bitmaps extension is present >>>>>> but this >>>>>> + bit is unset, the bitmaps extension >>>>>> data is >>>>>> + inconsistent. >>>>> It may as well be consistent, but we don't know. >>>>> >>>>> Perhaps something like "must be considered inconsistent" or "is >>>>> potentially inconsistent". >>>>> >>>>>> + >>>>>> + Bits 1-63: Reserved (set to 0) >>>>>> 96 - 99: refcount_order >>>>>> Describes the width of a reference count block >>>>>> entry (width >>>>>> @@ -123,6 +134,7 @@ be stored. Each extension has a structure like >>>>>> the following: >>>>>> 0x00000000 - End of the header extension area >>>>>> 0xE2792ACA - Backing file format name >>>>>> 0x6803f857 - Feature name table >>>>>> + 0x23852875 - Bitmaps extension >>>>>> other - Unknown header extension, can >>>>>> be safely >>>>>> ignored >>>>>> @@ -166,6 +178,34 @@ the header extension data. Each entry look >>>>>> like this: >>>>>> terminated if it has full length) >>>>>> +== Bitmaps extension == >>>>>> + >>>>>> +The bitmaps extension is an optional header extension. It provides >>>>>> the ability >>>>>> +to store bitmaps related to a virtual disk. For now, there is only >>>>>> one bitmap >>>>>> +type: the dirty tracking bitmap, which tracks virtual disk changes >>>>> >from some >>>>>> +point in time. >>>>> I have one major problem with this patch, and it starts here. >>>>> >>>>> The spec talks about dirty tracking bitmaps all the way, but it never >>>>> really defines what a dirty tracking bitmap even contains. It has a few >>>>> hints here and there, but they aren't consistent. >>>>> >>>>> Here's the first hint: They track "virtual disk changes", which implies >>>>> they track guest clusters rather than host clusters. >>>>> >>>>>> +The data of the extension should be considered consistent only if the >>>>>> +corresponding auto-clear feature bit is set, see autoclear_features >>>>>> above. >>>>>> + >>>>>> +The fields of the bitmaps extension are: >>>>>> + >>>>>> + 0 - 3: nb_bitmaps >>>>>> + The number of bitmaps contained in the image. >>>>>> Must be >>>>>> + greater than or equal to 1. >>>>>> + >>>>>> + Note: Qemu currently only supports up to 65535 >>>>>> bitmaps per >>>>>> + image. >>>>>> + >>>>>> + 4 - 7: bitmap_directory_size >>>>>> + Size of the bitmap directory in bytes. It is the >>>>>> cumulative >>>>>> + size of all (nb_bitmaps) bitmap headers. >>>>>> + >>>>>> + 8 - 15: bitmap_directory_offset >>>>>> + Offset into the image file at which the bitmap >>>>>> directory >>>>>> + starts. Must be aligned to a cluster boundary. >>>>>> + >>>>>> + >>>>>> == Host cluster management == >>>>>> qcow2 manages the allocation of host clusters by maintaining a >>>>>> reference count >>>>>> @@ -360,3 +400,133 @@ Snapshot table entry: >>>>>> variable: Padding to round up the snapshot table entry >>>>>> size to the >>>>>> next multiple of 8. >>>>>> + >>>>>> + >>>>>> +== Bitmaps == >>>>>> + >>>>>> +As mentioned above, the bitmaps extension provides the ability to >>>>>> store bitmaps >>>>>> +related a virtual disk. This section describes how these bitmaps are >>>>>> stored. >>>>> s/related/related to/ >>>>> >>>>>> +Note: all bitmaps are related to the virtual disk stored in this image. >>>>>> + >>>>>> +=== Bitmap directory === >>>>>> + >>>>>> +Each bitmap saved in the image is described in a bitmap directory >>>>>> entry. The >>>>>> +bitmap directory is a contiguous area in the image file, whose >>>>>> starting offset >>>>>> +and length are given by the header extension fields >>>>>> bitmap_directory_offset and >>>>>> +bitmap_directory_size. The entries of the bitmap directory have >>>>>> variable >>>>>> +length, depending on the length of the bitmap name and extra data. >>>>>> These >>>>>> +entries are also called bitmap headers. >>>>>> + >>>>>> +Structure of a bitmap directory entry: >>>>>> + >>>>>> + Byte 0 - 7: bitmap_table_offset >>>>>> + Offset into the image file at which the bitmap >>>>>> table >>>>>> + (described below) for the bitmap starts. Must be >>>>>> aligned to >>>>>> + a cluster boundary. >>>>>> + >>>>>> + 8 - 11: bitmap_table_size >>>>>> + Number of entries in the bitmap table of the >>>>>> bitmap. >>>>>> + >>>>>> + 12 - 15: flags >>>>>> + Bit >>>>>> + 0: in_use >>>>>> + The bitmap was not saved correctly and may be >>>>>> + inconsistent. >>>>>> + >>>>>> + 1: auto >>>>>> + The bitmap must reflect all changes of the >>>>>> virtual >>>>>> + disk by any application that would write to >>>>>> this qcow2 >>>>>> + file (including writes, snapshot switching, >>>>>> etc.). The >>>>>> + type of this bitmap must be 'dirty tracking >>>>>> bitmap'. >>>>> This suggests that we can represent snapshot switching in a dirty >>>>> tracking bitmap. Which is almost impossible if we track guest clusters, >>>>> except if loading a snapshot should mean that all bits in the bitmap are >>>>> set. However, that feels a bit useless and dirty tracking across >>>>> snapshots doesn't seem to make a whole lot of sense anyway. >>>>> >>>>> Maybe what would make more sense is that a bitmap is tied to a specific >>>>> snapshot or bitmaps are included in a snapshot, so that you can revert >>>>> to the dirty status as it was when the snapshot was taken. >>>>> >>>>> Or, if you really meant, that the tracking words on a host cluster >>>>> level, what clusters would be included in the bitmap? Would only the >>>>> virtual disk be included? VM state? Any metadata? >>>>> >>>>> It seems we definitely need a new section on dirty tracking bitmaps that >>>>> describes what the bitmap actually means and how it's supposed to work >>>>> with snapshots. I guess we could also talk about how it works with other >>>>> changes to the image like resizing. >>>> Bitmaps track guest clusters. >>>> Backup is not related to snapshots, so, I think set all bits in the >>>> dirty bitmap on snapshot switch is ok. Of course we can't just ignore >>>> snapshot switch, as it changes how user (and backup) sees the virtual >>>> disk. Deleting the bitmap on snapshot switch is not good option too I >>>> think. >>>> >>>> Bitmap size is defined to be equal to virtual disk size. So on resize, >>>> the one who resize must resize the bitmap too, to maintain accordance of >>>> the image file and the spec. >>>> >>>> I'll think about such section, ok. >>>> >>>>>> + Bits 2 - 31 are reserved and must be 0. >>>>>> + >>>>>> + 16: type >>>>>> + This field describes the sort of the bitmap. >>>>>> + Values: >>>>>> + 1: Dirty tracking bitmap >>>>>> + >>>>>> + Values 0, 2 - 255 are reserved. >>>>>> + >>>>>> + 17: granularity_bits >>>>>> + Granularity bits. Valid values: 0 - 63. >>>>>> + >>>>>> + Note: Qemu currently doesn't support >>>>>> granularity_bits >>>>>> + greater than 31. >>>>>> + >>>>>> + Granularity is calculated as >>>>>> + granularity = 1 << granularity_bits >>>>>> + >>>>>> + A bitmap's granularity is how many bytes of the >>>>>> image >>>>>> + accounts for one bit of the bitmap. >>>>>> + >>>>>> + 18 - 19: name_size >>>>>> + Size of the bitmap name. Must be non-zero. >>>>>> + >>>>>> + Note: Qemu currently doesn't support values >>>>>> greater than >>>>>> + 1023. >>>>>> + >>>>>> + 20 - 23: extra_data_size >>>>>> + Size of type-specific extra data. >>>>>> + >>>>>> + For now, as no extra data is defined, >>>>>> extra_data_size is >>>>>> + reserved and must be zero. >>>>>> + >>>>>> + variable: Type-specific extra data for the bitmap. >>>>> We talked about this in the other subthread. >>>>> >>>>>> + variable: The name of the bitmap (not null terminated). >>>>>> Must be >>>>>> + unique among all bitmap names within the bitmaps >>>>>> extension. >>>>>> + >>>>>> + variable: Padding to round up the bitmap directory entry >>>>>> size to the >>>>>> + next multiple of 8. >>>>>> + >>>>>> +=== Bitmap table === >>>>>> + >>>>>> +Bitmaps are stored using a one-level structure (as opposed to two-level >>>>>> +structure like for refcounts and guest clusters mapping) for the >>>>>> mapping of >>>>>> +bitmap data to host clusters. This structure is called the bitmap >>>>>> table. >>>>>> + >>>>>> +Each bitmap table has a variable size (stored in the bitmap >>>>>> directory Entry) >>>>>> +and may use multiple clusters, however, it must be contiguous in the >>>>>> image >>>>>> +file. >>>>>> + >>>>>> +Structure of a bitmap table entry: >>>>>> + >>>>>> + Bit 0: Reserved and must be zero if bits 9 - 55 are >>>>>> non-zero. >>>>>> + If bits 9 - 55 are zero: >>>>>> + 0: Cluster should be read as all zeros. >>>>>> + 1: Cluster should be read as all ones. >>>>>> + >>>>>> + 1 - 8: Reserved and must be zero. >>>>>> + >>>>>> + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be >>>>>> aligned to >>>>>> + a cluster boundary. If the offset is 0, the >>>>>> cluster is >>>>>> + unallocated; in that case, bit 0 determines how >>>>>> this >>>>>> + cluster should be treated when read from. >>>>>> + >>>>>> + 56 - 63: Reserved and must be zero. >>>>>> + >>>>>> +=== Bitmap data === >>>>>> + >>>>>> +As noted above, bitmap data is stored in separate clusters, >>>>>> described by the >>>>>> +bitmap table. Given an offset (in bytes) into the bitmap data, the >>>>>> offset into >>>>>> +the image file can be obtained as follows: >>>>>> + >>>>>> + image_offset = >>>>>> + bitmap_table[bitmap_data_offset / cluster_size] + >>>>>> + (bitmap_data_offset % cluster_size) >>>>>> + >>>>>> +This offset is not defined if bits 9 - 55 of bitmap table entry are >>>>>> zero (see >>>>>> +above). >>>>>> + >>>>>> +Given an offset byte_nr into the virtual disk and the bitmap's >>>>>> granularity, the >>>>>> +bit offset into the bitmap can be calculated like this: >>>>>> + >>>>>> + bit_offset = >>>>>> + image_offset(byte_nr / granularity / 8) * 8 + >>>>>> + (byte_nr / granularity) % 8 >>>>>> + >>>>>> +If the size of the bitmap data is not a multiply of cluster size >>>>>> then the last >>>>>> +cluster of the bitmap data contains some unused tail bits. These >>>>>> bits must be >>>>>> +zero. >>>>> In which order are the bits stored in the bitmap? >>>> What do you mean? >>> He means BE or LE. You can intuit it from the formula, but it's not >>> explicitly stated, I think >> byte ordering? But it make sense when we deal with integers, which >> occupies more then 1 byte. Here is plain byte sequence. Bitmap. Like >> a string.. What should I write here additionally? > I meant sub-byte ordering. Whether bit 0 is the LSB or the MSB. I assume > you intend to use the LSB for bit 0, but it's not written anywhere. > > Actually, that's an assumption that is made for bitmaps in the whole > spec (including e.g. feature flags), even though it is only explicitly > spelt out for the refcount blocks. I guess we should move it to the > 'General' section. Understand now. Then it should be separate patch. > > Kevin -- Best regards, Vladimir * now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.