All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V18 0/6] add-cow file format
@ 2013-04-10  8:11 Dong Xu Wang
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 1/6] docs: document for " Dong Xu Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

It will introduce a new file format: add-cow.

The add-cow file format makes it possible to perform copy-on-write on top of
a raw disk image.  When we know that no backing file clusters remain visible
(e.g. we have streamed the entire image and copied all data from the backing
file), then it is possible to discard the add-cow file and use the raw image
file directly.

This feature adds the copy-on-write feature to raw files (which cannot support
it natively) while allowing us to get full performance again later when we no
longer need copy-on-write.

add-cow can benefit from other available functions, such as path_has_protocol
and qed_read_string, so we will make them public.

snapshot_blkdev are not supported now for add-cow. Will add it in futher patches.

These patches are using QemuOpts parser, former patches could be found here:
http://patchwork.ozlabs.org/patch/235300/


v17 -> v18:
1) remove version field.
2) header size is maximum value and cluster size value.
3) fix type.
4) move struct to source file.
5) cluster_size->table_size.
6) use error_report, not fprintf.
7) remove version field from header.
8) header_size is MAX(cluster_size, 4096).
9) introduce s->cluster_sectors.
10) use BLKDBG_L2_LOAD/UPDATE.
11) add 037 and 038 tests.

v16->v17):
1) Use stringify.

v15->v16):
1) Rebased on QEMU upstream source tree.
2) Judge if opts is null in add_cow_create function.

v14->v15:
1) Fix typo and make some sentences more readable in docs.
2) Introduce STRINGIZER macro.

v13->v14:
1) Make some sentences more clear in docs.
2) Make MAGIC from 8 bytes to 4 bytes.
3) Fix some bugs.

v12->v13:
1) Use QemuOpts, not QEMUOptionParameter
2) cluster_size configuable
3) Refactor block-cache.c
4) Correct qemu-iotests script.
5) Other bug fix.

v11->v12:
1) Removed un-used feature bit.
2) Share cache code with qcow2.c.
3) Remove snapshot_blkdev support, will add it in another patch.
5) COW Bitmap field in add-cow file will be multiple of 65536.
6) fix grammer and typo.

Dong Xu Wang (6):
  docs: document for add-cow file format
  make path_has_protocol non static
  qed_read_string to bdrv_read_string
  rename qcow2-cache.c to block-cache.c
  add-cow file format core code.
  qemu-iotests: add add-cow iotests support

 block.c                      |  29 +-
 block/Makefile.objs          |   4 +-
 block/add-cow.c              | 741 +++++++++++++++++++++++++++++++++++++++++++
 block/block-cache.c          | 339 ++++++++++++++++++++
 block/qcow2-cache.c          | 323 -------------------
 block/qcow2-cluster.c        |  52 +--
 block/qcow2-refcount.c       |  62 ++--
 block/qcow2.c                |  21 +-
 block/qcow2.h                |  25 +-
 block/qed.c                  |  34 +-
 docs/specs/add-cow.txt       | 165 ++++++++++
 include/block/block-cache.h  |  60 ++++
 include/block/block.h        |   3 +
 include/block/block_int.h    |   2 +
 tests/qemu-iotests/017       |   2 +-
 tests/qemu-iotests/020       |   2 +-
 tests/qemu-iotests/037       |   2 +-
 tests/qemu-iotests/038       |   2 +-
 tests/qemu-iotests/common    |   6 +
 tests/qemu-iotests/common.rc |  15 +-
 trace-events                 |  13 +-
 21 files changed, 1452 insertions(+), 450 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 block/block-cache.c
 delete mode 100644 block/qcow2-cache.c
 create mode 100644 docs/specs/add-cow.txt
 create mode 100644 include/block/block-cache.h

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V18 1/6] docs: document for add-cow file format
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
@ 2013-04-10  8:11 ` Dong Xu Wang
  2013-04-18  8:30   ` Stefan Hajnoczi
  2013-04-26 22:45   ` Eric Blake
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 2/6] make path_has_protocol non static Dong Xu Wang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

Document for add-cow format, the usage and spec of add-cow are
introduced.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
V17->V18:
1) remove version field.
2) header size is maximum value and cluster size value.
3) fix type.
 docs/specs/add-cow.txt | 165 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)
 create mode 100644 docs/specs/add-cow.txt

diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
new file mode 100644
index 0000000..151028b
--- /dev/null
+++ b/docs/specs/add-cow.txt
@@ -0,0 +1,165 @@
+== General ==
+
+The raw file format does not support backing files or copy on write
+feature. The add-cow image format makes it possible to use backing
+files with a image by keeping a separate .add-cow metadata file.
+Once all clusters have been written into the image it is safe to
+discard the .add-cow and backing files, then we can use the image
+directly.
+
+An example usage of add-cow would look like:
+(ubuntu.img is a disk image which has an installed OS.)
+    1)  Create a image, such as raw format, with the same size of
+        ubuntu.img:
+            qemu-img create -f raw test.raw 8G
+    2)  Create an add-cow image which will store dirty bitmap
+            qemu-img create -f add-cow test.add-cow \
+                -o backing_file=ubuntu.img,image_file=test.raw
+    3)  Run qemu with add-cow image
+            qemu -drive if=virtio,file=test.add-cow
+
+test.raw may be larger than ubuntu.img, in that case, the size of
+test.add-cow will be calculated from the size of test.raw.
+
+image_fmt can be omitted, in that case image_fmt is assumed to be
+"raw". backing_fmt can also be omitted, add-cow should do a probe
+operation and determine what the backing file's format is.
+
+=Specification=
+
+The file format looks like this:
+
+ +---------------+-------------------------------+
+ |     Header    |           COW bitmap          |
+ +---------------+-------------------------------+
+
+All numbers in add-cow are stored in Little Endian byte order.
+
+== Header ==
+
+The Header is included in the first bytes:
+(HEADER_SIZE is defined in 40-43 bytes.)
+    Byte    0  -  3:    magic
+                        add-cow magic string ("ACOW").
+
+            4  -  7:    backing file name offset
+                        Offset in the add-cow file at which the backing
+                        file name is stored (NB: The string is not
+                        lNUL-terminated).
+                        If backing file name does NOT exist, this field
+                        will be 0. Must be between 76 and [HEADER_SIZE
+                        - 2](a file name must be at least 1 byte).
+
+            8  - 11:    backing file name size
+                        Length of the backing file name in bytes. It
+                        will be 0 if the backing file name offset is
+                        0. If backing file name offset is non-zero,
+                        then it must be non-zero. Must be less than
+                        [HEADER_SIZE - 76] to fit in the reserved
+                        part of the header. Backing file name offset
+                        + size must be no more than HEADER_SIZE.
+
+            12 - 15:    image file name offset
+                        Offset in the add-cow file at which the image
+                        file name is stored (NB: The string is not
+                        NUL-terminated). It must be between 76 and
+                        [HEADER_SIZE - 2]. Image file name size + offset
+                        must be no more than HEADER_SIZE.
+
+            16 - 19:    image file name size
+                        Length of the image file name in bytes.
+                        Must be less than [HEADER_SIZE - 76] to fit in
+                        the reserved part of the header.
+
+            20 - 23:    cluster bits
+                        Number of bits that are used for addressing an
+                        offset within a cluster (1 << cluster_bits is
+                        the cluster size). Must not be less than 9
+                        (i.e. 512 byte clusters).
+
+                        Note: qemu as of today has an implementation
+                        limit of 2 MB as the maximum cluster size and
+                        won't be able to open images with larger cluster
+                        sizes.
+
+            24 - 31:    features
+                        Bitmask of features. If a feature bit is set
+                        but not recognized, the add-cow file should be
+                        dropped. They are not used in now.
+
+                        Bits 0-63:  Reserved (set to 0)
+
+            32 - 39:    compatible features
+                        Bitmask of compatible features. An implementation
+                        can safely ignore any unknown bits that are set.
+                        Bit 0:      All allocated bit.  If this bit is
+                                    set then backing file and COW bitmap
+                                    will not be used, and can read from
+                                    or write to image file directly.
+
+                        Bits 1-63:  Reserved (set to 0)
+
+            40 - 43:    HEADER_SIZE
+                        The header field is variable-sized. This field
+                        indicates how many bytes will be used to store
+                        add-cow header. By default, it is maximum value
+                        of 4096 and cluster size value.
+
+            44 - 59:    backing file format
+                        Format of backing file. It will be filled with
+                        0 if backing file name offset is 0. If backing
+                        file name offset is non-empty, it must be
+                        non-empty. It is coded in free-form ASCII, and
+                        is not NUL-terminated. Zero padded on the right.
+
+            60 - 75:    image file format
+                        Format of image file. It must be non-empty. It
+                        is coded in free-form ASCII, and is not
+                        NUL-terminated. Zero padded on the right.
+
+            76 - [HEADER_SIZE - 1]:
+                        It is used to make sure COW bitmap field starts
+                        at the HEADER_SIZE byte, backing file name and
+                        image file name will be stored here. The bytes
+                        that are not pointing to backing file and image
+                        file names must be set to 0.
+
+== COW bitmap ==
+
+The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap
+related to backing file and image file.  It is tracking whether the
+cluster in image file is allocated or not.
+
+Each bit in the bitmap tracks one cluster's status. For example, if
+cluster bit is 16, then each bit tracks one cluster, (1 << 16) = 65536
+bytes. The image file size is rounded up to cluster size (where any
+bytes in the last cluster that do not fit in the image are ignored),
+then if the number of clusters is not a multiple of 8, then remaining
+bits in the bitmap will be set to 0.
+
+The size of bitmap is calculated according to virtual size of image
+file, and the size of bitmap should be multiple of add-cow file's
+cluster size, the bits not used will be set to 0. Within each byte,
+the least significant bit covers the first cluster. Bit orders in one
+byte look like:
+ +----+----+----+----+----+----+----+----+
+ | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
+ +----+----+----+----+----+----+----+----+
+
+If the bit is 0, it indicates the cluster has not been allocated in
+image file, data should be loaded from backing file while reading; if
+the bit is 1, it indicates the related cluster has been dirty, should
+be loaded from image file while reading. Writing to a cluster causes
+the corresponding bit to be set to 1. If there is no backing file, or
+if the image file is larger than the backing file and the offset is
+beyond the end of the backing file, then the data should be read as
+all zero bytes instead.
+
+If image file is not an even multiple of cluster bytes, bits that
+correspond to bytes beyond the image file size in add-cow must be written
+as 0 and must be ignored when reading.
+
+Image file name and backing file name must NOT be the same, we prevent
+this while creating add-cow files via qemu-img. If image file name and
+backing file name are the same, the add-cow image must be treated as
+invalid.
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V18 2/6] make path_has_protocol non static
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 1/6] docs: document for " Dong Xu Wang
@ 2013-04-10  8:11 ` Dong Xu Wang
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

We will use path_has_protocol outside block.c, so just make it public.

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.c               | 2 +-
 include/block/block.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 45e1d96..1e7dab5 100644
--- a/block.c
+++ b/block.c
@@ -193,7 +193,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with "<protocol>:" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
     const char *p;
 
diff --git a/include/block/block.h b/include/block/block.h
index 6e73277..7d34aa2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -343,6 +343,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
                   const char *base_path,
                   const char *filename);
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V18 3/6] qed_read_string to bdrv_read_string
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 1/6] docs: document for " Dong Xu Wang
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 2/6] make path_has_protocol non static Dong Xu Wang
@ 2013-04-10  8:11 ` Dong Xu Wang
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

Make qed_read_string function to a common interface, so move it to
block.c.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.c               | 27 +++++++++++++++++++++++++++
 block/qed.c           | 34 ++++------------------------------
 include/block/block.h |  2 ++
 3 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/block.c b/block.c
index 1e7dab5..6aad45b 100644
--- a/block.c
+++ b/block.c
@@ -210,6 +210,33 @@ int path_has_protocol(const char *path)
     return *p == ':';
 }
 
+/**
+ * Read a string of known length from the image file
+ *
+ * @bs:         Image file
+ * @offset:     File offset to start of string, in bytes
+ * @n:          String length in bytes
+ * @buf:        Destination buffer
+ * @buflen:     Destination buffer length in bytes
+ * @ret:        0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+                     char *buf, size_t buflen)
+{
+    int ret;
+    if (n >= buflen) {
+        return -EINVAL;
+    }
+    ret = bdrv_pread(bs, offset, buf, n);
+    if (ret < 0) {
+        return ret;
+    }
+    buf[n] = '\0';
+    return 0;
+}
+
 int path_is_absolute(const char *path)
 {
 #ifdef _WIN32
diff --git a/block/qed.c b/block/qed.c
index acffc2d..d647792 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
 }
 
 /**
- * Read a string of known length from the image file
- *
- * @file:       Image file
- * @offset:     File offset to start of string, in bytes
- * @n:          String length in bytes
- * @buf:        Destination buffer
- * @buflen:     Destination buffer length in bytes
- * @ret:        0 on success, -errno on failure
- *
- * The string is NUL-terminated.
- */
-static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
-                           char *buf, size_t buflen)
-{
-    int ret;
-    if (n >= buflen) {
-        return -EINVAL;
-    }
-    ret = bdrv_pread(file, offset, buf, n);
-    if (ret < 0) {
-        return ret;
-    }
-    buf[n] = '\0';
-    return 0;
-}
-
-/**
  * Allocate new clusters
  *
  * @s:          QED state
@@ -437,9 +410,10 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags)
             return -EINVAL;
         }
 
-        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
-                              s->header.backing_filename_size, bs->backing_file,
-                              sizeof(bs->backing_file));
+        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
+                               s->header.backing_filename_size,
+                               bs->backing_file,
+                               sizeof(bs->backing_file));
         if (ret < 0) {
             return ret;
         }
diff --git a/include/block/block.h b/include/block/block.h
index 7d34aa2..e737c9c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -173,6 +173,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
     const void *buf, int count);
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+                     char *buf, size_t buflen);
 int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
                   ` (2 preceding siblings ...)
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2013-04-10  8:11 ` Dong Xu Wang
  2013-04-18  8:54   ` Stefan Hajnoczi
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 5/6] add-cow file format core code Dong Xu Wang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

We will re-use qcow2-cache as block layer common cache code,
so change its name and made some changes, define a struct named
BlockTableType, pass BlockTableType and table size parameters to
block cache initialization function.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
v17-v18:
1) move struct to source file.
2) cluster_size->table_size.
 block/Makefile.objs         |   3 +-
 block/block-cache.c         | 339 ++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-cache.c         | 323 -----------------------------------------
 block/qcow2-cluster.c       |  52 +++----
 block/qcow2-refcount.c      |  62 ++++----
 block/qcow2.c               |  21 +--
 block/qcow2.h               |  23 +--
 include/block/block-cache.h |  60 ++++++++
 trace-events                |  13 +-
 9 files changed, 483 insertions(+), 413 deletions(-)
 create mode 100644 block/block-cache.c
 delete mode 100644 block/qcow2-cache.c
 create mode 100644 include/block/block-cache.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..4fe81dc 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-obj-y += block-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
diff --git a/block/block-cache.c b/block/block-cache.c
new file mode 100644
index 0000000..3544691
--- /dev/null
+++ b/block/block-cache.c
@@ -0,0 +1,339 @@
+/*
+ * QEMU Block Layer Cache
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This file is based on qcow2-cache.c, see its copyrights below:
+ *
+ * L2/refcount table cache for the QCOW2 format
+ *
+ * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block/block_int.h"
+#include "qemu-common.h"
+#include "trace.h"
+#include "block/block-cache.h"
+
+typedef struct BlockCachedTable {
+    void    *table;
+    int64_t offset;
+    bool    dirty;
+    int     cache_hits;
+    int     ref;
+} BlockCachedTable;
+
+struct BlockCache {
+    BlockCachedTable    *entries;
+    struct BlockCache   *depends;
+    int                 size;
+    size_t              table_size;
+    BlockTableType      table_type;
+    bool                depends_on_flush;
+};
+
+BlockCache *block_cache_create(BlockDriverState *bs, int num_tables,
+                               size_t table_size, BlockTableType type)
+{
+    BlockCache *c;
+    int i;
+
+    c = g_malloc0(sizeof(*c));
+    c->size = num_tables;
+    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
+    c->table_type = type;
+    c->table_size = table_size;
+
+    for (i = 0; i < c->size; i++) {
+        c->entries[i].table = qemu_blockalign(bs, table_size);
+    }
+
+    return c;
+}
+
+int block_cache_destroy(BlockDriverState *bs, BlockCache *c)
+{
+    int i;
+
+    for (i = 0; i < c->size; i++) {
+        assert(c->entries[i].ref == 0);
+        qemu_vfree(c->entries[i].table);
+    }
+
+    g_free(c->entries);
+    g_free(c);
+
+    return 0;
+}
+
+static int block_cache_flush_dependency(BlockDriverState *bs, BlockCache *c)
+{
+    int ret;
+
+    ret = block_cache_flush(bs, c->depends);
+    if (ret < 0) {
+        return ret;
+    }
+
+    c->depends = NULL;
+    c->depends_on_flush = false;
+
+    return 0;
+}
+
+static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
+{
+    int ret = 0;
+
+    if (!c->entries[i].dirty || !c->entries[i].offset) {
+        return 0;
+    }
+
+    trace_block_cache_entry_flush(qemu_coroutine_self(), c->table_type, i);
+
+    if (c->depends) {
+        ret = block_cache_flush_dependency(bs, c);
+    } else if (c->depends_on_flush) {
+        ret = bdrv_flush(bs->file);
+        if (ret >= 0) {
+            c->depends_on_flush = false;
+        }
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (c->table_type == BLOCK_TABLE_REF) {
+        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
+    } else if (c->table_type == BLOCK_TABLE_L2) {
+        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
+    } else if (c->table_type == BLOCK_TABLE_BITMAP) {
+        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
+    }
+
+    ret = bdrv_pwrite(bs->file, c->entries[i].offset,
+                      c->entries[i].table, c->table_size);
+    if (ret < 0) {
+        return ret;
+    }
+
+    c->entries[i].dirty = false;
+
+    return 0;
+}
+
+int block_cache_flush(BlockDriverState *bs, BlockCache *c)
+{
+    int result = 0;
+    int ret;
+    int i;
+
+    trace_block_cache_flush(qemu_coroutine_self(), c->table_type);
+
+    for (i = 0; i < c->size; i++) {
+        ret = block_cache_entry_flush(bs, c, i);
+        if (ret < 0 && result != -ENOSPC) {
+            result = ret;
+        }
+    }
+
+    if (result == 0) {
+        ret = bdrv_flush(bs->file);
+        if (ret < 0) {
+            result = ret;
+        }
+    }
+
+    return result;
+}
+
+int block_cache_set_dependency(BlockDriverState *bs,
+                               BlockCache *c,
+                               BlockCache *dependency)
+{
+    int ret;
+
+    if (dependency->depends) {
+        ret = block_cache_flush_dependency(bs, dependency);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (c->depends && (c->depends != dependency)) {
+        ret = block_cache_flush_dependency(bs, c);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    c->depends = dependency;
+    return 0;
+}
+
+void block_cache_depends_on_flush(BlockCache *c)
+{
+    c->depends_on_flush = true;
+}
+
+static int block_cache_find_entry_to_replace(BlockCache *c)
+{
+    int i;
+    int min_count = INT_MAX;
+    int min_index = -1;
+
+
+    for (i = 0; i < c->size; i++) {
+        if (c->entries[i].ref) {
+            continue;
+        }
+
+        if (c->entries[i].cache_hits < min_count) {
+            min_index = i;
+            min_count = c->entries[i].cache_hits;
+        }
+
+        /* Give newer hits priority */
+        /* TODO Check how to optimize the replacement strategy */
+        c->entries[i].cache_hits /= 2;
+    }
+
+    if (min_index == -1) {
+        /* This can't happen in current synchronous code, but leave the check
+         * here as a reminder for whoever starts using AIO with the cache */
+        abort();
+    }
+    return min_index;
+}
+
+static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
+                              uint64_t offset, void **table,
+                              bool read_from_disk)
+{
+    int i;
+    int ret;
+
+    trace_block_cache_get(qemu_coroutine_self(), c->table_type,
+                          offset, read_from_disk);
+
+    /* Check if the table is already cached */
+    for (i = 0; i < c->size; i++) {
+        if (c->entries[i].offset == offset) {
+            goto found;
+        }
+    }
+
+    /* If not, write a table back and replace it */
+    i = block_cache_find_entry_to_replace(c);
+    trace_block_cache_get_replace_entry(qemu_coroutine_self(),
+                                        c->table_type, i);
+    if (i < 0) {
+        return i;
+    }
+
+    ret = block_cache_entry_flush(bs, c, i);
+    if (ret < 0) {
+        return ret;
+    }
+
+    trace_block_cache_get_read(qemu_coroutine_self(),
+                               c->table_type, i);
+    c->entries[i].offset = 0;
+    if (read_from_disk) {
+        if (c->table_type == BLOCK_TABLE_L2) {
+            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
+        } else if (c->table_type == BLOCK_TABLE_BITMAP) {
+            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
+        }
+
+        ret = bdrv_pread(bs->file, offset, c->entries[i].table,
+                         c->table_size);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    /* Give the table some hits for the start so that it won't be replaced
+     * immediately. The number 32 is completely arbitrary. */
+    c->entries[i].cache_hits = 32;
+    c->entries[i].offset = offset;
+
+    /* And return the right table */
+found:
+    c->entries[i].cache_hits++;
+    c->entries[i].ref++;
+    *table = c->entries[i].table;
+
+    trace_block_cache_get_done(qemu_coroutine_self(),
+                               c->table_type, i);
+
+    return 0;
+}
+
+int block_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+                    void **table)
+{
+    return block_cache_do_get(bs, c, offset, table, true);
+}
+
+int block_cache_get_empty(BlockDriverState *bs, BlockCache *c,
+                          uint64_t offset, void **table)
+{
+    return block_cache_do_get(bs, c, offset, table, false);
+}
+
+int block_cache_put(BlockDriverState *bs, BlockCache *c, void **table)
+{
+    int i;
+
+    for (i = 0; i < c->size; i++) {
+        if (c->entries[i].table == *table) {
+            goto found;
+        }
+    }
+    return -ENOENT;
+
+found:
+    c->entries[i].ref--;
+    *table = NULL;
+
+    assert(c->entries[i].ref >= 0);
+    return 0;
+}
+
+void block_cache_entry_mark_dirty(BlockCache *c, void *table)
+{
+    int i;
+
+    for (i = 0; i < c->size; i++) {
+        if (c->entries[i].table == table) {
+            goto found;
+        }
+    }
+    abort();
+
+found:
+    c->entries[i].dirty = true;
+}
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
deleted file mode 100644
index 2f3114e..0000000
--- a/block/qcow2-cache.c
+++ /dev/null
@@ -1,323 +0,0 @@
-/*
- * L2/refcount table cache for the QCOW2 format
- *
- * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "block/block_int.h"
-#include "qemu-common.h"
-#include "qcow2.h"
-#include "trace.h"
-
-typedef struct Qcow2CachedTable {
-    void*   table;
-    int64_t offset;
-    bool    dirty;
-    int     cache_hits;
-    int     ref;
-} Qcow2CachedTable;
-
-struct Qcow2Cache {
-    Qcow2CachedTable*       entries;
-    struct Qcow2Cache*      depends;
-    int                     size;
-    bool                    depends_on_flush;
-};
-
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
-{
-    BDRVQcowState *s = bs->opaque;
-    Qcow2Cache *c;
-    int i;
-
-    c = g_malloc0(sizeof(*c));
-    c->size = num_tables;
-    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
-
-    for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
-    }
-
-    return c;
-}
-
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        assert(c->entries[i].ref == 0);
-        qemu_vfree(c->entries[i].table);
-    }
-
-    g_free(c->entries);
-    g_free(c);
-
-    return 0;
-}
-
-static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
-{
-    int ret;
-
-    ret = qcow2_cache_flush(bs, c->depends);
-    if (ret < 0) {
-        return ret;
-    }
-
-    c->depends = NULL;
-    c->depends_on_flush = false;
-
-    return 0;
-}
-
-static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
-{
-    BDRVQcowState *s = bs->opaque;
-    int ret = 0;
-
-    if (!c->entries[i].dirty || !c->entries[i].offset) {
-        return 0;
-    }
-
-    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
-                                  c == s->l2_table_cache, i);
-
-    if (c->depends) {
-        ret = qcow2_cache_flush_dependency(bs, c);
-    } else if (c->depends_on_flush) {
-        ret = bdrv_flush(bs->file);
-        if (ret >= 0) {
-            c->depends_on_flush = false;
-        }
-    }
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    if (c == s->refcount_block_cache) {
-        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
-    } else if (c == s->l2_table_cache) {
-        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
-    }
-
-    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
-    if (ret < 0) {
-        return ret;
-    }
-
-    c->entries[i].dirty = false;
-
-    return 0;
-}
-
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
-{
-    BDRVQcowState *s = bs->opaque;
-    int result = 0;
-    int ret;
-    int i;
-
-    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
-
-    for (i = 0; i < c->size; i++) {
-        ret = qcow2_cache_entry_flush(bs, c, i);
-        if (ret < 0 && result != -ENOSPC) {
-            result = ret;
-        }
-    }
-
-    if (result == 0) {
-        ret = bdrv_flush(bs->file);
-        if (ret < 0) {
-            result = ret;
-        }
-    }
-
-    return result;
-}
-
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
-    Qcow2Cache *dependency)
-{
-    int ret;
-
-    if (dependency->depends) {
-        ret = qcow2_cache_flush_dependency(bs, dependency);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    if (c->depends && (c->depends != dependency)) {
-        ret = qcow2_cache_flush_dependency(bs, c);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    c->depends = dependency;
-    return 0;
-}
-
-void qcow2_cache_depends_on_flush(Qcow2Cache *c)
-{
-    c->depends_on_flush = true;
-}
-
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
-    int i;
-    int min_count = INT_MAX;
-    int min_index = -1;
-
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].ref) {
-            continue;
-        }
-
-        if (c->entries[i].cache_hits < min_count) {
-            min_index = i;
-            min_count = c->entries[i].cache_hits;
-        }
-
-        /* Give newer hits priority */
-        /* TODO Check how to optimize the replacement strategy */
-        c->entries[i].cache_hits /= 2;
-    }
-
-    if (min_index == -1) {
-        /* This can't happen in current synchronous code, but leave the check
-         * here as a reminder for whoever starts using AIO with the cache */
-        abort();
-    }
-    return min_index;
-}
-
-static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
-    uint64_t offset, void **table, bool read_from_disk)
-{
-    BDRVQcowState *s = bs->opaque;
-    int i;
-    int ret;
-
-    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
-                          offset, read_from_disk);
-
-    /* Check if the table is already cached */
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].offset == offset) {
-            goto found;
-        }
-    }
-
-    /* If not, write a table back and replace it */
-    i = qcow2_cache_find_entry_to_replace(c);
-    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
-                                        c == s->l2_table_cache, i);
-    if (i < 0) {
-        return i;
-    }
-
-    ret = qcow2_cache_entry_flush(bs, c, i);
-    if (ret < 0) {
-        return ret;
-    }
-
-    trace_qcow2_cache_get_read(qemu_coroutine_self(),
-                               c == s->l2_table_cache, i);
-    c->entries[i].offset = 0;
-    if (read_from_disk) {
-        if (c == s->l2_table_cache) {
-            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
-        }
-
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    /* Give the table some hits for the start so that it won't be replaced
-     * immediately. The number 32 is completely arbitrary. */
-    c->entries[i].cache_hits = 32;
-    c->entries[i].offset = offset;
-
-    /* And return the right table */
-found:
-    c->entries[i].cache_hits++;
-    c->entries[i].ref++;
-    *table = c->entries[i].table;
-
-    trace_qcow2_cache_get_done(qemu_coroutine_self(),
-                               c == s->l2_table_cache, i);
-
-    return 0;
-}
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table)
-{
-    return qcow2_cache_do_get(bs, c, offset, table, true);
-}
-
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table)
-{
-    return qcow2_cache_do_get(bs, c, offset, table, false);
-}
-
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == *table) {
-            goto found;
-        }
-    }
-    return -ENOENT;
-
-found:
-    c->entries[i].ref--;
-    *table = NULL;
-
-    assert(c->entries[i].ref >= 0);
-    return 0;
-}
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == table) {
-            goto found;
-        }
-    }
-    abort();
-
-found:
-    c->entries[i].dirty = true;
-}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c71470a..c238ee2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -69,7 +69,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
         return new_l1_table_offset;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    ret = block_cache_flush(bs, s->refcount_block_cache);
     if (ret < 0) {
         goto fail;
     }
@@ -119,7 +119,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
+    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+                          (void **) l2_table);
 
     return ret;
 }
@@ -180,7 +181,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
         return l2_offset;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    ret = block_cache_flush(bs, s->refcount_block_cache);
     if (ret < 0) {
         goto fail;
     }
@@ -188,7 +189,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
     /* allocate a new entry in the l2 cache */
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
-    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
+    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
+                                (void **) table);
     if (ret < 0) {
         return ret;
     }
@@ -203,16 +205,16 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
         /* if there was an old l2 table, read it from the disk */
         BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
-        ret = qcow2_cache_get(bs, s->l2_table_cache,
+        ret = block_cache_get(bs, s->l2_table_cache,
             old_l2_offset & L1E_OFFSET_MASK,
-            (void**) &old_table);
+            (void **) &old_table);
         if (ret < 0) {
             goto fail;
         }
 
         memcpy(l2_table, old_table, s->cluster_size);
 
-        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
+        ret = block_cache_put(bs, s->l2_table_cache, (void **) &old_table);
         if (ret < 0) {
             goto fail;
         }
@@ -222,8 +224,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
     BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
 
     trace_qcow2_l2_allocate_write_l2(bs, l1_index);
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    ret = block_cache_flush(bs, s->l2_table_cache);
     if (ret < 0) {
         goto fail;
     }
@@ -242,7 +244,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
 fail:
     trace_qcow2_l2_allocate_done(bs, l1_index, ret);
-    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
+    block_cache_put(bs, s->l2_table_cache, (void **) table);
     s->l1_table[l1_index] = old_l2_offset;
     return ret;
 }
@@ -478,7 +480,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         abort();
     }
 
-    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
 
     nb_available = (c * s->cluster_sectors);
 
@@ -587,13 +589,13 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
      * allocated. */
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
     if (cluster_offset & L2E_OFFSET_MASK) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
         return 0;
     }
 
     cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
     if (cluster_offset < 0) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
         return 0;
     }
 
@@ -608,9 +610,9 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     /* compressed clusters never have the copied flag */
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
     l2_table[l2_index] = cpu_to_be64(cluster_offset);
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     if (ret < 0) {
         return 0;
     }
@@ -642,7 +644,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
      * need to be sure that the refcounts have been increased and COW was
      * handled.
      */
-    qcow2_cache_depends_on_flush(s->l2_table_cache);
+    block_cache_depends_on_flush(s->l2_table_cache);
 
     return 0;
 }
@@ -675,7 +677,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         qcow2_mark_dirty(bs);
     }
     if (qcow2_need_accurate_refcounts(s)) {
-        qcow2_cache_set_dependency(bs, s->l2_table_cache,
+        block_cache_set_dependency(bs, s->l2_table_cache,
                                    s->refcount_block_cache);
     }
 
@@ -683,7 +685,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     if (ret < 0) {
         goto err;
     }
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
 
     for (i = 0; i < m->nb_clusters; i++) {
         /* if two concurrent writes happen to the same unallocated cluster
@@ -700,7 +702,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      }
 
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     if (ret < 0) {
         goto err;
     }
@@ -909,7 +911,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     /* Cleanup */
 out:
-    pret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    pret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     if (pret < 0) {
         return pret;
     }
@@ -1037,7 +1039,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
      * wrong with our code. */
     assert(nb_clusters > 0);
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     if (ret < 0) {
         return ret;
     }
@@ -1328,14 +1330,14 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
         }
 
         /* First remove L2 entries */
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
         l2_table[l2_index + i] = cpu_to_be64(0);
 
         /* Then decrease the refcount */
         qcow2_free_any_clusters(bs, old_offset, 1);
     }
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     if (ret < 0) {
         return ret;
     }
@@ -1405,7 +1407,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         old_offset = be64_to_cpu(l2_table[l2_index + i]);
 
         /* Update L2 entries */
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
         if (old_offset & QCOW_OFLAG_COMPRESSED) {
             l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1);
@@ -1414,7 +1416,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         }
     }
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     if (ret < 0) {
         return ret;
     }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b32738f..c7427e8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -71,7 +71,7 @@ static int load_refcount_block(BlockDriverState *bs,
     int ret;
 
     BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
         refcount_block);
 
     return ret;
@@ -98,8 +98,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
     if (!refcount_block_offset)
         return 0;
 
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        (void**) &refcount_block);
+    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+        (void **) &refcount_block);
     if (ret < 0) {
         return ret;
     }
@@ -108,8 +108,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
         ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
     refcount = be16_to_cpu(refcount_block[block_index]);
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache,
-        (void**) &refcount_block);
+    ret = block_cache_put(bs, s->refcount_block_cache,
+        (void **) &refcount_block);
     if (ret < 0) {
         return ret;
     }
@@ -172,7 +172,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         /* If it's already there, we're done */
         if (refcount_block_offset) {
              return load_refcount_block(bs, refcount_block_offset,
-                 (void**) refcount_block);
+                 (void **) refcount_block);
         }
     }
 
@@ -201,7 +201,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     *refcount_block = NULL;
 
     /* We write to the refcount table, so we might depend on L2 tables */
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    ret = block_cache_flush(bs, s->l2_table_cache);
     if (ret < 0) {
         return ret;
     }
@@ -220,8 +220,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
     if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
         /* Zero the new refcount block before updating it */
-        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+            (void **) refcount_block);
         if (ret < 0) {
             goto fail_block;
         }
@@ -240,15 +240,15 @@ static int alloc_refcount_block(BlockDriverState *bs,
             goto fail_block;
         }
 
-        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+        ret = block_cache_flush(bs, s->refcount_block_cache);
         if (ret < 0) {
             goto fail_block;
         }
 
         /* Initialize the new refcount block only after updating its refcount,
          * update_refcount uses the refcount cache itself */
-        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+            (void **) refcount_block);
         if (ret < 0) {
             goto fail_block;
         }
@@ -258,8 +258,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
     /* Now the new refcount block needs to be written to disk */
     BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
-    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
+    ret = block_cache_flush(bs, s->refcount_block_cache);
     if (ret < 0) {
         goto fail_block;
     }
@@ -279,7 +279,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
         return 0;
     }
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+    ret = block_cache_put(bs, s->refcount_block_cache,
+                          (void **) refcount_block);
     if (ret < 0) {
         goto fail_block;
     }
@@ -402,7 +403,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t));
     s->free_cluster_index = old_free_cluster_index;
 
-    ret = load_refcount_block(bs, new_block, (void**) refcount_block);
+    ret = load_refcount_block(bs, new_block, (void **) refcount_block);
     if (ret < 0) {
         return ret;
     }
@@ -413,7 +414,7 @@ fail_table:
     g_free(new_table);
 fail_block:
     if (*refcount_block != NULL) {
-        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+        block_cache_put(bs, s->refcount_block_cache, (void **) refcount_block);
     }
     return ret;
 }
@@ -439,7 +440,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     }
 
     if (addend < 0) {
-        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
+        block_cache_set_dependency(bs, s->refcount_block_cache,
             s->l2_table_cache);
     }
 
@@ -456,8 +457,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         /* Load the refcount block and allocate it if needed */
         if (table_index != old_table_index) {
             if (refcount_block) {
-                ret = qcow2_cache_put(bs, s->refcount_block_cache,
-                    (void**) &refcount_block);
+                ret = block_cache_put(bs, s->refcount_block_cache,
+                    (void **) &refcount_block);
                 if (ret < 0) {
                     goto fail;
                 }
@@ -470,7 +471,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         }
         old_table_index = table_index;
 
-        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
 
         /* we can update the count and save it */
         block_index = cluster_index &
@@ -493,8 +494,8 @@ fail:
     /* Write last changed block to disk */
     if (refcount_block) {
         int wret;
-        wret = qcow2_cache_put(bs, s->refcount_block_cache,
-            (void**) &refcount_block);
+        wret = block_cache_put(bs, s->refcount_block_cache,
+            (void **) &refcount_block);
         if (wret < 0) {
             return ret < 0 ? ret : wret;
         }
@@ -671,7 +672,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
      * or explicitly by update_cluster_refcount().  Refcount blocks must be
      * flushed before the caller's L2 table updates.
      */
-    qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
+    block_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
     return offset;
 }
 
@@ -767,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
             old_l2_offset = l2_offset;
             l2_offset &= L1E_OFFSET_MASK;
 
-            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
-                (void**) &l2_table);
+            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+                (void **) &l2_table);
             if (ret < 0) {
                 goto fail;
             }
@@ -811,16 +812,17 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     }
                     if (offset != old_offset) {
                         if (addend > 0) {
-                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
+                            block_cache_set_dependency(bs, s->l2_table_cache,
                                 s->refcount_block_cache);
                         }
                         l2_table[j] = cpu_to_be64(offset);
-                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+                        block_cache_entry_mark_dirty(s->l2_table_cache,
+                                                     l2_table);
                     }
                 }
             }
 
-            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+            ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
             if (ret < 0) {
                 goto fail;
             }
@@ -847,7 +849,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     ret = bdrv_flush(bs);
 fail:
     if (l2_table) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
     }
 
     /* Update L1 only if it isn't deleted anyway (addend = -1) */
diff --git a/block/qcow2.c b/block/qcow2.c
index 5871192..3b1c416 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -446,8 +446,11 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE,
+                                           s->cluster_size, BLOCK_TABLE_L2);
+    s->refcount_block_cache = block_cache_create(bs, REFCOUNT_CACHE_SIZE,
+                                                 s->cluster_size,
+                                                 BLOCK_TABLE_REF);
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
@@ -548,7 +551,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
     qcow2_refcount_close(bs);
     g_free(s->l1_table);
     if (s->l2_table_cache) {
-        qcow2_cache_destroy(bs, s->l2_table_cache);
+        block_cache_destroy(bs, s->l2_table_cache);
     }
     g_free(s->cluster_cache);
     qemu_vfree(s->cluster_data);
@@ -913,13 +916,13 @@ static void qcow2_close(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     g_free(s->l1_table);
 
-    qcow2_cache_flush(bs, s->l2_table_cache);
-    qcow2_cache_flush(bs, s->refcount_block_cache);
+    block_cache_flush(bs, s->l2_table_cache);
+    block_cache_flush(bs, s->refcount_block_cache);
 
     qcow2_mark_clean(bs);
 
-    qcow2_cache_destroy(bs, s->l2_table_cache);
-    qcow2_cache_destroy(bs, s->refcount_block_cache);
+    block_cache_destroy(bs, s->l2_table_cache);
+    block_cache_destroy(bs, s->refcount_block_cache);
 
     g_free(s->unknown_header_fields);
     cleanup_unknown_header_ext(bs);
@@ -1596,14 +1599,14 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
     int ret;
 
     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    ret = block_cache_flush(bs, s->l2_table_cache);
     if (ret < 0) {
         qemu_co_mutex_unlock(&s->lock);
         return ret;
     }
 
     if (qcow2_need_accurate_refcounts(s)) {
-        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+        ret = block_cache_flush(bs, s->refcount_block_cache);
         if (ret < 0) {
             qemu_co_mutex_unlock(&s->lock);
             return ret;
diff --git a/block/qcow2.h b/block/qcow2.h
index bf8db2a..e7f6aec 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
 
 #include "block/aes.h"
 #include "block/coroutine.h"
+#include "block/block-cache.h"
 
 //#define DEBUG_ALLOC
 //#define DEBUG_ALLOC2
@@ -97,9 +98,6 @@ typedef struct QCowSnapshot {
     uint64_t vm_clock_nsec;
 } QCowSnapshot;
 
-struct Qcow2Cache;
-typedef struct Qcow2Cache Qcow2Cache;
-
 typedef struct Qcow2UnknownHeaderExtension {
     uint32_t magic;
     uint32_t len;
@@ -149,8 +147,8 @@ typedef struct BDRVQcowState {
     uint64_t l1_table_offset;
     uint64_t *l1_table;
 
-    Qcow2Cache* l2_table_cache;
-    Qcow2Cache* refcount_block_cache;
+    BlockCache  *l2_table_cache;
+    BlockCache  *refcount_block_cache;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -392,19 +390,6 @@ void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
 
 /* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
-    Qcow2Cache *dependency);
-void qcow2_cache_depends_on_flush(Qcow2Cache *c);
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table);
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table);
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+
 
 #endif
diff --git a/include/block/block-cache.h b/include/block/block-cache.h
new file mode 100644
index 0000000..0b9cb3f
--- /dev/null
+++ b/include/block/block-cache.h
@@ -0,0 +1,60 @@
+/*
+ * QEMU Block Layer Cache
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This file is based on qcow2-cache.c, see its copyrights below:
+ *
+ * L2/refcount table cache for the QCOW2 format
+ *
+ * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_CACHE_H
+#define BLOCK_CACHE_H
+
+typedef enum {
+    BLOCK_TABLE_REF,
+    BLOCK_TABLE_L2,
+    BLOCK_TABLE_BITMAP,
+} BlockTableType;
+
+struct BlockCache;
+typedef struct BlockCache BlockCache;
+
+BlockCache *block_cache_create(BlockDriverState *bs, int num_tables,
+                               size_t table_size, BlockTableType type);
+int block_cache_destroy(BlockDriverState *bs, BlockCache *c);
+int block_cache_flush(BlockDriverState *bs, BlockCache *c);
+int block_cache_set_dependency(BlockDriverState *bs,
+                               BlockCache *c,
+                               BlockCache *dependency);
+void block_cache_depends_on_flush(BlockCache *c);
+int block_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+                    void **table);
+int block_cache_get_empty(BlockDriverState *bs, BlockCache *c,
+                          uint64_t offset, void **table);
+int block_cache_put(BlockDriverState *bs, BlockCache *c, void **table);
+void block_cache_entry_mark_dirty(BlockCache *c, void *table);
+#endif /* BLOCK_CACHE_H */
diff --git a/trace-events b/trace-events
index 412f7e4..65124c1 100644
--- a/trace-events
+++ b/trace-events
@@ -497,12 +497,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
 qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
 qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
 
-qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
-qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
-qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+# block/block-cache.c
+block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
+block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
+block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
 
 # block/qed-l2-cache.c
 qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
                   ` (3 preceding siblings ...)
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2013-04-10  8:11 ` Dong Xu Wang
  2013-04-18 10:03   ` Stefan Hajnoczi
  2013-05-13 15:07   ` Jeff Cody
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 6/6] qemu-iotests: add add-cow iotests support Dong Xu Wang
  2013-04-18 10:06 ` [Qemu-devel] [PATCH V18 0/6] add-cow file format Stefan Hajnoczi
  6 siblings, 2 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

add-cow file format core code. It use block-cache.c as cache code.
It lacks of snapshot_blkdev support.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
v17-v18:
1) use error_report, not fprintf.
2) remove version field from header.
3) header_size is MAX(cluster_size, 4096).
4) introduce s->cluster_sectors.
5) use BLKDBG_L2_LOAD/UPDATE.

v16->v17:
1) Use stringify.

v15->v16:
1) Judge if opts is null in add_cow_create function.
 block/Makefile.objs       |   1 +
 block/add-cow.c           | 741 ++++++++++++++++++++++++++++++++++++++++++++++
 block/block-cache.c       |   4 +-
 block/qcow2.h             |   6 +-
 include/block/block_int.h |   2 +
 5 files changed, 749 insertions(+), 5 deletions(-)
 create mode 100644 block/add-cow.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 4fe81dc..69cbb09 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,6 +1,7 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-obj-y += block-cache.o
+block-obj-y += add-cow.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 0000000..904b008
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,741 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "qemu/module.h"
+#include "block/block-cache.h"
+
+#define ACOW_CLUSTER_SIZE 65536
+enum {
+    /* compat_features bit */
+    ACOW_F_ALL_ALLOCATED    = 0X01,
+
+    /* none feature bit used now */
+    ACOW_FEATURE_MASK       = 0,
+
+    ACOW_MAGIC              = 'A' | 'C' << 8 | 'O' << 16 | 'W' << 24,
+    ACOW_CACHE_SIZE         = 16,
+    DEFAULT_HEADER_SIZE    = 4096,
+    MIN_CLUSTER_BITS            = 9,
+    MAX_CLUSTER_BITS            = 21,
+};
+
+typedef struct AddCowHeader {
+    uint32_t    magic;
+
+    uint32_t    backing_offset;
+    uint32_t    backing_size;
+    uint32_t    image_offset;
+    uint32_t    image_size;
+
+    uint32_t    cluster_bits;
+    uint64_t    features;
+    uint64_t    compat_features;
+    uint32_t    header_size;
+
+    char        backing_fmt[16];
+    char        image_fmt[16];
+} QEMU_PACKED AddCowHeader;
+
+typedef struct BDRVAddCowState {
+    BlockDriverState    *image_hd;
+    CoMutex             lock;
+    int                 cluster_size;
+    int                 cluster_sectors;
+    BlockCache          *bitmap_cache;
+    uint64_t            bitmap_size;
+    AddCowHeader        header;
+    char                backing_fmt[16];
+    char                image_fmt[16];
+} BDRVAddCowState;
+
+/* Convert sector_num to offset in bitmap */
+static inline int64_t offset_in_bitmap(int64_t sector_num,
+                                       int64_t cluster_sectors)
+{
+    int64_t cluster_num = sector_num / cluster_sectors;
+    return cluster_num / 8;
+}
+
+static inline bool is_cluster_head(int64_t sector_num,
+                                   int64_t cluster_sectors)
+{
+    return sector_num % cluster_sectors == 0;
+}
+
+static inline bool is_cluster_tail(int64_t sector_num,
+                                   int64_t cluster_sectors)
+{
+    return (sector_num + 1) % cluster_sectors == 0;
+}
+
+static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
+{
+    cpu->magic              = le32_to_cpu(le->magic);
+
+    cpu->backing_offset     = le32_to_cpu(le->backing_offset);
+    cpu->backing_size       = le32_to_cpu(le->backing_size);
+    cpu->image_offset       = le32_to_cpu(le->image_offset);
+    cpu->image_size         = le32_to_cpu(le->image_size);
+
+    cpu->cluster_bits       = le32_to_cpu(le->cluster_bits);
+    cpu->features           = le64_to_cpu(le->features);
+    cpu->compat_features    = le64_to_cpu(le->compat_features);
+    cpu->header_size        = le32_to_cpu(le->header_size);
+
+    memcpy(cpu->backing_fmt, le->backing_fmt, sizeof(cpu->backing_fmt));
+    memcpy(cpu->image_fmt, le->image_fmt, sizeof(cpu->image_fmt));
+}
+
+static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
+{
+    le->magic               = cpu_to_le32(cpu->magic);
+
+    le->backing_offset      = cpu_to_le32(cpu->backing_offset);
+    le->backing_size        = cpu_to_le32(cpu->backing_size);
+    le->image_offset        = cpu_to_le32(cpu->image_offset);
+    le->image_size          = cpu_to_le32(cpu->image_size);
+
+    le->cluster_bits        = cpu_to_le32(cpu->cluster_bits);
+    le->features            = cpu_to_le64(cpu->features);
+    le->compat_features     = cpu_to_le64(cpu->compat_features);
+    le->header_size         = cpu_to_le32(cpu->header_size);
+    memcpy(le->backing_fmt, cpu->backing_fmt, sizeof(le->backing_fmt));
+    memcpy(le->image_fmt, cpu->image_fmt, sizeof(le->image_fmt));
+}
+
+static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const AddCowHeader *header = (const AddCowHeader *)buf;
+
+    if (buf_size < sizeof(*header)) {
+        return 0;
+    }
+    if (le32_to_cpu(header->magic) == ACOW_MAGIC) {
+        return 100;
+    }
+    return 0;
+}
+
+static int add_cow_create(const char *filename, QemuOpts *opts)
+{
+    AddCowHeader header = {
+        .magic      = ACOW_MAGIC,
+        .features   = 0,
+        .compat_features = 0,
+    };
+    AddCowHeader le_header;
+    int64_t image_len = 0;
+    const char *backing_filename = NULL, *backing_fmt = NULL;
+    const char *image_filename = NULL, *image_format = NULL;
+    size_t cluster_size = ACOW_CLUSTER_SIZE;
+    BlockDriverState *bs, *backing_bs;
+    BlockDriver *drv = bdrv_find_format("add-cow");
+    int ret;
+
+    image_len = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
+    backing_filename = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
+    backing_fmt = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
+    image_filename = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FILE);
+    image_format = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FMT);
+    cluster_size = qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE,
+                                     ACOW_CLUSTER_SIZE);
+
+    header.cluster_bits = ffs(cluster_size) - 1;
+    if (header.cluster_bits < MIN_CLUSTER_BITS ||
+        header.cluster_bits > MAX_CLUSTER_BITS ||
+        (1 << header.cluster_bits) != cluster_size) {
+        error_report(
+            "Cluster size must be a power of two between %d and %dk",
+            1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
+        return -EINVAL;
+    }
+
+   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);
+    if (backing_filename) {
+        header.backing_offset = sizeof(header);
+        header.backing_size = strlen(backing_filename);
+
+        if (!backing_fmt) {
+            backing_bs = bdrv_new("image");
+            ret = bdrv_open(backing_bs, backing_filename, NULL,
+                            BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
+            if (ret < 0) {
+                return ret;
+            }
+            backing_fmt = bdrv_get_format_name(backing_bs);
+            bdrv_delete(backing_bs);
+        }
+    } else {
+        header.compat_features |= ACOW_F_ALL_ALLOCATED;
+    }
+
+    if (image_filename) {
+        header.image_offset = sizeof(header) + header.backing_size;
+        header.image_size = strlen(image_filename);
+    } else {
+        error_report("Error: image_file should be given.");
+        return -EINVAL;
+    }
+
+    if (backing_filename && !strcmp(backing_filename, image_filename)) {
+        error_report("Error: Trying to create an image with the "
+                     "same backing file name as the image file name");
+        return -EINVAL;
+    }
+
+    if (!strcmp(filename, image_filename)) {
+        error_report("Error: Trying to create an image with the "
+                     "same filename as the image file name");
+        return -EINVAL;
+    }
+
+    if (header.image_offset + header.image_size > header.header_size) {
+        error_report("image_file name or backing_file name too long.");
+        return -ENOSPC;
+    }
+
+    ret = bdrv_file_open(&bs, image_filename, NULL, 0);
+    if (ret < 0) {
+        return ret;
+    }
+    bdrv_close(bs);
+
+    ret = bdrv_create_file(filename, NULL);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
+    if (ret < 0) {
+        return ret;
+    }
+    snprintf(header.backing_fmt, sizeof(header.backing_fmt), "%s",
+             backing_fmt ? backing_fmt : "");
+    snprintf(header.image_fmt, sizeof(header.image_fmt), "%s",
+             image_format ? image_format : "raw");
+    add_cow_header_cpu_to_le(&header, &le_header);
+    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_pwrite(bs, header.image_offset, image_filename,
+                      header.image_size);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    if (backing_filename) {
+        ret = bdrv_pwrite(bs, header.backing_offset, backing_filename,
+                          header.backing_size);
+        if (ret < 0) {
+            bdrv_delete(bs);
+            return ret;
+        }
+    }
+    bdrv_close(bs);
+
+    ret = bdrv_open(bs, filename, NULL, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_truncate(bs, image_len);
+    bdrv_delete(bs);
+    return ret;
+}
+
+static int add_cow_open(BlockDriverState *bs, QDict *options, int flags)
+{
+    char                image_filename[PATH_MAX];
+    char                tmp_name[PATH_MAX];
+    int                 ret;
+    int                 sector_per_byte;
+    BDRVAddCowState     *s = bs->opaque;
+    AddCowHeader        le_header;
+    char backing_filename[PATH_MAX];
+
+    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    add_cow_header_le_to_cpu(&le_header, &s->header);
+
+    if (s->header.magic != ACOW_MAGIC) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    if (s->header.features & ~ACOW_FEATURE_MASK) {
+        char buf[64];
+        snprintf(buf, sizeof(buf), "Feature Flags: %" PRIx64,
+                 s->header.features & ~ACOW_FEATURE_MASK);
+        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+                      bs->device_name, "add-cow", buf);
+        return -ENOTSUP;
+    }
+
+    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
+        snprintf(bs->backing_format, sizeof(bs->backing_format),
+                 "%s", s->header.backing_fmt);
+    }
+
+    if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
+        s->header.cluster_bits > MAX_CLUSTER_BITS) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    s->cluster_size = 1 << s->header.cluster_bits;
+    if (s->header.header_size != MAX(s->cluster_size, DEFAULT_HEADER_SIZE)) {
+        char buf[64];
+        snprintf(buf, sizeof(buf), "Header size: %d",
+                 s->header.header_size);
+        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+                      bs->device_name, "add-cow", buf);
+        return -ENOTSUP;
+    }
+
+    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
+        ret = bdrv_read_string(bs->file, s->header.backing_offset,
+                               s->header.backing_size,
+                               bs->backing_file,
+                               sizeof(bs->backing_file));
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
+    ret = bdrv_read_string(bs->file, s->header.image_offset,
+                           s->header.image_size, tmp_name,
+                           sizeof(tmp_name));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if (path_has_protocol(tmp_name)) {
+        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
+    } else {
+        path_combine(image_filename, sizeof(image_filename),
+                     bs->filename, tmp_name);
+    }
+    bdrv_get_full_backing_filename(bs, backing_filename,
+                                   sizeof(backing_filename));
+    if (!strcmp(backing_filename, image_filename)) {
+        error_report("Error: backing file and image file are same: %s\n",
+                     backing_filename);
+        ret = EINVAL;
+        goto fail;
+    }
+    s->image_hd = bdrv_new("");
+    ret = bdrv_open(s->image_hd, image_filename, NULL, flags,
+                    bdrv_find_format(s->header.image_fmt));
+    if (ret < 0) {
+        bdrv_delete(s->image_hd);
+        goto fail;
+    }
+
+    bs->total_sectors = bdrv_getlength(s->image_hd) / BDRV_SECTOR_SIZE;
+    s->cluster_sectors = s->cluster_size / BDRV_SECTOR_SIZE;
+    sector_per_byte = s->cluster_sectors * 8;
+    s->bitmap_size =
+        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
+    s->bitmap_cache =  block_cache_create(bs, ACOW_CACHE_SIZE,
+                                          s->cluster_size,
+                                          BLOCK_TABLE_BITMAP);
+    qemu_co_mutex_init(&s->lock);
+    return 0;
+fail:
+    return ret;
+}
+
+static void add_cow_close(BlockDriverState *bs)
+{
+    BDRVAddCowState *s = bs->opaque;
+    if (s->bitmap_cache) {
+        block_cache_destroy(bs, s->bitmap_cache);
+    }
+    bdrv_delete(s->image_hd);
+}
+
+static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
+{
+    BDRVAddCowState *s  = bs->opaque;
+    int64_t cluster_num = sector_num / s->cluster_sectors;
+    uint8_t *table      = NULL;
+    bool val = false;
+    int ret;
+
+    uint64_t offset = s->header.header_size +
+        (offset_in_bitmap(sector_num, s->cluster_sectors)
+            & (~(s->cluster_size - 1)));
+    ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
+    if (ret < 0) {
+        return ret;
+    }
+
+    val = table[cluster_num / 8 % s->cluster_size] & (1 << (cluster_num % 8));
+    ret = block_cache_put(bs, s->bitmap_cache, (void **)&table);
+    if (ret < 0) {
+        return ret;
+    }
+    return val;
+}
+
+static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, int *num_same)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int changed;
+
+    if (nb_sectors == 0) {
+        *num_same = 0;
+        return 0;
+    }
+
+    if (s->header.compat_features & ACOW_F_ALL_ALLOCATED) {
+        *num_same = nb_sectors;
+        return 1;
+    }
+    changed = is_allocated(bs, sector_num);
+
+    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
+        if (is_allocated(bs, sector_num + *num_same) != changed) {
+            break;
+        }
+    }
+    return changed;
+}
+
+static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
+                                int64_t sector_num, int nb_sectors)
+{
+    int n1;
+    if ((sector_num + nb_sectors) <= bs->total_sectors) {
+        return nb_sectors;
+    }
+    if (sector_num >= bs->total_sectors) {
+        n1 = 0;
+    } else {
+        n1 = bs->total_sectors - sector_num;
+    }
+
+    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
+                      0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
+
+    return n1;
+}
+
+static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
+                                         int64_t sector_num,
+                                         int remaining_sectors,
+                                         QEMUIOVector *qiov)
+{
+    BDRVAddCowState *s  = bs->opaque;
+    int cur_nr_sectors;
+    uint64_t bytes_done = 0;
+    QEMUIOVector hd_qiov;
+    int n1, ret = 0;
+
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    qemu_co_mutex_lock(&s->lock);
+    while (remaining_sectors != 0) {
+        cur_nr_sectors = remaining_sectors;
+        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors,
+                                 &cur_nr_sectors)) {
+            qemu_iovec_reset(&hd_qiov);
+            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+                              cur_nr_sectors * BDRV_SECTOR_SIZE);
+            qemu_co_mutex_unlock(&s->lock);
+            ret = bdrv_co_readv(s->image_hd, sector_num,
+                                cur_nr_sectors, &hd_qiov);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
+        } else {
+            if (bs->backing_hd) {
+                qemu_iovec_reset(&hd_qiov);
+                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+                                  cur_nr_sectors * BDRV_SECTOR_SIZE);
+                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
+                                          sector_num, cur_nr_sectors);
+                if (n1 > 0) {
+                    qemu_co_mutex_unlock(&s->lock);
+                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
+                                        cur_nr_sectors, &hd_qiov);
+                    qemu_co_mutex_lock(&s->lock);
+                    if (ret < 0) {
+                        goto fail;
+                    }
+                }
+            } else {
+                qemu_iovec_memset(&hd_qiov, 0, 0,
+                                  BDRV_SECTOR_SIZE * cur_nr_sectors);
+            }
+        }
+        remaining_sectors -= cur_nr_sectors;
+        sector_num += cur_nr_sectors;
+        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
+    }
+fail:
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
+static int coroutine_fn copy_sectors(BlockDriverState *bs,
+                                     int n_start, int n_end)
+{
+    BDRVAddCowState *s = bs->opaque;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int n, ret;
+
+    n = n_end - n_start;
+    if (n <= 0) {
+        return 0;
+    }
+
+    iov.iov_len = n * BDRV_SECTOR_SIZE;
+    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
+    if (ret < 0) {
+        goto out;
+    }
+    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = 0;
+out:
+    qemu_vfree(iov.iov_base);
+    return ret;
+}
+
+static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
+                                          int64_t sector_num,
+                                          int remaining_sectors,
+                                          QEMUIOVector *qiov)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int ret = 0, i;
+    QEMUIOVector hd_qiov;
+    uint8_t *table;
+    uint64_t offset;
+    int mask = s->cluster_sectors - 1;
+    int cluster_mask = s->cluster_size - 1;
+
+    qemu_co_mutex_lock(&s->lock);
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    ret = bdrv_co_writev(s->image_hd, sector_num,
+                         remaining_sectors, qiov);
+
+    if (ret < 0) {
+        goto fail;
+    }
+    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
+        /* Copy content of unmodified sectors */
+        if (!is_cluster_head(sector_num, s->cluster_sectors)
+            && !is_allocated(bs, sector_num)) {
+            ret = copy_sectors(bs, sector_num & ~mask, sector_num);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        if (!is_cluster_tail(sector_num + remaining_sectors - 1,
+                             s->cluster_sectors)
+            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
+            ret = copy_sectors(bs, sector_num + remaining_sectors,
+                               ((sector_num + remaining_sectors) | mask) + 1);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        for (i = sector_num / s->cluster_sectors;
+            i <= (sector_num + remaining_sectors - 1) / s->cluster_sectors;
+            i++) {
+            offset = s->header.header_size
+                + (offset_in_bitmap(i * s->cluster_sectors,
+                s->cluster_sectors) & (~cluster_mask));
+            ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
+            if (ret < 0) {
+                goto fail;
+            }
+            if ((table[i / 8] & (1 << (i % 8))) == 0) {
+                table[i / 8] |= (1 << (i % 8));
+                block_cache_entry_mark_dirty(s->bitmap_cache, table);
+            }
+
+            ret = block_cache_put(bs, s->bitmap_cache, (void **) &table);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+    }
+    ret = 0;
+fail:
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
+static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int sector_per_byte = s->cluster_sectors * 8;
+    int ret;
+    int64_t bitmap_size =
+        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
+    bitmap_size = (bitmap_size + s->cluster_size - 1)
+        & (~(s->cluster_size - 1));
+
+    ret = bdrv_truncate(bs->file, s->header.header_size + bitmap_size);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_truncate(s->image_hd, size);
+    if (ret < 0) {
+        return ret;
+    }
+    return 0;
+}
+
+static int add_cow_reopen_prepare(BDRVReopenState *state,
+                                  BlockReopenQueue *queue, Error **errp)
+{
+    BDRVAddCowState *s;
+    int ret = -1;
+
+    assert(state != NULL);
+    assert(state->bs != NULL);
+
+    if (queue == NULL) {
+        error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+                  "No reopen queue for add-cow");
+        goto exit;
+    }
+
+    s = state->bs->opaque;
+
+    assert(s != NULL);
+
+
+    bdrv_reopen_queue(queue, s->image_hd, state->flags);
+    ret = 0;
+
+exit:
+    return ret;
+}
+
+
+static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int ret;
+
+    qemu_co_mutex_lock(&s->lock);
+    if (s->bitmap_cache) {
+        ret = block_cache_flush(bs, s->bitmap_cache);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    ret = bdrv_flush(s->image_hd);
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
+}
+
+static int add_cow_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    BDRVAddCowState *s = bs->opaque;
+    bdi->cluster_size = s->cluster_size;
+    return 0;
+}
+
+static QemuOptsList add_cow_create_opts = {
+    .name = "add-cow-create-opts",
+    .head = QTAILQ_HEAD_INITIALIZER(add_cow_create_opts.head),
+    .desc = {
+        {
+            .name = BLOCK_OPT_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Virtual disk size"
+        },
+        {
+            .name = BLOCK_OPT_BACKING_FILE,
+            .type = QEMU_OPT_STRING,
+            .help = "File name of a base image"
+        },
+        {
+            .name = BLOCK_OPT_BACKING_FMT,
+            .type = QEMU_OPT_STRING,
+            .help = "Image format of the base image"
+        },
+        {
+            .name = BLOCK_OPT_IMAGE_FILE,
+            .type = QEMU_OPT_STRING,
+            .help = "File name of a image file"
+        },
+        {
+            .name = BLOCK_OPT_IMAGE_FMT,
+            .type = QEMU_OPT_STRING,
+            .help = "Image format of the image file"
+        },
+        {
+            .name = BLOCK_OPT_CLUSTER_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "add-cow cluster size",
+            .def_value_str = stringify(ACOW_CLUSTER_SIZE)
+        },
+        { /* end of list */ }
+    }
+};
+
+static BlockDriver bdrv_add_cow = {
+    .format_name                = "add-cow",
+    .instance_size              = sizeof(BDRVAddCowState),
+    .bdrv_probe                 = add_cow_probe,
+    .bdrv_open                  = add_cow_open,
+    .bdrv_close                 = add_cow_close,
+    .bdrv_create                = add_cow_create,
+    .bdrv_co_readv              = add_cow_co_readv,
+    .bdrv_co_writev             = add_cow_co_writev,
+    .bdrv_truncate              = bdrv_add_cow_truncate,
+    .bdrv_co_is_allocated       = add_cow_is_allocated,
+    .bdrv_reopen_prepare        = add_cow_reopen_prepare,
+    .bdrv_get_info              = add_cow_get_info,
+
+    .bdrv_create_opts           = &add_cow_create_opts,
+    .bdrv_co_flush_to_os        = add_cow_co_flush,
+};
+
+static void bdrv_add_cow_init(void)
+{
+    bdrv_register(&bdrv_add_cow);
+}
+
+block_init(bdrv_add_cow_init);
diff --git a/block/block-cache.c b/block/block-cache.c
index 3544691..4824632 100644
--- a/block/block-cache.c
+++ b/block/block-cache.c
@@ -130,7 +130,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
     } else if (c->table_type == BLOCK_TABLE_L2) {
         BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
     } else if (c->table_type == BLOCK_TABLE_BITMAP) {
-        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
+        BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
     }
 
     ret = bdrv_pwrite(bs->file, c->entries[i].offset,
@@ -265,7 +265,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
         if (c->table_type == BLOCK_TABLE_L2) {
             BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
         } else if (c->table_type == BLOCK_TABLE_BITMAP) {
-            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
+            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
         }
 
         ret = bdrv_pread(bs->file, offset, c->entries[i].table,
diff --git a/block/qcow2.h b/block/qcow2.h
index e7f6aec..a4e514b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -98,6 +98,9 @@ typedef struct QCowSnapshot {
     uint64_t vm_clock_nsec;
 } QCowSnapshot;
 
+struct BlockCache;
+typedef struct BlockCache BlockCache;
+
 typedef struct Qcow2UnknownHeaderExtension {
     uint32_t magic;
     uint32_t len;
@@ -389,7 +392,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
 void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
 
-/* qcow2-cache.c functions */
-
-
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a44a9ac..b9eb9b8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,8 @@
 #define BLOCK_OPT_COMPAT_LEVEL      "compat"
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
 #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
+#define BLOCK_OPT_IMAGE_FILE        "image_file"
+#define BLOCK_OPT_IMAGE_FMT         "image_fmt"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V18 6/6] qemu-iotests: add add-cow iotests support
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
                   ` (4 preceding siblings ...)
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 5/6] add-cow file format core code Dong Xu Wang
@ 2013-04-10  8:11 ` Dong Xu Wang
  2013-04-18 10:06 ` [Qemu-devel] [PATCH V18 0/6] add-cow file format Stefan Hajnoczi
  6 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-10  8:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dong Xu Wang, kwol, wdongxu, stefanha

This patch will use qemu-iotests to test add-cow file format.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
v17-v18:
1) add 037 and 038 tests.

 tests/qemu-iotests/017       |  2 +-
 tests/qemu-iotests/020       |  2 +-
 tests/qemu-iotests/037       |  2 +-
 tests/qemu-iotests/038       |  2 +-
 tests/qemu-iotests/common    |  6 ++++++
 tests/qemu-iotests/common.rc | 15 ++++++++++++++-
 6 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 66951eb..d31432f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 2fb0ff8..3dbb495 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/037 b/tests/qemu-iotests/037
index c11460b..683e73d 100755
--- a/tests/qemu-iotests/037
+++ b/tests/qemu-iotests/037
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/038 b/tests/qemu-iotests/038
index 36125ea..10bacfb 100755
--- a/tests/qemu-iotests/038
+++ b/tests/qemu-iotests/038
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index b3aad89..9f651bf 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -128,6 +128,7 @@ common options
 check options
     -raw                test raw (default)
     -cow                test cow
+    -add-cow            test add-cow
     -qcow               test qcow
     -qcow2              test qcow2
     -qed                test qed
@@ -164,6 +165,11 @@ testlist options
 	    xpand=false
 	    ;;
 
+    -add-cow)
+        IMGFMT=add-cow
+        xpand=false
+        ;;
+
 	-qcow)
 	    IMGFMT=qcow
 	    xpand=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index e522d61..45e676c 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -107,6 +107,16 @@ _make_test_img()
     fi
     if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
         optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
+    elif [ "$IMGFMT" = "add-cow" ]; then
+        local IMG="$TEST_IMG"".raw"
+        if [ "$1" = "-b" ]; then
+            IMG="$IMG"".b"
+            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+            extra_img_options="-o image_file=$IMG $extra_img_options"
+        else
+            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+            extra_img_options="-o image_file=$IMG"
+        fi
     fi
 
     if [ -n "$optstr" ]; then
@@ -124,7 +134,8 @@ _make_test_img()
             -e "s# compat='[^']*'##g" \
             -e "s# compat6=\\(on\\|off\\)##g" \
             -e "s# static=\\(on\\|off\\)##g" \
-            -e "s# lazy_refcounts=\\(on\\|off\\)##g"
+            -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+            -e "s# image_file='[^']*'##g"
 
     # Start an NBD server on the image file, which is what we'll be talking to
     if [ $IMGPROTO = "nbd" ]; then
@@ -146,6 +157,8 @@ _cleanup_test_img()
             rm -f $TEST_DIR/t.$IMGFMT
             rm -f $TEST_DIR/t.$IMGFMT.orig
             rm -f $TEST_DIR/t.$IMGFMT.base
+            rm -f $TEST_DIR/t.$IMGFMT.raw
+            rm -f $TEST_DIR/t.$IMGFMT.raw.b
             ;;
 
         rbd)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 1/6] docs: document for add-cow file format
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 1/6] docs: document for " Dong Xu Wang
@ 2013-04-18  8:30   ` Stefan Hajnoczi
  2013-04-23  1:45     ` Dong Xu Wang
  2013-04-26 22:45   ` Eric Blake
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2013-04-18  8:30 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwol, wdongxu, qemu-devel, stefanha

On Wed, Apr 10, 2013 at 04:11:48PM +0800, Dong Xu Wang wrote:
> +The Header is included in the first bytes:
> +(HEADER_SIZE is defined in 40-43 bytes.)
> +    Byte    0  -  3:    magic
> +                        add-cow magic string ("ACOW").
> +
> +            4  -  7:    backing file name offset
> +                        Offset in the add-cow file at which the backing
> +                        file name is stored (NB: The string is not
> +                        lNUL-terminated).

s/lNUL/NUL/

> +            24 - 31:    features
> +                        Bitmask of features. If a feature bit is set
> +                        but not recognized, the add-cow file should be
> +                        dropped. They are not used in now.

"If a feature bit is set but not recognized, the opening the add-cow file must fail.  No features bits are currently defined."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2013-04-18  8:54   ` Stefan Hajnoczi
  2013-04-23  1:47     ` Dong Xu Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2013-04-18  8:54 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwol, wdongxu, qemu-devel, stefanha

On Wed, Apr 10, 2013 at 04:11:51PM +0800, Dong Xu Wang wrote:

This patch does two things:
1. Rename and move qcow2-cache to block-cache.
2. Add a type enum to parameterize BLKDEBUG_EVENT() for L2, refcount,
   and bitmaps.

It's hard to review #2 since all lines are changed in the diff by #1.
In the future, please split patches that move code into two.

The first patch should simply move and rename - it should not include
code changes.

The second patch should contain code changes (i.e. adding the cache type
enum and parameterizing BLKDEBUG_EVENT()).

BTW, when splitting the patch into two for easy reviewing it also
becomes questionable whether to reassign authorship since 90+% of the
code is unchanged and simply moved.

> +    if (c->table_type == BLOCK_TABLE_REF) {
> +        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
> +    } else if (c->table_type == BLOCK_TABLE_L2) {
> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
> +    } else if (c->table_type == BLOCK_TABLE_BITMAP) {
> +        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);

BLKDBG_COW_WRITE is for writing out sectors from the backing file.  It
is not for updating the allocation bitmap.

Please pick an appropriate constant or define your own if none exists.

> +    if (read_from_disk) {
> +        if (c->table_type == BLOCK_TABLE_L2) {
> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
> +        } else if (c->table_type == BLOCK_TABLE_BITMAP) {
> +            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);

Same here.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 5/6] add-cow file format core code Dong Xu Wang
@ 2013-04-18 10:03   ` Stefan Hajnoczi
  2013-04-23  1:54     ` Dong Xu Wang
  2013-05-09  6:24     ` Dong Xu Wang
  2013-05-13 15:07   ` Jeff Cody
  1 sibling, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2013-04-18 10:03 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwol, wdongxu, qemu-devel, stefanha

On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
> +    header.cluster_bits = ffs(cluster_size) - 1;
> +    if (header.cluster_bits < MIN_CLUSTER_BITS ||
> +        header.cluster_bits > MAX_CLUSTER_BITS ||
> +        (1 << header.cluster_bits) != cluster_size) {
> +        error_report(
> +            "Cluster size must be a power of two between %d and %dk",
> +            1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
> +        return -EINVAL;
> +    }
> +
> +   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);

Indentation.

> +    if (backing_filename) {
> +        header.backing_offset = sizeof(header);
> +        header.backing_size = strlen(backing_filename);
> +
> +        if (!backing_fmt) {
> +            backing_bs = bdrv_new("image");
> +            ret = bdrv_open(backing_bs, backing_filename, NULL,
> +                            BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
> +            if (ret < 0) {
> +                return ret;

backing_bs is leaked.

> +    ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    snprintf(header.backing_fmt, sizeof(header.backing_fmt), "%s",
> +             backing_fmt ? backing_fmt : "");
> +    snprintf(header.image_fmt, sizeof(header.image_fmt), "%s",
> +             image_format ? image_format : "raw");

snprintf() doesn't have the semantics in the add-cow specification:

" 44 - 59:    backing file format
              Format of backing file. It will be filled with
              0 if backing file name offset is 0. If backing
              file name offset is non-empty, it must be
              non-empty. It is coded in free-form ASCII, and
              is not NUL-terminated. Zero padded on the right.

  60 - 75:    image file format
              Format of image file. It must be non-empty. It
              is coded in free-form ASCII, and is not
              NUL-terminated. Zero padded on the right."

strncpy() does the zero padding and doesn't NUL-terminate if the max buffer
size is used.

> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
> +        snprintf(bs->backing_format, sizeof(bs->backing_format),
> +                 "%s", s->header.backing_fmt);

s->header.backing_fmt is not NUL-terminated so using snprintf() is
inappropriate (could it read beyond the end of .backing_fmt?).

> +    }
> +
> +    if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
> +        s->header.cluster_bits > MAX_CLUSTER_BITS) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    s->cluster_size = 1 << s->header.cluster_bits;
> +    if (s->header.header_size != MAX(s->cluster_size, DEFAULT_HEADER_SIZE)) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "Header size: %d",

%u or PRIu32 since header_size is uint32_t.  This avoids compiler or
code scanner warnings.

> +    s->image_hd = bdrv_new("");
> +    ret = bdrv_open(s->image_hd, image_filename, NULL, flags,
> +                    bdrv_find_format(s->header.image_fmt));

Cannot use image_fmt as a string since it is not NUL-terminated.

> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> +                                          int64_t sector_num,
> +                                          int remaining_sectors,
> +                                          QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret = 0, i;
> +    QEMUIOVector hd_qiov;
> +    uint8_t *table;
> +    uint64_t offset;
> +    int mask = s->cluster_sectors - 1;
> +    int cluster_mask = s->cluster_size - 1;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    ret = bdrv_co_writev(s->image_hd, sector_num,
> +                         remaining_sectors, qiov);

All writes are serialized.  This means write performance will be very
poor for multi-threaded workloads.

qcow2 tracks allocating writes and allows them to execute at the same
time if they do not overlap clusters.

> +
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
> +        /* Copy content of unmodified sectors */
> +        if (!is_cluster_head(sector_num, s->cluster_sectors)
> +            && !is_allocated(bs, sector_num)) {
> +            ret = copy_sectors(bs, sector_num & ~mask, sector_num);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1,
> +                             s->cluster_sectors)
> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
> +                               ((sector_num + remaining_sectors) | mask) + 1);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }

This trashes the cluster when remaining_sectors = 0, sector_num =
cluster_sectors, and sector cluster_sectors - 1 is unallocated.

Probably best to return early when remaining_sectors == 0.

> +
> +        for (i = sector_num / s->cluster_sectors;
> +            i <= (sector_num + remaining_sectors - 1) / s->cluster_sectors;
> +            i++) {
> +            offset = s->header.header_size
> +                + (offset_in_bitmap(i * s->cluster_sectors,
> +                s->cluster_sectors) & (~cluster_mask));
> +            ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
> +                table[i / 8] |= (1 << (i % 8));

i is based on sector_num while table[] starts at offset, not sector 0.
The index expression i / 8 leads to out-of-bounds accesses.

I think you forgot to & (s->cluster_sectors - 1).

> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    if (s->bitmap_cache) {
> +        ret = block_cache_flush(bs, s->bitmap_cache);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +    ret = bdrv_flush(s->image_hd);

This is the wrong way around.  We must flush image_hd first so that
valid data is on disk.  Then we can flush bitmap_cache to mark the
clusters allocated.

Beyond explicit flushes you also need to make sure that image_hd is
flushed *before* bitmap_cache tables are written out (e.g. cache
eviction when the cache becomes full).  It seems this code is missing.

Also please use bdrv_co_flush() instead of bdrv_flush() in
add_cow_co_flush() since this is a coroutine function.

> diff --git a/block/block-cache.c b/block/block-cache.c
> index 3544691..4824632 100644
> --- a/block/block-cache.c
> +++ b/block/block-cache.c
> @@ -130,7 +130,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
>      } else if (c->table_type == BLOCK_TABLE_L2) {
>          BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>      } else if (c->table_type == BLOCK_TABLE_BITMAP) {
> -        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>      }
>  
>      ret = bdrv_pwrite(bs->file, c->entries[i].offset,
> @@ -265,7 +265,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
>          if (c->table_type == BLOCK_TABLE_L2) {
>              BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>          } else if (c->table_type == BLOCK_TABLE_BITMAP) {
> -            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>          }
>  
>          ret = bdrv_pread(bs->file, offset, c->entries[i].table,

I commented on this in the previous patch.  Please squash this fix into
the previous patch.

> diff --git a/block/qcow2.h b/block/qcow2.h
> index e7f6aec..a4e514b 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -98,6 +98,9 @@ typedef struct QCowSnapshot {
>      uint64_t vm_clock_nsec;
>  } QCowSnapshot;
>  
> +struct BlockCache;
> +typedef struct BlockCache BlockCache;
> +
>  typedef struct Qcow2UnknownHeaderExtension {
>      uint32_t magic;
>      uint32_t len;
> @@ -389,7 +392,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
>  
> -/* qcow2-cache.c functions */
> -
> -

More qcow2-cache.c move cleanups?  Please squash into the previous
patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 0/6] add-cow file format
  2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
                   ` (5 preceding siblings ...)
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 6/6] qemu-iotests: add add-cow iotests support Dong Xu Wang
@ 2013-04-18 10:06 ` Stefan Hajnoczi
  6 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2013-04-18 10:06 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwol, wdongxu, qemu-devel, stefanha

On Wed, Apr 10, 2013 at 04:11:47PM +0800, Dong Xu Wang wrote:
> It will introduce a new file format: add-cow.
> 
> The add-cow file format makes it possible to perform copy-on-write on top of
> a raw disk image.  When we know that no backing file clusters remain visible
> (e.g. we have streamed the entire image and copied all data from the backing
> file), then it is possible to discard the add-cow file and use the raw image
> file directly.
> 
> This feature adds the copy-on-write feature to raw files (which cannot support
> it natively) while allowing us to get full performance again later when we no
> longer need copy-on-write.
> 
> add-cow can benefit from other available functions, such as path_has_protocol
> and qed_read_string, so we will make them public.
> 
> snapshot_blkdev are not supported now for add-cow. Will add it in futher patches.
> 
> These patches are using QemuOpts parser, former patches could be found here:
> http://patchwork.ozlabs.org/patch/235300/
> 
> 
> v17 -> v18:
> 1) remove version field.
> 2) header size is maximum value and cluster size value.
> 3) fix type.
> 4) move struct to source file.
> 5) cluster_size->table_size.
> 6) use error_report, not fprintf.
> 7) remove version field from header.
> 8) header_size is MAX(cluster_size, 4096).
> 9) introduce s->cluster_sectors.
> 10) use BLKDBG_L2_LOAD/UPDATE.
> 11) add 037 and 038 tests.

Left a few comments but the series is close.

The biggest practical issue is serialized allocating writes.
Installation or multi-threaded write workloads may be quite slow since
only one read request is processed at a time.  This can be solved later.

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 1/6] docs: document for add-cow file format
  2013-04-18  8:30   ` Stefan Hajnoczi
@ 2013-04-23  1:45     ` Dong Xu Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-23  1:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwol, wdongxu, qemu-devel, stefanha

On 2013/4/18 16:30, Stefan Hajnoczi wrote:
> On Wed, Apr 10, 2013 at 04:11:48PM +0800, Dong Xu Wang wrote:
>> +The Header is included in the first bytes:
>> +(HEADER_SIZE is defined in 40-43 bytes.)
>> +    Byte    0  -  3:    magic
>> +                        add-cow magic string ("ACOW").
>> +
>> +            4  -  7:    backing file name offset
>> +                        Offset in the add-cow file at which the backing
>> +                        file name is stored (NB: The string is not
>> +                        lNUL-terminated).
>
> s/lNUL/NUL/
Okay.
>
>> +            24 - 31:    features
>> +                        Bitmask of features. If a feature bit is set
>> +                        but not recognized, the add-cow file should be
>> +                        dropped. They are not used in now.
>
> "If a feature bit is set but not recognized, the opening the add-cow file must fail.  No features bits are currently defined."
>
Okay.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c
  2013-04-18  8:54   ` Stefan Hajnoczi
@ 2013-04-23  1:47     ` Dong Xu Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-23  1:47 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwol, wdongxu, qemu-devel, stefanha

On 2013/4/18 16:54, Stefan Hajnoczi wrote:
> On Wed, Apr 10, 2013 at 04:11:51PM +0800, Dong Xu Wang wrote:
>
> This patch does two things:
> 1. Rename and move qcow2-cache to block-cache.
> 2. Add a type enum to parameterize BLKDEBUG_EVENT() for L2, refcount,
>     and bitmaps.
>
> It's hard to review #2 since all lines are changed in the diff by #1.
> In the future, please split patches that move code into two.
>
> The first patch should simply move and rename - it should not include
> code changes.
>
> The second patch should contain code changes (i.e. adding the cache type
> enum and parameterizing BLKDEBUG_EVENT()).
>
> BTW, when splitting the patch into two for easy reviewing it also
> becomes questionable whether to reassign authorship since 90+% of the
> code is unchanged and simply moved.
>
Okay.
>> +    if (c->table_type == BLOCK_TABLE_REF) {
>> +        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
>> +    } else if (c->table_type == BLOCK_TABLE_L2) {
>> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>> +    } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> +        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
>
> BLKDBG_COW_WRITE is for writing out sectors from the backing file.  It
> is not for updating the allocation bitmap.
>
Okay.
> Please pick an appropriate constant or define your own if none exists.
>
>> +    if (read_from_disk) {
>> +        if (c->table_type == BLOCK_TABLE_L2) {
>> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>> +        } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> +            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>
> Same here.
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-04-18 10:03   ` Stefan Hajnoczi
@ 2013-04-23  1:54     ` Dong Xu Wang
  2013-05-09  6:24     ` Dong Xu Wang
  1 sibling, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-04-23  1:54 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwol, wdongxu, qemu-devel, stefanha

On 2013/4/18 18:03, Stefan Hajnoczi wrote:
> On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
>> +    header.cluster_bits = ffs(cluster_size) - 1;
>> +    if (header.cluster_bits < MIN_CLUSTER_BITS ||
>> +        header.cluster_bits > MAX_CLUSTER_BITS ||
>> +        (1 << header.cluster_bits) != cluster_size) {
>> +        error_report(
>> +            "Cluster size must be a power of two between %d and %dk",
>> +            1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
>> +        return -EINVAL;
>> +    }
>> +
>> +   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);
>
> Indentation.
>
okay.
>> +    if (backing_filename) {
>> +        header.backing_offset = sizeof(header);
>> +        header.backing_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, NULL,
>> +                            BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>
> backing_bs is leaked.
Okay. will fix.
>
>> +    ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    snprintf(header.backing_fmt, sizeof(header.backing_fmt), "%s",
>> +             backing_fmt ? backing_fmt : "");
>> +    snprintf(header.image_fmt, sizeof(header.image_fmt), "%s",
>> +             image_format ? image_format : "raw");
>
> snprintf() doesn't have the semantics in the add-cow specification:
>
> " 44 - 59:    backing file format
>                Format of backing file. It will be filled with
>                0 if backing file name offset is 0. If backing
>                file name offset is non-empty, it must be
>                non-empty. It is coded in free-form ASCII, and
>                is not NUL-terminated. Zero padded on the right.
>
>    60 - 75:    image file format
>                Format of image file. It must be non-empty. It
>                is coded in free-form ASCII, and is not
>                NUL-terminated. Zero padded on the right."
>
> strncpy() does the zero padding and doesn't NUL-terminate if the max buffer
> size is used.
>
Okay.
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        snprintf(bs->backing_format, sizeof(bs->backing_format),
>> +                 "%s", s->header.backing_fmt);
>
> s->header.backing_fmt is not NUL-terminated so using snprintf() is
> inappropriate (could it read beyond the end of .backing_fmt?).
>
Okay.
>> +    }
>> +
>> +    if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
>> +        s->header.cluster_bits > MAX_CLUSTER_BITS) {
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    s->cluster_size = 1 << s->header.cluster_bits;
>> +    if (s->header.header_size != MAX(s->cluster_size, DEFAULT_HEADER_SIZE)) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "Header size: %d",
>
> %u or PRIu32 since header_size is uint32_t.  This avoids compiler or
> code scanner warnings.
>
Okay.
>> +    s->image_hd = bdrv_new("");
>> +    ret = bdrv_open(s->image_hd, image_filename, NULL, flags,
>> +                    bdrv_find_format(s->header.image_fmt));
>
> Cannot use image_fmt as a string since it is not NUL-terminated.
>
Okay.
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +                                          int64_t sector_num,
>> +                                          int remaining_sectors,
>> +                                          QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +    int mask = s->cluster_sectors - 1;
>> +    int cluster_mask = s->cluster_size - 1;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd, sector_num,
>> +                         remaining_sectors, qiov);
>
> All writes are serialized.  This means write performance will be very
> poor for multi-threaded workloads.
>
> qcow2 tracks allocating writes and allows them to execute at the same
> time if they do not overlap clusters.
>
Okay, will refer qcow2 related code.
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num, s->cluster_sectors)
>> +            && !is_allocated(bs, sector_num)) {
>> +            ret = copy_sectors(bs, sector_num & ~mask, sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1,
>> +                             s->cluster_sectors)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                               ((sector_num + remaining_sectors) | mask) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>
> This trashes the cluster when remaining_sectors = 0, sector_num =
> cluster_sectors, and sector cluster_sectors - 1 is unallocated.
>
> Probably best to return early when remaining_sectors == 0.
>
Okay.
>> +
>> +        for (i = sector_num / s->cluster_sectors;
>> +            i <= (sector_num + remaining_sectors - 1) / s->cluster_sectors;
>> +            i++) {
>> +            offset = s->header.header_size
>> +                + (offset_in_bitmap(i * s->cluster_sectors,
>> +                s->cluster_sectors) & (~cluster_mask));
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>
> i is based on sector_num while table[] starts at offset, not sector 0.
> The index expression i / 8 leads to out-of-bounds accesses.
>
> I think you forgot to & (s->cluster_sectors - 1).
>
Okay.
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    if (s->bitmap_cache) {
>> +        ret = block_cache_flush(bs, s->bitmap_cache);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +    ret = bdrv_flush(s->image_hd);
>
> This is the wrong way around.  We must flush image_hd first so that
> valid data is on disk.  Then we can flush bitmap_cache to mark the
> clusters allocated.
>
> Beyond explicit flushes you also need to make sure that image_hd is
> flushed *before* bitmap_cache tables are written out (e.g. cache
> eviction when the cache becomes full).  It seems this code is missing.
>
> Also please use bdrv_co_flush() instead of bdrv_flush() in
> add_cow_co_flush() since this is a coroutine function.
>
Okay.
>> diff --git a/block/block-cache.c b/block/block-cache.c
>> index 3544691..4824632 100644
>> --- a/block/block-cache.c
>> +++ b/block/block-cache.c
>> @@ -130,7 +130,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
>>       } else if (c->table_type == BLOCK_TABLE_L2) {
>>           BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>>       } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
>> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>       }
>>
>>       ret = bdrv_pwrite(bs->file, c->entries[i].offset,
>> @@ -265,7 +265,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
>>           if (c->table_type == BLOCK_TABLE_L2) {
>>               BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>           } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> -            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>           }
>>
>>           ret = bdrv_pread(bs->file, offset, c->entries[i].table,
>
> I commented on this in the previous patch.  Please squash this fix into
> the previous patch.
>
Okay.
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index e7f6aec..a4e514b 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -98,6 +98,9 @@ typedef struct QCowSnapshot {
>>       uint64_t vm_clock_nsec;
>>   } QCowSnapshot;
>>
>> +struct BlockCache;
>> +typedef struct BlockCache BlockCache;
>> +
>>   typedef struct Qcow2UnknownHeaderExtension {
>>       uint32_t magic;
>>       uint32_t len;
>> @@ -389,7 +392,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>>   void qcow2_free_snapshots(BlockDriverState *bs);
>>   int qcow2_read_snapshots(BlockDriverState *bs);
>>
>> -/* qcow2-cache.c functions */
>> -
>> -
>
> More qcow2-cache.c move cleanups?  Please squash into the previous
> patch.
Okay.
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 1/6] docs: document for add-cow file format
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 1/6] docs: document for " Dong Xu Wang
  2013-04-18  8:30   ` Stefan Hajnoczi
@ 2013-04-26 22:45   ` Eric Blake
  2013-05-02  5:44     ` Dong Xu Wang
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Blake @ 2013-04-26 22:45 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: wdongxu, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 5178 bytes --]

On 04/10/2013 02:11 AM, Dong Xu Wang wrote:
> Document for add-cow format, the usage and spec of add-cow are
> introduced.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> V17->V18:
> 1) remove version field.
> 2) header size is maximum value and cluster size value.
> 3) fix type.
>  docs/specs/add-cow.txt | 165 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 165 insertions(+)
>  create mode 100644 docs/specs/add-cow.txt
> 
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..151028b
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,165 @@
> +== General ==

No copyright notice?  Not necessarily your fault, since many other files
in this directory suffer from the same problem.

> +
> +The raw file format does not support backing files or copy on write
> +feature. The add-cow image format makes it possible to use backing
> +files with a image by keeping a separate .add-cow metadata file.
> +Once all clusters have been written into the image it is safe to
> +discard the .add-cow and backing files, then we can use the image
> +directly.
> +
> +An example usage of add-cow would look like:
> +(ubuntu.img is a disk image which has an installed OS.)
> +    1)  Create a image, such as raw format, with the same size of
> +        ubuntu.img:
> +            qemu-img create -f raw test.raw 8G
> +    2)  Create an add-cow image which will store dirty bitmap
> +            qemu-img create -f add-cow test.add-cow \
> +                -o backing_file=ubuntu.img,image_file=test.raw
> +    3)  Run qemu with add-cow image
> +            qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of
> +test.add-cow will be calculated from the size of test.raw.
> +
> +image_fmt can be omitted, in that case image_fmt is assumed to be
> +"raw". backing_fmt can also be omitted, add-cow should do a probe
> +operation and determine what the backing file's format is.

In general, probing a raw file is a security hole (we just plugged a CVE
with NBD probing); you probably ought to mention that it is recommended
to always specify the format for any raw file, so that probing doesn't
misinterpret the contents of the file as some other format.

> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------------------------+
> + |     Header    |           COW bitmap          |
> + +---------------+-------------------------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(HEADER_SIZE is defined in 40-43 bytes.)
> +    Byte    0  -  3:    magic
> +                        add-cow magic string ("ACOW").

Probably ought to mention that this magic string is in the ASCII
encoding (those characters map to different bytes on EBCDIC, although I
doubt qemu will ever really been ported to EBCDIC)

> +
> +            4  -  7:    backing file name offset
> +                        Offset in the add-cow file at which the backing
> +                        file name is stored (NB: The string is not
> +                        lNUL-terminated).

s/lNUL/NUL/

> +                        If backing file name does NOT exist, this field
> +                        will be 0. Must be between 76 and [HEADER_SIZE
> +                        - 2](a file name must be at least 1 byte).
> +

> +
> +            40 - 43:    HEADER_SIZE
> +                        The header field is variable-sized. This field
> +                        indicates how many bytes will be used to store
> +                        add-cow header. By default, it is maximum value
> +                        of 4096 and cluster size value.

Should it be required to be a multiple of 4096, for efficient alignment
of clusters?

> +
> +            44 - 59:    backing file format
> +                        Format of backing file. It will be filled with
> +                        0 if backing file name offset is 0. If backing
> +                        file name offset is non-empty, it must be
> +                        non-empty. It is coded in free-form ASCII, and
> +                        is not NUL-terminated. Zero padded on the right.

Requiring this to be non-empty if a backing file is named contradicts
your earlier statement that backing format is probed (I actually like
mandating the backing format, though).

> +
> +            60 - 75:    image file format
> +                        Format of image file. It must be non-empty. It
> +                        is coded in free-form ASCII, and is not
> +                        NUL-terminated. Zero padded on the right.

Again, requiring a format contradicts the earlier statement about format
probing.

How does this compare with Paolo's efforts to design a persistent bitmap
for drive-mirror/block-backup use?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 1/6] docs: document for add-cow file format
  2013-04-26 22:45   ` Eric Blake
@ 2013-05-02  5:44     ` Dong Xu Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-05-02  5:44 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, wdongxu, qemu-devel, stefanha

On 2013/4/27 6:45, Eric Blake wrote:
> On 04/10/2013 02:11 AM, Dong Xu Wang wrote:
>> Document for add-cow format, the usage and spec of add-cow are
>> introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> V17->V18:
>> 1) remove version field.
>> 2) header size is maximum value and cluster size value.
>> 3) fix type.
>>   docs/specs/add-cow.txt | 165 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 165 insertions(+)
>>   create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..151028b
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,165 @@
>> +== General ==
>
> No copyright notice?  Not necessarily your fault, since many other files
> in this directory suffer from the same problem.
>
Yep, all documents in docs/specs have no copyright, so I omit it.
>> +
>> +The raw file format does not support backing files or copy on write
>> +feature. The add-cow image format makes it possible to use backing
>> +files with a image by keeping a separate .add-cow metadata file.
>> +Once all clusters have been written into the image it is safe to
>> +discard the .add-cow and backing files, then we can use the image
>> +directly.
>> +
>> +An example usage of add-cow would look like:
>> +(ubuntu.img is a disk image which has an installed OS.)
>> +    1)  Create a image, such as raw format, with the same size of
>> +        ubuntu.img:
>> +            qemu-img create -f raw test.raw 8G
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of
>> +test.add-cow will be calculated from the size of test.raw.
>> +
>> +image_fmt can be omitted, in that case image_fmt is assumed to be
>> +"raw". backing_fmt can also be omitted, add-cow should do a probe
>> +operation and determine what the backing file's format is.
>
> In general, probing a raw file is a security hole (we just plugged a CVE
> with NBD probing); you probably ought to mention that it is recommended
> to always specify the format for any raw file, so that probing doesn't
> misinterpret the contents of the file as some other format.
>
Okay, will mention.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------------------------+
>> + |     Header    |           COW bitmap          |
>> + +---------------+-------------------------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(HEADER_SIZE is defined in 40-43 bytes.)
>> +    Byte    0  -  3:    magic
>> +                        add-cow magic string ("ACOW").
>
> Probably ought to mention that this magic string is in the ASCII
> encoding (those characters map to different bytes on EBCDIC, although I
> doubt qemu will ever really been ported to EBCDIC)
>
Okay.
>> +
>> +            4  -  7:    backing file name offset
>> +                        Offset in the add-cow file at which the backing
>> +                        file name is stored (NB: The string is not
>> +                        lNUL-terminated).
>
> s/lNUL/NUL/
>
Okay.
>> +                        If backing file name does NOT exist, this field
>> +                        will be 0. Must be between 76 and [HEADER_SIZE
>> +                        - 2](a file name must be at least 1 byte).
>> +
>
>> +
>> +            40 - 43:    HEADER_SIZE
>> +                        The header field is variable-sized. This field
>> +                        indicates how many bytes will be used to store
>> +                        add-cow header. By default, it is maximum value
>> +                        of 4096 and cluster size value.
>
> Should it be required to be a multiple of 4096, for efficient alignment
> of clusters?
>
I think I can make cluster size be multiple of 4096..
>> +
>> +            44 - 59:    backing file format
>> +                        Format of backing file. It will be filled with
>> +                        0 if backing file name offset is 0. If backing
>> +                        file name offset is non-empty, it must be
>> +                        non-empty. It is coded in free-form ASCII, and
>> +                        is not NUL-terminated. Zero padded on the right.
>
> Requiring this to be non-empty if a backing file is named contradicts
> your earlier statement that backing format is probed (I actually like
> mandating the backing format, though).

I think logic can be:
1) if "-o backing_fmt = raw" then do not probe.
2) else even if "-o backing_fmt = $fmt" is used, also perform a probe 
operation and write backing_fmt in add_cow headers.
>
>> +
>> +            60 - 75:    image file format
>> +                        Format of image file. It must be non-empty. It
>> +                        is coded in free-form ASCII, and is not
>> +                        NUL-terminated. Zero padded on the right.
>
> Again, requiring a format contradicts the earlier statement about format
> probing.
>
> How does this compare with Paolo's efforts to design a persistent bitmap
> for drive-mirror/block-backup use?
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-04-18 10:03   ` Stefan Hajnoczi
  2013-04-23  1:54     ` Dong Xu Wang
@ 2013-05-09  6:24     ` Dong Xu Wang
  2013-05-09 15:07       ` Stefan Hajnoczi
  1 sibling, 1 reply; 20+ messages in thread
From: Dong Xu Wang @ 2013-05-09  6:24 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Thu, Apr 18, 2013 at 6:03 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
>> +    header.cluster_bits = ffs(cluster_size) - 1;
>> +    if (header.cluster_bits < MIN_CLUSTER_BITS ||
>> +        header.cluster_bits > MAX_CLUSTER_BITS ||
>> +        (1 << header.cluster_bits) != cluster_size) {
>> +        error_report(
>> +            "Cluster size must be a power of two between %d and %dk",
>> +            1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
>> +        return -EINVAL;
>> +    }
>> +
>> +   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);
>
> Indentation.
>
>> +    if (backing_filename) {
>> +        header.backing_offset = sizeof(header);
>> +        header.backing_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, NULL,
>> +                            BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>
> backing_bs is leaked.
>
>> +    ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    snprintf(header.backing_fmt, sizeof(header.backing_fmt), "%s",
>> +             backing_fmt ? backing_fmt : "");
>> +    snprintf(header.image_fmt, sizeof(header.image_fmt), "%s",
>> +             image_format ? image_format : "raw");
>
> snprintf() doesn't have the semantics in the add-cow specification:
>
> " 44 - 59:    backing file format
>               Format of backing file. It will be filled with
>               0 if backing file name offset is 0. If backing
>               file name offset is non-empty, it must be
>               non-empty. It is coded in free-form ASCII, and
>               is not NUL-terminated. Zero padded on the right.
>
>   60 - 75:    image file format
>               Format of image file. It must be non-empty. It
>               is coded in free-form ASCII, and is not
>               NUL-terminated. Zero padded on the right."
>
> strncpy() does the zero padding and doesn't NUL-terminate if the max buffer
> size is used.
>
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        snprintf(bs->backing_format, sizeof(bs->backing_format),
>> +                 "%s", s->header.backing_fmt);
>
> s->header.backing_fmt is not NUL-terminated so using snprintf() is
> inappropriate (could it read beyond the end of .backing_fmt?).
>
>> +    }
>> +
>> +    if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
>> +        s->header.cluster_bits > MAX_CLUSTER_BITS) {
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    s->cluster_size = 1 << s->header.cluster_bits;
>> +    if (s->header.header_size != MAX(s->cluster_size, DEFAULT_HEADER_SIZE)) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "Header size: %d",
>
> %u or PRIu32 since header_size is uint32_t.  This avoids compiler or
> code scanner warnings.
>
>> +    s->image_hd = bdrv_new("");
>> +    ret = bdrv_open(s->image_hd, image_filename, NULL, flags,
>> +                    bdrv_find_format(s->header.image_fmt));
>
> Cannot use image_fmt as a string since it is not NUL-terminated.
>
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +                                          int64_t sector_num,
>> +                                          int remaining_sectors,
>> +                                          QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +    int mask = s->cluster_sectors - 1;
>> +    int cluster_mask = s->cluster_size - 1;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd, sector_num,
>> +                         remaining_sectors, qiov);
>
> All writes are serialized.  This means write performance will be very
> poor for multi-threaded workloads.
>
> qcow2 tracks allocating writes and allows them to execute at the same
> time if they do not overlap clusters.
>
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num, s->cluster_sectors)
>> +            && !is_allocated(bs, sector_num)) {
>> +            ret = copy_sectors(bs, sector_num & ~mask, sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1,
>> +                             s->cluster_sectors)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                               ((sector_num + remaining_sectors) | mask) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>
> This trashes the cluster when remaining_sectors = 0, sector_num =
> cluster_sectors, and sector cluster_sectors - 1 is unallocated.
>
> Probably best to return early when remaining_sectors == 0.
>
>> +
>> +        for (i = sector_num / s->cluster_sectors;
>> +            i <= (sector_num + remaining_sectors - 1) / s->cluster_sectors;
>> +            i++) {
>> +            offset = s->header.header_size
>> +                + (offset_in_bitmap(i * s->cluster_sectors,
>> +                s->cluster_sectors) & (~cluster_mask));
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>
> i is based on sector_num while table[] starts at offset, not sector 0.
> The index expression i / 8 leads to out-of-bounds accesses.
>
> I think you forgot to & (s->cluster_sectors - 1).
>
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    if (s->bitmap_cache) {
>> +        ret = block_cache_flush(bs, s->bitmap_cache);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +    ret = bdrv_flush(s->image_hd);
>
> This is the wrong way around.  We must flush image_hd first so that
> valid data is on disk.  Then we can flush bitmap_cache to mark the
> clusters allocated.
>
> Beyond explicit flushes you also need to make sure that image_hd is
> flushed *before* bitmap_cache tables are written out (e.g. cache
> eviction when the cache becomes full).  It seems this code is missing.
>
Hi  Stefan, how can I make sure image_hd "flush" operation completed before
I write bitmap_cache tables? Is there any functions I can use?

Thank you.

> Also please use bdrv_co_flush() instead of bdrv_flush() in
> add_cow_co_flush() since this is a coroutine function.
>
>> diff --git a/block/block-cache.c b/block/block-cache.c
>> index 3544691..4824632 100644
>> --- a/block/block-cache.c
>> +++ b/block/block-cache.c
>> @@ -130,7 +130,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
>>      } else if (c->table_type == BLOCK_TABLE_L2) {
>>          BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>>      } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
>> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>      }
>>
>>      ret = bdrv_pwrite(bs->file, c->entries[i].offset,
>> @@ -265,7 +265,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
>>          if (c->table_type == BLOCK_TABLE_L2) {
>>              BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>          } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> -            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>          }
>>
>>          ret = bdrv_pread(bs->file, offset, c->entries[i].table,
>
> I commented on this in the previous patch.  Please squash this fix into
> the previous patch.
>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index e7f6aec..a4e514b 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -98,6 +98,9 @@ typedef struct QCowSnapshot {
>>      uint64_t vm_clock_nsec;
>>  } QCowSnapshot;
>>
>> +struct BlockCache;
>> +typedef struct BlockCache BlockCache;
>> +
>>  typedef struct Qcow2UnknownHeaderExtension {
>>      uint32_t magic;
>>      uint32_t len;
>> @@ -389,7 +392,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>>  void qcow2_free_snapshots(BlockDriverState *bs);
>>  int qcow2_read_snapshots(BlockDriverState *bs);
>>
>> -/* qcow2-cache.c functions */
>> -
>> -
>
> More qcow2-cache.c move cleanups?  Please squash into the previous
> patch.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-05-09  6:24     ` Dong Xu Wang
@ 2013-05-09 15:07       ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2013-05-09 15:07 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel

On Thu, May 09, 2013 at 02:24:20PM +0800, Dong Xu Wang wrote:
> On Thu, Apr 18, 2013 at 6:03 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
> >> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> >> +{
> >> +    BDRVAddCowState *s = bs->opaque;
> >> +    int ret;
> >> +
> >> +    qemu_co_mutex_lock(&s->lock);
> >> +    if (s->bitmap_cache) {
> >> +        ret = block_cache_flush(bs, s->bitmap_cache);
> >> +        if (ret < 0) {
> >> +            return ret;
> >> +        }
> >> +    }
> >> +    ret = bdrv_flush(s->image_hd);
> >
> > This is the wrong way around.  We must flush image_hd first so that
> > valid data is on disk.  Then we can flush bitmap_cache to mark the
> > clusters allocated.
> >
> > Beyond explicit flushes you also need to make sure that image_hd is
> > flushed *before* bitmap_cache tables are written out (e.g. cache
> > eviction when the cache becomes full).  It seems this code is missing.
> >
> Hi  Stefan, how can I make sure image_hd "flush" operation completed before
> I write bitmap_cache tables? Is there any functions I can use?

You could extend the block cache like this:

block_cache_add_dependency_func(BlockCache *c,
                                int (*dependency_fn)(BlockCache *c, void *opaque),
				void *opaque);

Then add-cow would register a custom dependency function that flushes
image_hd.

A hackier way of achieving the same thing is to instantiate an
image_hd_cache and set a dependency from the bitmap_cache on the
image_hd_cache.  The image_hd_cache actually has no cache entries, but
it uses image_hd and will flush it.

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 5/6] add-cow file format core code Dong Xu Wang
  2013-04-18 10:03   ` Stefan Hajnoczi
@ 2013-05-13 15:07   ` Jeff Cody
  2013-05-14  2:15     ` Dong Xu Wang
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Cody @ 2013-05-13 15:07 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, wdongxu, qemu-devel, stefanha

On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
> add-cow file format core code. It use block-cache.c as cache code.
> It lacks of snapshot_blkdev support.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> v17-v18:
> 1) use error_report, not fprintf.
> 2) remove version field from header.
> 3) header_size is MAX(cluster_size, 4096).
> 4) introduce s->cluster_sectors.
> 5) use BLKDBG_L2_LOAD/UPDATE.
> 
> v16->v17:
> 1) Use stringify.
> 
> v15->v16:
> 1) Judge if opts is null in add_cow_create function.
>  block/Makefile.objs       |   1 +
>  block/add-cow.c           | 741 ++++++++++++++++++++++++++++++++++++++++++++++
>  block/block-cache.c       |   4 +-
>  block/qcow2.h             |   6 +-
>  include/block/block_int.h |   2 +
>  5 files changed, 749 insertions(+), 5 deletions(-)
>  create mode 100644 block/add-cow.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 4fe81dc..69cbb09 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,6 +1,7 @@
>  block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += block-cache.o
> +block-obj-y += add-cow.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-y += parallels.o blkdebug.o blkverify.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..904b008
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,741 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "qemu/module.h"
> +#include "block/block-cache.h"
> +
> +#define ACOW_CLUSTER_SIZE 65536
> +enum {
> +    /* compat_features bit */
> +    ACOW_F_ALL_ALLOCATED    = 0X01,
> +
> +    /* none feature bit used now */
> +    ACOW_FEATURE_MASK       = 0,
> +
> +    ACOW_MAGIC              = 'A' | 'C' << 8 | 'O' << 16 | 'W' << 24,
> +    ACOW_CACHE_SIZE         = 16,
> +    DEFAULT_HEADER_SIZE    = 4096,
> +    MIN_CLUSTER_BITS            = 9,
> +    MAX_CLUSTER_BITS            = 21,
> +};
> +
> +typedef struct AddCowHeader {
> +    uint32_t    magic;
> +
> +    uint32_t    backing_offset;
> +    uint32_t    backing_size;
> +    uint32_t    image_offset;
> +    uint32_t    image_size;
> +
> +    uint32_t    cluster_bits;
> +    uint64_t    features;
> +    uint64_t    compat_features;
> +    uint32_t    header_size;
> +
> +    char        backing_fmt[16];
> +    char        image_fmt[16];
> +} QEMU_PACKED AddCowHeader;
> +
> +typedef struct BDRVAddCowState {
> +    BlockDriverState    *image_hd;
> +    CoMutex             lock;
> +    int                 cluster_size;
> +    int                 cluster_sectors;
> +    BlockCache          *bitmap_cache;
> +    uint64_t            bitmap_size;
> +    AddCowHeader        header;
> +    char                backing_fmt[16];
> +    char                image_fmt[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num,
> +                                       int64_t cluster_sectors)
> +{
> +    int64_t cluster_num = sector_num / cluster_sectors;
> +    return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num,
> +                                   int64_t cluster_sectors)
> +{
> +    return sector_num % cluster_sectors == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num,
> +                                   int64_t cluster_sectors)
> +{
> +    return (sector_num + 1) % cluster_sectors == 0;
> +}
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> +    cpu->magic              = le32_to_cpu(le->magic);
> +
> +    cpu->backing_offset     = le32_to_cpu(le->backing_offset);
> +    cpu->backing_size       = le32_to_cpu(le->backing_size);
> +    cpu->image_offset       = le32_to_cpu(le->image_offset);
> +    cpu->image_size         = le32_to_cpu(le->image_size);
> +
> +    cpu->cluster_bits       = le32_to_cpu(le->cluster_bits);
> +    cpu->features           = le64_to_cpu(le->features);
> +    cpu->compat_features    = le64_to_cpu(le->compat_features);
> +    cpu->header_size        = le32_to_cpu(le->header_size);
> +
> +    memcpy(cpu->backing_fmt, le->backing_fmt, sizeof(cpu->backing_fmt));
> +    memcpy(cpu->image_fmt, le->image_fmt, sizeof(cpu->image_fmt));
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> +    le->magic               = cpu_to_le32(cpu->magic);
> +
> +    le->backing_offset      = cpu_to_le32(cpu->backing_offset);
> +    le->backing_size        = cpu_to_le32(cpu->backing_size);
> +    le->image_offset        = cpu_to_le32(cpu->image_offset);
> +    le->image_size          = cpu_to_le32(cpu->image_size);
> +
> +    le->cluster_bits        = cpu_to_le32(cpu->cluster_bits);
> +    le->features            = cpu_to_le64(cpu->features);
> +    le->compat_features     = cpu_to_le64(cpu->compat_features);
> +    le->header_size         = cpu_to_le32(cpu->header_size);
> +    memcpy(le->backing_fmt, cpu->backing_fmt, sizeof(le->backing_fmt));
> +    memcpy(le->image_fmt, cpu->image_fmt, sizeof(le->image_fmt));
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> +    if (buf_size < sizeof(*header)) {
> +        return 0;
> +    }
> +    if (le32_to_cpu(header->magic) == ACOW_MAGIC) {
> +        return 100;
> +    }
> +    return 0;
> +}
> +
> +static int add_cow_create(const char *filename, QemuOpts *opts)
> +{
> +    AddCowHeader header = {
> +        .magic      = ACOW_MAGIC,
> +        .features   = 0,
> +        .compat_features = 0,
> +    };
> +    AddCowHeader le_header;
> +    int64_t image_len = 0;
> +    const char *backing_filename = NULL, *backing_fmt = NULL;
> +    const char *image_filename = NULL, *image_format = NULL;
> +    size_t cluster_size = ACOW_CLUSTER_SIZE;
> +    BlockDriverState *bs, *backing_bs;
> +    BlockDriver *drv = bdrv_find_format("add-cow");
> +    int ret;
> +
> +    image_len = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
> +    backing_filename = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
> +    backing_fmt = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
> +    image_filename = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FILE);
> +    image_format = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FMT);
> +    cluster_size = qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE,
> +                                     ACOW_CLUSTER_SIZE);
> +
> +    header.cluster_bits = ffs(cluster_size) - 1;
> +    if (header.cluster_bits < MIN_CLUSTER_BITS ||
> +        header.cluster_bits > MAX_CLUSTER_BITS ||
> +        (1 << header.cluster_bits) != cluster_size) {
> +        error_report(
> +            "Cluster size must be a power of two between %d and %dk",
> +            1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
> +        return -EINVAL;
> +    }
> +
> +   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);
> +    if (backing_filename) {
> +        header.backing_offset = sizeof(header);
> +        header.backing_size = strlen(backing_filename);
> +
> +        if (!backing_fmt) {
> +            backing_bs = bdrv_new("image");
> +            ret = bdrv_open(backing_bs, backing_filename, NULL,
> +                            BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            backing_fmt = bdrv_get_format_name(backing_bs);
> +            bdrv_delete(backing_bs);
> +        }
> +    } else {
> +        header.compat_features |= ACOW_F_ALL_ALLOCATED;
> +    }
> +
> +    if (image_filename) {
> +        header.image_offset = sizeof(header) + header.backing_size;
> +        header.image_size = strlen(image_filename);
> +    } else {
> +        error_report("Error: image_file should be given.");
> +        return -EINVAL;
> +    }
> +
> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same backing file name as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (!strcmp(filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same filename as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (header.image_offset + header.image_size > header.header_size) {
> +        error_report("image_file name or backing_file name too long.");
> +        return -ENOSPC;
> +    }
> +
> +    ret = bdrv_file_open(&bs, image_filename, NULL, 0);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    bdrv_close(bs);
> +
> +    ret = bdrv_create_file(filename, NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    snprintf(header.backing_fmt, sizeof(header.backing_fmt), "%s",
> +             backing_fmt ? backing_fmt : "");
> +    snprintf(header.image_fmt, sizeof(header.image_fmt), "%s",
> +             image_format ? image_format : "raw");
> +    add_cow_header_cpu_to_le(&header, &le_header);
> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, header.image_offset, image_filename,
> +                      header.image_size);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    if (backing_filename) {
> +        ret = bdrv_pwrite(bs, header.backing_offset, backing_filename,
> +                          header.backing_size);
> +        if (ret < 0) {
> +            bdrv_delete(bs);
> +            return ret;
> +        }
> +    }
> +    bdrv_close(bs);
> +
> +    ret = bdrv_open(bs, filename, NULL, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_truncate(bs, image_len);
> +    bdrv_delete(bs);
> +    return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, QDict *options, int flags)
> +{
> +    char                image_filename[PATH_MAX];
> +    char                tmp_name[PATH_MAX];
> +    int                 ret;
> +    int                 sector_per_byte;
> +    BDRVAddCowState     *s = bs->opaque;
> +    AddCowHeader        le_header;
> +    char backing_filename[PATH_MAX];
> +
> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> +    if (s->header.magic != ACOW_MAGIC) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if (s->header.features & ~ACOW_FEATURE_MASK) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "Feature Flags: %" PRIx64,
> +                 s->header.features & ~ACOW_FEATURE_MASK);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +                      bs->device_name, "add-cow", buf);
> +        return -ENOTSUP;
> +    }
> +
> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
> +        snprintf(bs->backing_format, sizeof(bs->backing_format),
> +                 "%s", s->header.backing_fmt);
> +    }
> +
> +    if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
> +        s->header.cluster_bits > MAX_CLUSTER_BITS) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    s->cluster_size = 1 << s->header.cluster_bits;
> +    if (s->header.header_size != MAX(s->cluster_size, DEFAULT_HEADER_SIZE)) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "Header size: %d",
> +                 s->header.header_size);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +                      bs->device_name, "add-cow", buf);
> +        return -ENOTSUP;
> +    }
> +
> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, s->header.backing_offset,
> +                               s->header.backing_size,
> +                               bs->backing_file,
> +                               sizeof(bs->backing_file));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file, s->header.image_offset,
> +                           s->header.image_size, tmp_name,
> +                           sizeof(tmp_name));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if (path_has_protocol(tmp_name)) {
> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> +    } else {
> +        path_combine(image_filename, sizeof(image_filename),
> +                     bs->filename, tmp_name);
> +    }
> +    bdrv_get_full_backing_filename(bs, backing_filename,
> +                                   sizeof(backing_filename));
> +    if (!strcmp(backing_filename, image_filename)) {
> +        error_report("Error: backing file and image file are same: %s\n",
> +                     backing_filename);
> +        ret = EINVAL;
> +        goto fail;
> +    }
> +    s->image_hd = bdrv_new("");
> +    ret = bdrv_open(s->image_hd, image_filename, NULL, flags,
> +                    bdrv_find_format(s->header.image_fmt));
> +    if (ret < 0) {
> +        bdrv_delete(s->image_hd);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = bdrv_getlength(s->image_hd) / BDRV_SECTOR_SIZE;
> +    s->cluster_sectors = s->cluster_size / BDRV_SECTOR_SIZE;
> +    sector_per_byte = s->cluster_sectors * 8;
> +    s->bitmap_size =
> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> +    s->bitmap_cache =  block_cache_create(bs, ACOW_CACHE_SIZE,
> +                                          s->cluster_size,
> +                                          BLOCK_TABLE_BITMAP);
> +    qemu_co_mutex_init(&s->lock);
> +    return 0;
> +fail:
> +    return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    if (s->bitmap_cache) {
> +        block_cache_destroy(bs, s->bitmap_cache);
> +    }
> +    bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    int64_t cluster_num = sector_num / s->cluster_sectors;
> +    uint8_t *table      = NULL;
> +    bool val = false;
> +    int ret;
> +
> +    uint64_t offset = s->header.header_size +
> +        (offset_in_bitmap(sector_num, s->cluster_sectors)
> +            & (~(s->cluster_size - 1)));
> +    ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    val = table[cluster_num / 8 % s->cluster_size] & (1 << (cluster_num % 8));
> +    ret = block_cache_put(bs, s->bitmap_cache, (void **)&table);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return val;
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, int *num_same)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int changed;
> +
> +    if (nb_sectors == 0) {
> +        *num_same = 0;
> +        return 0;
> +    }
> +
> +    if (s->header.compat_features & ACOW_F_ALL_ALLOCATED) {
> +        *num_same = nb_sectors;
> +        return 1;
> +    }
> +    changed = is_allocated(bs, sector_num);
> +
> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
> +            break;
> +        }
> +    }
> +    return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> +                                int64_t sector_num, int nb_sectors)
> +{
> +    int n1;
> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
> +        return nb_sectors;
> +    }
> +    if (sector_num >= bs->total_sectors) {
> +        n1 = 0;
> +    } else {
> +        n1 = bs->total_sectors - sector_num;
> +    }
> +
> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> +                      0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> +    return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> +                                         int64_t sector_num,
> +                                         int remaining_sectors,
> +                                         QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    int cur_nr_sectors;
> +    uint64_t bytes_done = 0;
> +    QEMUIOVector hd_qiov;
> +    int n1, ret = 0;
> +
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    qemu_co_mutex_lock(&s->lock);
> +    while (remaining_sectors != 0) {
> +        cur_nr_sectors = remaining_sectors;
> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors,
> +                                 &cur_nr_sectors)) {
> +            qemu_iovec_reset(&hd_qiov);
> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                              cur_nr_sectors * BDRV_SECTOR_SIZE);
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = bdrv_co_readv(s->image_hd, sector_num,
> +                                cur_nr_sectors, &hd_qiov);
> +            qemu_co_mutex_lock(&s->lock);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        } else {
> +            if (bs->backing_hd) {
> +                qemu_iovec_reset(&hd_qiov);
> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                                  cur_nr_sectors * BDRV_SECTOR_SIZE);
> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> +                                          sector_num, cur_nr_sectors);
> +                if (n1 > 0) {
> +                    qemu_co_mutex_unlock(&s->lock);
> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
> +                                        cur_nr_sectors, &hd_qiov);
> +                    qemu_co_mutex_lock(&s->lock);
> +                    if (ret < 0) {
> +                        goto fail;
> +                    }
> +                }
> +            } else {
> +                qemu_iovec_memset(&hd_qiov, 0, 0,
> +                                  BDRV_SECTOR_SIZE * cur_nr_sectors);
> +            }
> +        }
> +        remaining_sectors -= cur_nr_sectors;
> +        sector_num += cur_nr_sectors;
> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> +    }
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> +                                     int n_start, int n_end)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    QEMUIOVector qiov;
> +    struct iovec iov;
> +    int n, ret;
> +
> +    n = n_end - n_start;
> +    if (n <= 0) {
> +        return 0;
> +    }
> +
> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = 0;
> +out:
> +    qemu_vfree(iov.iov_base);
> +    return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> +                                          int64_t sector_num,
> +                                          int remaining_sectors,
> +                                          QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret = 0, i;
> +    QEMUIOVector hd_qiov;
> +    uint8_t *table;
> +    uint64_t offset;
> +    int mask = s->cluster_sectors - 1;
> +    int cluster_mask = s->cluster_size - 1;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    ret = bdrv_co_writev(s->image_hd, sector_num,
> +                         remaining_sectors, qiov);
> +
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
> +        /* Copy content of unmodified sectors */
> +        if (!is_cluster_head(sector_num, s->cluster_sectors)
> +            && !is_allocated(bs, sector_num)) {
> +            ret = copy_sectors(bs, sector_num & ~mask, sector_num);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1,
> +                             s->cluster_sectors)
> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
> +                               ((sector_num + remaining_sectors) | mask) + 1);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        for (i = sector_num / s->cluster_sectors;
> +            i <= (sector_num + remaining_sectors - 1) / s->cluster_sectors;
> +            i++) {
> +            offset = s->header.header_size
> +                + (offset_in_bitmap(i * s->cluster_sectors,
> +                s->cluster_sectors) & (~cluster_mask));
> +            ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
> +                table[i / 8] |= (1 << (i % 8));
> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
> +            }
> +
> +            ret = block_cache_put(bs, s->bitmap_cache, (void **) &table);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +    }
> +    ret = 0;
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int sector_per_byte = s->cluster_sectors * 8;
> +    int ret;
> +    int64_t bitmap_size =
> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> +    bitmap_size = (bitmap_size + s->cluster_size - 1)
> +        & (~(s->cluster_size - 1));
> +
> +    ret = bdrv_truncate(bs->file, s->header.header_size + bitmap_size);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_truncate(s->image_hd, size);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return 0;
> +}
> +
> +static int add_cow_reopen_prepare(BDRVReopenState *state,
> +                                  BlockReopenQueue *queue, Error **errp)
> +{
> +    BDRVAddCowState *s;
> +    int ret = -1;
> +
> +    assert(state != NULL);
> +    assert(state->bs != NULL);
> +
> +    if (queue == NULL) {
> +        error_set(errp, ERROR_CLASS_GENERIC_ERROR,
> +                  "No reopen queue for add-cow");
> +        goto exit;
> +    }
> +
> +    s = state->bs->opaque;
> +
> +    assert(s != NULL);
> +
> +
> +    bdrv_reopen_queue(queue, s->image_hd, state->flags);
> +    ret = 0;
> +
> +exit:
> +    return ret;
> +}
> +
> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    if (s->bitmap_cache) {
> +        ret = block_cache_flush(bs, s->bitmap_cache);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +    ret = bdrv_flush(s->image_hd);
> +    qemu_co_mutex_unlock(&s->lock);
> +    return ret;
> +}
> +
> +static int add_cow_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    bdi->cluster_size = s->cluster_size;
> +    return 0;
> +}
> +
> +static QemuOptsList add_cow_create_opts = {
> +    .name = "add-cow-create-opts",
> +    .head = QTAILQ_HEAD_INITIALIZER(add_cow_create_opts.head),
> +    .desc = {
> +        {
> +            .name = BLOCK_OPT_SIZE,
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Virtual disk size"
> +        },
> +        {
> +            .name = BLOCK_OPT_BACKING_FILE,
> +            .type = QEMU_OPT_STRING,
> +            .help = "File name of a base image"
> +        },
> +        {
> +            .name = BLOCK_OPT_BACKING_FMT,
> +            .type = QEMU_OPT_STRING,
> +            .help = "Image format of the base image"
> +        },
> +        {
> +            .name = BLOCK_OPT_IMAGE_FILE,
> +            .type = QEMU_OPT_STRING,
> +            .help = "File name of a image file"
> +        },
> +        {
> +            .name = BLOCK_OPT_IMAGE_FMT,
> +            .type = QEMU_OPT_STRING,
> +            .help = "Image format of the image file"
> +        },
> +        {
> +            .name = BLOCK_OPT_CLUSTER_SIZE,
> +            .type = QEMU_OPT_SIZE,
> +            .help = "add-cow cluster size",
> +            .def_value_str = stringify(ACOW_CLUSTER_SIZE)

Hi Dong,

.def_value_str is undefined - it looks like this series is dependent
on:

[PATCH V14 1/6] add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print

In that 6-part series, patches 2,4,5 are also dependent on the above
patch.  Should these series be combined?

You did mention in the header that this uses QemuOpts, but until I
applied and tried compiling, it wasn't obvious (at least to me) the
external patch dependency above.

Thanks,
Jeff


> +        },
> +        { /* end of list */ }
> +    }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> +    .format_name                = "add-cow",
> +    .instance_size              = sizeof(BDRVAddCowState),
> +    .bdrv_probe                 = add_cow_probe,
> +    .bdrv_open                  = add_cow_open,
> +    .bdrv_close                 = add_cow_close,
> +    .bdrv_create                = add_cow_create,
> +    .bdrv_co_readv              = add_cow_co_readv,
> +    .bdrv_co_writev             = add_cow_co_writev,
> +    .bdrv_truncate              = bdrv_add_cow_truncate,
> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
> +    .bdrv_reopen_prepare        = add_cow_reopen_prepare,
> +    .bdrv_get_info              = add_cow_get_info,
> +
> +    .bdrv_create_opts           = &add_cow_create_opts,
> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> +    bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/block-cache.c b/block/block-cache.c
> index 3544691..4824632 100644
> --- a/block/block-cache.c
> +++ b/block/block-cache.c
> @@ -130,7 +130,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
>      } else if (c->table_type == BLOCK_TABLE_L2) {
>          BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>      } else if (c->table_type == BLOCK_TABLE_BITMAP) {
> -        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>      }
>  
>      ret = bdrv_pwrite(bs->file, c->entries[i].offset,
> @@ -265,7 +265,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
>          if (c->table_type == BLOCK_TABLE_L2) {
>              BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>          } else if (c->table_type == BLOCK_TABLE_BITMAP) {
> -            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>          }
>  
>          ret = bdrv_pread(bs->file, offset, c->entries[i].table,
> diff --git a/block/qcow2.h b/block/qcow2.h
> index e7f6aec..a4e514b 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -98,6 +98,9 @@ typedef struct QCowSnapshot {
>      uint64_t vm_clock_nsec;
>  } QCowSnapshot;
>  
> +struct BlockCache;
> +typedef struct BlockCache BlockCache;
> +
>  typedef struct Qcow2UnknownHeaderExtension {
>      uint32_t magic;
>      uint32_t len;
> @@ -389,7 +392,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
>  
> -/* qcow2-cache.c functions */
> -
> -
>  #endif
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index a44a9ac..b9eb9b8 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -57,6 +57,8 @@
>  #define BLOCK_OPT_COMPAT_LEVEL      "compat"
>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
>  #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
> +#define BLOCK_OPT_IMAGE_FILE        "image_file"
> +#define BLOCK_OPT_IMAGE_FMT         "image_fmt"
>  
>  typedef struct BdrvTrackedRequest BdrvTrackedRequest;
>  
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.
  2013-05-13 15:07   ` Jeff Cody
@ 2013-05-14  2:15     ` Dong Xu Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Dong Xu Wang @ 2013-05-14  2:15 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, wdongxu, qemu-devel, stefanha

On 2013/5/13 23:07, Jeff Cody wrote:
> On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
>> add-cow file format core code. It use block-cache.c as cache code.
>> It lacks of snapshot_blkdev support.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> v17-v18:
>> 1) use error_report, not fprintf.
>> 2) remove version field from header.
>> 3) header_size is MAX(cluster_size, 4096).
>> 4) introduce s->cluster_sectors.
>> 5) use BLKDBG_L2_LOAD/UPDATE.
>>
>> v16->v17:
>> 1) Use stringify.
>>
>> v15->v16:
>> 1) Judge if opts is null in add_cow_create function.
>>   block/Makefile.objs       |   1 +
>>   block/add-cow.c           | 741 ++++++++++++++++++++++++++++++++++++++++++++++
>>   block/block-cache.c       |   4 +-
>>   block/qcow2.h             |   6 +-
>>   include/block/block_int.h |   2 +
>>   5 files changed, 749 insertions(+), 5 deletions(-)
>>   create mode 100644 block/add-cow.c
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 4fe81dc..69cbb09 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -1,6 +1,7 @@
>>   block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>>   block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>   block-obj-y += block-cache.o
>> +block-obj-y += add-cow.o
>>   block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>   block-obj-y += qed-check.o
>>   block-obj-y += parallels.o blkdebug.o blkverify.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..904b008
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,741 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block/block_int.h"
>> +#include "qemu/module.h"
>> +#include "block/block-cache.h"
>> +
>> +#define ACOW_CLUSTER_SIZE 65536
>> +enum {
>> +    /* compat_features bit */
>> +    ACOW_F_ALL_ALLOCATED    = 0X01,
>> +
>> +    /* none feature bit used now */
>> +    ACOW_FEATURE_MASK       = 0,
>> +
>> +    ACOW_MAGIC              = 'A' | 'C' << 8 | 'O' << 16 | 'W' << 24,
>> +    ACOW_CACHE_SIZE         = 16,
>> +    DEFAULT_HEADER_SIZE    = 4096,
>> +    MIN_CLUSTER_BITS            = 9,
>> +    MAX_CLUSTER_BITS            = 21,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> +    uint32_t    magic;
>> +
>> +    uint32_t    backing_offset;
>> +    uint32_t    backing_size;
>> +    uint32_t    image_offset;
>> +    uint32_t    image_size;
>> +
>> +    uint32_t    cluster_bits;
>> +    uint64_t    features;
>> +    uint64_t    compat_features;
>> +    uint32_t    header_size;
>> +
>> +    char        backing_fmt[16];
>> +    char        image_fmt[16];
>> +} QEMU_PACKED AddCowHeader;
>> +
>> +typedef struct BDRVAddCowState {
>> +    BlockDriverState    *image_hd;
>> +    CoMutex             lock;
>> +    int                 cluster_size;
>> +    int                 cluster_sectors;
>> +    BlockCache          *bitmap_cache;
>> +    uint64_t            bitmap_size;
>> +    AddCowHeader        header;
>> +    char                backing_fmt[16];
>> +    char                image_fmt[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num,
>> +                                       int64_t cluster_sectors)
>> +{
>> +    int64_t cluster_num = sector_num / cluster_sectors;
>> +    return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num,
>> +                                   int64_t cluster_sectors)
>> +{
>> +    return sector_num % cluster_sectors == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num,
>> +                                   int64_t cluster_sectors)
>> +{
>> +    return (sector_num + 1) % cluster_sectors == 0;
>> +}
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> +    cpu->magic              = le32_to_cpu(le->magic);
>> +
>> +    cpu->backing_offset     = le32_to_cpu(le->backing_offset);
>> +    cpu->backing_size       = le32_to_cpu(le->backing_size);
>> +    cpu->image_offset       = le32_to_cpu(le->image_offset);
>> +    cpu->image_size         = le32_to_cpu(le->image_size);
>> +
>> +    cpu->cluster_bits       = le32_to_cpu(le->cluster_bits);
>> +    cpu->features           = le64_to_cpu(le->features);
>> +    cpu->compat_features    = le64_to_cpu(le->compat_features);
>> +    cpu->header_size        = le32_to_cpu(le->header_size);
>> +
>> +    memcpy(cpu->backing_fmt, le->backing_fmt, sizeof(cpu->backing_fmt));
>> +    memcpy(cpu->image_fmt, le->image_fmt, sizeof(cpu->image_fmt));
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> +    le->magic               = cpu_to_le32(cpu->magic);
>> +
>> +    le->backing_offset      = cpu_to_le32(cpu->backing_offset);
>> +    le->backing_size        = cpu_to_le32(cpu->backing_size);
>> +    le->image_offset        = cpu_to_le32(cpu->image_offset);
>> +    le->image_size          = cpu_to_le32(cpu->image_size);
>> +
>> +    le->cluster_bits        = cpu_to_le32(cpu->cluster_bits);
>> +    le->features            = cpu_to_le64(cpu->features);
>> +    le->compat_features     = cpu_to_le64(cpu->compat_features);
>> +    le->header_size         = cpu_to_le32(cpu->header_size);
>> +    memcpy(le->backing_fmt, cpu->backing_fmt, sizeof(le->backing_fmt));
>> +    memcpy(le->image_fmt, cpu->image_fmt, sizeof(le->image_fmt));
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> +    const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> +    if (buf_size < sizeof(*header)) {
>> +        return 0;
>> +    }
>> +    if (le32_to_cpu(header->magic) == ACOW_MAGIC) {
>> +        return 100;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int add_cow_create(const char *filename, QemuOpts *opts)
>> +{
>> +    AddCowHeader header = {
>> +        .magic      = ACOW_MAGIC,
>> +        .features   = 0,
>> +        .compat_features = 0,
>> +    };
>> +    AddCowHeader le_header;
>> +    int64_t image_len = 0;
>> +    const char *backing_filename = NULL, *backing_fmt = NULL;
>> +    const char *image_filename = NULL, *image_format = NULL;
>> +    size_t cluster_size = ACOW_CLUSTER_SIZE;
>> +    BlockDriverState *bs, *backing_bs;
>> +    BlockDriver *drv = bdrv_find_format("add-cow");
>> +    int ret;
>> +
>> +    image_len = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
>> +    backing_filename = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
>> +    backing_fmt = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
>> +    image_filename = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FILE);
>> +    image_format = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FMT);
>> +    cluster_size = qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE,
>> +                                     ACOW_CLUSTER_SIZE);
>> +
>> +    header.cluster_bits = ffs(cluster_size) - 1;
>> +    if (header.cluster_bits < MIN_CLUSTER_BITS ||
>> +        header.cluster_bits > MAX_CLUSTER_BITS ||
>> +        (1 << header.cluster_bits) != cluster_size) {
>> +        error_report(
>> +            "Cluster size must be a power of two between %d and %dk",
>> +            1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
>> +        return -EINVAL;
>> +    }
>> +
>> +   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);
>> +    if (backing_filename) {
>> +        header.backing_offset = sizeof(header);
>> +        header.backing_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, NULL,
>> +                            BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            backing_fmt = bdrv_get_format_name(backing_bs);
>> +            bdrv_delete(backing_bs);
>> +        }
>> +    } else {
>> +        header.compat_features |= ACOW_F_ALL_ALLOCATED;
>> +    }
>> +
>> +    if (image_filename) {
>> +        header.image_offset = sizeof(header) + header.backing_size;
>> +        header.image_size = strlen(image_filename);
>> +    } else {
>> +        error_report("Error: image_file should be given.");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same backing file name as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!strcmp(filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same filename as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (header.image_offset + header.image_size > header.header_size) {
>> +        error_report("image_file name or backing_file name too long.");
>> +        return -ENOSPC;
>> +    }
>> +
>> +    ret = bdrv_file_open(&bs, image_filename, NULL, 0);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    bdrv_close(bs);
>> +
>> +    ret = bdrv_create_file(filename, NULL);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    snprintf(header.backing_fmt, sizeof(header.backing_fmt), "%s",
>> +             backing_fmt ? backing_fmt : "");
>> +    snprintf(header.image_fmt, sizeof(header.image_fmt), "%s",
>> +             image_format ? image_format : "raw");
>> +    add_cow_header_cpu_to_le(&header, &le_header);
>> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, header.image_offset, image_filename,
>> +                      header.image_size);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        ret = bdrv_pwrite(bs, header.backing_offset, backing_filename,
>> +                          header.backing_size);
>> +        if (ret < 0) {
>> +            bdrv_delete(bs);
>> +            return ret;
>> +        }
>> +    }
>> +    bdrv_close(bs);
>> +
>> +    ret = bdrv_open(bs, filename, NULL, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_truncate(bs, image_len);
>> +    bdrv_delete(bs);
>> +    return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, QDict *options, int flags)
>> +{
>> +    char                image_filename[PATH_MAX];
>> +    char                tmp_name[PATH_MAX];
>> +    int                 ret;
>> +    int                 sector_per_byte;
>> +    BDRVAddCowState     *s = bs->opaque;
>> +    AddCowHeader        le_header;
>> +    char backing_filename[PATH_MAX];
>> +
>> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> +    if (s->header.magic != ACOW_MAGIC) {
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.features & ~ACOW_FEATURE_MASK) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "Feature Flags: %" PRIx64,
>> +                 s->header.features & ~ACOW_FEATURE_MASK);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +                      bs->device_name, "add-cow", buf);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        snprintf(bs->backing_format, sizeof(bs->backing_format),
>> +                 "%s", s->header.backing_fmt);
>> +    }
>> +
>> +    if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
>> +        s->header.cluster_bits > MAX_CLUSTER_BITS) {
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    s->cluster_size = 1 << s->header.cluster_bits;
>> +    if (s->header.header_size != MAX(s->cluster_size, DEFAULT_HEADER_SIZE)) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "Header size: %d",
>> +                 s->header.header_size);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +                      bs->device_name, "add-cow", buf);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, s->header.backing_offset,
>> +                               s->header.backing_size,
>> +                               bs->backing_file,
>> +                               sizeof(bs->backing_file));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file, s->header.image_offset,
>> +                           s->header.image_size, tmp_name,
>> +                           sizeof(tmp_name));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    if (path_has_protocol(tmp_name)) {
>> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> +    } else {
>> +        path_combine(image_filename, sizeof(image_filename),
>> +                     bs->filename, tmp_name);
>> +    }
>> +    bdrv_get_full_backing_filename(bs, backing_filename,
>> +                                   sizeof(backing_filename));
>> +    if (!strcmp(backing_filename, image_filename)) {
>> +        error_report("Error: backing file and image file are same: %s\n",
>> +                     backing_filename);
>> +        ret = EINVAL;
>> +        goto fail;
>> +    }
>> +    s->image_hd = bdrv_new("");
>> +    ret = bdrv_open(s->image_hd, image_filename, NULL, flags,
>> +                    bdrv_find_format(s->header.image_fmt));
>> +    if (ret < 0) {
>> +        bdrv_delete(s->image_hd);
>> +        goto fail;
>> +    }
>> +
>> +    bs->total_sectors = bdrv_getlength(s->image_hd) / BDRV_SECTOR_SIZE;
>> +    s->cluster_sectors = s->cluster_size / BDRV_SECTOR_SIZE;
>> +    sector_per_byte = s->cluster_sectors * 8;
>> +    s->bitmap_size =
>> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> +    s->bitmap_cache =  block_cache_create(bs, ACOW_CACHE_SIZE,
>> +                                          s->cluster_size,
>> +                                          BLOCK_TABLE_BITMAP);
>> +    qemu_co_mutex_init(&s->lock);
>> +    return 0;
>> +fail:
>> +    return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    if (s->bitmap_cache) {
>> +        block_cache_destroy(bs, s->bitmap_cache);
>> +    }
>> +    bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    int64_t cluster_num = sector_num / s->cluster_sectors;
>> +    uint8_t *table      = NULL;
>> +    bool val = false;
>> +    int ret;
>> +
>> +    uint64_t offset = s->header.header_size +
>> +        (offset_in_bitmap(sector_num, s->cluster_sectors)
>> +            & (~(s->cluster_size - 1)));
>> +    ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    val = table[cluster_num / 8 % s->cluster_size] & (1 << (cluster_num % 8));
>> +    ret = block_cache_put(bs, s->bitmap_cache, (void **)&table);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return val;
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int changed;
>> +
>> +    if (nb_sectors == 0) {
>> +        *num_same = 0;
>> +        return 0;
>> +    }
>> +
>> +    if (s->header.compat_features & ACOW_F_ALL_ALLOCATED) {
>> +        *num_same = nb_sectors;
>> +        return 1;
>> +    }
>> +    changed = is_allocated(bs, sector_num);
>> +
>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>> +            break;
>> +        }
>> +    }
>> +    return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> +                                int64_t sector_num, int nb_sectors)
>> +{
>> +    int n1;
>> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> +        return nb_sectors;
>> +    }
>> +    if (sector_num >= bs->total_sectors) {
>> +        n1 = 0;
>> +    } else {
>> +        n1 = bs->total_sectors - sector_num;
>> +    }
>> +
>> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> +                      0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> +    return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> +                                         int64_t sector_num,
>> +                                         int remaining_sectors,
>> +                                         QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    int cur_nr_sectors;
>> +    uint64_t bytes_done = 0;
>> +    QEMUIOVector hd_qiov;
>> +    int n1, ret = 0;
>> +
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    qemu_co_mutex_lock(&s->lock);
>> +    while (remaining_sectors != 0) {
>> +        cur_nr_sectors = remaining_sectors;
>> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors,
>> +                                 &cur_nr_sectors)) {
>> +            qemu_iovec_reset(&hd_qiov);
>> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                              cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +            qemu_co_mutex_unlock(&s->lock);
>> +            ret = bdrv_co_readv(s->image_hd, sector_num,
>> +                                cur_nr_sectors, &hd_qiov);
>> +            qemu_co_mutex_lock(&s->lock);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        } else {
>> +            if (bs->backing_hd) {
>> +                qemu_iovec_reset(&hd_qiov);
>> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                                  cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> +                                          sector_num, cur_nr_sectors);
>> +                if (n1 > 0) {
>> +                    qemu_co_mutex_unlock(&s->lock);
>> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> +                                        cur_nr_sectors, &hd_qiov);
>> +                    qemu_co_mutex_lock(&s->lock);
>> +                    if (ret < 0) {
>> +                        goto fail;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_iovec_memset(&hd_qiov, 0, 0,
>> +                                  BDRV_SECTOR_SIZE * cur_nr_sectors);
>> +            }
>> +        }
>> +        remaining_sectors -= cur_nr_sectors;
>> +        sector_num += cur_nr_sectors;
>> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> +    }
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> +                                     int n_start, int n_end)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    QEMUIOVector qiov;
>> +    struct iovec iov;
>> +    int n, ret;
>> +
>> +    n = n_end - n_start;
>> +    if (n <= 0) {
>> +        return 0;
>> +    }
>> +
>> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
>> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> +    qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = 0;
>> +out:
>> +    qemu_vfree(iov.iov_base);
>> +    return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +                                          int64_t sector_num,
>> +                                          int remaining_sectors,
>> +                                          QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +    int mask = s->cluster_sectors - 1;
>> +    int cluster_mask = s->cluster_size - 1;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd, sector_num,
>> +                         remaining_sectors, qiov);
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.compat_features & ACOW_F_ALL_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num, s->cluster_sectors)
>> +            && !is_allocated(bs, sector_num)) {
>> +            ret = copy_sectors(bs, sector_num & ~mask, sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1,
>> +                             s->cluster_sectors)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                               ((sector_num + remaining_sectors) | mask) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        for (i = sector_num / s->cluster_sectors;
>> +            i <= (sector_num + remaining_sectors - 1) / s->cluster_sectors;
>> +            i++) {
>> +            offset = s->header.header_size
>> +                + (offset_in_bitmap(i * s->cluster_sectors,
>> +                s->cluster_sectors) & (~cluster_mask));
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> +            }
>> +
>> +            ret = block_cache_put(bs, s->bitmap_cache, (void **) &table);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +    }
>> +    ret = 0;
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int sector_per_byte = s->cluster_sectors * 8;
>> +    int ret;
>> +    int64_t bitmap_size =
>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> +    bitmap_size = (bitmap_size + s->cluster_size - 1)
>> +        & (~(s->cluster_size - 1));
>> +
>> +    ret = bdrv_truncate(bs->file, s->header.header_size + bitmap_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_truncate(s->image_hd, size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int add_cow_reopen_prepare(BDRVReopenState *state,
>> +                                  BlockReopenQueue *queue, Error **errp)
>> +{
>> +    BDRVAddCowState *s;
>> +    int ret = -1;
>> +
>> +    assert(state != NULL);
>> +    assert(state->bs != NULL);
>> +
>> +    if (queue == NULL) {
>> +        error_set(errp, ERROR_CLASS_GENERIC_ERROR,
>> +                  "No reopen queue for add-cow");
>> +        goto exit;
>> +    }
>> +
>> +    s = state->bs->opaque;
>> +
>> +    assert(s != NULL);
>> +
>> +
>> +    bdrv_reopen_queue(queue, s->image_hd, state->flags);
>> +    ret = 0;
>> +
>> +exit:
>> +    return ret;
>> +}
>> +
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    if (s->bitmap_cache) {
>> +        ret = block_cache_flush(bs, s->bitmap_cache);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +    ret = bdrv_flush(s->image_hd);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    return ret;
>> +}
>> +
>> +static int add_cow_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    bdi->cluster_size = s->cluster_size;
>> +    return 0;
>> +}
>> +
>> +static QemuOptsList add_cow_create_opts = {
>> +    .name = "add-cow-create-opts",
>> +    .head = QTAILQ_HEAD_INITIALIZER(add_cow_create_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = BLOCK_OPT_SIZE,
>> +            .type = QEMU_OPT_SIZE,
>> +            .help = "Virtual disk size"
>> +        },
>> +        {
>> +            .name = BLOCK_OPT_BACKING_FILE,
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "File name of a base image"
>> +        },
>> +        {
>> +            .name = BLOCK_OPT_BACKING_FMT,
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "Image format of the base image"
>> +        },
>> +        {
>> +            .name = BLOCK_OPT_IMAGE_FILE,
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "File name of a image file"
>> +        },
>> +        {
>> +            .name = BLOCK_OPT_IMAGE_FMT,
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "Image format of the image file"
>> +        },
>> +        {
>> +            .name = BLOCK_OPT_CLUSTER_SIZE,
>> +            .type = QEMU_OPT_SIZE,
>> +            .help = "add-cow cluster size",
>> +            .def_value_str = stringify(ACOW_CLUSTER_SIZE)
>
> Hi Dong,
>
> .def_value_str is undefined - it looks like this series is dependent
> on:
>
> [PATCH V14 1/6] add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print
>
> In that 6-part series, patches 2,4,5 are also dependent on the above
> patch.  Should these series be combined?
>
> You did mention in the header that this uses QemuOpts, but until I
> applied and tried compiling, it wasn't obvious (at least to me) the
> external patch dependency above.
>
Ah, yes, I wish QemuOpts patches will be merged first, so add-cow is 
based on them, so ..
> Thanks,
> Jeff
>
>
>> +        },
>> +        { /* end of list */ }
>> +    }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> +    .format_name                = "add-cow",
>> +    .instance_size              = sizeof(BDRVAddCowState),
>> +    .bdrv_probe                 = add_cow_probe,
>> +    .bdrv_open                  = add_cow_open,
>> +    .bdrv_close                 = add_cow_close,
>> +    .bdrv_create                = add_cow_create,
>> +    .bdrv_co_readv              = add_cow_co_readv,
>> +    .bdrv_co_writev             = add_cow_co_writev,
>> +    .bdrv_truncate              = bdrv_add_cow_truncate,
>> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
>> +    .bdrv_reopen_prepare        = add_cow_reopen_prepare,
>> +    .bdrv_get_info              = add_cow_get_info,
>> +
>> +    .bdrv_create_opts           = &add_cow_create_opts,
>> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> +    bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/block-cache.c b/block/block-cache.c
>> index 3544691..4824632 100644
>> --- a/block/block-cache.c
>> +++ b/block/block-cache.c
>> @@ -130,7 +130,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
>>       } else if (c->table_type == BLOCK_TABLE_L2) {
>>           BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>>       } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
>> +        BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>       }
>>
>>       ret = bdrv_pwrite(bs->file, c->entries[i].offset,
>> @@ -265,7 +265,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
>>           if (c->table_type == BLOCK_TABLE_L2) {
>>               BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>           } else if (c->table_type == BLOCK_TABLE_BITMAP) {
>> -            BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>> +            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>>           }
>>
>>           ret = bdrv_pread(bs->file, offset, c->entries[i].table,
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index e7f6aec..a4e514b 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -98,6 +98,9 @@ typedef struct QCowSnapshot {
>>       uint64_t vm_clock_nsec;
>>   } QCowSnapshot;
>>
>> +struct BlockCache;
>> +typedef struct BlockCache BlockCache;
>> +
>>   typedef struct Qcow2UnknownHeaderExtension {
>>       uint32_t magic;
>>       uint32_t len;
>> @@ -389,7 +392,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>>   void qcow2_free_snapshots(BlockDriverState *bs);
>>   int qcow2_read_snapshots(BlockDriverState *bs);
>>
>> -/* qcow2-cache.c functions */
>> -
>> -
>>   #endif
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index a44a9ac..b9eb9b8 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -57,6 +57,8 @@
>>   #define BLOCK_OPT_COMPAT_LEVEL      "compat"
>>   #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
>>   #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
>> +#define BLOCK_OPT_IMAGE_FILE        "image_file"
>> +#define BLOCK_OPT_IMAGE_FMT         "image_fmt"
>>
>>   typedef struct BdrvTrackedRequest BdrvTrackedRequest;
>>
>> --
>> 1.7.11.7
>>
>>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-05-14  2:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-10  8:11 [Qemu-devel] [PATCH V18 0/6] add-cow file format Dong Xu Wang
2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 1/6] docs: document for " Dong Xu Wang
2013-04-18  8:30   ` Stefan Hajnoczi
2013-04-23  1:45     ` Dong Xu Wang
2013-04-26 22:45   ` Eric Blake
2013-05-02  5:44     ` Dong Xu Wang
2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 2/6] make path_has_protocol non static Dong Xu Wang
2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
2013-04-18  8:54   ` Stefan Hajnoczi
2013-04-23  1:47     ` Dong Xu Wang
2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 5/6] add-cow file format core code Dong Xu Wang
2013-04-18 10:03   ` Stefan Hajnoczi
2013-04-23  1:54     ` Dong Xu Wang
2013-05-09  6:24     ` Dong Xu Wang
2013-05-09 15:07       ` Stefan Hajnoczi
2013-05-13 15:07   ` Jeff Cody
2013-05-14  2:15     ` Dong Xu Wang
2013-04-10  8:11 ` [Qemu-devel] [PATCH V18 6/6] qemu-iotests: add add-cow iotests support Dong Xu Wang
2013-04-18 10:06 ` [Qemu-devel] [PATCH V18 0/6] add-cow file format Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.