All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V12 0/6] add-cow file format
@ 2012-08-10 15:39 Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

This will introduce a new file format: add-cow. 

add-cow can benefit from other available functions, such as path_has_protocol and
qed_read_string, so we will make them public. 

Now add-cow is still using QEMUOptionParameter, not QemuOpts,  I will send a
separate patch series to convert.

snapshot_blkdev are not supported now for add-cow, after converting QEMUOptionParameter
to QemuOpts, will add related code.


v11->v12:
1) Removed un-used feature bit.
2) Share cache code with qcow2.c.
3) Remove snapshot_blkdev support, will add it in another patch.
5) COW Bitmap field in add-cow file will be multiple of 65536.
6) fix grammer and typo.

Dong Xu Wang (6):
  docs: document for add cow file format
  make path_has_protocol non-static
  qed_read_string to bdrv_read_string
  rename qcow2-cache.c to block-cache.c
  add-cow file format
  qemu-iotests

 block.c                      |   29 ++-
 block.h                      |    6 +
 block/Makefile.objs          |    4 +-
 block/add-cow.c              |  613 ++++++++++++++++++++++++++++++++++++++++++
 block/add-cow.h              |   85 ++++++
 block/qcow2-cache.c          |  323 ----------------------
 block/qcow2-cluster.c        |   66 +++--
 block/qcow2-refcount.c       |   66 +++--
 block/qcow2.c                |   36 ++--
 block/qcow2.h                |   24 +--
 block/qed.c                  |   29 +--
 block_int.h                  |    2 +
 docs/specs/add-cow.txt       |  123 +++++++++
 tests/qemu-iotests/017       |    2 +-
 tests/qemu-iotests/020       |    2 +-
 tests/qemu-iotests/check     |    4 +-
 tests/qemu-iotests/common    |    6 +
 tests/qemu-iotests/common.rc |   19 ++
 trace-events                 |   13 +-
 19 files changed, 994 insertions(+), 458 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 block/add-cow.h
 delete mode 100644 block/qcow2-cache.c
 create mode 100644 docs/specs/add-cow.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:27   ` Michael Roth
  2012-09-10 15:23   ` Kevin Wolf
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Document for add-cow format, the usage and spec of add-cow are introduced.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/add-cow.txt

diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
new file mode 100644
index 0000000..d5a7a68
--- /dev/null
+++ b/docs/specs/add-cow.txt
@@ -0,0 +1,123 @@
+== General ==
+
+The raw file format does not support backing files or copy on write feature.
+The add-cow image format makes it possible to use backing files with raw
+image by keeping a separate .add-cow metadata file. Once all sectors
+have been written into the raw image it is safe to discard the .add-cow
+and backing files, then we can use the raw image directly.
+
+An example usage of add-cow would look like::
+(ubuntu.img is a disk image which has been installed OS.)
+    1)  Create a raw image with the same size of ubuntu.img
+            qemu-img create -f raw test.raw 8G
+    2)  Create an add-cow image which will store dirty bitmap
+            qemu-img create -f add-cow test.add-cow \
+                -o backing_file=ubuntu.img,image_file=test.raw
+    3)  Run qemu with add-cow image
+            qemu -drive if=virtio,file=test.add-cow
+
+test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
+will be calculated from the size of test.raw.
+
+=Specification=
+
+The file format looks like this:
+
+ +---------------+-------------+-----------------+
+ |     Header    |   Reserved  |    COW bitmap   |
+ +---------------+-------------+-----------------+
+
+All numbers in add-cow are stored in Little Endian byte order.
+
+== Header ==
+
+The Header is included in the first bytes:
+(#define HEADER_SIZE (4096 * header_pages_size))
+    Byte    0 -  7:     magic
+                        add-cow magic string ("ADD_COW\xff").
+
+            8 -  11:    version
+                        Version number (only valid value is 1 now).
+
+            12 - 15:    backing file name offset
+                        Offset in the add-cow file at which the backing file
+                        name is stored (NB: The string is not nul-terminated).
+                        If backing file name does NOT exist, this field will be
+                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
+                        must be at least 1 byte).
+
+            16 - 19:    backing file name size
+                        Length of the backing file name in bytes. It will be 0
+                        if the backing file name offset is 0. If backing file
+                        name offset is non-zero, then it must be non-zero. Must
+                        be less than [HEADER_SIZE - 80] to fit in the reserved
+                        part of the header.
+
+            20 - 23:    image file name offset
+                        Offset in the add-cow file at which the image file name
+                        is stored (NB: The string is not null terminated). It
+                        must be between 80 and [HEADER_SIZE - 2].
+
+            24 - 27:    image file name size
+                        Length of the image file name in bytes.
+                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
+                        part of the header.
+
+            28 - 35:    features
+                        Currently only 1 feature bit is used:
+                        Feature bits:
+                            * ADD_COW_F_All_ALLOCATED   = 0x01.
+
+            36 - 43:    optional features
+                        Not used now. Reserved for future use. It must be set to 0.
+
+            44 - 47:    header pages size
+                        The header field is variable-sized. This field indicates
+                        how many pages(4k) will be used to store add-cow header.
+                        In add-cow v1, it is fixed to 1, so the header size will
+                        be 4k * 1 = 4096 bytes.
+
+            48 - 63:    backing file format
+                        format of backing file. It will be filled with 0 if
+                        backing file name offset is 0. If backing file name
+                        offset is non-zero, it must be non-zero. It is coded
+                        in free-form ASCII, and is not NUL-terminated.
+
+            64 - 79:    image file format
+                        format of image file. It must be non-zero. It is coded
+                        in free-form ASCII, and is not NUL-terminated.
+
+            80 - [HEADER_SIZE - 1]:
+                        It is used to make sure COW bitmap field starts at the
+                        HEADER_SIZE byte, backing file name and image file name
+                        will be stored here. The bytes that is not pointing to
+                        backing file and image file names will bet set to 0.
+
+== COW bitmap ==
+
+The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
+backing file and image file. The bitmap will track whether the sector in
+backing file is dirty or not.
+
+Each bit in the bitmap indicates one cluster's status. One cluster includes 128
+sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
+calculated according to virtual size of image file, and it also should be multipe
+of 65536, the bits not used will be set to 0. Within each byte, the least
+significant bit covers the first cluster. Bit orders in one byte look like:
+ +----+----+----+----+----+----+----+----+
+ | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
+ +----+----+----+----+----+----+----+----+
+
+If the bit is 0, indicates the sector has not been allocated in image file, data
+should be loaded from backing file while reading; if the bit is 1, indicates the
+related sector has been dirty, should be loaded from image file while reading.
+Writing to a sector causes the corresponding bit to be set to 1.
+
+If raw image is not an even multiple of cluster bytes, bits that correspond to
+bytes beyond the raw file size in add-cow will be 0.
+
+Image file name and backing file name must NOT be the same, we prevent this
+while creating add-cow files.
+
+Image file and backing file are interpreted relative to the qcow2 file, not
+to the current working directory of the process that opened the qcow2 file.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:27   ` Michael Roth
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

We will use path_has_protocol outside block.c, so just make it public.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.c |    2 +-
 block.h |    1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 24323c1..c13d803 100644
--- a/block.c
+++ b/block.c
@@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with "<protocol>:" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
     const char *p;
 
diff --git a/block.h b/block.h
index 650d872..54e61c9 100644
--- a/block.h
+++ b/block.h
@@ -307,6 +307,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
                   const char *base_path,
                   const char *filename);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:32   ` Michael Roth
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Make qed_read_string function to a common interface, so move it to block.c.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.c     |   27 +++++++++++++++++++++++++++
 block.h     |    2 ++
 block/qed.c |   29 +----------------------------
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index c13d803..d906b35 100644
--- a/block.c
+++ b/block.c
@@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
     return *p == ':';
 }
 
+/**
+ * Read a string of known length from the image file
+ *
+ * @bs:         Image file
+ * @offset:     File offset to start of string, in bytes
+ * @n:          String length in bytes
+ * @buf:        Destination buffer
+ * @buflen:     Destination buffer length in bytes
+ * @ret:        0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+                           char *buf, size_t buflen)
+{
+    int ret;
+    if (n >= buflen) {
+        return -EINVAL;
+    }
+    ret = bdrv_pread(bs, offset, buf, n);
+    if (ret < 0) {
+        return ret;
+    }
+    buf[n] = '\0';
+    return 0;
+}
+
 int path_is_absolute(const char *path)
 {
 #ifdef _WIN32
diff --git a/block.h b/block.h
index 54e61c9..e5dfcd7 100644
--- a/block.h
+++ b/block.h
@@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
     const void *buf, int count);
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+    char *buf, size_t buflen);
 int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/qed.c b/block/qed.c
index 5f3eefa..311c589 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
 }
 
 /**
- * Read a string of known length from the image file
- *
- * @file:       Image file
- * @offset:     File offset to start of string, in bytes
- * @n:          String length in bytes
- * @buf:        Destination buffer
- * @buflen:     Destination buffer length in bytes
- * @ret:        0 on success, -errno on failure
- *
- * The string is NUL-terminated.
- */
-static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
-                           char *buf, size_t buflen)
-{
-    int ret;
-    if (n >= buflen) {
-        return -EINVAL;
-    }
-    ret = bdrv_pread(file, offset, buf, n);
-    if (ret < 0) {
-        return ret;
-    }
-    buf[n] = '\0';
-    return 0;
-}
-
-/**
  * Allocate new clusters
  *
  * @s:          QED state
@@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
             return -EINVAL;
         }
 
-        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
+        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
                               s->header.backing_filename_size, bs->backing_file,
                               sizeof(bs->backing_file));
         if (ret < 0) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (2 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:52   ` Michael Roth
  2012-09-11  8:41   ` Kevin Wolf
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

add-cow and qcow2 file format will share the same cache code, so rename
block-cache.c to block-cache.c. And related structure and qcow2 code also
are changed.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.h                |    3 +
 block/Makefile.objs    |    3 +-
 block/qcow2-cache.c    |  323 ------------------------------------------------
 block/qcow2-cluster.c  |   66 ++++++----
 block/qcow2-refcount.c |   66 ++++++-----
 block/qcow2.c          |   36 +++---
 block/qcow2.h          |   24 +---
 trace-events           |   13 +-
 8 files changed, 109 insertions(+), 425 deletions(-)
 delete mode 100644 block/qcow2-cache.c

diff --git a/block.h b/block.h
index e5dfcd7..c325661 100644
--- a/block.h
+++ b/block.h
@@ -401,6 +401,9 @@ typedef enum {
     BLKDBG_CLUSTER_ALLOC_BYTES,
     BLKDBG_CLUSTER_FREE,
 
+    BLKDBG_ADD_COW_UPDATE,
+    BLKDBG_ADD_COW_LOAD,
+
     BLKDBG_EVENT_MAX,
 } BlkDebugEvent;
 
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..23bdfc8 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,7 +1,8 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
+block-obj-y += block-cache.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
deleted file mode 100644
index 2d4322a..0000000
--- a/block/qcow2-cache.c
+++ /dev/null
@@ -1,323 +0,0 @@
-/*
- * L2/refcount table cache for the QCOW2 format
- *
- * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "block_int.h"
-#include "qemu-common.h"
-#include "qcow2.h"
-#include "trace.h"
-
-typedef struct Qcow2CachedTable {
-    void*   table;
-    int64_t offset;
-    bool    dirty;
-    int     cache_hits;
-    int     ref;
-} Qcow2CachedTable;
-
-struct Qcow2Cache {
-    Qcow2CachedTable*       entries;
-    struct Qcow2Cache*      depends;
-    int                     size;
-    bool                    depends_on_flush;
-};
-
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
-{
-    BDRVQcowState *s = bs->opaque;
-    Qcow2Cache *c;
-    int i;
-
-    c = g_malloc0(sizeof(*c));
-    c->size = num_tables;
-    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
-
-    for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
-    }
-
-    return c;
-}
-
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        assert(c->entries[i].ref == 0);
-        qemu_vfree(c->entries[i].table);
-    }
-
-    g_free(c->entries);
-    g_free(c);
-
-    return 0;
-}
-
-static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
-{
-    int ret;
-
-    ret = qcow2_cache_flush(bs, c->depends);
-    if (ret < 0) {
-        return ret;
-    }
-
-    c->depends = NULL;
-    c->depends_on_flush = false;
-
-    return 0;
-}
-
-static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
-{
-    BDRVQcowState *s = bs->opaque;
-    int ret = 0;
-
-    if (!c->entries[i].dirty || !c->entries[i].offset) {
-        return 0;
-    }
-
-    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
-                                  c == s->l2_table_cache, i);
-
-    if (c->depends) {
-        ret = qcow2_cache_flush_dependency(bs, c);
-    } else if (c->depends_on_flush) {
-        ret = bdrv_flush(bs->file);
-        if (ret >= 0) {
-            c->depends_on_flush = false;
-        }
-    }
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    if (c == s->refcount_block_cache) {
-        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
-    } else if (c == s->l2_table_cache) {
-        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
-    }
-
-    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
-    if (ret < 0) {
-        return ret;
-    }
-
-    c->entries[i].dirty = false;
-
-    return 0;
-}
-
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
-{
-    BDRVQcowState *s = bs->opaque;
-    int result = 0;
-    int ret;
-    int i;
-
-    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
-
-    for (i = 0; i < c->size; i++) {
-        ret = qcow2_cache_entry_flush(bs, c, i);
-        if (ret < 0 && result != -ENOSPC) {
-            result = ret;
-        }
-    }
-
-    if (result == 0) {
-        ret = bdrv_flush(bs->file);
-        if (ret < 0) {
-            result = ret;
-        }
-    }
-
-    return result;
-}
-
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
-    Qcow2Cache *dependency)
-{
-    int ret;
-
-    if (dependency->depends) {
-        ret = qcow2_cache_flush_dependency(bs, dependency);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    if (c->depends && (c->depends != dependency)) {
-        ret = qcow2_cache_flush_dependency(bs, c);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    c->depends = dependency;
-    return 0;
-}
-
-void qcow2_cache_depends_on_flush(Qcow2Cache *c)
-{
-    c->depends_on_flush = true;
-}
-
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
-    int i;
-    int min_count = INT_MAX;
-    int min_index = -1;
-
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].ref) {
-            continue;
-        }
-
-        if (c->entries[i].cache_hits < min_count) {
-            min_index = i;
-            min_count = c->entries[i].cache_hits;
-        }
-
-        /* Give newer hits priority */
-        /* TODO Check how to optimize the replacement strategy */
-        c->entries[i].cache_hits /= 2;
-    }
-
-    if (min_index == -1) {
-        /* This can't happen in current synchronous code, but leave the check
-         * here as a reminder for whoever starts using AIO with the cache */
-        abort();
-    }
-    return min_index;
-}
-
-static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
-    uint64_t offset, void **table, bool read_from_disk)
-{
-    BDRVQcowState *s = bs->opaque;
-    int i;
-    int ret;
-
-    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
-                          offset, read_from_disk);
-
-    /* Check if the table is already cached */
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].offset == offset) {
-            goto found;
-        }
-    }
-
-    /* If not, write a table back and replace it */
-    i = qcow2_cache_find_entry_to_replace(c);
-    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
-                                        c == s->l2_table_cache, i);
-    if (i < 0) {
-        return i;
-    }
-
-    ret = qcow2_cache_entry_flush(bs, c, i);
-    if (ret < 0) {
-        return ret;
-    }
-
-    trace_qcow2_cache_get_read(qemu_coroutine_self(),
-                               c == s->l2_table_cache, i);
-    c->entries[i].offset = 0;
-    if (read_from_disk) {
-        if (c == s->l2_table_cache) {
-            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
-        }
-
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    /* Give the table some hits for the start so that it won't be replaced
-     * immediately. The number 32 is completely arbitrary. */
-    c->entries[i].cache_hits = 32;
-    c->entries[i].offset = offset;
-
-    /* And return the right table */
-found:
-    c->entries[i].cache_hits++;
-    c->entries[i].ref++;
-    *table = c->entries[i].table;
-
-    trace_qcow2_cache_get_done(qemu_coroutine_self(),
-                               c == s->l2_table_cache, i);
-
-    return 0;
-}
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table)
-{
-    return qcow2_cache_do_get(bs, c, offset, table, true);
-}
-
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table)
-{
-    return qcow2_cache_do_get(bs, c, offset, table, false);
-}
-
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == *table) {
-            goto found;
-        }
-    }
-    return -ENOENT;
-
-found:
-    c->entries[i].ref--;
-    *table = NULL;
-
-    assert(c->entries[i].ref >= 0);
-    return 0;
-}
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == table) {
-            goto found;
-        }
-    }
-    abort();
-
-found:
-    c->entries[i].dirty = true;
-}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e179211..335dc7a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -28,6 +28,7 @@
 #include "block_int.h"
 #include "block/qcow2.h"
 #include "trace.h"
+#include "block-cache.h"
 
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
 {
@@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
         return new_l1_table_offset;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         goto fail;
     }
@@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
+    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+        (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
 
     return ret;
 }
@@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
         return l2_offset;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         goto fail;
     }
@@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
     /* allocate a new entry in the l2 cache */
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
-    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
+    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
+        (void **) table, BLOCK_TABLE_L2, s->cluster_size);
     if (ret < 0) {
         return ret;
     }
@@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
         /* if there was an old l2 table, read it from the disk */
         BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
-        ret = qcow2_cache_get(bs, s->l2_table_cache,
+        ret = block_cache_get(bs, s->l2_table_cache,
             old_l2_offset & L1E_OFFSET_MASK,
-            (void**) &old_table);
+            (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
         if (ret < 0) {
             goto fail;
         }
 
         memcpy(l2_table, old_table, s->cluster_size);
 
-        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
+        ret = block_cache_put(bs, s->l2_table_cache,
+            (void **) &old_table, BLOCK_TABLE_L2);
         if (ret < 0) {
             goto fail;
         }
@@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
     BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
 
     trace_qcow2_l2_allocate_write_l2(bs, l1_index);
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    ret = block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
     if (ret < 0) {
         goto fail;
     }
@@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
 fail:
     trace_qcow2_l2_allocate_done(bs, l1_index, ret);
-    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
+    block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
     s->l1_table[l1_index] = old_l2_offset;
     return ret;
 }
@@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         abort();
     }
 
-    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
 
     nb_available = (c * s->cluster_sectors);
 
@@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
      * allocated. */
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
     if (cluster_offset & L2E_OFFSET_MASK) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache,
+            (void **) &l2_table, BLOCK_TABLE_L2);
         return 0;
     }
 
     cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
     if (cluster_offset < 0) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache,
+            (void **) &l2_table, BLOCK_TABLE_L2);
         return 0;
     }
 
@@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     /* compressed clusters never have the copied flag */
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
     l2_table[l2_index] = cpu_to_be64(cluster_offset);
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return 0;
     }
@@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      * handled.
      */
     if (cow) {
-        qcow2_cache_depends_on_flush(s->l2_table_cache);
+        block_cache_depends_on_flush(s->l2_table_cache);
     }
 
-    if (qcow2_need_accurate_refcounts(s)) {
-        qcow2_cache_set_dependency(bs, s->l2_table_cache,
-                                   s->refcount_block_cache);
-    }
+    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
+        s->refcount_block_cache, s->cluster_size);
     ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
     if (ret < 0) {
         goto err;
     }
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
 
     for (i = 0; i < m->nb_clusters; i++) {
         /* if two concurrent writes happen to the same unallocated cluster
@@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      }
 
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         goto err;
     }
@@ -913,7 +922,8 @@ again:
      * request to complete. If we still had the reference, we could use up the
      * whole cache with sleeping requests.
      */
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return ret;
     }
@@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
         }
 
         /* First remove L2 entries */
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
         l2_table[l2_index + i] = cpu_to_be64(0);
 
         /* Then decrease the refcount */
         qcow2_free_any_clusters(bs, old_offset, 1);
     }
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return ret;
     }
@@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         old_offset = be64_to_cpu(l2_table[l2_index + i]);
 
         /* Update L2 entries */
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
         if (old_offset & QCOW_OFLAG_COMPRESSED) {
             l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1);
@@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         }
     }
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return ret;
     }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 5e3f915..728bfc1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -25,6 +25,7 @@
 #include "qemu-common.h"
 #include "block_int.h"
 #include "block/qcow2.h"
+#include "block-cache.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
     int ret;
 
     BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        refcount_block);
+    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+        refcount_block, BLOCK_TABLE_REF, s->cluster_size);
 
     return ret;
 }
@@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
     if (!refcount_block_offset)
         return 0;
 
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        (void**) &refcount_block);
+    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+        (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         return ret;
     }
@@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
         ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
     refcount = be16_to_cpu(refcount_block[block_index]);
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache,
-        (void**) &refcount_block);
+    ret = block_cache_put(bs, s->refcount_block_cache,
+        (void **) &refcount_block, BLOCK_TABLE_REF);
     if (ret < 0) {
         return ret;
     }
@@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
     *refcount_block = NULL;
 
     /* We write to the refcount table, so we might depend on L2 tables */
-    qcow2_cache_flush(bs, s->l2_table_cache);
+    block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
 
     /* Allocate the refcount block itself and mark it as used */
     int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
@@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
     if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
         /* Zero the new refcount block before updating it */
-        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
         if (ret < 0) {
             goto fail_block;
         }
@@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
         /* Initialize the new refcount block only after updating its refcount,
          * update_refcount uses the refcount cache itself */
-        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
         if (ret < 0) {
             goto fail_block;
         }
@@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
     /* Now the new refcount block needs to be written to disk */
     BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
-    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         goto fail_block;
     }
@@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
         return 0;
     }
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+    ret = block_cache_put(bs, s->refcount_block_cache,
+        (void **) refcount_block, BLOCK_TABLE_REF);
     if (ret < 0) {
         goto fail_block;
     }
@@ -406,7 +410,8 @@ fail_table:
     g_free(new_table);
 fail_block:
     if (*refcount_block != NULL) {
-        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+        block_cache_put(bs, s->refcount_block_cache,
+            (void **) refcount_block, BLOCK_TABLE_REF);
     }
     return ret;
 }
@@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     }
 
     if (addend < 0) {
-        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
-            s->l2_table_cache);
+        block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
+            s->l2_table_cache, s->cluster_size);
     }
 
     start = offset & ~(s->cluster_size - 1);
@@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         /* Load the refcount block and allocate it if needed */
         if (table_index != old_table_index) {
             if (refcount_block) {
-                ret = qcow2_cache_put(bs, s->refcount_block_cache,
-                    (void**) &refcount_block);
+                ret = block_cache_put(bs, s->refcount_block_cache,
+                    (void **) &refcount_block, BLOCK_TABLE_REF);
                 if (ret < 0) {
                     goto fail;
                 }
@@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         }
         old_table_index = table_index;
 
-        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
 
         /* we can update the count and save it */
         block_index = cluster_index &
@@ -486,8 +491,8 @@ fail:
     /* Write last changed block to disk */
     if (refcount_block) {
         int wret;
-        wret = qcow2_cache_put(bs, s->refcount_block_cache,
-            (void**) &refcount_block);
+        wret = block_cache_put(bs, s->refcount_block_cache,
+            (void **) &refcount_block, BLOCK_TABLE_REF);
         if (wret < 0) {
             return ret < 0 ? ret : wret;
         }
@@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
             old_l2_offset = l2_offset;
             l2_offset &= L1E_OFFSET_MASK;
 
-            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
-                (void**) &l2_table);
+            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+                (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
             if (ret < 0) {
                 goto fail;
             }
@@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     }
                     if (offset != old_offset) {
                         if (addend > 0) {
-                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
-                                s->refcount_block_cache);
+                            block_cache_set_dependency(bs, s->l2_table_cache,
+                                BLOCK_TABLE_L2, s->refcount_block_cache,
+                                s->cluster_size);
                         }
                         l2_table[j] = cpu_to_be64(offset);
-                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+                        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
                     }
                 }
             }
 
-            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+            ret = block_cache_put(bs, s->l2_table_cache,
+                (void **) &l2_table, BLOCK_TABLE_L2);
             if (ret < 0) {
                 goto fail;
             }
@@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     ret = 0;
 fail:
     if (l2_table) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache,
+            (void **) &l2_table, BLOCK_TABLE_L2);
     }
 
     /* Update L1 only if it isn't deleted anyway (addend = -1) */
diff --git a/block/qcow2.c b/block/qcow2.c
index fd5e214..b89d312 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -30,6 +30,7 @@
 #include "qemu-error.h"
 #include "qerror.h"
 #include "trace.h"
+#include "block-cache.h"
 
 /*
   Differences with QCOW:
@@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+    s->refcount_block_cache =
+        block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
@@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     qcow2_refcount_close(bs);
     g_free(s->l1_table);
     if (s->l2_table_cache) {
-        qcow2_cache_destroy(bs, s->l2_table_cache);
+        block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
     }
     g_free(s->cluster_cache);
     qemu_vfree(s->cluster_data);
@@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     g_free(s->l1_table);
 
-    qcow2_cache_flush(bs, s->l2_table_cache);
-    qcow2_cache_flush(bs, s->refcount_block_cache);
-
+    block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
+    block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     qcow2_mark_clean(bs);
-
-    qcow2_cache_destroy(bs, s->l2_table_cache);
-    qcow2_cache_destroy(bs, s->refcount_block_cache);
+    block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
+    block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
 
     g_free(s->unknown_header_fields);
     cleanup_unknown_header_ext(bs);
@@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
                     options->value.s);
                 return -EINVAL;
             }
-        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
-            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
         }
         options++;
     }
@@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
     int ret;
 
     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    ret = block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
     if (ret < 0) {
         qemu_co_mutex_unlock(&s->lock);
         return ret;
     }
 
-    if (qcow2_need_accurate_refcounts(s)) {
-        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
-        if (ret < 0) {
-            qemu_co_mutex_unlock(&s->lock);
-            return ret;
-        }
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
+    if (ret < 0) {
+        qemu_co_mutex_unlock(&s->lock);
+        return ret;
     }
     qemu_co_mutex_unlock(&s->lock);
 
diff --git a/block/qcow2.h b/block/qcow2.h
index b4eb654..cb6fd7a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
 
 #include "aes.h"
 #include "qemu-coroutine.h"
+#include "block-cache.h"
 
 //#define DEBUG_ALLOC
 //#define DEBUG_ALLOC2
@@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
     uint64_t vm_clock_nsec;
 } QCowSnapshot;
 
-struct Qcow2Cache;
-typedef struct Qcow2Cache Qcow2Cache;
 
 typedef struct Qcow2UnknownHeaderExtension {
     uint32_t magic;
@@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
     uint64_t l1_table_offset;
     uint64_t *l1_table;
 
-    Qcow2Cache* l2_table_cache;
-    Qcow2Cache* refcount_block_cache;
+    BlockCache *l2_table_cache;
+    BlockCache *refcount_block_cache;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
 
 void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
-
-/* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
-    Qcow2Cache *dependency);
-void qcow2_cache_depends_on_flush(Qcow2Cache *c);
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table);
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table);
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
-
 #endif
diff --git a/trace-events b/trace-events
index 6b12f83..52b6438 100644
--- a/trace-events
+++ b/trace-events
@@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
 qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
 qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
 
-qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
-qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
-qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+# block/block-cache.c
+block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
+block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
+block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
 
 # block/qed-l2-cache.c
 qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (3 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 20:19   ` Michael Roth
  2012-09-11  9:40   ` Kevin Wolf
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
  2012-08-23  5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

add-cow file format core code. It use block-cache.c as cache code.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block/Makefile.objs |    1 +
 block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/add-cow.h     |   85 +++++++
 block_int.h         |    2 +
 4 files changed, 701 insertions(+), 0 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 block/add-cow.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 23bdfc8..7ed5051 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
+block-obj-y += add-cow.o
 block-obj-y += block-cache.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 0000000..d4711d5
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,613 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+#include "add-cow.h"
+
+static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
+{
+    cpu->magic                      = le64_to_cpu(le->magic);
+    cpu->version                    = le32_to_cpu(le->version);
+
+    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
+    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
+
+    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
+    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
+
+    cpu->features                   = le64_to_cpu(le->features);
+    cpu->optional_features          = le64_to_cpu(le->optional_features);
+    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
+}
+
+static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
+{
+    le->magic                       = cpu_to_le64(cpu->magic);
+    le->version                     = cpu_to_le32(cpu->version);
+
+    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
+    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
+
+    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
+    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
+
+    le->features                    = cpu_to_le64(cpu->features);
+    le->optional_features           = cpu_to_le64(cpu->optional_features);
+    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
+}
+
+static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const AddCowHeader *header = (const AddCowHeader *)buf;
+
+    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
+        le32_to_cpu(header->version) == ADD_COW_VERSION) {
+        return 100;
+    } else {
+        return 0;
+    }
+}
+
+static int add_cow_create(const char *filename, QEMUOptionParameter *options)
+{
+    AddCowHeader header = {
+        .magic = ADD_COW_MAGIC,
+        .version = ADD_COW_VERSION,
+        .features = 0,
+        .optional_features = 0,
+        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
+    };
+    AddCowHeader le_header;
+    int64_t image_len = 0;
+    const char *backing_filename = NULL;
+    const char *backing_fmt = NULL;
+    const char *image_filename = NULL;
+    const char *image_format = NULL;
+    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
+    BlockDriver *drv = bdrv_find_format("add-cow");
+    BDRVAddCowState s;
+    int ret;
+
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            image_len = options->value.n;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
+            backing_filename = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
+            backing_fmt = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
+            image_filename = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
+            image_format = options->value.s;
+        }
+        options++;
+    }
+
+    if (backing_filename) {
+        header.backing_filename_offset = sizeof(header)
+            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
+        header.backing_filename_size = strlen(backing_filename);
+
+        if (!backing_fmt) {
+            backing_bs = bdrv_new("image");
+            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
+                    | BDRV_O_CACHE_WB, NULL);
+            if (ret < 0) {
+                return ret;
+            }
+            backing_fmt = bdrv_get_format_name(backing_bs);
+            bdrv_delete(backing_bs);
+        }
+    } else {
+        header.features |= ADD_COW_F_All_ALLOCATED;
+    }
+
+    if (image_filename) {
+        header.image_filename_offset =
+            sizeof(header) + sizeof(s.backing_file_format)
+                + sizeof(s.image_file_format) + header.backing_filename_size;
+        header.image_filename_size = strlen(image_filename);
+    } else {
+        error_report("Error: image_file should be given.");
+        return -EINVAL;
+    }
+
+    if (backing_filename && !strcmp(backing_filename, image_filename)) {
+        error_report("Error: Trying to create an image with the "
+                     "same backing file name as the image file name");
+        return -EINVAL;
+    }
+
+    if (!strcmp(filename, image_filename)) {
+        error_report("Error: Trying to create an image with the "
+                     "same filename as the image file name");
+        return -EINVAL;
+    }
+
+    if (header.image_filename_offset + header.image_filename_size
+            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
+        error_report("image_file name or backing_file name too long.");
+        return -ENOSPC;
+    }
+
+    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
+    if (ret < 0) {
+        return ret;
+    }
+    bdrv_delete(image_bs);
+
+    ret = bdrv_create_file(filename, NULL);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
+    if (ret < 0) {
+        return ret;
+    }
+    add_cow_header_cpu_to_le(&header, &le_header);
+    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
+        backing_fmt ? strlen(backing_fmt) : 0);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
+        image_format ? image_format : "raw",
+        image_format ? strlen(image_format) : sizeof("raw"));
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    if (backing_filename) {
+        ret = bdrv_pwrite(bs, header.backing_filename_offset,
+            backing_filename, header.backing_filename_size);
+        if (ret < 0) {
+            bdrv_delete(bs);
+            return ret;
+        }
+    }
+
+    ret = bdrv_pwrite(bs, header.image_filename_offset,
+        image_filename, header.image_filename_size);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_truncate(bs, image_len);
+    bdrv_delete(bs);
+    return ret;
+}
+
+static int add_cow_open(BlockDriverState *bs, int flags)
+{
+    char                image_filename[ADD_COW_FILE_LEN];
+    char                tmp_name[ADD_COW_FILE_LEN];
+    BlockDriver         *image_drv = NULL;
+    int                 ret;
+    int                 sector_per_byte;
+    BDRVAddCowState     *s = bs->opaque;
+    AddCowHeader        le_header;
+
+    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+    if (ret != sizeof(s->header)) {
+        goto fail;
+    }
+
+    add_cow_header_le_to_cpu(&le_header, &s->header);
+
+    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    if (s->header.version != ADD_COW_VERSION) {
+        char version[64];
+        snprintf(version, sizeof(version), "ADD-COW version %d",
+            s->header.version);
+        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+            bs->device_name, "add-cow", version);
+        ret = -ENOTSUP;
+        goto fail;
+    }
+
+    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
+        char buf[64];
+        snprintf(buf, sizeof(buf), "%" PRIx64,
+            s->header.features & ~ADD_COW_FEATURE_MASK);
+        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+            bs->device_name, "add-cow", buf);
+        return -ENOTSUP;
+    }
+
+    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+        ret = bdrv_read_string(bs->file, sizeof(s->header),
+            sizeof(s->backing_file_format) - 1, s->backing_file_format,
+            sizeof(s->backing_file_format));
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
+    ret = bdrv_read_string(bs->file,
+            sizeof(s->header) + sizeof(s->image_file_format),
+            sizeof(s->image_file_format) - 1, s->image_file_format,
+            sizeof(s->image_file_format));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
+                          s->header.backing_filename_size, bs->backing_file,
+                          sizeof(bs->backing_file));
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
+    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
+                      s->header.image_filename_size, tmp_name,
+                      sizeof(tmp_name));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    s->image_hd = bdrv_new("");
+    if (path_has_protocol(image_filename)) {
+        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
+    } else {
+        path_combine(image_filename, sizeof(image_filename),
+                     bs->filename, tmp_name);
+    }
+
+    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
+    if (ret < 0) {
+        bdrv_delete(s->image_hd);
+        goto fail;
+    }
+
+    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
+    s->cluster_size = ADD_COW_CLUSTER_SIZE;
+    sector_per_byte = SECTORS_PER_CLUSTER * 8;
+    s->bitmap_size =
+        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
+    s->bitmap_cache =
+        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
+
+    qemu_co_mutex_init(&s->lock);
+    return 0;
+fail:
+    if (s->bitmap_cache) {
+        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
+    }
+    return ret;
+}
+
+static void add_cow_close(BlockDriverState *bs)
+{
+    BDRVAddCowState *s = bs->opaque;
+    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
+    bdrv_delete(s->image_hd);
+}
+
+static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
+{
+    BDRVAddCowState *s  = bs->opaque;
+    BlockCache *c = s->bitmap_cache;
+    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+    uint8_t *table      = NULL;
+    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
+    int ret = block_cache_get(bs, s->bitmap_cache, offset,
+        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
+
+    if (ret < 0) {
+        return ret;
+    }
+    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
+        & (1 << (cluster_num % 8));
+}
+
+static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, int *num_same)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int changed;
+
+    if (nb_sectors == 0) {
+        *num_same = 0;
+        return 0;
+    }
+
+    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
+        *num_same = nb_sectors - 1;
+        return 1;
+    }
+    changed = is_allocated(bs, sector_num);
+
+    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
+        if (is_allocated(bs, sector_num + *num_same) != changed) {
+            break;
+        }
+    }
+    return changed;
+}
+
+static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
+                  int64_t sector_num, int nb_sectors)
+{
+    int n1;
+    if ((sector_num + nb_sectors) <= bs->total_sectors) {
+        return nb_sectors;
+    }
+    if (sector_num >= bs->total_sectors) {
+        n1 = 0;
+    } else {
+        n1 = bs->total_sectors - sector_num;
+    }
+
+    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
+        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
+
+    return n1;
+}
+
+static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
+    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
+{
+    BDRVAddCowState *s  = bs->opaque;
+    int cur_nr_sectors;
+    uint64_t bytes_done = 0;
+    QEMUIOVector hd_qiov;
+    int n, n1, ret = 0;
+
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    qemu_co_mutex_lock(&s->lock);
+    while (remaining_sectors != 0) {
+        cur_nr_sectors = remaining_sectors;
+        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
+            cur_nr_sectors = n;
+            qemu_iovec_reset(&hd_qiov);
+            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+                            cur_nr_sectors * BDRV_SECTOR_SIZE);
+            qemu_co_mutex_unlock(&s->lock);
+            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
+        } else {
+            cur_nr_sectors = n;
+            if (bs->backing_hd) {
+                qemu_iovec_reset(&hd_qiov);
+                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+                            cur_nr_sectors * BDRV_SECTOR_SIZE);
+                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
+                    sector_num, cur_nr_sectors);
+                if (n1 > 0) {
+                    qemu_co_mutex_unlock(&s->lock);
+                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
+                                        n, &hd_qiov);
+                    qemu_co_mutex_lock(&s->lock);
+                    if (ret < 0) {
+                        goto fail;
+                    }
+                }
+            } else {
+                qemu_iovec_memset(&hd_qiov, 0, 0,
+                    BDRV_SECTOR_SIZE * cur_nr_sectors);
+            }
+        }
+        remaining_sectors -= cur_nr_sectors;
+        sector_num += cur_nr_sectors;
+        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
+    }
+fail:
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
+static int coroutine_fn copy_sectors(BlockDriverState *bs,
+                                     int n_start, int n_end)
+{
+    BDRVAddCowState *s = bs->opaque;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int n, ret;
+
+    n = n_end - n_start;
+    if (n <= 0) {
+        return 0;
+    }
+
+    iov.iov_len = n * BDRV_SECTOR_SIZE;
+    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
+    if (ret < 0) {
+        goto out;
+    }
+    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = 0;
+out:
+    qemu_vfree(iov.iov_base);
+    return ret;
+}
+
+static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
+        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
+{
+    BDRVAddCowState *s = bs->opaque;
+    BlockCache *c = s->bitmap_cache;
+    int ret = 0, i;
+    QEMUIOVector hd_qiov;
+    uint8_t *table;
+    uint64_t offset;
+
+    qemu_co_mutex_lock(&s->lock);
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    ret = bdrv_co_writev(s->image_hd,
+                     sector_num,
+                     remaining_sectors, qiov);
+
+    if (ret < 0) {
+        goto fail;
+    }
+    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+        /* Copy content of unmodified sectors */
+        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
+            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
+                sector_num);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
+            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
+            ret = copy_sectors(bs, sector_num + remaining_sectors,
+                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        for (i = sector_num / SECTORS_PER_CLUSTER;
+            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
+            i++) {
+            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
+            ret = block_cache_get(bs, s->bitmap_cache, offset,
+                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
+            if (ret < 0) {
+                goto fail;
+            }
+            if ((table[i / 8] & (1 << (i % 8))) == 0) {
+                table[i / 8] |= (1 << (i % 8));
+                block_cache_entry_mark_dirty(s->bitmap_cache, table);
+            }
+        }
+    }
+    ret = 0;
+fail:
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
+static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
+    int ret;
+    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
+    int64_t bitmap_size =
+        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
+    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
+        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
+
+    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
+    if (ret < 0) {
+        return ret;
+    }
+    return 0;
+}
+
+static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int ret;
+
+    qemu_co_mutex_lock(&s->lock);
+    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
+        ADD_COW_CACHE_ENTRY_SIZE);
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
+}
+
+static QEMUOptionParameter add_cow_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+    {
+        .name = BLOCK_OPT_BACKING_FILE,
+        .type = OPT_STRING,
+        .help = "File name of a base image"
+    },
+    {
+        .name = BLOCK_OPT_BACKING_FMT,
+        .type = OPT_STRING,
+        .help = "Image format of the base image"
+    },
+    {
+        .name = BLOCK_OPT_IMAGE_FILE,
+        .type = OPT_STRING,
+        .help = "File name of a image file"
+    },
+    {
+        .name = BLOCK_OPT_IMAGE_FORMAT,
+        .type = OPT_STRING,
+        .help = "Image format of the image file"
+    },
+    { NULL }
+};
+
+static BlockDriver bdrv_add_cow = {
+    .format_name                = "add-cow",
+    .instance_size              = sizeof(BDRVAddCowState),
+    .bdrv_probe                 = add_cow_probe,
+    .bdrv_open                  = add_cow_open,
+    .bdrv_close                 = add_cow_close,
+    .bdrv_create                = add_cow_create,
+    .bdrv_co_readv              = add_cow_co_readv,
+    .bdrv_co_writev             = add_cow_co_writev,
+    .bdrv_truncate              = bdrv_add_cow_truncate,
+    .bdrv_co_is_allocated       = add_cow_is_allocated,
+
+    .create_options             = add_cow_create_options,
+    .bdrv_co_flush_to_os        = add_cow_co_flush,
+};
+
+static void bdrv_add_cow_init(void)
+{
+    bdrv_register(&bdrv_add_cow);
+}
+
+block_init(bdrv_add_cow_init);
diff --git a/block/add-cow.h b/block/add-cow.h
new file mode 100644
index 0000000..f058376
--- /dev/null
+++ b/block/add-cow.h
@@ -0,0 +1,85 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BLOCK_ADD_COW_H
+#define BLOCK_ADD_COW_H
+#include "block-cache.h"
+
+enum {
+    ADD_COW_F_All_ALLOCATED     = 0X01,
+    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
+
+    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
+                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
+                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
+                    ((uint64_t)'W' << 8) | 0xFF),
+    ADD_COW_VERSION             = 1,
+    ADD_COW_FILE_LEN            = 1024,
+    ADD_COW_CACHE_SIZE          = 16,
+    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
+    ADD_COW_CLUSTER_SIZE        = 65536,
+    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
+    ADD_COW_PAGE_SIZE           = 4096,
+    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
+};
+
+typedef struct AddCowHeader {
+    uint64_t        magic;
+    uint32_t        version;
+
+    uint32_t        backing_filename_offset;
+    uint32_t        backing_filename_size;
+
+    uint32_t        image_filename_offset;
+    uint32_t        image_filename_size;
+
+    uint64_t        features;
+    uint64_t        optional_features;
+    uint32_t        header_pages_size;
+} QEMU_PACKED AddCowHeader;
+
+typedef struct BDRVAddCowState {
+    BlockDriverState    *image_hd;
+    CoMutex             lock;
+    int                 cluster_size;
+    BlockCache         *bitmap_cache;
+    uint64_t            bitmap_size;
+    AddCowHeader        header;
+    char                backing_file_format[16];
+    char                image_file_format[16];
+} BDRVAddCowState;
+
+/* Convert sector_num to offset in bitmap */
+static inline int64_t offset_in_bitmap(int64_t sector_num)
+{
+    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+    return cluster_num / 8;
+}
+
+static inline bool is_cluster_head(int64_t sector_num)
+{
+    return sector_num % SECTORS_PER_CLUSTER == 0;
+}
+
+static inline bool is_cluster_tail(int64_t sector_num)
+{
+    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
+}
+
+BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
+int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
+void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
+int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+    void **table);
+int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
+#endif
diff --git a/block_int.h b/block_int.h
index 6c1d9ca..67954ec 100644
--- a/block_int.h
+++ b/block_int.h
@@ -53,6 +53,8 @@
 #define BLOCK_OPT_SUBFMT            "subformat"
 #define BLOCK_OPT_COMPAT_LEVEL      "compat"
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
+#define BLOCK_OPT_IMAGE_FILE        "image_file"
+#define BLOCK_OPT_IMAGE_FORMAT      "image_format"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (4 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-11  9:55   ` Kevin Wolf
  2012-08-23  5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Add qemu-iotests support for add-cow.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 tests/qemu-iotests/017       |    2 +-
 tests/qemu-iotests/020       |    2 +-
 tests/qemu-iotests/check     |    4 ++--
 tests/qemu-iotests/common    |    6 ++++++
 tests/qemu-iotests/common.rc |   19 +++++++++++++++++++
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 66951eb..d31432f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 2fb0ff8..3dbb495 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 432732c..122267b 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -243,7 +243,7 @@ do
 		echo " - no qualified output"
 		err=true
 	    else
-		if diff -w $seq.out $tmp.out >/dev/null 2>&1
+        if diff -w -I "^Formatting" $seq.out $tmp.out >/dev/null 2>&1
 		then
 		    echo ""
 		    if $err
@@ -255,7 +255,7 @@ do
 		else
 		    echo " - output mismatch (see $seq.out.bad)"
 		    mv $tmp.out $seq.out.bad
-		    $diff -w $seq.out $seq.out.bad
+            $diff -w -I "^Formatting" $seq.out $seq.out.bad
 		    err=true
 		fi
 	    fi
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 1f6fdf5..1c81b09 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -128,6 +128,7 @@ common options
 check options
     -raw                test raw (default)
     -cow                test cow
+    -add-cow            test add-cow
     -qcow               test qcow
     -qcow2              test qcow2
     -qed                test qed
@@ -163,6 +164,11 @@ testlist options
 	    xpand=false
 	    ;;
 
+    -add-cow)
+        IMGFMT=add-cow
+        xpand=false
+        ;;
+
 	-qcow)
 	    IMGFMT=qcow
 	    xpand=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 7782808..ec5afd7 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -97,6 +97,18 @@ _make_test_img()
     fi
     if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
         optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
+    elif [ "$IMGFMT" = "add-cow" ]; then
+        local BACKING="$TEST_IMG"".qcow2"
+        local IMG="$TEST_IMG"".raw"
+        if [ "$1" = "-b" ]; then
+            IMG="$IMG"".b"
+            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+            extra_img_options="-o image_file=$IMG $extra_img_options"
+        else
+            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+            $QEMU_IMG create -f qcow2 $BACKING $image_size>/dev/null
+            extra_img_options="-o backing_file=$BACKING,image_file=$IMG"
+        fi
     fi
 
     if [ -n "$optstr" ]; then
@@ -125,6 +137,13 @@ _cleanup_test_img()
             rm -f $TEST_DIR/t.$IMGFMT
             rm -f $TEST_DIR/t.$IMGFMT.orig
             rm -f $TEST_DIR/t.$IMGFMT.base
+            if [ "$IMGFMT" = "add-cow" ]; then
+                rm -f $TEST_DIR/t.$IMGFMT.qcow2
+                rm -f $TEST_DIR/t.$IMGFMT.raw
+                rm -f $TEST_DIR/t.$IMGFMT.raw.b
+                rm -f $TEST_DIR/t.$IMGFMT.ct.qcow2
+                rm -f $TEST_DIR/t.$IMGFMT.ct.raw
+            fi
             ;;
 
         rbd)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 0/6] add-cow file format
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (5 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
@ 2012-08-23  5:34 ` Dong Xu Wang
  6 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-23  5:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Anyone can give me some comments? That will be very grateful..

On Fri, Aug 10, 2012 at 11:39 PM, Dong Xu Wang
<wdongxu@linux.vnet.ibm.com> wrote:
> This will introduce a new file format: add-cow.
>
> add-cow can benefit from other available functions, such as path_has_protocol and
> qed_read_string, so we will make them public.
>
> Now add-cow is still using QEMUOptionParameter, not QemuOpts,  I will send a
> separate patch series to convert.
>
> snapshot_blkdev are not supported now for add-cow, after converting QEMUOptionParameter
> to QemuOpts, will add related code.
>
>
> v11->v12:
> 1) Removed un-used feature bit.
> 2) Share cache code with qcow2.c.
> 3) Remove snapshot_blkdev support, will add it in another patch.
> 5) COW Bitmap field in add-cow file will be multiple of 65536.
> 6) fix grammer and typo.
>
> Dong Xu Wang (6):
>   docs: document for add cow file format
>   make path_has_protocol non-static
>   qed_read_string to bdrv_read_string
>   rename qcow2-cache.c to block-cache.c
>   add-cow file format
>   qemu-iotests
>
>  block.c                      |   29 ++-
>  block.h                      |    6 +
>  block/Makefile.objs          |    4 +-
>  block/add-cow.c              |  613 ++++++++++++++++++++++++++++++++++++++++++
>  block/add-cow.h              |   85 ++++++
>  block/qcow2-cache.c          |  323 ----------------------
>  block/qcow2-cluster.c        |   66 +++--
>  block/qcow2-refcount.c       |   66 +++--
>  block/qcow2.c                |   36 ++--
>  block/qcow2.h                |   24 +--
>  block/qed.c                  |   29 +--
>  block_int.h                  |    2 +
>  docs/specs/add-cow.txt       |  123 +++++++++
>  tests/qemu-iotests/017       |    2 +-
>  tests/qemu-iotests/020       |    2 +-
>  tests/qemu-iotests/check     |    4 +-
>  tests/qemu-iotests/common    |    6 +
>  tests/qemu-iotests/common.rc |   19 ++
>  trace-events                 |   13 +-
>  19 files changed, 994 insertions(+), 458 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 block/add-cow.h
>  delete mode 100644 block/qcow2-cache.c
>  create mode 100644 docs/specs/add-cow.txt
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
@ 2012-09-06 17:27   ` Michael Roth
  2012-09-10  1:48     ` Dong Xu Wang
  2012-09-10 15:23   ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:27 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:40PM +0800, Dong Xu Wang wrote:
> Document for add-cow format, the usage and spec of add-cow are introduced.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 123 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/add-cow.txt
> 
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..d5a7a68
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,123 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
> +    1)  Create a raw image with the same size of ubuntu.img
> +            qemu-img create -f raw test.raw 8G
> +    2)  Create an add-cow image which will store dirty bitmap
> +            qemu-img create -f add-cow test.add-cow \
> +                -o backing_file=ubuntu.img,image_file=test.raw
> +    3)  Run qemu with add-cow image
> +            qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + |     Header    |   Reserved  |    COW bitmap   |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_pages_size))
> +    Byte    0 -  7:     magic
> +                        add-cow magic string ("ADD_COW\xff").
> +
> +            8 -  11:    version
> +                        Version number (only valid value is 1 now).
> +
> +            12 - 15:    backing file name offset
> +                        Offset in the add-cow file at which the backing file
> +                        name is stored (NB: The string is not nul-terminated).
> +                        If backing file name does NOT exist, this field will be
> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
> +                        must be at least 1 byte).
> +
> +            16 - 19:    backing file name size
> +                        Length of the backing file name in bytes. It will be 0
> +                        if the backing file name offset is 0. If backing file
> +                        name offset is non-zero, then it must be non-zero. Must
> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            20 - 23:    image file name offset
> +                        Offset in the add-cow file at which the image file name
> +                        is stored (NB: The string is not null terminated). It
> +                        must be between 80 and [HEADER_SIZE - 2].
> +
> +            24 - 27:    image file name size
> +                        Length of the image file name in bytes.
> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            28 - 35:    features
> +                        Currently only 1 feature bit is used:
> +                        Feature bits:
> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.
> +
> +            36 - 43:    optional features
> +                        Not used now. Reserved for future use. It must be set to 0.
> +
> +            44 - 47:    header pages size
> +                        The header field is variable-sized. This field indicates
> +                        how many pages(4k) will be used to store add-cow header.
> +                        In add-cow v1, it is fixed to 1, so the header size will
> +                        be 4k * 1 = 4096 bytes.
> +
> +            48 - 63:    backing file format
> +                        format of backing file. It will be filled with 0 if
> +                        backing file name offset is 0. If backing file name
> +                        offset is non-zero, it must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.
> +
> +            64 - 79:    image file format
> +                        format of image file. It must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.
> +
> +            80 - [HEADER_SIZE - 1]:
> +                        It is used to make sure COW bitmap field starts at the
> +                        HEADER_SIZE byte, backing file name and image file name
> +                        will be stored here. The bytes that is not pointing to
> +                        backing file and image file names will bet set to 0.
> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
> +calculated according to virtual size of image file, and it also should be multipe
> +of 65536, the bits not used will be set to 0. Within each byte, the least
> +significant bit covers the first cluster. Bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow will be 0.
> +
> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.
> +
> +Image file and backing file are interpreted relative to the qcow2 file, not

Relative to the add-cow file?

> +to the current working directory of the process that opened the qcow2 file.
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
@ 2012-09-06 17:27   ` Michael Roth
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:27 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:41PM +0800, Dong Xu Wang wrote:
> We will use path_has_protocol outside block.c, so just make it public.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

> ---
>  block.c |    2 +-
>  block.h |    1 +
>  2 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 24323c1..c13d803 100644
> --- a/block.c
> +++ b/block.c
> @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
>  }
> 
>  /* check if the path starts with "<protocol>:" */
> -static int path_has_protocol(const char *path)
> +int path_has_protocol(const char *path)
>  {
>      const char *p;
> 
> diff --git a/block.h b/block.h
> index 650d872..54e61c9 100644
> --- a/block.h
> +++ b/block.h
> @@ -307,6 +307,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
> 
>  char *get_human_readable_size(char *buf, int buf_size, int64_t size);
>  int path_is_absolute(const char *path);
> +int path_has_protocol(const char *path);
>  void path_combine(char *dest, int dest_size,
>                    const char *base_path,
>                    const char *filename);
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-09-06 17:32   ` Michael Roth
  2012-09-10  1:49     ` Dong Xu Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:32 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:42PM +0800, Dong Xu Wang wrote:
> Make qed_read_string function to a common interface, so move it to block.c.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block.c     |   27 +++++++++++++++++++++++++++
>  block.h     |    2 ++
>  block/qed.c |   29 +----------------------------
>  3 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c13d803..d906b35 100644
> --- a/block.c
> +++ b/block.c
> @@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
>      return *p == ':';
>  }
> 
> +/**
> + * Read a string of known length from the image file
> + *
> + * @bs:         Image file
> + * @offset:     File offset to start of string, in bytes
> + * @n:          String length in bytes
> + * @buf:        Destination buffer
> + * @buflen:     Destination buffer length in bytes
> + * @ret:        0 on success, -errno on failure
> + *
> + * The string is NUL-terminated.
> + */
> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
> +                           char *buf, size_t buflen)

Small alignment issue   ^

> +{
> +    int ret;
> +    if (n >= buflen) {
> +        return -EINVAL;
> +    }
> +    ret = bdrv_pread(bs, offset, buf, n);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    buf[n] = '\0';
> +    return 0;
> +}
> +
>  int path_is_absolute(const char *path)
>  {
>  #ifdef _WIN32
> diff --git a/block.h b/block.h
> index 54e61c9..e5dfcd7 100644
> --- a/block.h
> +++ b/block.h
> @@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>      const void *buf, int count);
>  int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
>      int nb_sectors, QEMUIOVector *qiov);
> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
> +    char *buf, size_t buflen);

Another one here        ^

>  int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>  int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/qed.c b/block/qed.c
> index 5f3eefa..311c589 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
>  }
> 
>  /**
> - * Read a string of known length from the image file
> - *
> - * @file:       Image file
> - * @offset:     File offset to start of string, in bytes
> - * @n:          String length in bytes
> - * @buf:        Destination buffer
> - * @buflen:     Destination buffer length in bytes
> - * @ret:        0 on success, -errno on failure
> - *
> - * The string is NUL-terminated.
> - */
> -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
> -                           char *buf, size_t buflen)
> -{
> -    int ret;
> -    if (n >= buflen) {
> -        return -EINVAL;
> -    }
> -    ret = bdrv_pread(file, offset, buf, n);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -    buf[n] = '\0';
> -    return 0;
> -}
> -
> -/**
>   * Allocate new clusters
>   *
>   * @s:          QED state
> @@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>              return -EINVAL;
>          }
> 
> -        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>                                s->header.backing_filename_size, bs->backing_file,
>                                sizeof(bs->backing_file));

Here too                          ^

Looks good otherwise.

>          if (ret < 0) {
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-09-06 17:52   ` Michael Roth
  2012-09-10  2:14     ` Dong Xu Wang
  2012-09-11  8:41   ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:52 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:43PM +0800, Dong Xu Wang wrote:
> add-cow and qcow2 file format will share the same cache code, so rename
> block-cache.c to block-cache.c. And related structure and qcow2 code also

"qcow2-cache.c to block-cache.c"

But I've scanned through the rest of your patches and can't seem to find
where block-cache.c gets introduced. Did you forget to git add it?

> are changed.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block.h                |    3 +
>  block/Makefile.objs    |    3 +-
>  block/qcow2-cache.c    |  323 ------------------------------------------------
>  block/qcow2-cluster.c  |   66 ++++++----
>  block/qcow2-refcount.c |   66 ++++++-----
>  block/qcow2.c          |   36 +++---
>  block/qcow2.h          |   24 +---
>  trace-events           |   13 +-
>  8 files changed, 109 insertions(+), 425 deletions(-)
>  delete mode 100644 block/qcow2-cache.c
> 
> diff --git a/block.h b/block.h
> index e5dfcd7..c325661 100644
> --- a/block.h
> +++ b/block.h
> @@ -401,6 +401,9 @@ typedef enum {
>      BLKDBG_CLUSTER_ALLOC_BYTES,
>      BLKDBG_CLUSTER_FREE,
> 
> +    BLKDBG_ADD_COW_UPDATE,
> +    BLKDBG_ADD_COW_LOAD,
> +
>      BLKDBG_EVENT_MAX,
>  } BlkDebugEvent;
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index b5754d3..23bdfc8 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,7 +1,8 @@
>  block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> +block-obj-y += block-cache.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> deleted file mode 100644
> index 2d4322a..0000000
> --- a/block/qcow2-cache.c
> +++ /dev/null
> @@ -1,323 +0,0 @@
> -/*
> - * L2/refcount table cache for the QCOW2 format
> - *
> - * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a copy
> - * of this software and associated documentation files (the "Software"), to deal
> - * in the Software without restriction, including without limitation the rights
> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> - * copies of the Software, and to permit persons to whom the Software is
> - * furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice shall be included in
> - * all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> - * THE SOFTWARE.
> - */
> -
> -#include "block_int.h"
> -#include "qemu-common.h"
> -#include "qcow2.h"
> -#include "trace.h"
> -
> -typedef struct Qcow2CachedTable {
> -    void*   table;
> -    int64_t offset;
> -    bool    dirty;
> -    int     cache_hits;
> -    int     ref;
> -} Qcow2CachedTable;
> -
> -struct Qcow2Cache {
> -    Qcow2CachedTable*       entries;
> -    struct Qcow2Cache*      depends;
> -    int                     size;
> -    bool                    depends_on_flush;
> -};
> -
> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    Qcow2Cache *c;
> -    int i;
> -
> -    c = g_malloc0(sizeof(*c));
> -    c->size = num_tables;
> -    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
> -
> -    for (i = 0; i < c->size; i++) {
> -        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
> -    }
> -
> -    return c;
> -}
> -
> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
> -{
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        assert(c->entries[i].ref == 0);
> -        qemu_vfree(c->entries[i].table);
> -    }
> -
> -    g_free(c->entries);
> -    g_free(c);
> -
> -    return 0;
> -}
> -
> -static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
> -{
> -    int ret;
> -
> -    ret = qcow2_cache_flush(bs, c->depends);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    c->depends = NULL;
> -    c->depends_on_flush = false;
> -
> -    return 0;
> -}
> -
> -static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int ret = 0;
> -
> -    if (!c->entries[i].dirty || !c->entries[i].offset) {
> -        return 0;
> -    }
> -
> -    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
> -                                  c == s->l2_table_cache, i);
> -
> -    if (c->depends) {
> -        ret = qcow2_cache_flush_dependency(bs, c);
> -    } else if (c->depends_on_flush) {
> -        ret = bdrv_flush(bs->file);
> -        if (ret >= 0) {
> -            c->depends_on_flush = false;
> -        }
> -    }
> -
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    if (c == s->refcount_block_cache) {
> -        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
> -    } else if (c == s->l2_table_cache) {
> -        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
> -    }
> -
> -    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
> -        s->cluster_size);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    c->entries[i].dirty = false;
> -
> -    return 0;
> -}
> -
> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int result = 0;
> -    int ret;
> -    int i;
> -
> -    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
> -
> -    for (i = 0; i < c->size; i++) {
> -        ret = qcow2_cache_entry_flush(bs, c, i);
> -        if (ret < 0 && result != -ENOSPC) {
> -            result = ret;
> -        }
> -    }
> -
> -    if (result == 0) {
> -        ret = bdrv_flush(bs->file);
> -        if (ret < 0) {
> -            result = ret;
> -        }
> -    }
> -
> -    return result;
> -}
> -
> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
> -    Qcow2Cache *dependency)
> -{
> -    int ret;
> -
> -    if (dependency->depends) {
> -        ret = qcow2_cache_flush_dependency(bs, dependency);
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    }
> -
> -    if (c->depends && (c->depends != dependency)) {
> -        ret = qcow2_cache_flush_dependency(bs, c);
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    }
> -
> -    c->depends = dependency;
> -    return 0;
> -}
> -
> -void qcow2_cache_depends_on_flush(Qcow2Cache *c)
> -{
> -    c->depends_on_flush = true;
> -}
> -
> -static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
> -{
> -    int i;
> -    int min_count = INT_MAX;
> -    int min_index = -1;
> -
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].ref) {
> -            continue;
> -        }
> -
> -        if (c->entries[i].cache_hits < min_count) {
> -            min_index = i;
> -            min_count = c->entries[i].cache_hits;
> -        }
> -
> -        /* Give newer hits priority */
> -        /* TODO Check how to optimize the replacement strategy */
> -        c->entries[i].cache_hits /= 2;
> -    }
> -
> -    if (min_index == -1) {
> -        /* This can't happen in current synchronous code, but leave the check
> -         * here as a reminder for whoever starts using AIO with the cache */
> -        abort();
> -    }
> -    return min_index;
> -}
> -
> -static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
> -    uint64_t offset, void **table, bool read_from_disk)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int i;
> -    int ret;
> -
> -    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
> -                          offset, read_from_disk);
> -
> -    /* Check if the table is already cached */
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].offset == offset) {
> -            goto found;
> -        }
> -    }
> -
> -    /* If not, write a table back and replace it */
> -    i = qcow2_cache_find_entry_to_replace(c);
> -    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
> -                                        c == s->l2_table_cache, i);
> -    if (i < 0) {
> -        return i;
> -    }
> -
> -    ret = qcow2_cache_entry_flush(bs, c, i);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    trace_qcow2_cache_get_read(qemu_coroutine_self(),
> -                               c == s->l2_table_cache, i);
> -    c->entries[i].offset = 0;
> -    if (read_from_disk) {
> -        if (c == s->l2_table_cache) {
> -            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
> -        }
> -
> -        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    }
> -
> -    /* Give the table some hits for the start so that it won't be replaced
> -     * immediately. The number 32 is completely arbitrary. */
> -    c->entries[i].cache_hits = 32;
> -    c->entries[i].offset = offset;
> -
> -    /* And return the right table */
> -found:
> -    c->entries[i].cache_hits++;
> -    c->entries[i].ref++;
> -    *table = c->entries[i].table;
> -
> -    trace_qcow2_cache_get_done(qemu_coroutine_self(),
> -                               c == s->l2_table_cache, i);
> -
> -    return 0;
> -}
> -
> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table)
> -{
> -    return qcow2_cache_do_get(bs, c, offset, table, true);
> -}
> -
> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table)
> -{
> -    return qcow2_cache_do_get(bs, c, offset, table, false);
> -}
> -
> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
> -{
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].table == *table) {
> -            goto found;
> -        }
> -    }
> -    return -ENOENT;
> -
> -found:
> -    c->entries[i].ref--;
> -    *table = NULL;
> -
> -    assert(c->entries[i].ref >= 0);
> -    return 0;
> -}
> -
> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
> -{
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].table == table) {
> -            goto found;
> -        }
> -    }
> -    abort();
> -
> -found:
> -    c->entries[i].dirty = true;
> -}
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e179211..335dc7a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "block/qcow2.h"
>  #include "trace.h"
> +#include "block-cache.h"
> 
>  int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>  {
> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>          return new_l1_table_offset;
>      }
> 
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          goto fail;
>      }
> @@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
>      BDRVQcowState *s = bs->opaque;
>      int ret;
> 
> -    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
> +    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
> +        (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
> 
>      return ret;
>  }
> @@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>          return l2_offset;
>      }
> 
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          goto fail;
>      }
> @@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>      /* allocate a new entry in the l2 cache */
> 
>      trace_qcow2_l2_allocate_get_empty(bs, l1_index);
> -    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
> +    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
> +        (void **) table, BLOCK_TABLE_L2, s->cluster_size);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> 
>          /* if there was an old l2 table, read it from the disk */
>          BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
> -        ret = qcow2_cache_get(bs, s->l2_table_cache,
> +        ret = block_cache_get(bs, s->l2_table_cache,
>              old_l2_offset & L1E_OFFSET_MASK,
> -            (void**) &old_table);
> +            (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
>          if (ret < 0) {
>              goto fail;
>          }
> 
>          memcpy(l2_table, old_table, s->cluster_size);
> 
> -        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
> +        ret = block_cache_put(bs, s->l2_table_cache,
> +            (void **) &old_table, BLOCK_TABLE_L2);
>          if (ret < 0) {
>              goto fail;
>          }
> @@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>      BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
> 
>      trace_qcow2_l2_allocate_write_l2(bs, l1_index);
> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +    ret = block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
>      if (ret < 0) {
>          goto fail;
>      }
> @@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> 
>  fail:
>      trace_qcow2_l2_allocate_done(bs, l1_index, ret);
> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
> +    block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
>      s->l1_table[l1_index] = old_l2_offset;
>      return ret;
>  }
> @@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>          abort();
>      }
> 
> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
> 
>      nb_available = (c * s->cluster_sectors);
> 
> @@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>       * allocated. */
>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
>      if (cluster_offset & L2E_OFFSET_MASK) {
> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +        block_cache_put(bs, s->l2_table_cache,
> +            (void **) &l2_table, BLOCK_TABLE_L2);
>          return 0;
>      }
> 
>      cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
>      if (cluster_offset < 0) {
> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +        block_cache_put(bs, s->l2_table_cache,
> +            (void **) &l2_table, BLOCK_TABLE_L2);
>          return 0;
>      }
> 
> @@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>      /* compressed clusters never have the copied flag */
> 
>      BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>      l2_table[l2_index] = cpu_to_be64(cluster_offset);
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return 0;
>      }
> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>       * handled.
>       */
>      if (cow) {
> -        qcow2_cache_depends_on_flush(s->l2_table_cache);
> +        block_cache_depends_on_flush(s->l2_table_cache);
>      }
> 
> -    if (qcow2_need_accurate_refcounts(s)) {
> -        qcow2_cache_set_dependency(bs, s->l2_table_cache,
> -                                   s->refcount_block_cache);
> -    }
> +    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
> +        s->refcount_block_cache, s->cluster_size);
>      ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
>      if (ret < 0) {
>          goto err;
>      }
> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> 
>      for (i = 0; i < m->nb_clusters; i++) {
>          /* if two concurrent writes happen to the same unallocated cluster
> @@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>       }
> 
> 
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          goto err;
>      }
> @@ -913,7 +922,8 @@ again:
>       * request to complete. If we still had the reference, we could use up the
>       * whole cache with sleeping requests.
>       */
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>          }
> 
>          /* First remove L2 entries */
> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>          l2_table[l2_index + i] = cpu_to_be64(0);
> 
>          /* Then decrease the refcount */
>          qcow2_free_any_clusters(bs, old_offset, 1);
>      }
> 
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>          old_offset = be64_to_cpu(l2_table[l2_index + i]);
> 
>          /* Update L2 entries */
> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>          if (old_offset & QCOW_OFLAG_COMPRESSED) {
>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>              qcow2_free_any_clusters(bs, old_offset, 1);
> @@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>          }
>      }
> 
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return ret;
>      }
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 5e3f915..728bfc1 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -25,6 +25,7 @@
>  #include "qemu-common.h"
>  #include "block_int.h"
>  #include "block/qcow2.h"
> +#include "block-cache.h"
> 
>  static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
>  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> @@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
>      int ret;
> 
>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> -        refcount_block);
> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> +        refcount_block, BLOCK_TABLE_REF, s->cluster_size);
> 
>      return ret;
>  }
> @@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>      if (!refcount_block_offset)
>          return 0;
> 
> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> -        (void**) &refcount_block);
> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> +        (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>          ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
>      refcount = be16_to_cpu(refcount_block[block_index]);
> 
> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
> -        (void**) &refcount_block);
> +    ret = block_cache_put(bs, s->refcount_block_cache,
> +        (void **) &refcount_block, BLOCK_TABLE_REF);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>      *refcount_block = NULL;
> 
>      /* We write to the refcount table, so we might depend on L2 tables */
> -    qcow2_cache_flush(bs, s->l2_table_cache);
> +    block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
> 
>      /* Allocate the refcount block itself and mark it as used */
>      int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
> @@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
> 
>      if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
>          /* Zero the new refcount block before updating it */
> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
> -            (void**) refcount_block);
> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>          if (ret < 0) {
>              goto fail_block;
>          }
> @@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
> 
>          /* Initialize the new refcount block only after updating its refcount,
>           * update_refcount uses the refcount cache itself */
> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
> -            (void**) refcount_block);
> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>          if (ret < 0) {
>              goto fail_block;
>          }
> @@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
> 
>      /* Now the new refcount block needs to be written to disk */
>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
> -    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          goto fail_block;
>      }
> @@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>          return 0;
>      }
> 
> -    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
> +    ret = block_cache_put(bs, s->refcount_block_cache,
> +        (void **) refcount_block, BLOCK_TABLE_REF);
>      if (ret < 0) {
>          goto fail_block;
>      }
> @@ -406,7 +410,8 @@ fail_table:
>      g_free(new_table);
>  fail_block:
>      if (*refcount_block != NULL) {
> -        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
> +        block_cache_put(bs, s->refcount_block_cache,
> +            (void **) refcount_block, BLOCK_TABLE_REF);
>      }
>      return ret;
>  }
> @@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>      }
> 
>      if (addend < 0) {
> -        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
> -            s->l2_table_cache);
> +        block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
> +            s->l2_table_cache, s->cluster_size);
>      }
> 
>      start = offset & ~(s->cluster_size - 1);
> @@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>          /* Load the refcount block and allocate it if needed */
>          if (table_index != old_table_index) {
>              if (refcount_block) {
> -                ret = qcow2_cache_put(bs, s->refcount_block_cache,
> -                    (void**) &refcount_block);
> +                ret = block_cache_put(bs, s->refcount_block_cache,
> +                    (void **) &refcount_block, BLOCK_TABLE_REF);
>                  if (ret < 0) {
>                      goto fail;
>                  }
> @@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>          }
>          old_table_index = table_index;
> 
> -        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
> +        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
> 
>          /* we can update the count and save it */
>          block_index = cluster_index &
> @@ -486,8 +491,8 @@ fail:
>      /* Write last changed block to disk */
>      if (refcount_block) {
>          int wret;
> -        wret = qcow2_cache_put(bs, s->refcount_block_cache,
> -            (void**) &refcount_block);
> +        wret = block_cache_put(bs, s->refcount_block_cache,
> +            (void **) &refcount_block, BLOCK_TABLE_REF);
>          if (wret < 0) {
>              return ret < 0 ? ret : wret;
>          }
> @@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>              old_l2_offset = l2_offset;
>              l2_offset &= L1E_OFFSET_MASK;
> 
> -            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
> -                (void**) &l2_table);
> +            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
> +                (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
>              if (ret < 0) {
>                  goto fail;
>              }
> @@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>                      }
>                      if (offset != old_offset) {
>                          if (addend > 0) {
> -                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
> -                                s->refcount_block_cache);
> +                            block_cache_set_dependency(bs, s->l2_table_cache,
> +                                BLOCK_TABLE_L2, s->refcount_block_cache,
> +                                s->cluster_size);
>                          }
>                          l2_table[j] = cpu_to_be64(offset);
> -                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +                        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>                      }
>                  }
>              }
> 
> -            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +            ret = block_cache_put(bs, s->l2_table_cache,
> +                (void **) &l2_table, BLOCK_TABLE_L2);
>              if (ret < 0) {
>                  goto fail;
>              }
> @@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>      ret = 0;
>  fail:
>      if (l2_table) {
> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +        block_cache_put(bs, s->l2_table_cache,
> +            (void **) &l2_table, BLOCK_TABLE_L2);
>      }
> 
>      /* Update L1 only if it isn't deleted anyway (addend = -1) */
> diff --git a/block/qcow2.c b/block/qcow2.c
> index fd5e214..b89d312 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -30,6 +30,7 @@
>  #include "qemu-error.h"
>  #include "qerror.h"
>  #include "trace.h"
> +#include "block-cache.h"
> 
>  /*
>    Differences with QCOW:
> @@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>      }
> 
>      /* alloc L2 table/refcount block cache */
> -    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
> -    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
> +    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
> +    s->refcount_block_cache =
> +        block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
> 
>      s->cluster_cache = g_malloc(s->cluster_size);
>      /* one more sector for decompressed data alignment */
> @@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>      qcow2_refcount_close(bs);
>      g_free(s->l1_table);
>      if (s->l2_table_cache) {
> -        qcow2_cache_destroy(bs, s->l2_table_cache);
> +        block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>      }
>      g_free(s->cluster_cache);
>      qemu_vfree(s->cluster_data);
> @@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
>      BDRVQcowState *s = bs->opaque;
>      g_free(s->l1_table);
> 
> -    qcow2_cache_flush(bs, s->l2_table_cache);
> -    qcow2_cache_flush(bs, s->refcount_block_cache);
> -
> +    block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
> +    block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      qcow2_mark_clean(bs);
> -
> -    qcow2_cache_destroy(bs, s->l2_table_cache);
> -    qcow2_cache_destroy(bs, s->refcount_block_cache);
> +    block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
> +    block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
> 
>      g_free(s->unknown_header_fields);
>      cleanup_unknown_header_ext(bs);
> @@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
>                      options->value.s);
>                  return -EINVAL;
>              }
> -        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
> -            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
>          }
>          options++;
>      }
> @@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>      int ret;
> 
>      qemu_co_mutex_lock(&s->lock);
> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
> +    ret = block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
>      if (ret < 0) {
>          qemu_co_mutex_unlock(&s->lock);
>          return ret;
>      }
> 
> -    if (qcow2_need_accurate_refcounts(s)) {
> -        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> -        if (ret < 0) {
> -            qemu_co_mutex_unlock(&s->lock);
> -            return ret;
> -        }
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
> +    if (ret < 0) {
> +        qemu_co_mutex_unlock(&s->lock);
> +        return ret;
>      }
>      qemu_co_mutex_unlock(&s->lock);
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index b4eb654..cb6fd7a 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -27,6 +27,7 @@
> 
>  #include "aes.h"
>  #include "qemu-coroutine.h"
> +#include "block-cache.h"
> 
>  //#define DEBUG_ALLOC
>  //#define DEBUG_ALLOC2
> @@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
>      uint64_t vm_clock_nsec;
>  } QCowSnapshot;
> 
> -struct Qcow2Cache;
> -typedef struct Qcow2Cache Qcow2Cache;
> 
>  typedef struct Qcow2UnknownHeaderExtension {
>      uint32_t magic;
> @@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
>      uint64_t l1_table_offset;
>      uint64_t *l1_table;
> 
> -    Qcow2Cache* l2_table_cache;
> -    Qcow2Cache* refcount_block_cache;
> +    BlockCache *l2_table_cache;
> +    BlockCache *refcount_block_cache;
> 
>      uint8_t *cluster_cache;
>      uint8_t *cluster_data;
> @@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
> 
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
> -
> -/* qcow2-cache.c functions */
> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
> -
> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
> -    Qcow2Cache *dependency);
> -void qcow2_cache_depends_on_flush(Qcow2Cache *c);
> -
> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table);
> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table);
> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
> -
>  #endif
> diff --git a/trace-events b/trace-events
> index 6b12f83..52b6438 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
>  qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
>  qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
> 
> -qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
> -qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
> -qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +# block/block-cache.c
> +block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
> +block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
> +block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> 
>  # block/qed-l2-cache.c
>  qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
@ 2012-09-06 20:19   ` Michael Roth
  2012-09-10  2:25     ` Dong Xu Wang
  2012-09-11  9:40   ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 20:19 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
> add-cow file format core code. It use block-cache.c as cache code.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block/Makefile.objs |    1 +
>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/add-cow.h     |   85 +++++++
>  block_int.h         |    2 +
>  4 files changed, 701 insertions(+), 0 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 block/add-cow.h
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 23bdfc8..7ed5051 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> +block-obj-y += add-cow.o
>  block-obj-y += block-cache.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..d4711d5
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,613 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> +    cpu->magic                      = le64_to_cpu(le->magic);
> +    cpu->version                    = le32_to_cpu(le->version);
> +
> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
> +
> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
> +
> +    cpu->features                   = le64_to_cpu(le->features);
> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> +    le->magic                       = cpu_to_le64(cpu->magic);
> +    le->version                     = cpu_to_le32(cpu->version);
> +
> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
> +
> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
> +
> +    le->features                    = cpu_to_le64(cpu->features);
> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
> +        return 100;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
> +{
> +    AddCowHeader header = {
> +        .magic = ADD_COW_MAGIC,
> +        .version = ADD_COW_VERSION,
> +        .features = 0,
> +        .optional_features = 0,
> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
> +    };
> +    AddCowHeader le_header;
> +    int64_t image_len = 0;
> +    const char *backing_filename = NULL;
> +    const char *backing_fmt = NULL;
> +    const char *image_filename = NULL;
> +    const char *image_format = NULL;
> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
> +    BlockDriver *drv = bdrv_find_format("add-cow");
> +    BDRVAddCowState s;
> +    int ret;
> +
> +    while (options && options->name) {
> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> +            image_len = options->value.n;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
> +            backing_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
> +            backing_fmt = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
> +            image_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
> +            image_format = options->value.s;
> +        }
> +        options++;
> +    }
> +
> +    if (backing_filename) {
> +        header.backing_filename_offset = sizeof(header)
> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
> +        header.backing_filename_size = strlen(backing_filename);
> +
> +        if (!backing_fmt) {
> +            backing_bs = bdrv_new("image");
> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
> +                    | BDRV_O_CACHE_WB, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            backing_fmt = bdrv_get_format_name(backing_bs);
> +            bdrv_delete(backing_bs);
> +        }
> +    } else {
> +        header.features |= ADD_COW_F_All_ALLOCATED;
> +    }
> +
> +    if (image_filename) {
> +        header.image_filename_offset =
> +            sizeof(header) + sizeof(s.backing_file_format)
> +                + sizeof(s.image_file_format) + header.backing_filename_size;
> +        header.image_filename_size = strlen(image_filename);
> +    } else {
> +        error_report("Error: image_file should be given.");
> +        return -EINVAL;
> +    }
> +
> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same backing file name as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (!strcmp(filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same filename as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (header.image_filename_offset + header.image_filename_size
> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
> +        error_report("image_file name or backing_file name too long.");
> +        return -ENOSPC;
> +    }
> +
> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    bdrv_delete(image_bs);
> +
> +    ret = bdrv_create_file(filename, NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    add_cow_header_cpu_to_le(&header, &le_header);
> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
> +        backing_fmt ? strlen(backing_fmt) : 0);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
> +        image_format ? image_format : "raw",
> +        image_format ? strlen(image_format) : sizeof("raw"));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    if (backing_filename) {
> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
> +            backing_filename, header.backing_filename_size);
> +        if (ret < 0) {
> +            bdrv_delete(bs);
> +            return ret;
> +        }
> +    }
> +
> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
> +        image_filename, header.image_filename_size);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_truncate(bs, image_len);
> +    bdrv_delete(bs);
> +    return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> +    char                image_filename[ADD_COW_FILE_LEN];
> +    char                tmp_name[ADD_COW_FILE_LEN];
> +    BlockDriver         *image_drv = NULL;
> +    int                 ret;
> +    int                 sector_per_byte;
> +    BDRVAddCowState     *s = bs->opaque;
> +    AddCowHeader        le_header;
> +
> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> +    if (ret != sizeof(s->header)) {
> +        goto fail;
> +    }
> +
> +    add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if (s->header.version != ADD_COW_VERSION) {
> +        char version[64];
> +        snprintf(version, sizeof(version), "ADD-COW version %d",
> +            s->header.version);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", version);
> +        ret = -ENOTSUP;
> +        goto fail;
> +    }
> +
> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "%" PRIx64,
> +            s->header.features & ~ADD_COW_FEATURE_MASK);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", buf);
> +        return -ENOTSUP;
> +    }
> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
> +            sizeof(s->backing_file_format));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file,
> +            sizeof(s->header) + sizeof(s->image_file_format),
> +            sizeof(s->image_file_format) - 1, s->image_file_format,
> +            sizeof(s->image_file_format));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> +                          s->header.backing_filename_size, bs->backing_file,
> +                          sizeof(bs->backing_file));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
> +                      s->header.image_filename_size, tmp_name,
> +                      sizeof(tmp_name));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->image_hd = bdrv_new("");
> +    if (path_has_protocol(image_filename)) {
> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> +    } else {
> +        path_combine(image_filename, sizeof(image_filename),
> +                     bs->filename, tmp_name);
> +    }
> +
> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
> +    if (ret < 0) {
> +        bdrv_delete(s->image_hd);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    s->bitmap_size =
> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> +    s->bitmap_cache =
> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
> +
> +    qemu_co_mutex_init(&s->lock);
> +    return 0;
> +fail:
> +    if (s->bitmap_cache) {
> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    }
> +    return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    uint8_t *table      = NULL;
> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
> +        & (1 << (cluster_num % 8));
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, int *num_same)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int changed;
> +
> +    if (nb_sectors == 0) {
> +        *num_same = 0;
> +        return 0;
> +    }
> +
> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
> +        *num_same = nb_sectors - 1;
> +        return 1;
> +    }
> +    changed = is_allocated(bs, sector_num);
> +
> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
> +            break;
> +        }
> +    }
> +    return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> +                  int64_t sector_num, int nb_sectors)
> +{
> +    int n1;
> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
> +        return nb_sectors;
> +    }
> +    if (sector_num >= bs->total_sectors) {
> +        n1 = 0;
> +    } else {
> +        n1 = bs->total_sectors - sector_num;
> +    }
> +
> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> +    return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    int cur_nr_sectors;
> +    uint64_t bytes_done = 0;
> +    QEMUIOVector hd_qiov;
> +    int n, n1, ret = 0;
> +
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    qemu_co_mutex_lock(&s->lock);
> +    while (remaining_sectors != 0) {
> +        cur_nr_sectors = remaining_sectors;
> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
> +            cur_nr_sectors = n;
> +            qemu_iovec_reset(&hd_qiov);
> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
> +            qemu_co_mutex_lock(&s->lock);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        } else {
> +            cur_nr_sectors = n;
> +            if (bs->backing_hd) {
> +                qemu_iovec_reset(&hd_qiov);
> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> +                    sector_num, cur_nr_sectors);
> +                if (n1 > 0) {
> +                    qemu_co_mutex_unlock(&s->lock);
> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
> +                                        n, &hd_qiov);
> +                    qemu_co_mutex_lock(&s->lock);
> +                    if (ret < 0) {
> +                        goto fail;
> +                    }
> +                }
> +            } else {
> +                qemu_iovec_memset(&hd_qiov, 0, 0,
> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
> +            }
> +        }
> +        remaining_sectors -= cur_nr_sectors;
> +        sector_num += cur_nr_sectors;
> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> +    }
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> +                                     int n_start, int n_end)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    QEMUIOVector qiov;
> +    struct iovec iov;
> +    int n, ret;
> +
> +    n = n_end - n_start;
> +    if (n <= 0) {
> +        return 0;
> +    }
> +
> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = 0;
> +out:
> +    qemu_vfree(iov.iov_base);
> +    return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int ret = 0, i;
> +    QEMUIOVector hd_qiov;
> +    uint8_t *table;
> +    uint64_t offset;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    ret = bdrv_co_writev(s->image_hd,
> +                     sector_num,
> +                     remaining_sectors, qiov);

alignment                   ^

or even at ^ if you prefer and have done in some places, just need to be
consistent about it for better readability.

> +
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        /* Copy content of unmodified sectors */
> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {

Why do we avoid a COW when writing to the first sector of a cluster?

> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
> +                sector_num);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        for (i = sector_num / SECTORS_PER_CLUSTER;
> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
> +            i++) {
> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
> +                table[i / 8] |= (1 << (i % 8));
> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
> +            }
> +        }
> +    }
> +    ret = 0;
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    int ret;
> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
> +    int64_t bitmap_size =
> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
> +
> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return 0;
> +}
> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
> +        ADD_COW_CACHE_ENTRY_SIZE);
> +    qemu_co_mutex_unlock(&s->lock);
> +    return ret;
> +}
> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FMT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a image file"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FORMAT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the image file"
> +    },
> +    { NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> +    .format_name                = "add-cow",
> +    .instance_size              = sizeof(BDRVAddCowState),
> +    .bdrv_probe                 = add_cow_probe,
> +    .bdrv_open                  = add_cow_open,
> +    .bdrv_close                 = add_cow_close,
> +    .bdrv_create                = add_cow_create,
> +    .bdrv_co_readv              = add_cow_co_readv,
> +    .bdrv_co_writev             = add_cow_co_writev,
> +    .bdrv_truncate              = bdrv_add_cow_truncate,
> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
> +
> +    .create_options             = add_cow_create_options,
> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> +    bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/add-cow.h b/block/add-cow.h
> new file mode 100644
> index 0000000..f058376
> --- /dev/null
> +++ b/block/add-cow.h
> @@ -0,0 +1,85 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_ADD_COW_H
> +#define BLOCK_ADD_COW_H
> +#include "block-cache.h"
> +
> +enum {
> +    ADD_COW_F_All_ALLOCATED     = 0X01,

Please use "ADD_COW_F_ALL_ALLOCATED" (all caps)

was searching your patch for how this was used and was scratching my
head when I wasn't seeing any matches :)

> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
> +
> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
> +                    ((uint64_t)'W' << 8) | 0xFF),
> +    ADD_COW_VERSION             = 1,
> +    ADD_COW_FILE_LEN            = 1024,
> +    ADD_COW_CACHE_SIZE          = 16,
> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
> +    ADD_COW_CLUSTER_SIZE        = 65536,
> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
> +    ADD_COW_PAGE_SIZE           = 4096,
> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
> +};
> +
> +typedef struct AddCowHeader {
> +    uint64_t        magic;
> +    uint32_t        version;
> +
> +    uint32_t        backing_filename_offset;
> +    uint32_t        backing_filename_size;
> +
> +    uint32_t        image_filename_offset;
> +    uint32_t        image_filename_size;
> +
> +    uint64_t        features;
> +    uint64_t        optional_features;
> +    uint32_t        header_pages_size;
> +} QEMU_PACKED AddCowHeader;

You should avoid using packed structures for image format headers.
Instead, I would either:

a) re-order the fields so that 32/64-bit fields, respectively, fall on
32/64-bit boundaries (in your case, for instance, moving header_pages_size
above features) like qed/qcow2 do, or

b) read/write the fields individually rather than reading/writing directly
into/from the header struct.

The safest route is b). Adds a few lines of code, but you won't have to
re-work things (or worry about introducing bugs) later if you were to add,
say, a 32-bit value, and then a 64-bit value later.

> +
> +typedef struct BDRVAddCowState {
> +    BlockDriverState    *image_hd;
> +    CoMutex             lock;
> +    int                 cluster_size;
> +    BlockCache         *bitmap_cache;
> +    uint64_t            bitmap_size;
> +    AddCowHeader        header;
> +    char                backing_file_format[16];
> +    char                image_file_format[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num)
> +{
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num)
> +{
> +    return sector_num % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num)
> +{
> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
> +    void **table);
> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
> +#endif
> diff --git a/block_int.h b/block_int.h
> index 6c1d9ca..67954ec 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -53,6 +53,8 @@
>  #define BLOCK_OPT_SUBFMT            "subformat"
>  #define BLOCK_OPT_COMPAT_LEVEL      "compat"
>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
> +#define BLOCK_OPT_IMAGE_FILE        "image_file"
> +#define BLOCK_OPT_IMAGE_FORMAT      "image_format"
> 
>  typedef struct BdrvTrackedRequest BdrvTrackedRequest;
> 
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-09-06 17:27   ` Michael Roth
@ 2012-09-10  1:48     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  1:48 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 1:27 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:40PM +0800, Dong Xu Wang wrote:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 123 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..d5a7a68
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,123 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>> +    1)  Create a raw image with the same size of ubuntu.img
>> +            qemu-img create -f raw test.raw 8G
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + |     Header    |   Reserved  |    COW bitmap   |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_pages_size))
>> +    Byte    0 -  7:     magic
>> +                        add-cow magic string ("ADD_COW\xff").
>> +
>> +            8 -  11:    version
>> +                        Version number (only valid value is 1 now).
>> +
>> +            12 - 15:    backing file name offset
>> +                        Offset in the add-cow file at which the backing file
>> +                        name is stored (NB: The string is not nul-terminated).
>> +                        If backing file name does NOT exist, this field will be
>> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> +                        must be at least 1 byte).
>> +
>> +            16 - 19:    backing file name size
>> +                        Length of the backing file name in bytes. It will be 0
>> +                        if the backing file name offset is 0. If backing file
>> +                        name offset is non-zero, then it must be non-zero. Must
>> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            20 - 23:    image file name offset
>> +                        Offset in the add-cow file at which the image file name
>> +                        is stored (NB: The string is not null terminated). It
>> +                        must be between 80 and [HEADER_SIZE - 2].
>> +
>> +            24 - 27:    image file name size
>> +                        Length of the image file name in bytes.
>> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            28 - 35:    features
>> +                        Currently only 1 feature bit is used:
>> +                        Feature bits:
>> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.
>> +
>> +            36 - 43:    optional features
>> +                        Not used now. Reserved for future use. It must be set to 0.
>> +
>> +            44 - 47:    header pages size
>> +                        The header field is variable-sized. This field indicates
>> +                        how many pages(4k) will be used to store add-cow header.
>> +                        In add-cow v1, it is fixed to 1, so the header size will
>> +                        be 4k * 1 = 4096 bytes.
>> +
>> +            48 - 63:    backing file format
>> +                        format of backing file. It will be filled with 0 if
>> +                        backing file name offset is 0. If backing file name
>> +                        offset is non-zero, it must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>> +
>> +            64 - 79:    image file format
>> +                        format of image file. It must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>> +
>> +            80 - [HEADER_SIZE - 1]:
>> +                        It is used to make sure COW bitmap field starts at the
>> +                        HEADER_SIZE byte, backing file name and image file name
>> +                        will be stored here. The bytes that is not pointing to
>> +                        backing file and image file names will bet set to 0.
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
>> +calculated according to virtual size of image file, and it also should be multipe
>> +of 65536, the bits not used will be set to 0. Within each byte, the least
>> +significant bit covers the first cluster. Bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow will be 0.
>> +
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>> +
>> +Image file and backing file are interpreted relative to the qcow2 file, not
>
> Relative to the add-cow file?
Ah, yes..
>
>> +to the current working directory of the process that opened the qcow2 file.

>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
  2012-09-06 17:32   ` Michael Roth
@ 2012-09-10  1:49     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  1:49 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 1:32 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:42PM +0800, Dong Xu Wang wrote:
>> Make qed_read_string function to a common interface, so move it to block.c.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block.c     |   27 +++++++++++++++++++++++++++
>>  block.h     |    2 ++
>>  block/qed.c |   29 +----------------------------
>>  3 files changed, 30 insertions(+), 28 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index c13d803..d906b35 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
>>      return *p == ':';
>>  }
>>
>> +/**
>> + * Read a string of known length from the image file
>> + *
>> + * @bs:         Image file
>> + * @offset:     File offset to start of string, in bytes
>> + * @n:          String length in bytes
>> + * @buf:        Destination buffer
>> + * @buflen:     Destination buffer length in bytes
>> + * @ret:        0 on success, -errno on failure
>> + *
>> + * The string is NUL-terminated.
>> + */
>> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
>> +                           char *buf, size_t buflen)
>
> Small alignment issue   ^
>
>> +{
>> +    int ret;
>> +    if (n >= buflen) {
>> +        return -EINVAL;
>> +    }
>> +    ret = bdrv_pread(bs, offset, buf, n);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    buf[n] = '\0';
>> +    return 0;
>> +}
>> +
>>  int path_is_absolute(const char *path)
>>  {
>>  #ifdef _WIN32
>> diff --git a/block.h b/block.h
>> index 54e61c9..e5dfcd7 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>>      const void *buf, int count);
>>  int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
>>      int nb_sectors, QEMUIOVector *qiov);
>> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
>> +    char *buf, size_t buflen);
>
> Another one here        ^
>
>>  int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>>  int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
>> diff --git a/block/qed.c b/block/qed.c
>> index 5f3eefa..311c589 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
>>  }
>>
>>  /**
>> - * Read a string of known length from the image file
>> - *
>> - * @file:       Image file
>> - * @offset:     File offset to start of string, in bytes
>> - * @n:          String length in bytes
>> - * @buf:        Destination buffer
>> - * @buflen:     Destination buffer length in bytes
>> - * @ret:        0 on success, -errno on failure
>> - *
>> - * The string is NUL-terminated.
>> - */
>> -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
>> -                           char *buf, size_t buflen)
>> -{
>> -    int ret;
>> -    if (n >= buflen) {
>> -        return -EINVAL;
>> -    }
>> -    ret = bdrv_pread(file, offset, buf, n);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -    buf[n] = '\0';
>> -    return 0;
>> -}
>> -
>> -/**
>>   * Allocate new clusters
>>   *
>>   * @s:          QED state
>> @@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>>              return -EINVAL;
>>          }
>>
>> -        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
>> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>>                                s->header.backing_filename_size, bs->backing_file,
>>                                sizeof(bs->backing_file));
>
> Here too                          ^
>
> Looks good otherwise.
>
>>          if (ret < 0) {
>> --
>> 1.7.1
>>
>>
>
Thank you Michael .

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-09-06 17:52   ` Michael Roth
@ 2012-09-10  2:14     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  2:14 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 1:52 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:43PM +0800, Dong Xu Wang wrote:
>> add-cow and qcow2 file format will share the same cache code, so rename
>> block-cache.c to block-cache.c. And related structure and qcow2 code also
>
> "qcow2-cache.c to block-cache.c"
>
> But I've scanned through the rest of your patches and can't seem to find
> where block-cache.c gets introduced. Did you forget to git add it?

Really sorry for that, I forget to add the block-cache.c, will add it in v13.
>
>> are changed.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block.h                |    3 +
>>  block/Makefile.objs    |    3 +-
>>  block/qcow2-cache.c    |  323 ------------------------------------------------
>>  block/qcow2-cluster.c  |   66 ++++++----
>>  block/qcow2-refcount.c |   66 ++++++-----
>>  block/qcow2.c          |   36 +++---
>>  block/qcow2.h          |   24 +---
>>  trace-events           |   13 +-
>>  8 files changed, 109 insertions(+), 425 deletions(-)
>>  delete mode 100644 block/qcow2-cache.c
>>
>> diff --git a/block.h b/block.h
>> index e5dfcd7..c325661 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -401,6 +401,9 @@ typedef enum {
>>      BLKDBG_CLUSTER_ALLOC_BYTES,
>>      BLKDBG_CLUSTER_FREE,
>>
>> +    BLKDBG_ADD_COW_UPDATE,
>> +    BLKDBG_ADD_COW_LOAD,
>> +
>>      BLKDBG_EVENT_MAX,
>>  } BlkDebugEvent;
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index b5754d3..23bdfc8 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -1,7 +1,8 @@
>>  block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>>  block-obj-$(CONFIG_WIN32) += raw-win32.o
>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>> deleted file mode 100644
>> index 2d4322a..0000000
>> --- a/block/qcow2-cache.c
>> +++ /dev/null
>> @@ -1,323 +0,0 @@
>> -/*
>> - * L2/refcount table cache for the QCOW2 format
>> - *
>> - * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
>> - *
>> - * Permission is hereby granted, free of charge, to any person obtaining a copy
>> - * of this software and associated documentation files (the "Software"), to deal
>> - * in the Software without restriction, including without limitation the rights
>> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> - * copies of the Software, and to permit persons to whom the Software is
>> - * furnished to do so, subject to the following conditions:
>> - *
>> - * The above copyright notice and this permission notice shall be included in
>> - * all copies or substantial portions of the Software.
>> - *
>> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> - * THE SOFTWARE.
>> - */
>> -
>> -#include "block_int.h"
>> -#include "qemu-common.h"
>> -#include "qcow2.h"
>> -#include "trace.h"
>> -
>> -typedef struct Qcow2CachedTable {
>> -    void*   table;
>> -    int64_t offset;
>> -    bool    dirty;
>> -    int     cache_hits;
>> -    int     ref;
>> -} Qcow2CachedTable;
>> -
>> -struct Qcow2Cache {
>> -    Qcow2CachedTable*       entries;
>> -    struct Qcow2Cache*      depends;
>> -    int                     size;
>> -    bool                    depends_on_flush;
>> -};
>> -
>> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    Qcow2Cache *c;
>> -    int i;
>> -
>> -    c = g_malloc0(sizeof(*c));
>> -    c->size = num_tables;
>> -    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
>> -    }
>> -
>> -    return c;
>> -}
>> -
>> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
>> -{
>> -    int i;
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        assert(c->entries[i].ref == 0);
>> -        qemu_vfree(c->entries[i].table);
>> -    }
>> -
>> -    g_free(c->entries);
>> -    g_free(c);
>> -
>> -    return 0;
>> -}
>> -
>> -static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
>> -{
>> -    int ret;
>> -
>> -    ret = qcow2_cache_flush(bs, c->depends);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    c->depends = NULL;
>> -    c->depends_on_flush = false;
>> -
>> -    return 0;
>> -}
>> -
>> -static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int ret = 0;
>> -
>> -    if (!c->entries[i].dirty || !c->entries[i].offset) {
>> -        return 0;
>> -    }
>> -
>> -    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
>> -                                  c == s->l2_table_cache, i);
>> -
>> -    if (c->depends) {
>> -        ret = qcow2_cache_flush_dependency(bs, c);
>> -    } else if (c->depends_on_flush) {
>> -        ret = bdrv_flush(bs->file);
>> -        if (ret >= 0) {
>> -            c->depends_on_flush = false;
>> -        }
>> -    }
>> -
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    if (c == s->refcount_block_cache) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
>> -    } else if (c == s->l2_table_cache) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>> -    }
>> -
>> -    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
>> -        s->cluster_size);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    c->entries[i].dirty = false;
>> -
>> -    return 0;
>> -}
>> -
>> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int result = 0;
>> -    int ret;
>> -    int i;
>> -
>> -    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        ret = qcow2_cache_entry_flush(bs, c, i);
>> -        if (ret < 0 && result != -ENOSPC) {
>> -            result = ret;
>> -        }
>> -    }
>> -
>> -    if (result == 0) {
>> -        ret = bdrv_flush(bs->file);
>> -        if (ret < 0) {
>> -            result = ret;
>> -        }
>> -    }
>> -
>> -    return result;
>> -}
>> -
>> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>> -    Qcow2Cache *dependency)
>> -{
>> -    int ret;
>> -
>> -    if (dependency->depends) {
>> -        ret = qcow2_cache_flush_dependency(bs, dependency);
>> -        if (ret < 0) {
>> -            return ret;
>> -        }
>> -    }
>> -
>> -    if (c->depends && (c->depends != dependency)) {
>> -        ret = qcow2_cache_flush_dependency(bs, c);
>> -        if (ret < 0) {
>> -            return ret;
>> -        }
>> -    }
>> -
>> -    c->depends = dependency;
>> -    return 0;
>> -}
>> -
>> -void qcow2_cache_depends_on_flush(Qcow2Cache *c)
>> -{
>> -    c->depends_on_flush = true;
>> -}
>> -
>> -static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
>> -{
>> -    int i;
>> -    int min_count = INT_MAX;
>> -    int min_index = -1;
>> -
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].ref) {
>> -            continue;
>> -        }
>> -
>> -        if (c->entries[i].cache_hits < min_count) {
>> -            min_index = i;
>> -            min_count = c->entries[i].cache_hits;
>> -        }
>> -
>> -        /* Give newer hits priority */
>> -        /* TODO Check how to optimize the replacement strategy */
>> -        c->entries[i].cache_hits /= 2;
>> -    }
>> -
>> -    if (min_index == -1) {
>> -        /* This can't happen in current synchronous code, but leave the check
>> -         * here as a reminder for whoever starts using AIO with the cache */
>> -        abort();
>> -    }
>> -    return min_index;
>> -}
>> -
>> -static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>> -    uint64_t offset, void **table, bool read_from_disk)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int i;
>> -    int ret;
>> -
>> -    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
>> -                          offset, read_from_disk);
>> -
>> -    /* Check if the table is already cached */
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].offset == offset) {
>> -            goto found;
>> -        }
>> -    }
>> -
>> -    /* If not, write a table back and replace it */
>> -    i = qcow2_cache_find_entry_to_replace(c);
>> -    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
>> -                                        c == s->l2_table_cache, i);
>> -    if (i < 0) {
>> -        return i;
>> -    }
>> -
>> -    ret = qcow2_cache_entry_flush(bs, c, i);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    trace_qcow2_cache_get_read(qemu_coroutine_self(),
>> -                               c == s->l2_table_cache, i);
>> -    c->entries[i].offset = 0;
>> -    if (read_from_disk) {
>> -        if (c == s->l2_table_cache) {
>> -            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>> -        }
>> -
>> -        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
>> -        if (ret < 0) {
>> -            return ret;
>> -        }
>> -    }
>> -
>> -    /* Give the table some hits for the start so that it won't be replaced
>> -     * immediately. The number 32 is completely arbitrary. */
>> -    c->entries[i].cache_hits = 32;
>> -    c->entries[i].offset = offset;
>> -
>> -    /* And return the right table */
>> -found:
>> -    c->entries[i].cache_hits++;
>> -    c->entries[i].ref++;
>> -    *table = c->entries[i].table;
>> -
>> -    trace_qcow2_cache_get_done(qemu_coroutine_self(),
>> -                               c == s->l2_table_cache, i);
>> -
>> -    return 0;
>> -}
>> -
>> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table)
>> -{
>> -    return qcow2_cache_do_get(bs, c, offset, table, true);
>> -}
>> -
>> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table)
>> -{
>> -    return qcow2_cache_do_get(bs, c, offset, table, false);
>> -}
>> -
>> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>> -{
>> -    int i;
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].table == *table) {
>> -            goto found;
>> -        }
>> -    }
>> -    return -ENOENT;
>> -
>> -found:
>> -    c->entries[i].ref--;
>> -    *table = NULL;
>> -
>> -    assert(c->entries[i].ref >= 0);
>> -    return 0;
>> -}
>> -
>> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
>> -{
>> -    int i;
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].table == table) {
>> -            goto found;
>> -        }
>> -    }
>> -    abort();
>> -
>> -found:
>> -    c->entries[i].dirty = true;
>> -}
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index e179211..335dc7a 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "block/qcow2.h"
>>  #include "trace.h"
>> +#include "block-cache.h"
>>
>>  int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>>  {
>> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>>          return new_l1_table_offset;
>>      }
>>
>> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail;
>>      }
>> @@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
>>      BDRVQcowState *s = bs->opaque;
>>      int ret;
>>
>> -    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
>> +    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
>> +        (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
>>
>>      return ret;
>>  }
>> @@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>          return l2_offset;
>>      }
>>
>> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail;
>>      }
>> @@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>      /* allocate a new entry in the l2 cache */
>>
>>      trace_qcow2_l2_allocate_get_empty(bs, l1_index);
>> -    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
>> +    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
>> +        (void **) table, BLOCK_TABLE_L2, s->cluster_size);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>
>>          /* if there was an old l2 table, read it from the disk */
>>          BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
>> -        ret = qcow2_cache_get(bs, s->l2_table_cache,
>> +        ret = block_cache_get(bs, s->l2_table_cache,
>>              old_l2_offset & L1E_OFFSET_MASK,
>> -            (void**) &old_table);
>> +            (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
>>          if (ret < 0) {
>>              goto fail;
>>          }
>>
>>          memcpy(l2_table, old_table, s->cluster_size);
>>
>> -        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
>> +        ret = block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &old_table, BLOCK_TABLE_L2);
>>          if (ret < 0) {
>>              goto fail;
>>          }
>> @@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>      BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
>>
>>      trace_qcow2_l2_allocate_write_l2(bs, l1_index);
>> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
>> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +    ret = block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail;
>>      }
>> @@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>
>>  fail:
>>      trace_qcow2_l2_allocate_done(bs, l1_index, ret);
>> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
>> +    block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
>>      s->l1_table[l1_index] = old_l2_offset;
>>      return ret;
>>  }
>> @@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>>          abort();
>>      }
>>
>> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
>>
>>      nb_available = (c * s->cluster_sectors);
>>
>> @@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>       * allocated. */
>>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
>>      if (cluster_offset & L2E_OFFSET_MASK) {
>> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +        block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &l2_table, BLOCK_TABLE_L2);
>>          return 0;
>>      }
>>
>>      cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
>>      if (cluster_offset < 0) {
>> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +        block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &l2_table, BLOCK_TABLE_L2);
>>          return 0;
>>      }
>>
>> @@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>      /* compressed clusters never have the copied flag */
>>
>>      BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
>> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>      l2_table[l2_index] = cpu_to_be64(cluster_offset);
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return 0;
>>      }
>> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>       * handled.
>>       */
>>      if (cow) {
>> -        qcow2_cache_depends_on_flush(s->l2_table_cache);
>> +        block_cache_depends_on_flush(s->l2_table_cache);
>>      }
>>
>> -    if (qcow2_need_accurate_refcounts(s)) {
>> -        qcow2_cache_set_dependency(bs, s->l2_table_cache,
>> -                                   s->refcount_block_cache);
>> -    }
>> +    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
>> +        s->refcount_block_cache, s->cluster_size);
>>      ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
>>      if (ret < 0) {
>>          goto err;
>>      }
>> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>
>>      for (i = 0; i < m->nb_clusters; i++) {
>>          /* if two concurrent writes happen to the same unallocated cluster
>> @@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>       }
>>
>>
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          goto err;
>>      }
>> @@ -913,7 +922,8 @@ again:
>>       * request to complete. If we still had the reference, we could use up the
>>       * whole cache with sleeping requests.
>>       */
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>>          }
>>
>>          /* First remove L2 entries */
>> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>          l2_table[l2_index + i] = cpu_to_be64(0);
>>
>>          /* Then decrease the refcount */
>>          qcow2_free_any_clusters(bs, old_offset, 1);
>>      }
>>
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>>          old_offset = be64_to_cpu(l2_table[l2_index + i]);
>>
>>          /* Update L2 entries */
>> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>          if (old_offset & QCOW_OFLAG_COMPRESSED) {
>>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>>              qcow2_free_any_clusters(bs, old_offset, 1);
>> @@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>>          }
>>      }
>>
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 5e3f915..728bfc1 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -25,6 +25,7 @@
>>  #include "qemu-common.h"
>>  #include "block_int.h"
>>  #include "block/qcow2.h"
>> +#include "block-cache.h"
>>
>>  static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
>>  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>> @@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
>>      int ret;
>>
>>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
>> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> -        refcount_block);
>> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> +        refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>
>>      return ret;
>>  }
>> @@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>>      if (!refcount_block_offset)
>>          return 0;
>>
>> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> -        (void**) &refcount_block);
>> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> +        (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>>          ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
>>      refcount = be16_to_cpu(refcount_block[block_index]);
>>
>> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -        (void**) &refcount_block);
>> +    ret = block_cache_put(bs, s->refcount_block_cache,
>> +        (void **) &refcount_block, BLOCK_TABLE_REF);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>      *refcount_block = NULL;
>>
>>      /* We write to the refcount table, so we might depend on L2 tables */
>> -    qcow2_cache_flush(bs, s->l2_table_cache);
>> +    block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>>
>>      /* Allocate the refcount block itself and mark it as used */
>>      int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
>> @@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>>      if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
>>          /* Zero the new refcount block before updating it */
>> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> -            (void**) refcount_block);
>> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>          if (ret < 0) {
>>              goto fail_block;
>>          }
>> @@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>>          /* Initialize the new refcount block only after updating its refcount,
>>           * update_refcount uses the refcount cache itself */
>> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> -            (void**) refcount_block);
>> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>          if (ret < 0) {
>>              goto fail_block;
>>          }
>> @@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>>      /* Now the new refcount block needs to be written to disk */
>>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
>> -    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
>> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail_block;
>>      }
>> @@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>          return 0;
>>      }
>>
>> -    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
>> +    ret = block_cache_put(bs, s->refcount_block_cache,
>> +        (void **) refcount_block, BLOCK_TABLE_REF);
>>      if (ret < 0) {
>>          goto fail_block;
>>      }
>> @@ -406,7 +410,8 @@ fail_table:
>>      g_free(new_table);
>>  fail_block:
>>      if (*refcount_block != NULL) {
>> -        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
>> +        block_cache_put(bs, s->refcount_block_cache,
>> +            (void **) refcount_block, BLOCK_TABLE_REF);
>>      }
>>      return ret;
>>  }
>> @@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>>      }
>>
>>      if (addend < 0) {
>> -        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
>> -            s->l2_table_cache);
>> +        block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
>> +            s->l2_table_cache, s->cluster_size);
>>      }
>>
>>      start = offset & ~(s->cluster_size - 1);
>> @@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>>          /* Load the refcount block and allocate it if needed */
>>          if (table_index != old_table_index) {
>>              if (refcount_block) {
>> -                ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -                    (void**) &refcount_block);
>> +                ret = block_cache_put(bs, s->refcount_block_cache,
>> +                    (void **) &refcount_block, BLOCK_TABLE_REF);
>>                  if (ret < 0) {
>>                      goto fail;
>>                  }
>> @@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>>          }
>>          old_table_index = table_index;
>>
>> -        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>> +        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>>
>>          /* we can update the count and save it */
>>          block_index = cluster_index &
>> @@ -486,8 +491,8 @@ fail:
>>      /* Write last changed block to disk */
>>      if (refcount_block) {
>>          int wret;
>> -        wret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -            (void**) &refcount_block);
>> +        wret = block_cache_put(bs, s->refcount_block_cache,
>> +            (void **) &refcount_block, BLOCK_TABLE_REF);
>>          if (wret < 0) {
>>              return ret < 0 ? ret : wret;
>>          }
>> @@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>>              old_l2_offset = l2_offset;
>>              l2_offset &= L1E_OFFSET_MASK;
>>
>> -            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
>> -                (void**) &l2_table);
>> +            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
>> +                (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
>>              if (ret < 0) {
>>                  goto fail;
>>              }
>> @@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>>                      }
>>                      if (offset != old_offset) {
>>                          if (addend > 0) {
>> -                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
>> -                                s->refcount_block_cache);
>> +                            block_cache_set_dependency(bs, s->l2_table_cache,
>> +                                BLOCK_TABLE_L2, s->refcount_block_cache,
>> +                                s->cluster_size);
>>                          }
>>                          l2_table[j] = cpu_to_be64(offset);
>> -                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +                        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>                      }
>>                  }
>>              }
>>
>> -            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +            ret = block_cache_put(bs, s->l2_table_cache,
>> +                (void **) &l2_table, BLOCK_TABLE_L2);
>>              if (ret < 0) {
>>                  goto fail;
>>              }
>> @@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>>      ret = 0;
>>  fail:
>>      if (l2_table) {
>> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +        block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &l2_table, BLOCK_TABLE_L2);
>>      }
>>
>>      /* Update L1 only if it isn't deleted anyway (addend = -1) */
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index fd5e214..b89d312 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -30,6 +30,7 @@
>>  #include "qemu-error.h"
>>  #include "qerror.h"
>>  #include "trace.h"
>> +#include "block-cache.h"
>>
>>  /*
>>    Differences with QCOW:
>> @@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>>      }
>>
>>      /* alloc L2 table/refcount block cache */
>> -    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
>> -    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
>> +    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
>> +    s->refcount_block_cache =
>> +        block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
>>
>>      s->cluster_cache = g_malloc(s->cluster_size);
>>      /* one more sector for decompressed data alignment */
>> @@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>>      qcow2_refcount_close(bs);
>>      g_free(s->l1_table);
>>      if (s->l2_table_cache) {
>> -        qcow2_cache_destroy(bs, s->l2_table_cache);
>> +        block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>>      }
>>      g_free(s->cluster_cache);
>>      qemu_vfree(s->cluster_data);
>> @@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
>>      BDRVQcowState *s = bs->opaque;
>>      g_free(s->l1_table);
>>
>> -    qcow2_cache_flush(bs, s->l2_table_cache);
>> -    qcow2_cache_flush(bs, s->refcount_block_cache);
>> -
>> +    block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>> +    block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      qcow2_mark_clean(bs);
>> -
>> -    qcow2_cache_destroy(bs, s->l2_table_cache);
>> -    qcow2_cache_destroy(bs, s->refcount_block_cache);
>> +    block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>> +    block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
>>
>>      g_free(s->unknown_header_fields);
>>      cleanup_unknown_header_ext(bs);
>> @@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
>>                      options->value.s);
>>                  return -EINVAL;
>>              }
>> -        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
>> -            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
>>          }
>>          options++;
>>      }
>> @@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>>      int ret;
>>
>>      qemu_co_mutex_lock(&s->lock);
>> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
>> +    ret = block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>>      if (ret < 0) {
>>          qemu_co_mutex_unlock(&s->lock);
>>          return ret;
>>      }
>>
>> -    if (qcow2_need_accurate_refcounts(s)) {
>> -        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> -        if (ret < 0) {
>> -            qemu_co_mutex_unlock(&s->lock);
>> -            return ret;
>> -        }
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>> +    if (ret < 0) {
>> +        qemu_co_mutex_unlock(&s->lock);
>> +        return ret;
>>      }
>>      qemu_co_mutex_unlock(&s->lock);
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index b4eb654..cb6fd7a 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -27,6 +27,7 @@
>>
>>  #include "aes.h"
>>  #include "qemu-coroutine.h"
>> +#include "block-cache.h"
>>
>>  //#define DEBUG_ALLOC
>>  //#define DEBUG_ALLOC2
>> @@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
>>      uint64_t vm_clock_nsec;
>>  } QCowSnapshot;
>>
>> -struct Qcow2Cache;
>> -typedef struct Qcow2Cache Qcow2Cache;
>>
>>  typedef struct Qcow2UnknownHeaderExtension {
>>      uint32_t magic;
>> @@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
>>      uint64_t l1_table_offset;
>>      uint64_t *l1_table;
>>
>> -    Qcow2Cache* l2_table_cache;
>> -    Qcow2Cache* refcount_block_cache;
>> +    BlockCache *l2_table_cache;
>> +    BlockCache *refcount_block_cache;
>>
>>      uint8_t *cluster_cache;
>>      uint8_t *cluster_data;
>> @@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>>
>>  void qcow2_free_snapshots(BlockDriverState *bs);
>>  int qcow2_read_snapshots(BlockDriverState *bs);
>> -
>> -/* qcow2-cache.c functions */
>> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
>> -
>> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
>> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
>> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>> -    Qcow2Cache *dependency);
>> -void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>> -
>> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table);
>> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table);
>> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
>> -
>>  #endif
>> diff --git a/trace-events b/trace-events
>> index 6b12f83..52b6438 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
>>  qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
>>  qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
>>
>> -qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
>> -qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
>> -qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +# block/block-cache.c
>> +block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
>> +block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
>> +block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>>
>>  # block/qed-l2-cache.c
>>  qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-06 20:19   ` Michael Roth
@ 2012-09-10  2:25     ` Dong Xu Wang
  2012-09-11  9:44       ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  2:25 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 4:19 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block/Makefile.objs |    1 +
>>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block/add-cow.h     |   85 +++++++
>>  block_int.h         |    2 +
>>  4 files changed, 701 insertions(+), 0 deletions(-)
>>  create mode 100644 block/add-cow.c
>>  create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>>  block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> +    cpu->magic                      = le64_to_cpu(le->magic);
>> +    cpu->version                    = le32_to_cpu(le->version);
>> +
>> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
>> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
>> +
>> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
>> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
>> +
>> +    cpu->features                   = le64_to_cpu(le->features);
>> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
>> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> +    le->magic                       = cpu_to_le64(cpu->magic);
>> +    le->version                     = cpu_to_le32(cpu->version);
>> +
>> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
>> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
>> +
>> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
>> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
>> +
>> +    le->features                    = cpu_to_le64(cpu->features);
>> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
>> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> +    const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> +        return 100;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
>> +{
>> +    AddCowHeader header = {
>> +        .magic = ADD_COW_MAGIC,
>> +        .version = ADD_COW_VERSION,
>> +        .features = 0,
>> +        .optional_features = 0,
>> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> +    };
>> +    AddCowHeader le_header;
>> +    int64_t image_len = 0;
>> +    const char *backing_filename = NULL;
>> +    const char *backing_fmt = NULL;
>> +    const char *image_filename = NULL;
>> +    const char *image_format = NULL;
>> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> +    BlockDriver *drv = bdrv_find_format("add-cow");
>> +    BDRVAddCowState s;
>> +    int ret;
>> +
>> +    while (options && options->name) {
>> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> +            image_len = options->value.n;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> +            backing_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> +            backing_fmt = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> +            image_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
>> +            image_format = options->value.s;
>> +        }
>> +        options++;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        header.backing_filename_offset = sizeof(header)
>> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
>> +        header.backing_filename_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
>> +                    | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            backing_fmt = bdrv_get_format_name(backing_bs);
>> +            bdrv_delete(backing_bs);
>> +        }
>> +    } else {
>> +        header.features |= ADD_COW_F_All_ALLOCATED;
>> +    }
>> +
>> +    if (image_filename) {
>> +        header.image_filename_offset =
>> +            sizeof(header) + sizeof(s.backing_file_format)
>> +                + sizeof(s.image_file_format) + header.backing_filename_size;
>> +        header.image_filename_size = strlen(image_filename);
>> +    } else {
>> +        error_report("Error: image_file should be given.");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same backing file name as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!strcmp(filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same filename as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (header.image_filename_offset + header.image_filename_size
>> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
>> +        error_report("image_file name or backing_file name too long.");
>> +        return -ENOSPC;
>> +    }
>> +
>> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    bdrv_delete(image_bs);
>> +
>> +    ret = bdrv_create_file(filename, NULL);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    add_cow_header_cpu_to_le(&header, &le_header);
>> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
>> +        backing_fmt ? strlen(backing_fmt) : 0);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
>> +        image_format ? image_format : "raw",
>> +        image_format ? strlen(image_format) : sizeof("raw"));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
>> +            backing_filename, header.backing_filename_size);
>> +        if (ret < 0) {
>> +            bdrv_delete(bs);
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
>> +        image_filename, header.image_filename_size);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_truncate(bs, image_len);
>> +    bdrv_delete(bs);
>> +    return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, int flags)
>> +{
>> +    char                image_filename[ADD_COW_FILE_LEN];
>> +    char                tmp_name[ADD_COW_FILE_LEN];
>> +    BlockDriver         *image_drv = NULL;
>> +    int                 ret;
>> +    int                 sector_per_byte;
>> +    BDRVAddCowState     *s = bs->opaque;
>> +    AddCowHeader        le_header;
>> +
>> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> +    if (ret != sizeof(s->header)) {
>> +        goto fail;
>> +    }
>> +
>> +    add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.version != ADD_COW_VERSION) {
>> +        char version[64];
>> +        snprintf(version, sizeof(version), "ADD-COW version %d",
>> +            s->header.version);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", version);
>> +        ret = -ENOTSUP;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "%" PRIx64,
>> +            s->header.features & ~ADD_COW_FEATURE_MASK);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", buf);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
>> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
>> +            sizeof(s->backing_file_format));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file,
>> +            sizeof(s->header) + sizeof(s->image_file_format),
>> +            sizeof(s->image_file_format) - 1, s->image_file_format,
>> +            sizeof(s->image_file_format));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> +                          s->header.backing_filename_size, bs->backing_file,
>> +                          sizeof(bs->backing_file));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
>> +                      s->header.image_filename_size, tmp_name,
>> +                      sizeof(tmp_name));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    s->image_hd = bdrv_new("");
>> +    if (path_has_protocol(image_filename)) {
>> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> +    } else {
>> +        path_combine(image_filename, sizeof(image_filename),
>> +                     bs->filename, tmp_name);
>> +    }
>> +
>> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(s->image_hd);
>> +        goto fail;
>> +    }
>> +
>> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
>> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
>> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    s->bitmap_size =
>> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> +    s->bitmap_cache =
>> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> +    qemu_co_mutex_init(&s->lock);
>> +    return 0;
>> +fail:
>> +    if (s->bitmap_cache) {
>> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    }
>> +    return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    uint8_t *table      = NULL;
>> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>> +        & (1 << (cluster_num % 8));
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int changed;
>> +
>> +    if (nb_sectors == 0) {
>> +        *num_same = 0;
>> +        return 0;
>> +    }
>> +
>> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>> +        *num_same = nb_sectors - 1;
>> +        return 1;
>> +    }
>> +    changed = is_allocated(bs, sector_num);
>> +
>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>> +            break;
>> +        }
>> +    }
>> +    return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> +                  int64_t sector_num, int nb_sectors)
>> +{
>> +    int n1;
>> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> +        return nb_sectors;
>> +    }
>> +    if (sector_num >= bs->total_sectors) {
>> +        n1 = 0;
>> +    } else {
>> +        n1 = bs->total_sectors - sector_num;
>> +    }
>> +
>> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> +    return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    int cur_nr_sectors;
>> +    uint64_t bytes_done = 0;
>> +    QEMUIOVector hd_qiov;
>> +    int n, n1, ret = 0;
>> +
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    qemu_co_mutex_lock(&s->lock);
>> +    while (remaining_sectors != 0) {
>> +        cur_nr_sectors = remaining_sectors;
>> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
>> +            cur_nr_sectors = n;
>> +            qemu_iovec_reset(&hd_qiov);
>> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +            qemu_co_mutex_unlock(&s->lock);
>> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
>> +            qemu_co_mutex_lock(&s->lock);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        } else {
>> +            cur_nr_sectors = n;
>> +            if (bs->backing_hd) {
>> +                qemu_iovec_reset(&hd_qiov);
>> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> +                    sector_num, cur_nr_sectors);
>> +                if (n1 > 0) {
>> +                    qemu_co_mutex_unlock(&s->lock);
>> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> +                                        n, &hd_qiov);
>> +                    qemu_co_mutex_lock(&s->lock);
>> +                    if (ret < 0) {
>> +                        goto fail;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_iovec_memset(&hd_qiov, 0, 0,
>> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
>> +            }
>> +        }
>> +        remaining_sectors -= cur_nr_sectors;
>> +        sector_num += cur_nr_sectors;
>> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> +    }
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> +                                     int n_start, int n_end)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    QEMUIOVector qiov;
>> +    struct iovec iov;
>> +    int n, ret;
>> +
>> +    n = n_end - n_start;
>> +    if (n <= 0) {
>> +        return 0;
>> +    }
>> +
>> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
>> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> +    qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = 0;
>> +out:
>> +    qemu_vfree(iov.iov_base);
>> +    return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd,
>> +                     sector_num,
>> +                     remaining_sectors, qiov);
>
> alignment                   ^
>
> or even at ^ if you prefer and have done in some places, just need to be
> consistent about it for better readability.
>
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
>
> Why do we avoid a COW when writing to the first sector of a cluster?

Because if it is the first sector, we need not use copy_sector, we
write it directly would be enough, it starts at the begening of one
cluster.

>
>> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
>> +                sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        for (i = sector_num / SECTORS_PER_CLUSTER;
>> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
>> +            i++) {
>> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> +            }
>> +        }
>> +    }
>> +    ret = 0;
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    int ret;
>> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>> +    int64_t bitmap_size =
>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>> +
>> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>> +        ADD_COW_CACHE_ENTRY_SIZE);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    return ret;
>> +}
>> +
>> +static QEMUOptionParameter add_cow_create_options[] = {
>> +    {
>> +        .name = BLOCK_OPT_SIZE,
>> +        .type = OPT_SIZE,
>> +        .help = "Virtual disk size"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FMT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a image file"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FORMAT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the image file"
>> +    },
>> +    { NULL }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> +    .format_name                = "add-cow",
>> +    .instance_size              = sizeof(BDRVAddCowState),
>> +    .bdrv_probe                 = add_cow_probe,
>> +    .bdrv_open                  = add_cow_open,
>> +    .bdrv_close                 = add_cow_close,
>> +    .bdrv_create                = add_cow_create,
>> +    .bdrv_co_readv              = add_cow_co_readv,
>> +    .bdrv_co_writev             = add_cow_co_writev,
>> +    .bdrv_truncate              = bdrv_add_cow_truncate,
>> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
>> +
>> +    .create_options             = add_cow_create_options,
>> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> +    bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/add-cow.h b/block/add-cow.h
>> new file mode 100644
>> index 0000000..f058376
>> --- /dev/null
>> +++ b/block/add-cow.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef BLOCK_ADD_COW_H
>> +#define BLOCK_ADD_COW_H
>> +#include "block-cache.h"
>> +
>> +enum {
>> +    ADD_COW_F_All_ALLOCATED     = 0X01,
>
> Please use "ADD_COW_F_ALL_ALLOCATED" (all caps)

Okay.
>
> was searching your patch for how this was used and was scratching my
> head when I wasn't seeing any matches :)

It wil be used such as:
qemu-img create -f add-cow -o image_file=t.raw t.add-cow

while we need not read from backing_file any more.

>
>> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
>> +
>> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
>> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
>> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
>> +                    ((uint64_t)'W' << 8) | 0xFF),
>> +    ADD_COW_VERSION             = 1,
>> +    ADD_COW_FILE_LEN            = 1024,
>> +    ADD_COW_CACHE_SIZE          = 16,
>> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
>> +    ADD_COW_CLUSTER_SIZE        = 65536,
>> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
>> +    ADD_COW_PAGE_SIZE           = 4096,
>> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> +    uint64_t        magic;
>> +    uint32_t        version;
>> +
>> +    uint32_t        backing_filename_offset;
>> +    uint32_t        backing_filename_size;
>> +
>> +    uint32_t        image_filename_offset;
>> +    uint32_t        image_filename_size;
>> +
>> +    uint64_t        features;
>> +    uint64_t        optional_features;
>> +    uint32_t        header_pages_size;
>> +} QEMU_PACKED AddCowHeader;
>
> You should avoid using packed structures for image format headers.
> Instead, I would either:
>
> a) re-order the fields so that 32/64-bit fields, respectively, fall on
> 32/64-bit boundaries (in your case, for instance, moving header_pages_size
> above features) like qed/qcow2 do, or
>
> b) read/write the fields individually rather than reading/writing directly
> into/from the header struct.
>
> The safest route is b). Adds a few lines of code, but you won't have to
> re-work things (or worry about introducing bugs) later if you were to add,
> say, a 32-bit value, and then a 64-bit value later.

While, Kevin's suggestion is using PACKED, so ..
>
>> +
>> +typedef struct BDRVAddCowState {
>> +    BlockDriverState    *image_hd;
>> +    CoMutex             lock;
>> +    int                 cluster_size;
>> +    BlockCache         *bitmap_cache;
>> +    uint64_t            bitmap_size;
>> +    AddCowHeader        header;
>> +    char                backing_file_format[16];
>> +    char                image_file_format[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num)
>> +{
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num)
>> +{
>> +    return sector_num % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num)
>> +{
>> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
>> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
>> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
>> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
>> +    void **table);
>> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
>> +#endif
>> diff --git a/block_int.h b/block_int.h
>> index 6c1d9ca..67954ec 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -53,6 +53,8 @@
>>  #define BLOCK_OPT_SUBFMT            "subformat"
>>  #define BLOCK_OPT_COMPAT_LEVEL      "compat"
>>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
>> +#define BLOCK_OPT_IMAGE_FILE        "image_file"
>> +#define BLOCK_OPT_IMAGE_FORMAT      "image_format"
>>
>>  typedef struct BdrvTrackedRequest BdrvTrackedRequest;
>>
>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
  2012-09-06 17:27   ` Michael Roth
@ 2012-09-10 15:23   ` Kevin Wolf
  2012-09-11  2:12     ` Dong Xu Wang
  1 sibling, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2012-09-10 15:23 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> Document for add-cow format, the usage and spec of add-cow are introduced.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 123 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/add-cow.txt
> 
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..d5a7a68
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,123 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
> +    1)  Create a raw image with the same size of ubuntu.img
> +            qemu-img create -f raw test.raw 8G
> +    2)  Create an add-cow image which will store dirty bitmap
> +            qemu-img create -f add-cow test.add-cow \
> +                -o backing_file=ubuntu.img,image_file=test.raw
> +    3)  Run qemu with add-cow image
> +            qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + |     Header    |   Reserved  |    COW bitmap   |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_pages_size))
> +    Byte    0 -  7:     magic
> +                        add-cow magic string ("ADD_COW\xff").
> +
> +            8 -  11:    version
> +                        Version number (only valid value is 1 now).
> +
> +            12 - 15:    backing file name offset
> +                        Offset in the add-cow file at which the backing file
> +                        name is stored (NB: The string is not nul-terminated).
> +                        If backing file name does NOT exist, this field will be
> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
> +                        must be at least 1 byte).
> +
> +            16 - 19:    backing file name size
> +                        Length of the backing file name in bytes. It will be 0
> +                        if the backing file name offset is 0. If backing file
> +                        name offset is non-zero, then it must be non-zero. Must
> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            20 - 23:    image file name offset
> +                        Offset in the add-cow file at which the image file name
> +                        is stored (NB: The string is not null terminated). It
> +                        must be between 80 and [HEADER_SIZE - 2].
> +
> +            24 - 27:    image file name size
> +                        Length of the image file name in bytes.
> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            28 - 35:    features
> +                        Currently only 1 feature bit is used:

What happens when opening a file with an unknown bit set? How must
unknown bits be initialised?

> +                        Feature bits:
> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.

What does this flag mean, and is it required to be set on that
condition? Also, please use ALL_CAPS.

> +
> +            36 - 43:    optional features
> +                        Not used now. Reserved for future use. It must be set to 0.

And must be ignored when reading.

> +
> +            44 - 47:    header pages size
> +                        The header field is variable-sized. This field indicates
> +                        how many pages(4k) will be used to store add-cow header.
> +                        In add-cow v1, it is fixed to 1, so the header size will
> +                        be 4k * 1 = 4096 bytes.

Why arbitrarily defined "pages" instead of bytes or at least clusters?

> +
> +            48 - 63:    backing file format
> +                        format of backing file. It will be filled with 0 if
> +                        backing file name offset is 0. If backing file name
> +                        offset is non-zero, it must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.

Zero padded on the right, I guess?

Also defining that a string must be "non-zero" looks odd, should
probably be "non-empty".

> +
> +            64 - 79:    image file format
> +                        format of image file. It must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.

Same here.

> +
> +            80 - [HEADER_SIZE - 1]:
> +                        It is used to make sure COW bitmap field starts at the
> +                        HEADER_SIZE byte, backing file name and image file name
> +                        will be stored here. The bytes that is not pointing to
> +                        backing file and image file names will bet set to 0.

"will be set to 0" describes the behaviour of qemu. A spec should
describe the file format, not a specific implementation. Make it "must"
or "should".

> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes.

Should we make the cluster size configurable?

> the size of bitmap is
> +calculated according to virtual size of image file, and it also should be multipe

Typo: multiple

Sure you mean "should", or should it be "must"?

> +of 65536, the bits not used will be set to 0. Within each byte, the least
> +significant bit covers the first cluster. Bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow will be 0.

"must be written as 0 and must be ignored when reading" or something
like that.

> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.

What we do is irrelevant for a spec.

> +Image file and backing file are interpreted relative to the qcow2 file, not
> +to the current working directory of the process that opened the qcow2 file.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-09-10 15:23   ` Kevin Wolf
@ 2012-09-11  2:12     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-11  2:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Mon, Sep 10, 2012 at 11:23 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 123 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..d5a7a68
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,123 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>> +    1)  Create a raw image with the same size of ubuntu.img
>> +            qemu-img create -f raw test.raw 8G
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + |     Header    |   Reserved  |    COW bitmap   |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_pages_size))
>> +    Byte    0 -  7:     magic
>> +                        add-cow magic string ("ADD_COW\xff").
>> +
>> +            8 -  11:    version
>> +                        Version number (only valid value is 1 now).
>> +
>> +            12 - 15:    backing file name offset
>> +                        Offset in the add-cow file at which the backing file
>> +                        name is stored (NB: The string is not nul-terminated).
>> +                        If backing file name does NOT exist, this field will be
>> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> +                        must be at least 1 byte).
>> +
>> +            16 - 19:    backing file name size
>> +                        Length of the backing file name in bytes. It will be 0
>> +                        if the backing file name offset is 0. If backing file
>> +                        name offset is non-zero, then it must be non-zero. Must
>> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            20 - 23:    image file name offset
>> +                        Offset in the add-cow file at which the image file name
>> +                        is stored (NB: The string is not null terminated). It
>> +                        must be between 80 and [HEADER_SIZE - 2].
>> +
>> +            24 - 27:    image file name size
>> +                        Length of the image file name in bytes.
>> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            28 - 35:    features
>> +                        Currently only 1 feature bit is used:
>
> What happens when opening a file with an unknown bit set? How must
> unknown bits be initialised?

Okay, I will code as qcow2, report report_unsupported_feature error.
And I will update
the spec file.

>
>> +                        Feature bits:
>> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.
>
> What does this flag mean, and is it required to be set on that
> condition? Also, please use ALL_CAPS.

This feature bit will used as:
qemu-img create -f add-cow -o image_file=t.raw t.add-cow.

While creating add-cow and without backing_file, this feature can
avoid reading/updating
bitmap. I think it can let the code be more faster.

And also, maybe, I can implement add_cow_check, check if the feature
bit should be set.
How do you think, Kevin?

>
>> +
>> +            36 - 43:    optional features
>> +                        Not used now. Reserved for future use. It must be set to 0.
>
> And must be ignored when reading.
>
Okay.

>> +
>> +            44 - 47:    header pages size
>> +                        The header field is variable-sized. This field indicates
>> +                        how many pages(4k) will be used to store add-cow header.
>> +                        In add-cow v1, it is fixed to 1, so the header size will
>> +                        be 4k * 1 = 4096 bytes.
>
> Why arbitrarily defined "pages" instead of bytes or at least clusters?

Okay, next version I will just caclulate it by bytes.
>
>> +
>> +            48 - 63:    backing file format
>> +                        format of backing file. It will be filled with 0 if
>> +                        backing file name offset is 0. If backing file name
>> +                        offset is non-zero, it must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>
> Zero padded on the right, I guess?

Yes, will update.

>
> Also defining that a string must be "non-zero" looks odd, should
> probably be "non-empty".
>
Okay.

>> +
>> +            64 - 79:    image file format
>> +                        format of image file. It must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>
> Same here.
Okay.
>
>> +
>> +            80 - [HEADER_SIZE - 1]:
>> +                        It is used to make sure COW bitmap field starts at the
>> +                        HEADER_SIZE byte, backing file name and image file name
>> +                        will be stored here. The bytes that is not pointing to
>> +                        backing file and image file names will bet set to 0.
>
> "will be set to 0" describes the behaviour of qemu. A spec should
> describe the file format, not a specific implementation. Make it "must"
> or "should".
Okay.
>
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes.
>
> Should we make the cluster size configurable?
>
>> the size of bitmap is
>> +calculated according to virtual size of image file, and it also should be multipe
>
> Typo: multiple
>
> Sure you mean "should", or should it be "must"?
Okay.

>
>> +of 65536, the bits not used will be set to 0. Within each byte, the least
>> +significant bit covers the first cluster. Bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow will be 0.
>
> "must be written as 0 and must be ignored when reading" or something
> like that.

Okay.
>
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>
> What we do is irrelevant for a spec.

Okay.

>
>> +Image file and backing file are interpreted relative to the qcow2 file, not
>> +to the current working directory of the process that opened the qcow2 file.
>
> Kevin
>

Thank you, Kevin.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
  2012-09-06 17:52   ` Michael Roth
@ 2012-09-11  8:41   ` Kevin Wolf
  1 sibling, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  8:41 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> add-cow and qcow2 file format will share the same cache code, so rename
> block-cache.c to block-cache.c. And related structure and qcow2 code also
> are changed.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block.h                |    3 +
>  block/Makefile.objs    |    3 +-
>  block/qcow2-cache.c    |  323 ------------------------------------------------
>  block/qcow2-cluster.c  |   66 ++++++----
>  block/qcow2-refcount.c |   66 ++++++-----
>  block/qcow2.c          |   36 +++---
>  block/qcow2.h          |   24 +---
>  trace-events           |   13 +-
>  8 files changed, 109 insertions(+), 425 deletions(-)
>  delete mode 100644 block/qcow2-cache.c
> 
> diff --git a/block.h b/block.h
> index e5dfcd7..c325661 100644
> --- a/block.h
> +++ b/block.h
> @@ -401,6 +401,9 @@ typedef enum {
>      BLKDBG_CLUSTER_ALLOC_BYTES,
>      BLKDBG_CLUSTER_FREE,
>  
> +    BLKDBG_ADD_COW_UPDATE,
> +    BLKDBG_ADD_COW_LOAD,
> +

I don't think you should add new events, the existing ones should be
generic enough that you can reuse them. It's somewhat hard to see
without block-cache.c, though.

Can you make sure to have one patch with pure code motion, and a
separate one with the changes needed to make it work with add-cow? It
will help reviewers a lot.

>      BLKDBG_EVENT_MAX,
>  } BlkDebugEvent;
>  
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e179211..335dc7a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "block/qcow2.h"
>  #include "trace.h"
> +#include "block-cache.h"
>  
>  int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>  {
> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>          return new_l1_table_offset;
>      }
>  
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);

I think its better to pass s->cluster_size to the cache initialisation
instead of in each call of the cache function.

For the blkdebug events I guess it's possible as well to move this to
the initialisation, but I'd have to see the block-cache.c code to say
something specific about this.

> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>       * handled.
>       */
>      if (cow) {
> -        qcow2_cache_depends_on_flush(s->l2_table_cache);
> +        block_cache_depends_on_flush(s->l2_table_cache);
>      }
>  
> -    if (qcow2_need_accurate_refcounts(s)) {
> -        qcow2_cache_set_dependency(bs, s->l2_table_cache,
> -                                   s->refcount_block_cache);
> -    }
> +    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
> +        s->refcount_block_cache, s->cluster_size);

What happened with lazy refcounting? Is this a mismerge or did you
intentionally remove the condition? (There's a second place where you do
the same)

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
  2012-09-06 20:19   ` Michael Roth
@ 2012-09-11  9:40   ` Kevin Wolf
  2012-09-12  7:28     ` Dong Xu Wang
  1 sibling, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  9:40 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> add-cow file format core code. It use block-cache.c as cache code.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block/Makefile.objs |    1 +
>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/add-cow.h     |   85 +++++++
>  block_int.h         |    2 +
>  4 files changed, 701 insertions(+), 0 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 block/add-cow.h
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 23bdfc8..7ed5051 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> +block-obj-y += add-cow.o
>  block-obj-y += block-cache.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..d4711d5
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,613 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> +    cpu->magic                      = le64_to_cpu(le->magic);
> +    cpu->version                    = le32_to_cpu(le->version);
> +
> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
> +
> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
> +
> +    cpu->features                   = le64_to_cpu(le->features);
> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> +    le->magic                       = cpu_to_le64(cpu->magic);
> +    le->version                     = cpu_to_le32(cpu->version);
> +
> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
> +
> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
> +
> +    le->features                    = cpu_to_le64(cpu->features);
> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
> +        return 100;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
> +{
> +    AddCowHeader header = {
> +        .magic = ADD_COW_MAGIC,
> +        .version = ADD_COW_VERSION,
> +        .features = 0,
> +        .optional_features = 0,
> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
> +    };
> +    AddCowHeader le_header;
> +    int64_t image_len = 0;
> +    const char *backing_filename = NULL;
> +    const char *backing_fmt = NULL;
> +    const char *image_filename = NULL;
> +    const char *image_format = NULL;
> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
> +    BlockDriver *drv = bdrv_find_format("add-cow");
> +    BDRVAddCowState s;
> +    int ret;
> +
> +    while (options && options->name) {
> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> +            image_len = options->value.n;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
> +            backing_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
> +            backing_fmt = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
> +            image_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
> +            image_format = options->value.s;
> +        }
> +        options++;
> +    }
> +
> +    if (backing_filename) {
> +        header.backing_filename_offset = sizeof(header)
> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
> +        header.backing_filename_size = strlen(backing_filename);
> +
> +        if (!backing_fmt) {
> +            backing_bs = bdrv_new("image");
> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
> +                    | BDRV_O_CACHE_WB, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            backing_fmt = bdrv_get_format_name(backing_bs);
> +            bdrv_delete(backing_bs);
> +        }
> +    } else {
> +        header.features |= ADD_COW_F_All_ALLOCATED;
> +    }
> +
> +    if (image_filename) {
> +        header.image_filename_offset =
> +            sizeof(header) + sizeof(s.backing_file_format)
> +                + sizeof(s.image_file_format) + header.backing_filename_size;
> +        header.image_filename_size = strlen(image_filename);
> +    } else {
> +        error_report("Error: image_file should be given.");
> +        return -EINVAL;
> +    }
> +
> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same backing file name as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (!strcmp(filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same filename as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (header.image_filename_offset + header.image_filename_size
> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
> +        error_report("image_file name or backing_file name too long.");
> +        return -ENOSPC;
> +    }
> +
> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    bdrv_delete(image_bs);
> +
> +    ret = bdrv_create_file(filename, NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    add_cow_header_cpu_to_le(&header, &le_header);
> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
> +        backing_fmt ? strlen(backing_fmt) : 0);

The spec requires zero padding, which you don't do here.

> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
> +        image_format ? image_format : "raw",
> +        image_format ? strlen(image_format) : sizeof("raw"));

And here.

> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    if (backing_filename) {
> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
> +            backing_filename, header.backing_filename_size);
> +        if (ret < 0) {
> +            bdrv_delete(bs);
> +            return ret;
> +        }
> +    }
> +
> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
> +        image_filename, header.image_filename_size);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_truncate(bs, image_len);
> +    bdrv_delete(bs);
> +    return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> +    char                image_filename[ADD_COW_FILE_LEN];
> +    char                tmp_name[ADD_COW_FILE_LEN];
> +    BlockDriver         *image_drv = NULL;
> +    int                 ret;
> +    int                 sector_per_byte;
> +    BDRVAddCowState     *s = bs->opaque;
> +    AddCowHeader        le_header;
> +
> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> +    if (ret != sizeof(s->header)) {

if (ret < 0) would be more consistent with the rest of the code.

> +        goto fail;
> +    }
> +
> +    add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {

Isn't this one endianess conversion too much? s->header is already LE.

Did you test add-cow on a big endian host?

> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if (s->header.version != ADD_COW_VERSION) {
> +        char version[64];
> +        snprintf(version, sizeof(version), "ADD-COW version %d",
> +            s->header.version);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", version);
> +        ret = -ENOTSUP;
> +        goto fail;
> +    }
> +
> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "%" PRIx64,
> +            s->header.features & ~ADD_COW_FEATURE_MASK);

This message is a bit terse, most users will be confused with an error
message that only consists of a hex number. Maybe better "Feature flags:
%" PRIx64.

> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", buf);
> +        return -ENOTSUP;
> +    }
> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
> +            sizeof(s->backing_file_format));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }

Would be great if this was not only read into memory, but actually
used... It must end up in bs->backing_format in order take effect.

> +
> +    ret = bdrv_read_string(bs->file,
> +            sizeof(s->header) + sizeof(s->image_file_format),
> +            sizeof(s->image_file_format) - 1, s->image_file_format,
> +            sizeof(s->image_file_format));
> +    if (ret < 0) {
> +        goto fail;
> +    }

This one is unused, too.

> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> +                          s->header.backing_filename_size, bs->backing_file,
> +                          sizeof(bs->backing_file));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
> +                      s->header.image_filename_size, tmp_name,
> +                      sizeof(tmp_name));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->image_hd = bdrv_new("");
> +    if (path_has_protocol(image_filename)) {
> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> +    } else {
> +        path_combine(image_filename, sizeof(image_filename),
> +                     bs->filename, tmp_name);
> +    }
> +
> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);

image_drv is always NULL.

> +    if (ret < 0) {
> +        bdrv_delete(s->image_hd);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    s->bitmap_size =
> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> +    s->bitmap_cache =
> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
> +
> +    qemu_co_mutex_init(&s->lock);
> +    return 0;
> +fail:
> +    if (s->bitmap_cache) {
> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    }
> +    return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    uint8_t *table      = NULL;
> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);

No matching block_cache_put?

> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
> +        & (1 << (cluster_num % 8));
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, int *num_same)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int changed;
> +
> +    if (nb_sectors == 0) {
> +        *num_same = 0;
> +        return 0;
> +    }
> +
> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
> +        *num_same = nb_sectors - 1;

Why - 1?

> +        return 1;
> +    }
> +    changed = is_allocated(bs, sector_num);
> +
> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
> +            break;
> +        }
> +    }
> +    return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> +                  int64_t sector_num, int nb_sectors)
> +{
> +    int n1;
> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
> +        return nb_sectors;
> +    }
> +    if (sector_num >= bs->total_sectors) {
> +        n1 = 0;
> +    } else {
> +        n1 = bs->total_sectors - sector_num;
> +    }
> +
> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> +    return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    int cur_nr_sectors;
> +    uint64_t bytes_done = 0;
> +    QEMUIOVector hd_qiov;
> +    int n, n1, ret = 0;
> +
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    qemu_co_mutex_lock(&s->lock);
> +    while (remaining_sectors != 0) {
> +        cur_nr_sectors = remaining_sectors;
> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
> +            cur_nr_sectors = n;

One of n and cur_nr_sectors is redundant.

> +            qemu_iovec_reset(&hd_qiov);
> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
> +            qemu_co_mutex_lock(&s->lock);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        } else {
> +            cur_nr_sectors = n;
> +            if (bs->backing_hd) {
> +                qemu_iovec_reset(&hd_qiov);
> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> +                    sector_num, cur_nr_sectors);
> +                if (n1 > 0) {
> +                    qemu_co_mutex_unlock(&s->lock);
> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
> +                                        n, &hd_qiov);
> +                    qemu_co_mutex_lock(&s->lock);
> +                    if (ret < 0) {
> +                        goto fail;
> +                    }
> +                }
> +            } else {
> +                qemu_iovec_memset(&hd_qiov, 0, 0,
> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
> +            }
> +        }
> +        remaining_sectors -= cur_nr_sectors;
> +        sector_num += cur_nr_sectors;
> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> +    }
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> +                                     int n_start, int n_end)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    QEMUIOVector qiov;
> +    struct iovec iov;
> +    int n, ret;
> +
> +    n = n_end - n_start;
> +    if (n <= 0) {
> +        return 0;
> +    }
> +
> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = 0;
> +out:
> +    qemu_vfree(iov.iov_base);
> +    return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int ret = 0, i;
> +    QEMUIOVector hd_qiov;
> +    uint8_t *table;
> +    uint64_t offset;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    ret = bdrv_co_writev(s->image_hd,
> +                     sector_num,
> +                     remaining_sectors, qiov);
> +
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        /* Copy content of unmodified sectors */
> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
> +                sector_num);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        for (i = sector_num / SECTORS_PER_CLUSTER;
> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
> +            i++) {
> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));

The maths in this loop looks a bit confusing, but I think it's correct.

> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
> +                table[i / 8] |= (1 << (i % 8));
> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
> +            }

Missing block_cache_put again?

> +        }
> +    }
> +    ret = 0;
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    int ret;
> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
> +    int64_t bitmap_size =
> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
> +
> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return 0;
> +}

So you don't truncate s->image_file? Does this work?

> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
> +        ADD_COW_CACHE_ENTRY_SIZE);
> +    qemu_co_mutex_unlock(&s->lock);
> +    return ret;
> +}

What about flushing s->image_file?

> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FMT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a image file"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FORMAT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the image file"
> +    },
> +    { NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> +    .format_name                = "add-cow",
> +    .instance_size              = sizeof(BDRVAddCowState),
> +    .bdrv_probe                 = add_cow_probe,
> +    .bdrv_open                  = add_cow_open,
> +    .bdrv_close                 = add_cow_close,
> +    .bdrv_create                = add_cow_create,
> +    .bdrv_co_readv              = add_cow_co_readv,
> +    .bdrv_co_writev             = add_cow_co_writev,
> +    .bdrv_truncate              = bdrv_add_cow_truncate,
> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
> +
> +    .create_options             = add_cow_create_options,
> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> +    bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/add-cow.h b/block/add-cow.h
> new file mode 100644
> index 0000000..f058376
> --- /dev/null
> +++ b/block/add-cow.h
> @@ -0,0 +1,85 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_ADD_COW_H
> +#define BLOCK_ADD_COW_H
> +#include "block-cache.h"
> +
> +enum {
> +    ADD_COW_F_All_ALLOCATED     = 0X01,
> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
> +
> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
> +                    ((uint64_t)'W' << 8) | 0xFF),
> +    ADD_COW_VERSION             = 1,
> +    ADD_COW_FILE_LEN            = 1024,
> +    ADD_COW_CACHE_SIZE          = 16,
> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
> +    ADD_COW_CLUSTER_SIZE        = 65536,
> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
> +    ADD_COW_PAGE_SIZE           = 4096,
> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
> +};
> +
> +typedef struct AddCowHeader {
> +    uint64_t        magic;
> +    uint32_t        version;
> +
> +    uint32_t        backing_filename_offset;
> +    uint32_t        backing_filename_size;
> +
> +    uint32_t        image_filename_offset;
> +    uint32_t        image_filename_size;
> +
> +    uint64_t        features;
> +    uint64_t        optional_features;
> +    uint32_t        header_pages_size;
> +} QEMU_PACKED AddCowHeader;

Why aren't backing/image_file_format part of the header here? They are
in the spec. It would also simplify some offset calculation code.

> +
> +typedef struct BDRVAddCowState {
> +    BlockDriverState    *image_hd;
> +    CoMutex             lock;
> +    int                 cluster_size;
> +    BlockCache         *bitmap_cache;
> +    uint64_t            bitmap_size;
> +    AddCowHeader        header;
> +    char                backing_file_format[16];
> +    char                image_file_format[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num)
> +{
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num)
> +{
> +    return sector_num % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num)
> +{
> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
> +    void **table);
> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);

These functions don't really exist any more, do they?

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-10  2:25     ` Dong Xu Wang
@ 2012-09-11  9:44       ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  9:44 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: Michael Roth, qemu-devel

Am 10.09.2012 04:25, schrieb Dong Xu Wang:
> On Fri, Sep 7, 2012 at 4:19 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>> On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
>>> +typedef struct AddCowHeader {
>>> +    uint64_t        magic;
>>> +    uint32_t        version;
>>> +
>>> +    uint32_t        backing_filename_offset;
>>> +    uint32_t        backing_filename_size;
>>> +
>>> +    uint32_t        image_filename_offset;
>>> +    uint32_t        image_filename_size;
>>> +
>>> +    uint64_t        features;
>>> +    uint64_t        optional_features;
>>> +    uint32_t        header_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> You should avoid using packed structures for image format headers.
>> Instead, I would either:
>>
>> a) re-order the fields so that 32/64-bit fields, respectively, fall on
>> 32/64-bit boundaries (in your case, for instance, moving header_pages_size
>> above features) like qed/qcow2 do, or
>>
>> b) read/write the fields individually rather than reading/writing directly
>> into/from the header struct.
>>
>> The safest route is b). Adds a few lines of code, but you won't have to
>> re-work things (or worry about introducing bugs) later if you were to add,
>> say, a 32-bit value, and then a 64-bit value later.
> 
> While, Kevin's suggestion is using PACKED, so ..

Yes, I think QEMU_PACKED is fine, and it's the safest version.

It would be nice to additionally do Michael's option a) if you like, but
I don't think the header is accessed too often, so the optimisation
isn't that important.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
@ 2012-09-11  9:55   ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  9:55 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> Add qemu-iotests support for add-cow.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  tests/qemu-iotests/017       |    2 +-
>  tests/qemu-iotests/020       |    2 +-
>  tests/qemu-iotests/check     |    4 ++--
>  tests/qemu-iotests/common    |    6 ++++++
>  tests/qemu-iotests/common.rc |   19 +++++++++++++++++++
>  5 files changed, 29 insertions(+), 4 deletions(-)

> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 432732c..122267b 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -243,7 +243,7 @@ do
>  		echo " - no qualified output"
>  		err=true
>  	    else
> -		if diff -w $seq.out $tmp.out >/dev/null 2>&1
> +        if diff -w -I "^Formatting" $seq.out $tmp.out >/dev/null 2>&1
>  		then
>  		    echo ""
>  		    if $err
> @@ -255,7 +255,7 @@ do
>  		else
>  		    echo " - output mismatch (see $seq.out.bad)"
>  		    mv $tmp.out $seq.out.bad
> -		    $diff -w $seq.out $seq.out.bad
> +            $diff -w -I "^Formatting" $seq.out $seq.out.bad
>  		    err=true
>  		fi
>  	    fi

These two hunks don't look right. You probably want to amend the sed
command in _make_test_img().

> diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> index 7782808..ec5afd7 100644
> --- a/tests/qemu-iotests/common.rc
> +++ b/tests/qemu-iotests/common.rc
> @@ -97,6 +97,18 @@ _make_test_img()
>      fi
>      if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
>          optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
> +    elif [ "$IMGFMT" = "add-cow" ]; then
> +        local BACKING="$TEST_IMG"".qcow2"
> +        local IMG="$TEST_IMG"".raw"
> +        if [ "$1" = "-b" ]; then
> +            IMG="$IMG"".b"
> +            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
> +            extra_img_options="-o image_file=$IMG $extra_img_options"
> +        else
> +            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
> +            $QEMU_IMG create -f qcow2 $BACKING $image_size>/dev/null
> +            extra_img_options="-o backing_file=$BACKING,image_file=$IMG"
> +        fi

This looks a bit hackish... Doesn't it completely ignore the requested
backing file name? I'm not sure if this is a good idea.

Can't you just create the raw image file and then use _optstr_add to add
the right -o image_file=... option? It should automatically get the
backing file right.

>      fi
>  
>      if [ -n "$optstr" ]; then
> @@ -125,6 +137,13 @@ _cleanup_test_img()
>              rm -f $TEST_DIR/t.$IMGFMT
>              rm -f $TEST_DIR/t.$IMGFMT.orig
>              rm -f $TEST_DIR/t.$IMGFMT.base
> +            if [ "$IMGFMT" = "add-cow" ]; then
> +                rm -f $TEST_DIR/t.$IMGFMT.qcow2
> +                rm -f $TEST_DIR/t.$IMGFMT.raw
> +                rm -f $TEST_DIR/t.$IMGFMT.raw.b
> +                rm -f $TEST_DIR/t.$IMGFMT.ct.qcow2
> +                rm -f $TEST_DIR/t.$IMGFMT.ct.raw

What are the .ct files?

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-11  9:40   ` Kevin Wolf
@ 2012-09-12  7:28     ` Dong Xu Wang
  2012-09-12  7:50       ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-12  7:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Tue, Sep 11, 2012 at 5:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block/Makefile.objs |    1 +
>>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block/add-cow.h     |   85 +++++++
>>  block_int.h         |    2 +
>>  4 files changed, 701 insertions(+), 0 deletions(-)
>>  create mode 100644 block/add-cow.c
>>  create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>>  block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> +    cpu->magic                      = le64_to_cpu(le->magic);
>> +    cpu->version                    = le32_to_cpu(le->version);
>> +
>> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
>> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
>> +
>> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
>> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
>> +
>> +    cpu->features                   = le64_to_cpu(le->features);
>> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
>> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> +    le->magic                       = cpu_to_le64(cpu->magic);
>> +    le->version                     = cpu_to_le32(cpu->version);
>> +
>> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
>> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
>> +
>> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
>> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
>> +
>> +    le->features                    = cpu_to_le64(cpu->features);
>> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
>> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> +    const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> +        return 100;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
>> +{
>> +    AddCowHeader header = {
>> +        .magic = ADD_COW_MAGIC,
>> +        .version = ADD_COW_VERSION,
>> +        .features = 0,
>> +        .optional_features = 0,
>> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> +    };
>> +    AddCowHeader le_header;
>> +    int64_t image_len = 0;
>> +    const char *backing_filename = NULL;
>> +    const char *backing_fmt = NULL;
>> +    const char *image_filename = NULL;
>> +    const char *image_format = NULL;
>> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> +    BlockDriver *drv = bdrv_find_format("add-cow");
>> +    BDRVAddCowState s;
>> +    int ret;
>> +
>> +    while (options && options->name) {
>> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> +            image_len = options->value.n;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> +            backing_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> +            backing_fmt = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> +            image_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
>> +            image_format = options->value.s;
>> +        }
>> +        options++;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        header.backing_filename_offset = sizeof(header)
>> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
>> +        header.backing_filename_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
>> +                    | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            backing_fmt = bdrv_get_format_name(backing_bs);
>> +            bdrv_delete(backing_bs);
>> +        }
>> +    } else {
>> +        header.features |= ADD_COW_F_All_ALLOCATED;
>> +    }
>> +
>> +    if (image_filename) {
>> +        header.image_filename_offset =
>> +            sizeof(header) + sizeof(s.backing_file_format)
>> +                + sizeof(s.image_file_format) + header.backing_filename_size;
>> +        header.image_filename_size = strlen(image_filename);
>> +    } else {
>> +        error_report("Error: image_file should be given.");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same backing file name as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!strcmp(filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same filename as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (header.image_filename_offset + header.image_filename_size
>> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
>> +        error_report("image_file name or backing_file name too long.");
>> +        return -ENOSPC;
>> +    }
>> +
>> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    bdrv_delete(image_bs);
>> +
>> +    ret = bdrv_create_file(filename, NULL);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    add_cow_header_cpu_to_le(&header, &le_header);
>> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
>> +        backing_fmt ? strlen(backing_fmt) : 0);
>
> The spec requires zero padding, which you don't do here.
Okay.
>
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
>> +        image_format ? image_format : "raw",
>> +        image_format ? strlen(image_format) : sizeof("raw"));
>
> And here.

Okay.

>
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
>> +            backing_filename, header.backing_filename_size);
>> +        if (ret < 0) {
>> +            bdrv_delete(bs);
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
>> +        image_filename, header.image_filename_size);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_truncate(bs, image_len);
>> +    bdrv_delete(bs);
>> +    return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, int flags)
>> +{
>> +    char                image_filename[ADD_COW_FILE_LEN];
>> +    char                tmp_name[ADD_COW_FILE_LEN];
>> +    BlockDriver         *image_drv = NULL;
>> +    int                 ret;
>> +    int                 sector_per_byte;
>> +    BDRVAddCowState     *s = bs->opaque;
>> +    AddCowHeader        le_header;
>> +
>> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> +    if (ret != sizeof(s->header)) {
>
> if (ret < 0) would be more consistent with the rest of the code.
>

Okay.

>> +        goto fail;
>> +    }
>> +
>> +    add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
>
> Isn't this one endianess conversion too much? s->header is already LE.
>
> Did you test add-cow on a big endian host?

My fault, will correct it in next version.

>
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.version != ADD_COW_VERSION) {
>> +        char version[64];
>> +        snprintf(version, sizeof(version), "ADD-COW version %d",
>> +            s->header.version);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", version);
>> +        ret = -ENOTSUP;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "%" PRIx64,
>> +            s->header.features & ~ADD_COW_FEATURE_MASK);
>
> This message is a bit terse, most users will be confused with an error
> message that only consists of a hex number. Maybe better "Feature flags:
> %" PRIx64.
>

Okay.

>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", buf);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
>> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
>> +            sizeof(s->backing_file_format));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>
> Would be great if this was not only read into memory, but actually
> used... It must end up in bs->backing_format in order take effect.
>
>> +
>> +    ret = bdrv_read_string(bs->file,
>> +            sizeof(s->header) + sizeof(s->image_file_format),
>> +            sizeof(s->image_file_format) - 1, s->image_file_format,
>> +            sizeof(s->image_file_format));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>
> This one is unused, too.
>
Okay.

>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> +                          s->header.backing_filename_size, bs->backing_file,
>> +                          sizeof(bs->backing_file));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
>> +                      s->header.image_filename_size, tmp_name,
>> +                      sizeof(tmp_name));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    s->image_hd = bdrv_new("");
>> +    if (path_has_protocol(image_filename)) {
>> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> +    } else {
>> +        path_combine(image_filename, sizeof(image_filename),
>> +                     bs->filename, tmp_name);
>> +    }
>> +
>> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
>
> image_drv is always NULL.
>
>> +    if (ret < 0) {
>> +        bdrv_delete(s->image_hd);
>> +        goto fail;
>> +    }
>> +
>> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
>> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
>> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    s->bitmap_size =
>> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> +    s->bitmap_cache =
>> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> +    qemu_co_mutex_init(&s->lock);
>> +    return 0;
>> +fail:
>> +    if (s->bitmap_cache) {
>> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    }
>> +    return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    uint8_t *table      = NULL;
>> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>
> No matching block_cache_put?
>
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>> +        & (1 << (cluster_num % 8));
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int changed;
>> +
>> +    if (nb_sectors == 0) {
>> +        *num_same = 0;
>> +        return 0;
>> +    }
>> +
>> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>> +        *num_same = nb_sectors - 1;
>
> Why - 1?
>
>> +        return 1;
>> +    }
>> +    changed = is_allocated(bs, sector_num);
>> +
>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>> +            break;
>> +        }
>> +    }
>> +    return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> +                  int64_t sector_num, int nb_sectors)
>> +{
>> +    int n1;
>> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> +        return nb_sectors;
>> +    }
>> +    if (sector_num >= bs->total_sectors) {
>> +        n1 = 0;
>> +    } else {
>> +        n1 = bs->total_sectors - sector_num;
>> +    }
>> +
>> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> +    return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    int cur_nr_sectors;
>> +    uint64_t bytes_done = 0;
>> +    QEMUIOVector hd_qiov;
>> +    int n, n1, ret = 0;
>> +
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    qemu_co_mutex_lock(&s->lock);
>> +    while (remaining_sectors != 0) {
>> +        cur_nr_sectors = remaining_sectors;
>> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
>> +            cur_nr_sectors = n;
>
> One of n and cur_nr_sectors is redundant.
Okay.
>
>> +            qemu_iovec_reset(&hd_qiov);
>> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +            qemu_co_mutex_unlock(&s->lock);
>> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
>> +            qemu_co_mutex_lock(&s->lock);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        } else {
>> +            cur_nr_sectors = n;
>> +            if (bs->backing_hd) {
>> +                qemu_iovec_reset(&hd_qiov);
>> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> +                    sector_num, cur_nr_sectors);
>> +                if (n1 > 0) {
>> +                    qemu_co_mutex_unlock(&s->lock);
>> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> +                                        n, &hd_qiov);
>> +                    qemu_co_mutex_lock(&s->lock);
>> +                    if (ret < 0) {
>> +                        goto fail;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_iovec_memset(&hd_qiov, 0, 0,
>> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
>> +            }
>> +        }
>> +        remaining_sectors -= cur_nr_sectors;
>> +        sector_num += cur_nr_sectors;
>> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> +    }
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> +                                     int n_start, int n_end)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    QEMUIOVector qiov;
>> +    struct iovec iov;
>> +    int n, ret;
>> +
>> +    n = n_end - n_start;
>> +    if (n <= 0) {
>> +        return 0;
>> +    }
>> +
>> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
>> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> +    qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = 0;
>> +out:
>> +    qemu_vfree(iov.iov_base);
>> +    return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd,
>> +                     sector_num,
>> +                     remaining_sectors, qiov);
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
>> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
>> +                sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        for (i = sector_num / SECTORS_PER_CLUSTER;
>> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
>> +            i++) {
>> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
>
> The maths in this loop looks a bit confusing, but I think it's correct.
>
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> +            }
>
> Missing block_cache_put again?
>
>> +        }
>> +    }
>> +    ret = 0;
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    int ret;
>> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>> +    int64_t bitmap_size =
>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>> +
>> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return 0;
>> +}
>
> So you don't truncate s->image_file? Does this work?

s->image_file should be truncated? Image file can have a larger virtual size
than backing_file, my understanding is we should not truncate image file.

>
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>> +        ADD_COW_CACHE_ENTRY_SIZE);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    return ret;
>> +}
>
> What about flushing s->image_file?
>
>> +
>> +static QEMUOptionParameter add_cow_create_options[] = {
>> +    {
>> +        .name = BLOCK_OPT_SIZE,
>> +        .type = OPT_SIZE,
>> +        .help = "Virtual disk size"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FMT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a image file"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FORMAT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the image file"
>> +    },
>> +    { NULL }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> +    .format_name                = "add-cow",
>> +    .instance_size              = sizeof(BDRVAddCowState),
>> +    .bdrv_probe                 = add_cow_probe,
>> +    .bdrv_open                  = add_cow_open,
>> +    .bdrv_close                 = add_cow_close,
>> +    .bdrv_create                = add_cow_create,
>> +    .bdrv_co_readv              = add_cow_co_readv,
>> +    .bdrv_co_writev             = add_cow_co_writev,
>> +    .bdrv_truncate              = bdrv_add_cow_truncate,
>> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
>> +
>> +    .create_options             = add_cow_create_options,
>> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> +    bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/add-cow.h b/block/add-cow.h
>> new file mode 100644
>> index 0000000..f058376
>> --- /dev/null
>> +++ b/block/add-cow.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef BLOCK_ADD_COW_H
>> +#define BLOCK_ADD_COW_H
>> +#include "block-cache.h"
>> +
>> +enum {
>> +    ADD_COW_F_All_ALLOCATED     = 0X01,
>> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
>> +
>> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
>> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
>> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
>> +                    ((uint64_t)'W' << 8) | 0xFF),
>> +    ADD_COW_VERSION             = 1,
>> +    ADD_COW_FILE_LEN            = 1024,
>> +    ADD_COW_CACHE_SIZE          = 16,
>> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
>> +    ADD_COW_CLUSTER_SIZE        = 65536,
>> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
>> +    ADD_COW_PAGE_SIZE           = 4096,
>> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> +    uint64_t        magic;
>> +    uint32_t        version;
>> +
>> +    uint32_t        backing_filename_offset;
>> +    uint32_t        backing_filename_size;
>> +
>> +    uint32_t        image_filename_offset;
>> +    uint32_t        image_filename_size;
>> +
>> +    uint64_t        features;
>> +    uint64_t        optional_features;
>> +    uint32_t        header_pages_size;
>> +} QEMU_PACKED AddCowHeader;
>
> Why aren't backing/image_file_format part of the header here? They are
> in the spec. It would also simplify some offset calculation code.
>

Anthony said "It's far better to shrink the size of the header and use
an offset/len
pointer to the backing file string.  Limiting backing files to 1023 is
unacceptable"

http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html

So I use offset  and length instead of using string directly.

>> +
>> +typedef struct BDRVAddCowState {
>> +    BlockDriverState    *image_hd;
>> +    CoMutex             lock;
>> +    int                 cluster_size;
>> +    BlockCache         *bitmap_cache;
>> +    uint64_t            bitmap_size;
>> +    AddCowHeader        header;
>> +    char                backing_file_format[16];
>> +    char                image_file_format[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num)
>> +{
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num)
>> +{
>> +    return sector_num % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num)
>> +{
>> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
>> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
>> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
>> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
>> +    void **table);
>> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
>
> These functions don't really exist any more, do they?

Right, sorry.

>
> Kevin
>

Thank you, Kevin.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-12  7:28     ` Dong Xu Wang
@ 2012-09-12  7:50       ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-12  7:50 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 12.09.2012 09:28, schrieb Dong Xu Wang:
>>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>>> +{
>>> +    BDRVAddCowState *s  = bs->opaque;
>>> +    BlockCache *c = s->bitmap_cache;
>>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>>> +    uint8_t *table      = NULL;
>>> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>>> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>>> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
>>> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>>
>> No matching block_cache_put?
>>
>>> +
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>>> +        & (1 << (cluster_num % 8));
>>> +}
>>> +
>>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>>> +        int64_t sector_num, int nb_sectors, int *num_same)
>>> +{
>>> +    BDRVAddCowState *s = bs->opaque;
>>> +    int changed;
>>> +
>>> +    if (nb_sectors == 0) {
>>> +        *num_same = 0;
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>>> +        *num_same = nb_sectors - 1;
>>
>> Why - 1?
>>
>>> +        return 1;
>>> +    }
>>> +    changed = is_allocated(bs, sector_num);
>>> +
>>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>>> +            break;
>>> +        }
>>> +    }
>>> +    return changed;
>>> +}

>>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>>> +{
>>> +    BDRVAddCowState *s = bs->opaque;
>>> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>>> +    int ret;
>>> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>>> +    int64_t bitmap_size =
>>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>>> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>>> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>>> +
>>> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    return 0;
>>> +}
>>
>> So you don't truncate s->image_file? Does this work?
> 
> s->image_file should be truncated? Image file can have a larger virtual size
> than backing_file, my understanding is we should not truncate image file.

I'm talking about s->image_hd, not bs->backing_hd. You are right that
the backing file should not be changed. But the associated raw image
should be resized, shouldn't it?

>>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>>> +{
>>> +    BDRVAddCowState *s = bs->opaque;
>>> +    int ret;
>>> +
>>> +    qemu_co_mutex_lock(&s->lock);
>>> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>>> +        ADD_COW_CACHE_ENTRY_SIZE);
>>> +    qemu_co_mutex_unlock(&s->lock);
>>> +    return ret;
>>> +}
>>
>> What about flushing s->image_file?

>>> +typedef struct AddCowHeader {
>>> +    uint64_t        magic;
>>> +    uint32_t        version;
>>> +
>>> +    uint32_t        backing_filename_offset;
>>> +    uint32_t        backing_filename_size;
>>> +
>>> +    uint32_t        image_filename_offset;
>>> +    uint32_t        image_filename_size;
>>> +
>>> +    uint64_t        features;
>>> +    uint64_t        optional_features;
>>> +    uint32_t        header_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> Why aren't backing/image_file_format part of the header here? They are
>> in the spec. It would also simplify some offset calculation code.
>>
> 
> Anthony said "It's far better to shrink the size of the header and use
> an offset/len
> pointer to the backing file string.  Limiting backing files to 1023 is
> unacceptable"
> 
> http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html
> 
> So I use offset  and length instead of using string directly.

I'm talking about the format, not the path.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-09-12  7:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
2012-09-06 17:27   ` Michael Roth
2012-09-10  1:48     ` Dong Xu Wang
2012-09-10 15:23   ` Kevin Wolf
2012-09-11  2:12     ` Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
2012-09-06 17:27   ` Michael Roth
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
2012-09-06 17:32   ` Michael Roth
2012-09-10  1:49     ` Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
2012-09-06 17:52   ` Michael Roth
2012-09-10  2:14     ` Dong Xu Wang
2012-09-11  8:41   ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
2012-09-06 20:19   ` Michael Roth
2012-09-10  2:25     ` Dong Xu Wang
2012-09-11  9:44       ` Kevin Wolf
2012-09-11  9:40   ` Kevin Wolf
2012-09-12  7:28     ` Dong Xu Wang
2012-09-12  7:50       ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
2012-09-11  9:55   ` Kevin Wolf
2012-08-23  5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.