qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
@ 2016-01-26 10:38 Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
                   ` (16 more replies)
  0 siblings, 17 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Hi all,

This series introduces a simple format to enable support of persistence of
block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
backup, and persistence of block dirty bitmap makes incrememtal backup possible
across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.

When user creates a "persisted" dirty bitmap, the QBM driver will create a
binary file and synchronize it with the existing in-memory block dirty bitmap
(BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
bits saved on disk, which will be loaded and used to initialize the in-memory
block dirty bitmap next time the guest is started.

The idea of the format is to reuse as much existing infrastructure as possible
and avoid introducing complex data structures - it works with any image format,
by gluing it together plain bitmap files with a json descriptor file. The
advantage of this approach over extending existing formats, such as qcow2, is
that the new feature is implemented by an orthogonal driver, in a format
agnostic way. This way, even raw images can have their persistent dirty
bitmaps.  (And you will notice in this series, with a little forging to the
spec, raw images can also have backing files through a QBM overlay!)

Rather than superseding it, this intends to be coexistent in parallel with the
qcow2 bitmap extension that Vladimir is working on.  The block driver interface
changes in this series also try to be generic and compatible for both drivers.

The format's specification is added to docs/specs/, see patch 1.

Patches 2-7 are necessary block layer changes in order be friendly to
persistent dirty bitmap drivers.

Patches 8, 9 and 11 extends the QMP interface to expose the added feature.

Patch 10 implements the driver. (todo: checksum extension for image/bitmap
integrity check)

Patch 12 - 16 are the tests I have for QBM so far. I'm sure more can be added
as questions emerge. :)

The series applies on top of my "qemu-img map" series and "meta bitmap" series:

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04866.html
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03656.html

If you feel like to play with it, git branch is also available at:

    https://github.com/famz/qemu qbm

Comments are welcome!

Fam


Fam Zheng (16):
  doc: Add QBM format specification
  block: Set dirty before doing write
  block: Allow .bdrv_close callback to release dirty bitmaps
  block: Move filename_decompose to block.c
  block: Make bdrv_get_cluster_size public
  block: Introduce bdrv_dirty_bitmap_set_persistent
  block: Only swap non-persistent dirty bitmaps
  qmp: Add optional parameter "persistent" in block-dirty-bitmap-add
  qmp: Add block-dirty-bitmap-set-persistent
  qbm: Implement format driver
  qapi: Add "qbm" as a generic cow format driver
  iotests: Add qbm format to 041
  iotests: Add qbm to case 097
  iotests: Add qbm to applicable test cases
  iotests: Add qbm specific test case 140
  iotests: Add persistent bitmap test case 141

 block.c                      |   54 +-
 block/Makefile.objs          |    1 +
 block/dirty-bitmap.c         |   63 ++
 block/io.c                   |    6 +-
 block/qbm.c                  | 1315 ++++++++++++++++++++++++++++++++++++++++++
 block/vmdk.c                 |   40 --
 blockdev.c                   |   28 +-
 docs/specs/qbm.md            |  118 ++++
 include/block/block.h        |    4 +-
 include/block/block_int.h    |    8 +
 include/block/dirty-bitmap.h |    5 +
 qapi/block-core.json         |   31 +-
 qmp-commands.hx              |   34 +-
 tests/qemu-iotests/004       |    2 +-
 tests/qemu-iotests/017       |    2 +-
 tests/qemu-iotests/018       |    2 +-
 tests/qemu-iotests/019       |    2 +-
 tests/qemu-iotests/020       |    2 +-
 tests/qemu-iotests/024       |    2 +-
 tests/qemu-iotests/025       |    2 +-
 tests/qemu-iotests/027       |    2 +-
 tests/qemu-iotests/028       |    2 +-
 tests/qemu-iotests/030       |    2 +-
 tests/qemu-iotests/034       |    2 +-
 tests/qemu-iotests/037       |    2 +-
 tests/qemu-iotests/038       |    2 +-
 tests/qemu-iotests/040       |    2 +-
 tests/qemu-iotests/041       |   18 +-
 tests/qemu-iotests/050       |    2 +-
 tests/qemu-iotests/055       |    2 +-
 tests/qemu-iotests/056       |    2 +-
 tests/qemu-iotests/069       |    2 +-
 tests/qemu-iotests/072       |    2 +-
 tests/qemu-iotests/086       |    2 +-
 tests/qemu-iotests/095       |    2 +-
 tests/qemu-iotests/096       |    2 +-
 tests/qemu-iotests/097       |    4 +-
 tests/qemu-iotests/099       |    2 +-
 tests/qemu-iotests/110       |    2 +-
 tests/qemu-iotests/129       |    2 +-
 tests/qemu-iotests/132       |    2 +-
 tests/qemu-iotests/139       |    2 +-
 tests/qemu-iotests/140       |   80 +++
 tests/qemu-iotests/140.out   |  145 +++++
 tests/qemu-iotests/141       |   62 ++
 tests/qemu-iotests/141.out   |    5 +
 tests/qemu-iotests/common    |    6 +
 tests/qemu-iotests/group     |    2 +
 48 files changed, 1998 insertions(+), 85 deletions(-)
 create mode 100644 block/qbm.c
 create mode 100644 docs/specs/qbm.md
 create mode 100755 tests/qemu-iotests/140
 create mode 100644 tests/qemu-iotests/140.out
 create mode 100644 tests/qemu-iotests/141
 create mode 100644 tests/qemu-iotests/141.out

-- 
2.4.3

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 17:51   ` Eric Blake
                     ` (3 more replies)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write Fam Zheng
                   ` (15 subsequent siblings)
  16 siblings, 4 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100644 docs/specs/qbm.md

diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
new file mode 100644
index 0000000..b91910b
--- /dev/null
+++ b/docs/specs/qbm.md
@@ -0,0 +1,118 @@
+QEMU Block Bitmap (QBM)
+=======================
+
+QBM is a multi-file disk format to allow storing persistent block bitmaps along
+with the tracked data image.  A QBM image includes one json descriptor file,
+one data image, one or more bitmap files that describe the block dirty status
+of the data image.
+
+The json file describes the structure of the image. The structure of the json
+descriptor file is:
+
+    QBM-JSON-FILE := { "QBM": DESC-JSON }
+
+    DESC-JSON := { "version": 1,
+                   "image": IMAGE,
+                   "BITMAPS": BITMAPS
+                 }
+
+Fields in the top level json dictionary are:
+
+@version: An integer which must be 1.
+@image: A dictionary in IMAGE schema, as described later. It provides the
+        information of the data image where user data is stored. Its format is
+        documented in the "IMAGE schema" section.
+@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
+          the dictionary are the names of bitmap, which must be strings, and
+          each value is a dictionary describing the information of the bitmap,
+          as documented below in the "BITMAP schema" section.
+
+=== IMAGE schema ===
+
+An IMAGE records the information of an image (such as a data image or a backing
+file). It has following fields:
+
+@file: The file name string of the referenced image. If it's a relative path,
+       the file should be found relative to the descriptor file's
+       location.
+@format: The format string of the file.
+
+=== BITMAP schema ===
+
+A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
+or a block allocation status bitmap). It has following mandatory fields:
+
+@file: The name of the bitmap file. The bitmap file is in little endian, both
+       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
+       corresponds to the first sectors.
+@granularity-bytes: How many bytes of data does one bit in the bitmap track.
+                    This value must be a power of 2 and no less than 512.
+@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
+       supported.
+       "dirty" indicates a block dirty bitmap; "allocation" indicates a
+       allocation status bitmap. There must be at most one "allocation" bitmap.
+
+If the type of the bitmap is "allocation", an extra field "backing" is also
+accepted:
+
+@backing: a dictionary as specified in the IMAGE schema. It can be used to
+          adding a backing file to raw image.
+
+
+=== Extended fields ===
+
+Implementations are allowed to extend the format schema by inserting additinoal
+members into above dictionaries, with key names that starts with either
+an "ext-hard-" or an "ext-soft-" prefix.
+
+Extended fields prefixed with "ext-soft-" are optional and can be ignored by
+parsers if they do not support it; fields starting with "ext-hard-" are
+mandatory and cannot be ignored, a parser should not proceed parsing the image
+if it does not support it.
+
+It is strongly recommended that the application names are also included in the
+extention name string, such as "ext-hard-qemu-", if the effect or
+interpretation of the field is local to a specific application.
+
+For example, QEMU can implement a "checksum" feature to make sure no files
+referred to by the json descriptor are modified inconsistently, by adding
+"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
+the json text found below.
+
+=== QBM descriptor file example ===
+
+This is the content of a QBM image's json descriptor file, which contains a
+data image (data.img), and three bitmaps, out of which the "allocation" bitmap
+associates a backing file to this image (base.img).
+
+{ "QBM": {
+    "version": 1,
+    "image": {
+        "file": "data.img",
+        "format": "raw"
+        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
+    },
+    "bitmaps": {
+        "0": {
+            "file": "bitmap0.bin",
+            "granularity-bytes": 512,
+            "type": "dirty"
+        },
+        "1": {
+            "file": "bitmap1.bin",
+            "granularity-bytes": 4096,
+            "type": "dirty"
+        },
+        "2": {
+            "file": "bitmap3.bin",
+            "granularity-bytes": 4096,
+            "type": "allocation"
+            "backing": {
+                "file": "base.img",
+                "format": "raw"
+                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
+            },
+        }
+    }
+} }
+
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 17:52   ` Eric Blake
  2016-02-09  0:11   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps Fam Zheng
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

So that driver can write the dirty bits into persistent dirty bitmaps in
the write callback.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 343ff1f..b964e7e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1164,6 +1164,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
         }
     }
 
+    bdrv_set_dirty(bs, sector_num, nb_sectors);
+
     if (ret < 0) {
         /* Do nothing, write notifier decided to fail this request */
     } else if (flags & BDRV_REQ_ZERO_WRITE) {
@@ -1179,8 +1181,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
         ret = bdrv_co_flush(bs);
     }
 
-    bdrv_set_dirty(bs, sector_num, nb_sectors);
-
     if (bs->wr_highest_offset < offset + bytes) {
         bs->wr_highest_offset = offset + bytes;
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 17:53   ` Eric Blake
  2016-02-09  0:23   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c Fam Zheng
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

If the driver owns some dirty bitmaps, this assertion will fail.

The correct place to release them is in bdrv_close, so move the
assertion one line down.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index afb71c0..fa6ad1d 100644
--- a/block.c
+++ b/block.c
@@ -2348,10 +2348,11 @@ static void bdrv_delete(BlockDriverState *bs)
     assert(!bs->job);
     assert(bdrv_op_blocker_is_empty(bs));
     assert(!bs->refcnt);
-    assert(QLIST_EMPTY(&bs->dirty_bitmaps));
 
     bdrv_close(bs);
 
+    assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+
     /* remove from list, if necessary */
     bdrv_make_anon(bs);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (2 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-27 16:07   ` Eric Blake
  2016-02-09 20:56   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public Fam Zheng
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

With the return value decoupled from VMDK, it can be reused by other block
code.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
 block/vmdk.c          | 40 ----------------------------------------
 include/block/block.h |  2 ++
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block.c b/block.c
index fa6ad1d..78db342 100644
--- a/block.c
+++ b/block.c
@@ -144,6 +144,46 @@ int path_is_absolute(const char *path)
 #endif
 }
 
+int filename_decompose(const char *filename, char *path, char *prefix,
+                       char *postfix, size_t buf_len, Error **errp)
+{
+    const char *p, *q;
+
+    if (filename == NULL || !strlen(filename)) {
+        error_setg(errp, "No filename provided");
+        return -EINVAL;
+    }
+    p = strrchr(filename, '/');
+    if (p == NULL) {
+        p = strrchr(filename, '\\');
+    }
+    if (p == NULL) {
+        p = strrchr(filename, ':');
+    }
+    if (p != NULL) {
+        p++;
+        if (p - filename >= buf_len) {
+            return -EINVAL;
+        }
+        pstrcpy(path, p - filename + 1, filename);
+    } else {
+        p = filename;
+        path[0] = '\0';
+    }
+    q = strrchr(p, '.');
+    if (q == NULL) {
+        pstrcpy(prefix, buf_len, p);
+        postfix[0] = '\0';
+    } else {
+        if (q - p >= buf_len) {
+            return -EINVAL;
+        }
+        pstrcpy(prefix, q - p + 1, p);
+        pstrcpy(postfix, buf_len, q);
+    }
+    return 0;
+}
+
 /* if filename is absolute, just copy it to dest. Otherwise, build a
    path to it by considering it is relative to base_path. URL are
    supported. */
diff --git a/block/vmdk.c b/block/vmdk.c
index f8f7fcf..505e0c2 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1764,46 +1764,6 @@ exit:
     return ret;
 }
 
-static int filename_decompose(const char *filename, char *path, char *prefix,
-                              char *postfix, size_t buf_len, Error **errp)
-{
-    const char *p, *q;
-
-    if (filename == NULL || !strlen(filename)) {
-        error_setg(errp, "No filename provided");
-        return VMDK_ERROR;
-    }
-    p = strrchr(filename, '/');
-    if (p == NULL) {
-        p = strrchr(filename, '\\');
-    }
-    if (p == NULL) {
-        p = strrchr(filename, ':');
-    }
-    if (p != NULL) {
-        p++;
-        if (p - filename >= buf_len) {
-            return VMDK_ERROR;
-        }
-        pstrcpy(path, p - filename + 1, filename);
-    } else {
-        p = filename;
-        path[0] = '\0';
-    }
-    q = strrchr(p, '.');
-    if (q == NULL) {
-        pstrcpy(prefix, buf_len, p);
-        postfix[0] = '\0';
-    } else {
-        if (q - p >= buf_len) {
-            return VMDK_ERROR;
-        }
-        pstrcpy(prefix, q - p + 1, p);
-        pstrcpy(postfix, buf_len, q);
-    }
-    return VMDK_OK;
-}
-
 static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
 {
     int idx = 0;
diff --git a/include/block/block.h b/include/block/block.h
index bfb76f8..b9b30cb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -449,6 +449,8 @@ int bdrv_is_snapshot(BlockDriverState *bs);
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
+int filename_decompose(const char *filename, char *path, char *prefix,
+                       char *postfix, size_t buf_len, Error **errp);
 void path_combine(char *dest, int dest_size,
                   const char *base_path,
                   const char *filename);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (3 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-27 16:08   ` Eric Blake
  2016-02-09 21:06   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent Fam Zheng
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c            | 2 +-
 include/block/block.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index b964e7e..15e461f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -425,7 +425,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
     }
 }
 
-static int bdrv_get_cluster_size(BlockDriverState *bs)
+int bdrv_get_cluster_size(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
     int ret;
diff --git a/include/block/block.h b/include/block/block.h
index b9b30cb..16b7845 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -435,7 +435,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
                             int64_t sector_num, int nb_sectors,
                             int64_t *cluster_sector_num,
                             int *cluster_nb_sectors);
-
+int bdrv_get_cluster_size(BlockDriverState *bs);
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
                                char *filename, int filename_size);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (4 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-02-09 21:31   ` John Snow
  2016-02-09 22:04   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 07/16] block: Only swap non-persistent dirty bitmaps Fam Zheng
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

By implementing bdrv_dirty_bitmap_set_persistent, a driver can support
the persistent dirty bitmap feature.

Once a dirty bitmap is made persistent, the driver is responsible for saving
the dirty bitmap when appropriate, for example before close; if a persistent
bitmap is removed or made non-persistent, .bdrv_dirty_bitmap_set_persistent
will be called, the driver should then remove the dirty bitmap from the disk.

This operation is not recursed in block layer, a filter such as blkdebug needs
to implement the callback and explicitly pass down to bs->file, etc.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/dirty-bitmap.c         | 38 ++++++++++++++++++++++++++++++++++++++
 include/block/block_int.h    |  8 ++++++++
 include/block/dirty-bitmap.h |  4 ++++
 3 files changed, 50 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 1aa7f76..882a0db 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -43,6 +43,7 @@ struct BdrvDirtyBitmap {
     int64_t size;               /* Size of the bitmap (Number of sectors) */
     bool disabled;              /* Bitmap is read-only */
     int active_iterators;       /* How many iterators are active */
+    bool persistent;            /* Whether this bitmap is persistent. */
     QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
@@ -71,6 +72,37 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
     bitmap->name = NULL;
 }
 
+int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
+                                     BdrvDirtyBitmap *bitmap,
+                                     bool persistent, bool flag_only,
+                                     Error **errp)
+{
+    int ret = 0;
+
+    if (!bitmap->name) {
+        error_setg(errp, "Cannot change the persistent status of an anonymous"
+                         "bitmap");
+        return -EINVAL;
+    }
+
+    if (persistent == bitmap->persistent) {
+        return 0;
+    }
+
+    if (!flag_only) {
+        if (!bs->drv || !bs->drv->bdrv_dirty_bitmap_set_persistent) {
+            error_setg(errp, "Not supported in this format.");
+            return -ENOTSUP;
+        }
+        ret = bs->drv->bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent,
+                                                        errp);
+    }
+    if (!ret) {
+        bitmap->persistent = persistent;
+    }
+    return ret;
+}
+
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
                                           uint32_t granularity,
                                           const char *name,
@@ -194,6 +226,12 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
     uint64_t granularity;
     BdrvDirtyBitmap *child;
 
+    if (bitmap->persistent) {
+        error_setg(errp, "Cannot create a successor for a bitmap that is "
+                   "persistent");
+        return -1;
+    }
+
     if (bdrv_dirty_bitmap_frozen(bitmap)) {
         error_setg(errp, "Cannot create a successor for a bitmap that is "
                    "currently frozen");
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5fa58e8..fbc34af 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -305,6 +305,14 @@ struct BlockDriver {
      */
     void (*bdrv_drain)(BlockDriverState *bs);
 
+    /**
+     * Make the dirty bitmap persistent if persistent=true or transient
+     * otherwise.
+     */
+    int (*bdrv_dirty_bitmap_set_persistent)(BlockDriverState *bs,
+                                            BdrvDirtyBitmap *bitmap,
+                                            bool persistent, Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index d14d923..5885720 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -24,6 +24,10 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
                                         const char *name);
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
+int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
+                                     BdrvDirtyBitmap *bitmap,
+                                     bool persistent, bool flag_only,
+                                     Error **errp);
 void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 07/16] block: Only swap non-persistent dirty bitmaps
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (5 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add Fam Zheng
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Persistent dirty bitmaps are special because they're tightly associated
with, or even belonging to the driver, swapping them doesn't make much
sense. Because this has nothing to do with backward compatibility, it's
okay to just let them stay with the old BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c                      | 11 +++++------
 block/dirty-bitmap.c         | 25 +++++++++++++++++++++++++
 include/block/dirty-bitmap.h |  1 +
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 78db342..3a29de2 100644
--- a/block.c
+++ b/block.c
@@ -2274,9 +2274,6 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->copy_on_read       = bs_src->copy_on_read;
 
     bs_dest->enable_write_cache = bs_src->enable_write_cache;
-
-    /* dirty bitmap */
-    bs_dest->dirty_bitmaps      = bs_src->dirty_bitmaps;
 }
 
 static void change_parent_backing_link(BlockDriverState *from,
@@ -2302,10 +2299,12 @@ static void change_parent_backing_link(BlockDriverState *from,
 }
 
 static void swap_feature_fields(BlockDriverState *bs_top,
-                                BlockDriverState *bs_new)
+                                BlockDriverState *bs_new,
+                                Error **errp)
 {
     BlockDriverState tmp;
 
+    bdrv_dirty_bitmap_swap(bs_top, bs_new);
     bdrv_move_feature_fields(&tmp, bs_top);
     bdrv_move_feature_fields(bs_top, bs_new);
     bdrv_move_feature_fields(bs_new, &tmp);
@@ -2343,7 +2342,7 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
     change_parent_backing_link(bs_top, bs_new);
 
     /* Some fields always stay on top of the backing file chain */
-    swap_feature_fields(bs_top, bs_new);
+    swap_feature_fields(bs_top, bs_new, NULL);
 
     bdrv_set_backing_hd(bs_new, bs_top);
     bdrv_unref(bs_top);
@@ -2368,7 +2367,7 @@ void bdrv_replace_in_backing_chain(BlockDriverState *old, BlockDriverState *new)
          * swap instead so that pointers aren't duplicated and cause trouble.
          * (Also, bdrv_swap() used to do the same.) */
         assert(!new->blk);
-        swap_feature_fields(old, new);
+        swap_feature_fields(old, new, NULL);
     }
     change_parent_backing_link(old, new);
 
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 882a0db..a3a401f 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -65,6 +65,31 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
     return NULL;
 }
 
+/* Swap non-persistent dirty bitmaps. */
+void bdrv_dirty_bitmap_swap(BlockDriverState *bs1, BlockDriverState *bs2)
+{
+    BdrvDirtyBitmap *bm, *next;
+    QLIST_HEAD(, BdrvDirtyBitmap) tmp = QLIST_HEAD_INITIALIZER(&tmp);
+
+    QLIST_FOREACH_SAFE(bm, &bs1->dirty_bitmaps, list, next) {
+        if (bm->persistent) {
+            continue;
+        }
+        QLIST_REMOVE(bm, list);
+        QLIST_INSERT_HEAD(&tmp, bm, list);
+    }
+    QLIST_FOREACH_SAFE(bm, &bs2->dirty_bitmaps, list, next) {
+        if (bm->persistent) {
+            continue;
+        }
+        QLIST_REMOVE(bm, list);
+        QLIST_INSERT_HEAD(&bs1->dirty_bitmaps, bm, list);
+    }
+    QLIST_FOREACH_SAFE(bm, &tmp, list, next) {
+        QLIST_INSERT_HEAD(&bs2->dirty_bitmaps, bm, list);
+    }
+}
+
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
 {
     assert(!bdrv_dirty_bitmap_frozen(bitmap));
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 5885720..a4de9c7 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -23,6 +23,7 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
                                            Error **errp);
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
                                         const char *name);
+void bdrv_dirty_bitmap_swap(BlockDriverState *bs1, BlockDriverState *bs2);
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
 int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
                                      BdrvDirtyBitmap *bitmap,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (6 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 07/16] block: Only swap non-persistent dirty bitmaps Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-02-09 22:05   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent Fam Zheng
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

When omitted it defaults to false with unchanged behavior.

When set to true, the created dirty bitmap is made persistent if supported, it
requires support from the active image format. Otherwise an error is returned.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c           | 8 +++++++-
 qapi/block-core.json | 6 +++++-
 qmp-commands.hx      | 3 ++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 07cfe25..08236f2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1997,6 +1997,7 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
     /* AIO context taken and released within qmp_block_dirty_bitmap_add */
     qmp_block_dirty_bitmap_add(action->node, action->name,
                                action->has_granularity, action->granularity,
+                               action->has_persistent, action->persistent,
                                &local_err);
 
     if (!local_err) {
@@ -2640,10 +2641,12 @@ out:
 
 void qmp_block_dirty_bitmap_add(const char *node, const char *name,
                                 bool has_granularity, uint32_t granularity,
+                                bool has_persistent, bool persistent,
                                 Error **errp)
 {
     AioContext *aio_context;
     BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
 
     if (!name || name[0] == '\0') {
         error_setg(errp, "Bitmap name cannot be empty");
@@ -2669,7 +2672,10 @@ void qmp_block_dirty_bitmap_add(const char *node, const char *name,
         granularity = bdrv_get_default_bitmap_granularity(bs);
     }
 
-    bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+    bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+    if (bitmap && has_persistent && persistent) {
+        bdrv_dirty_bitmap_set_persistent(bs, bitmap, true, false, errp);
+    }
 
  out:
     aio_context_release(aio_context);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 30c2e5f..0ac107c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1162,10 +1162,14 @@
 # @granularity: #optional the bitmap granularity, default is 64k for
 #               block-dirty-bitmap-add
 #
+# @persistent: #optinal whether to make the bitmap persistent, default is false.
+#              (Since 2.6)
+#
 # Since 2.4
 ##
 { 'struct': 'BlockDirtyBitmapAdd',
-  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32' } }
+  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32',
+            '*persistent': 'bool' } }
 
 ##
 # @block-dirty-bitmap-add
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db072a6..bd4428e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1373,7 +1373,7 @@ EQMP
 
     {
         .name       = "block-dirty-bitmap-add",
-        .args_type  = "node:B,name:s,granularity:i?",
+        .args_type  = "node:B,name:s,granularity:i?,persistent:b?",
         .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_add,
     },
 
@@ -1390,6 +1390,7 @@ Arguments:
 - "node": device/node on which to create dirty bitmap (json-string)
 - "name": name of the new dirty bitmap (json-string)
 - "granularity": granularity to track writes with (int, optional)
+- "persistent": whether the bitmap is persistent (bool, optional, default to no)
 
 Example:
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (7 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-02-09 22:49   ` John Snow
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 10/16] qbm: Implement format driver Fam Zheng
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c           | 20 ++++++++++++++++++++
 qapi/block-core.json | 22 ++++++++++++++++++++++
 qmp-commands.hx      | 31 +++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 08236f2..a9d6617 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2699,6 +2699,9 @@ void qmp_block_dirty_bitmap_remove(const char *node, const char *name,
                    name);
         goto out;
     }
+    if (bdrv_dirty_bitmap_set_persistent(bs, bitmap, false, false, errp)) {
+        goto out;
+    }
     bdrv_dirty_bitmap_make_anon(bitmap);
     bdrv_release_dirty_bitmap(bs, bitmap);
 
@@ -2740,6 +2743,23 @@ void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
     aio_context_release(aio_context);
 }
 
+void qmp_block_dirty_bitmap_set_persistent(const char *node, const char *name,
+                                           bool persistent, Error **errp)
+{
+    AioContext *aio_context;
+    BdrvDirtyBitmap *bitmap;
+    BlockDriverState *bs;
+
+    bitmap = block_dirty_bitmap_lookup(node, name, &bs, &aio_context, errp);
+    if (!bitmap || !bs) {
+        return;
+    }
+
+    bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent, false, errp);
+
+    aio_context_release(aio_context);
+}
+
 void hmp_drive_del(Monitor *mon, const QDict *qdict)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0ac107c..52689ed 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1263,6 +1263,28 @@
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
+# @block-dirty-bitmap-set-persistent
+#
+# Update a dirty bitmap's persistent state on the device
+#
+# @node: name of device/node which the bitmap is tracking
+#
+# @name: name of the dirty bitmap
+#
+# @persistent: #optinal whether to make the bitmap persistent, default is false
+#
+# Returns: nothing on success
+#          If @node is not a valid block device, DeviceNotFound
+#          If @name is not found, GenericError with an explanation
+#          If an error happens when setting the persistent state, GenericError
+#          with an explanation
+#
+# Since 2.6
+##
+{ 'command': 'block-dirty-bitmap-set-persistent',
+  'data': { 'node': 'str', 'name': 'str', 'persistent': 'bool' } }
+
+##
 # @block_set_io_throttle:
 #
 # Change I/O throttle limits for a block drive.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index bd4428e..e37cf09 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1458,6 +1458,37 @@ Example:
 EQMP
 
     {
+        .name       = "block-dirty-bitmap-set-persistent",
+        .args_type  = "node:B,name:s,persistent:b",
+        .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_set_persistent,
+    },
+
+SQMP
+
+block-dirty-bitmap-set-persistent
+---------------------------------
+Since 2.6
+
+Update the persistent state of a dirty bitmap. Format driver support is
+required.
+
+Arguments:
+
+- "node": device/node on which to update the dirty bitmap (json-string)
+- "name": name of the dirty bitmap to update (json-string)
+- "persistent": the state to update to. (json-bool)
+
+Example:
+
+-> { "execute": "block-dirty-bitmap-set-persistent",
+                "arguments": { "node": "drive0",
+                               "name": "bitmap0",
+                               "persistent": true } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "blockdev-snapshot-sync",
         .args_type  = "device:s?,node-name:s?,snapshot-file:s,snapshot-node-name:s?,format:s?,mode:s?",
         .mhandler.cmd_new = qmp_marshal_blockdev_snapshot_sync,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 10/16] qbm: Implement format driver
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (8 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-02-17 13:30   ` Vladimir Sementsov-Ogievskiy
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 11/16] qapi: Add "qbm" as a generic cow " Fam Zheng
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/Makefile.objs |    1 +
 block/qbm.c         | 1315 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1316 insertions(+)
 create mode 100644 block/qbm.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index cdd8655..1111ba7 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -5,6 +5,7 @@ block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
 block-obj-y += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
+block-obj-y += qbm.o
 block-obj-y += block-backend.o snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/qbm.c b/block/qbm.c
new file mode 100644
index 0000000..91e129f
--- /dev/null
+++ b/block/qbm.c
@@ -0,0 +1,1315 @@
+/*
+ * Block driver for the QBM format
+ *
+ * Copyright (c) 2016 Red Hat Inc.
+ *
+ * Authors:
+ *     Fam Zheng <famz@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "migration/migration.h"
+#include "qapi/qmp/qint.h"
+#include "qapi/qmp/qjson.h"
+
+#define QBM_BUF_SIZE_MAX (32 << 20)
+
+typedef enum QBMBitmapType {
+    QBM_TYPE_DIRTY,
+    QBM_TYPE_ALLOC,
+} QBMBitmapType;
+
+typedef struct QBMBitmap {
+    BdrvDirtyBitmap *bitmap;
+    BdrvChild *file;
+    char *name;
+    QBMBitmapType type;
+} QBMBitmap;
+
+typedef struct BDRVQBMState {
+    BdrvChild *image;
+    BdrvDirtyBitmap *alloc_bitmap;
+    QDict *desc;
+    QDict *backing_dict;
+    QBMBitmap *bitmaps;
+    int num_bitmaps;
+} BDRVQBMState;
+
+static const char *qbm_token_consume(const char *p, const char *token)
+{
+    size_t len = strlen(token);
+
+    if (!p) {
+        return NULL;
+    }
+    while (*p && (*p == ' ' ||
+                  *p == '\t' ||
+                  *p == '\n' ||
+                  *p == '\r')) {
+        p++;
+    }
+    if (strncmp(p, token, len)) {
+        return p + len;
+    }
+    return NULL;
+}
+
+static int qbm_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const char *p;
+    p = strstr((const char *)buf, "\"QBM\"");
+    if (!p) {
+        p = strstr((const char *)buf, "'QBM'");
+    }
+    if (!p) {
+        return 0;
+    }
+    p = qbm_token_consume(p, ":");
+    p = qbm_token_consume(p, "{");
+    if (p && *p) {
+        return 100;
+    }
+    return 0;
+}
+
+static void qbm_load_bitmap(BlockDriverState *bs, QBMBitmap *bm, Error **errp)
+{
+    int r;
+    BDRVQBMState *s = bs->opaque;
+    int64_t bitmap_file_size;
+    int64_t bitmap_size;
+    uint8_t *buf = NULL;
+    BlockDriverState *file = bm->file->bs;
+    int64_t image_size = bdrv_getlength(s->image->bs);
+
+    if (image_size < 0) {
+        error_setg(errp, "Cannot get image size: %s", s->image->bs->filename);
+        return;
+    }
+    bitmap_size = bdrv_dirty_bitmap_serialization_size(bm->bitmap, 0,
+                        bdrv_dirty_bitmap_size(bm->bitmap));
+    if (bitmap_size > QBM_BUF_SIZE_MAX) {
+        error_setg(errp, "Bitmap too big");
+        return;
+    }
+    bitmap_file_size = bdrv_getlength(file);
+    if (bitmap_file_size < bitmap_size) {
+        error_setg(errp,
+                   "Bitmap \"%s\" file too small "
+                   "(expecting at least %ld bytes but got %ld bytes): %s",
+                   bm->name, bitmap_size, bitmap_file_size, file->filename);
+        goto out;
+    }
+    buf = qemu_blockalign(file, bitmap_size);
+    r = bdrv_pread(file, 0, buf, bitmap_size);
+    if (r < 0) {
+        error_setg(errp, "Failed to read bitmap file \"%s\"",
+                   file->filename);
+        goto out;
+    }
+    bdrv_dirty_bitmap_deserialize_part(bm->bitmap, buf, 0, bs->total_sectors,
+                                       true);
+
+out:
+    g_free(buf);
+}
+
+static int qbm_reopen_prepare(BDRVReopenState *state,
+                              BlockReopenQueue *queue, Error **errp)
+{
+    return 0;
+}
+
+static void qbm_get_fullname(BlockDriverState *bs, char *dest, size_t sz,
+                             const char *filename)
+{
+    const char *base, *p;
+
+    base = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+
+    if (strstart(base, "json:", NULL)) {
+        /* There is not much we can do with a json: file name, try bs->file and
+         * cross our fingers. */
+        if (bs->file) {
+            qbm_get_fullname(bs->file->bs, dest, sz, filename);
+        } else {
+            pstrcpy(dest, sz, filename);
+        }
+        return;
+    }
+
+    p = strrchr(base, '/');
+
+    assert(sz > 0);
+    if (path_has_protocol(filename) || path_is_absolute(filename)) {
+        pstrcpy(dest, sz, filename);
+        return;
+    }
+
+    if (p) {
+        pstrcpy(dest, MIN(sz, p - base + 2), base);
+    } else {
+        dest[0] = '\0';
+    }
+    pstrcat(dest, sz, filename);
+}
+
+static BdrvChild *qbm_open_image(BlockDriverState *bs,
+                                 QDict *image, QDict *options,
+                                 Error **errp)
+{
+    BdrvChild *child;
+    const char *filename = qdict_get_try_str(image, "file");
+    const char *fmt = qdict_get_try_str(image, "format");
+    const char *checksum = qdict_get_try_str(image, "checksum");
+    char fullname[PATH_MAX];
+
+    if (!filename) {
+        error_setg(errp, "Image missing 'file' field");
+        return NULL;
+    }
+    if (!fmt) {
+        error_setg(errp, "Image missing 'format' field");
+        return NULL;
+    }
+    qbm_get_fullname(bs, fullname, sizeof(fullname), filename);
+    qdict_put(options, "image.driver", qstring_from_str(fmt));
+    child = bdrv_open_child(fullname, options, "image", bs, &child_file, false,
+                            errp);
+    if (!child) {
+        goto out;
+    }
+    if (checksum) {
+        /* TODO: compare checksum when we support this */
+        error_setg(errp, "Checksum not supported");
+    }
+out:
+    return child;
+}
+
+/* Open and load the persistent bitmap and return the created QBMBitmap object.
+ * If reuse_bitmap is not NULL, we skip bdrv_create_dirty_bitmap and reuse it.
+ **/
+static QBMBitmap *qbm_open_bitmap(BlockDriverState *bs,
+                                  const char *name,
+                                  const char *filename, int granularity,
+                                  QBMBitmapType type,
+                                  BdrvDirtyBitmap *reuse_bitmap,
+                                  Error **errp)
+{
+    BDRVQBMState *s = bs->opaque;
+    QBMBitmap *bm;
+    BdrvChild *file;
+    BdrvDirtyBitmap *bdrv_bitmap;
+    char *key;
+    QDict *options;
+    char fullname[PATH_MAX];
+    Error *local_err = NULL;
+
+    qbm_get_fullname(bs, fullname, sizeof(fullname), filename);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+    s->bitmaps = g_realloc_n(s->bitmaps, s->num_bitmaps + 1,
+                             sizeof(QBMBitmap));
+
+    /* Create options for the bitmap child BDS */
+    options = qdict_new();
+    key = g_strdup_printf("bitmap-%s.driver", name);
+    qdict_put(options, key, qstring_from_str("raw"));
+    g_free(key);
+
+    /* Open the child as plain "file" */
+    key = g_strdup_printf("bitmap-%s", name);
+    file = bdrv_open_child(fullname, options, key, bs, &child_file, false,
+                           errp);
+    g_free(key);
+    QDECREF(options);
+    if (!file) {
+        return NULL;
+    }
+
+    if (reuse_bitmap) {
+        bdrv_bitmap = reuse_bitmap;
+    } else {
+        bdrv_bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+        if (!bdrv_bitmap) {
+            bdrv_unref_child(bs, file);
+            return NULL;
+        }
+        bdrv_dirty_bitmap_set_persistent(bs, bdrv_bitmap, true, true, NULL);
+    }
+    bdrv_create_meta_dirty_bitmap(bdrv_bitmap, BDRV_SECTOR_SIZE);
+
+    bm = &s->bitmaps[s->num_bitmaps++];
+    bm->file = file;
+    bm->name = g_strdup(name);
+    bm->type = type;
+    bm->bitmap = bdrv_bitmap;
+    if (type == QBM_TYPE_ALLOC) {
+        assert(!s->alloc_bitmap);
+        s->alloc_bitmap = bdrv_bitmap;
+        /* Align the request to granularity so the block layer will take care
+         * of RMW for partial writes. */
+        bs->request_alignment = granularity;
+    }
+    return bm;
+}
+
+typedef struct QBMIterState {
+    QDict *options;
+    BlockDriverState *bs;
+    Error *err;
+    bool has_backing;
+} QBMIterState;
+
+static void qbm_bitmap_iter(const char *key, QObject *obj, void *opaque)
+{
+    QDict *dict;
+    const char *filename, *typename;
+    QBMBitmapType type;
+    int granularity;
+    QBMIterState *state = opaque;
+    BDRVQBMState *s = state->bs->opaque;
+    QBMBitmap *bm;
+
+    if (state->err) {
+        return;
+    }
+    dict = qobject_to_qdict(obj);
+    if (!dict) {
+        error_setg(&state->err, "'%s' is not a dicionary", key);
+        return;
+    }
+    filename = qdict_get_try_str(dict, "file");
+    if (!filename) {
+        error_setg(&state->err, "\"file\" is missing in bitmap \"%s\"", key);
+        return;
+    }
+    typename = qdict_get_try_str(dict, "type");
+    if (!typename) {
+        error_setg(&state->err, "\"value\" is missing in bitmap \"%s\"", key);
+        return;
+    } else if (!strcmp(typename, "dirty")) {
+        type = QBM_TYPE_DIRTY;
+    } else if (!strcmp(typename, "allocation")) {
+        QDict *backing_dict = qdict_get_qdict(dict, "backing");
+        type = QBM_TYPE_ALLOC;
+        if (backing_dict) {
+            if (state->has_backing) {
+                error_setg(&state->err, "Multiple backing is not supported");
+                return;
+            }
+            state->has_backing = true;
+            pstrcpy(state->bs->backing_file, PATH_MAX,
+                    qdict_get_try_str(backing_dict, "file"));
+            if (qdict_haskey(backing_dict, "format")) {
+                pstrcpy(state->bs->backing_format,
+                        sizeof(state->bs->backing_format),
+                        qdict_get_try_str(backing_dict, "format"));
+                }
+            s->backing_dict = backing_dict;
+            if (!strlen(state->bs->backing_file)) {
+                error_setg(&state->err, "Backing file name not specified");
+                return;
+            }
+        }
+    } else {
+        error_setg(&state->err, "\"value\" is missing in bitmap \"%s\"", key);
+        return;
+    }
+    granularity = qdict_get_try_int(dict, "granularity-bytes", -1);
+    if (granularity == -1) {
+        error_setg(&state->err, "\"granularity\" is missing in bitmap \"%s\"",
+                   key);
+        return;
+    } else if (granularity & (granularity - 1)) {
+        error_setg(&state->err, "\"granularity\" must be power of two");
+        return;
+    } else if (granularity < 512) {
+        error_setg(&state->err, "\"granularity\" too small");
+        return;
+    }
+
+    bm = qbm_open_bitmap(state->bs, key, filename, granularity,
+                         type, NULL, &state->err);
+    if (!bm) {
+        return;
+    }
+    qbm_load_bitmap(state->bs, bm, &state->err);
+}
+
+static void qbm_release_bitmap(BlockDriverState *bs, QBMBitmap *bm)
+{
+    bdrv_release_meta_dirty_bitmap(bm->bitmap);
+    bdrv_release_dirty_bitmap(bs, bm->bitmap);
+    bdrv_unref_child(bs, bm->file);
+}
+
+static void qbm_release_bitmaps(BlockDriverState *bs)
+{
+    int i;
+    BDRVQBMState *s = bs->opaque;
+
+    for (i = 0; i < s->num_bitmaps; i++) {
+        QBMBitmap *bm = &s->bitmaps[i];
+        bdrv_flush(bm->file->bs);
+        qbm_release_bitmap(bs, bm);
+        g_free(bm->name);
+    }
+}
+
+static int qbm_open_bitmaps(BlockDriverState *bs, QDict *bitmaps,
+                            QDict *options, Error **errp)
+{
+    QBMIterState state = (QBMIterState) {
+        .bs = bs,
+        .options = options,
+    };
+    qdict_iter(bitmaps, qbm_bitmap_iter, &state);
+    if (state.err) {
+        qbm_release_bitmaps(bs);
+        error_propagate(errp, state.err);
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static int qbm_open(BlockDriverState *bs, QDict *options, int flags,
+                    Error **errp)
+{
+    BDRVQBMState *s = bs->opaque;
+    int ret;
+    int64_t len;
+    char *desc;
+    QDict *dict, *image_dict, *bitmaps;
+
+    len = bdrv_getlength(bs->file->bs);
+    if (len > QBM_BUF_SIZE_MAX) {
+        error_setg(errp, "QBM description file too big.");
+        return -ENOMEM;
+    } else if (len < 0) {
+        error_setg(errp, "Failed to get descriptor file size");
+        return len;
+    } else if (!len) {
+        error_setg(errp, "Empty file");
+        return -EINVAL;
+    }
+
+    desc = qemu_blockalign(bs->file->bs, len);
+    ret = bdrv_pread(bs->file->bs, 0, desc, len);
+    if (ret < 0) {
+        goto out;
+    }
+    dict = qobject_to_qdict(qobject_from_json(desc));
+    if (!dict || !qdict_haskey(dict, "QBM")) {
+        error_setg(errp, "Failed to parse json from file");
+        ret = -EINVAL;
+        goto out;
+    }
+    s->desc = qdict_get_qdict(dict, "QBM");
+    if (!s->desc) {
+        error_setg(errp, "Json doesn't have key \"QBM\"");
+        ret = -EINVAL;
+        goto out;
+    }
+    if (qdict_get_try_int(s->desc, "version", -1) != 1) {
+        error_setg(errp, "Invalid version of json file");
+        ret = -EINVAL;
+        goto out;
+    }
+    if (!qdict_haskey(s->desc, "image")) {
+        error_setg(errp, "Key \"image\" not found in json file");
+        ret = -EINVAL;
+        goto out;
+    }
+    image_dict = qdict_get_qdict(s->desc, "image");
+    if (!image_dict) {
+        error_setg(errp, "\"image\" information invalid");
+        ret = -EINVAL;
+        goto out;
+    }
+
+    s->image = qbm_open_image(bs, image_dict, options, errp);
+    if (!s->image) {
+        ret = -EIO;
+        goto out;
+    }
+    bs->total_sectors = bdrv_nb_sectors(s->image->bs);
+    if (bs->total_sectors < 0) {
+        error_setg(errp, "Failed to get image size");
+        ret = -EINVAL;
+        goto out;
+    }
+
+    bitmaps = qdict_get_qdict(s->desc, "bitmaps");
+    if (!bitmaps) {
+        error_setg(errp, "\"bitmaps\" not found");
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = qbm_open_bitmaps(bs, bitmaps, options, errp);
+
+out:
+    g_free(desc);
+    return ret;
+}
+
+
+static void qbm_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    BDRVQBMState *s = bs->opaque;
+    Error *local_err = NULL;
+
+    bdrv_refresh_limits(s->image->bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    bs->bl.min_mem_alignment = s->image->bs->bl.min_mem_alignment;
+    bs->bl.opt_mem_alignment = s->image->bs->bl.opt_mem_alignment;
+    bs->bl.write_zeroes_alignment = bdrv_get_cluster_size(bs);
+}
+
+static int64_t coroutine_fn qbm_co_get_block_status(BlockDriverState *bs,
+                                                    int64_t sector_num,
+                                                    int nb_sectors,
+                                                    int *pnum,
+                                                    BlockDriverState **file)
+{
+    bool alloc = true;
+    int64_t next;
+    int cluster_sectors;
+    BDRVQBMState *s = bs->opaque;
+    int64_t ret = BDRV_BLOCK_OFFSET_VALID;
+
+    if (!s->alloc_bitmap) {
+        return bdrv_get_block_status(s->image->bs, sector_num, nb_sectors,
+                                     pnum, file);
+    }
+
+    ret |= BDRV_BLOCK_OFFSET_MASK & (sector_num << BDRV_SECTOR_BITS);
+    next = sector_num;
+    cluster_sectors = bdrv_dirty_bitmap_granularity(s->alloc_bitmap)
+                            >> BDRV_SECTOR_BITS;
+    while (next < sector_num + nb_sectors) {
+        if (next == sector_num) {
+            alloc = bdrv_get_dirty(bs, s->alloc_bitmap, next);
+        } else if (bdrv_get_dirty(bs, s->alloc_bitmap, next) != alloc) {
+            break;
+        }
+        next += cluster_sectors - next % cluster_sectors;
+    }
+    *pnum = MIN(next - sector_num, nb_sectors);
+
+    ret |= alloc ? BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED : 0;
+    *file = alloc ? s->image->bs : NULL;
+
+    return ret;
+}
+
+static int qbm_save_desc(BlockDriverState *desc_file, QDict *desc)
+{
+    int ret;
+    const char *str;
+    size_t len;
+    QString *json_str = NULL;
+    QDict *td;
+
+    ret = bdrv_truncate(desc_file, 0);
+    if (ret) {
+        return ret;
+    }
+
+    td = qdict_new();
+    /* Grab an extra reference so it doesn't get freed with td */
+    QINCREF(desc);
+    qdict_put(td, "QBM", desc);
+
+    json_str = qobject_to_json_pretty(QOBJECT(td));
+    str = qstring_get_str(json_str);
+    len = strlen(str);
+    ret = bdrv_pwrite(desc_file, 0, str, len);
+    /* End the json file with a new line, doesn't hurt if it fails. */
+    bdrv_pwrite(desc_file, len, "\n", 1);
+    /* bdrv_pwrite write padding zeros to align to sector, we don't need that
+     * for a text file */
+    bdrv_truncate(desc_file, len + 1);
+    QDECREF(json_str);
+    QDECREF(td);
+    return ret == len ? 0 : -EIO;
+}
+
+static coroutine_fn int qbm_co_readv(BlockDriverState *bs, int64_t sector_num,
+                                     int nb_sectors, QEMUIOVector *qiov)
+{
+    QEMUIOVector local_qiov;
+    BDRVQBMState *s = bs->opaque;
+    BdrvDirtyBitmapIter *iter;
+    int done_sectors = 0;
+    int ret;
+    int64_t next_allocated;
+    int64_t cur_sector = sector_num;
+    int granularity_sectors;
+
+    if (!s->alloc_bitmap) {
+        return bdrv_co_readv(s->image->bs, sector_num, nb_sectors, qiov);
+    }
+    granularity_sectors = bdrv_dirty_bitmap_granularity(s->alloc_bitmap)
+                            >> BDRV_SECTOR_BITS;
+    iter = bdrv_dirty_iter_new(s->alloc_bitmap, sector_num);
+    qemu_iovec_init(&local_qiov, qiov->niov);
+    do {
+        int64_t n;
+        int64_t consective_end;
+        next_allocated = bdrv_dirty_iter_next(iter);
+        if (next_allocated < 0) {
+            next_allocated = sector_num + nb_sectors;
+        } else {
+            next_allocated = MIN(next_allocated, sector_num + nb_sectors);
+        }
+        if (next_allocated > cur_sector) {
+            /* Read [cur_sector, next_allocated) from backing */
+            n = next_allocated - cur_sector;
+            qemu_iovec_reset(&local_qiov);
+            qemu_iovec_concat(&local_qiov, qiov,
+                              done_sectors << BDRV_SECTOR_BITS,
+                              n << BDRV_SECTOR_BITS);
+            ret = bdrv_co_readv(bs->backing->bs, cur_sector, n, &local_qiov);
+            if (ret) {
+                goto out;
+            }
+            done_sectors += n;
+            cur_sector += n;
+            if (done_sectors == nb_sectors) {
+                break;
+            }
+        }
+        consective_end = next_allocated;
+        /* Find consective allocated sectors */
+        while (consective_end < sector_num + nb_sectors) {
+            int64_t next = bdrv_dirty_iter_next(iter);
+            if (next < 0 || next - consective_end > granularity_sectors) {
+                /* No more consective sectors */
+                consective_end += granularity_sectors
+                                  - consective_end % granularity_sectors;
+                break;
+            }
+            consective_end = next;
+        }
+        consective_end = MIN(consective_end, sector_num + nb_sectors);
+        n = consective_end - cur_sector;
+        assert(n > 0);
+        /* Read [cur_sector, consective_end] from image */
+        qemu_iovec_reset(&local_qiov);
+        qemu_iovec_concat(&local_qiov, qiov,
+                          done_sectors << BDRV_SECTOR_BITS,
+                          n << BDRV_SECTOR_BITS);
+        ret = bdrv_co_readv(s->image->bs, cur_sector, n, &local_qiov);
+        if (ret) {
+            goto out;
+        }
+        done_sectors += n;
+        cur_sector += n;
+    } while (done_sectors < nb_sectors);
+out:
+    qemu_iovec_destroy(&local_qiov);
+    bdrv_dirty_iter_free(iter);
+    return ret;
+}
+
+static inline void qbm_check_alignment(BDRVQBMState *s, int64_t sector_num,
+                                       int nb_sectors)
+{
+    if (s->alloc_bitmap) {
+        int cluster_sectors = bdrv_dirty_bitmap_granularity(s->alloc_bitmap)
+                                >> BDRV_SECTOR_BITS;
+        assert(sector_num % cluster_sectors == 0);
+        assert(nb_sectors % cluster_sectors == 0);
+    }
+}
+
+typedef struct {
+    int inflight;
+    Coroutine *co;
+    int ret;
+} QBMBitmapWriteTracker;
+
+typedef struct {
+    QEMUIOVector qiov;
+    uint8_t *buf;
+    QBMBitmapWriteTracker *tracker;
+    BlockDriverState *bs;
+    QBMBitmap *bitmap;
+    int64_t sector_num;
+    int nb_sectors;
+} QBMBitmapWriteData;
+
+static void qbm_write_bitmap_cb(void *opaque, int ret)
+{
+    QBMBitmapWriteData *data = opaque;
+    QBMBitmapWriteTracker *tracker = data->tracker;
+
+    qemu_iovec_destroy(&data->qiov);
+    qemu_vfree(data->buf);
+    if (!ret) {
+        bdrv_dirty_bitmap_reset_meta(data->bs,
+                                     data->bitmap->bitmap,
+                                     data->sector_num, data->nb_sectors);
+    }
+    g_free(data);
+    tracker->ret = tracker->ret ? : ret;
+    if (!--tracker->inflight) {
+        qemu_coroutine_enter(tracker->co, NULL);
+    }
+}
+
+static int qbm_write_bitmap(BlockDriverState *bs, QBMBitmap *bm,
+                            int64_t sector_num, int nb_sectors,
+                            QBMBitmapWriteTracker *tracker)
+{
+    QBMBitmapWriteData *data;
+    int64_t start, end;
+    int64_t file_sector_num;
+    int file_nb_sectors;
+    size_t buf_size;
+    /* Each bit in the bitmap tracks bdrv_dirty_bitmap_granularity(bm->bitmap)
+     * bytes of guest data, so each sector in the bitmap tracks
+     * (bdrv_dirty_bitmap_granularity(bm->bitmap) * BDRV_SECTOR_SIZE *
+     * BITS_PER_BYTE) bytes of guest data, so in sector unit is: */
+    int64_t sectors_per_bitmap_sector =
+        BITS_PER_BYTE * bdrv_dirty_bitmap_granularity(bm->bitmap);
+    int align = MAX(bdrv_dirty_bitmap_serialization_align(bm->bitmap),
+                    sectors_per_bitmap_sector);
+
+    /* The start sector that is being marked dirty. */
+    start = QEMU_ALIGN_DOWN(sector_num, align);
+    /* The end sector that is being marked dirty. */
+    end = MIN(QEMU_ALIGN_UP(sector_num + nb_sectors, align),
+              bs->total_sectors);
+
+    if (!bdrv_dirty_bitmap_get_meta(bs, bm->bitmap, sector_num, nb_sectors)) {
+        return 0;
+    }
+
+    file_sector_num = start / sectors_per_bitmap_sector;
+    buf_size = bdrv_dirty_bitmap_serialization_size(bm->bitmap, start,
+                                                    end - start);
+    buf_size = QEMU_ALIGN_UP(buf_size, BDRV_SECTOR_SIZE);
+    file_nb_sectors = buf_size >> BDRV_SECTOR_BITS;
+
+    data = g_new(QBMBitmapWriteData, 1);
+    data->buf = qemu_blockalign0(bm->file->bs, buf_size);
+    bdrv_dirty_bitmap_serialize_part(bm->bitmap, data->buf, start,
+                                     end - start);
+    qemu_iovec_init(&data->qiov, 1);
+    qemu_iovec_add(&data->qiov, data->buf, buf_size);
+    data->tracker = tracker;
+    data->sector_num = start;
+    data->nb_sectors = end - start;
+    data->bs = bm->file->bs;
+    data->bitmap = bm;
+    bdrv_aio_writev(bm->file->bs, file_sector_num, &data->qiov,
+                    file_nb_sectors, qbm_write_bitmap_cb,
+                    data);
+    return -EINPROGRESS;
+}
+
+static int qbm_write_bitmaps(BlockDriverState *bs, int64_t sector_num,
+                             int nb_sectors)
+{
+    int i;
+    BDRVQBMState *s = bs->opaque;
+    QBMBitmapWriteTracker tracker = (QBMBitmapWriteTracker) {
+        .inflight = 1, /* So that no aio completion will call
+                          qemu_coroutine_enter before we yield. */
+        .co = qemu_coroutine_self(),
+    };
+
+    for (i = 0; i < s->num_bitmaps; i++) {
+        int ret = qbm_write_bitmap(bs, &s->bitmaps[i],
+                                   sector_num, nb_sectors, &tracker);
+        if (ret == -EINPROGRESS) {
+            tracker.inflight++;
+        } else if (ret < 0) {
+            tracker.ret = ret;
+            break;
+        }
+    }
+    tracker.inflight--;
+    if (tracker.inflight) {
+        /* At least one aio in submitted, wait. */
+        qemu_coroutine_yield();
+    }
+    return tracker.ret;
+}
+
+static coroutine_fn int qbm_co_writev(BlockDriverState *bs, int64_t sector_num,
+                                      int nb_sectors, QEMUIOVector *qiov)
+{
+    int ret;
+    BDRVQBMState *s = bs->opaque;
+
+    qbm_check_alignment(s, sector_num, nb_sectors);
+    ret = bdrv_co_writev(s->image->bs, sector_num, nb_sectors, qiov);
+    if (ret) {
+        return ret;
+    }
+    return qbm_write_bitmaps(bs, sector_num, nb_sectors);
+}
+
+static int coroutine_fn qbm_co_write_zeroes(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            int nb_sectors,
+                                            BdrvRequestFlags flags)
+{
+    int ret;
+    BDRVQBMState *s = bs->opaque;
+
+    qbm_check_alignment(s, sector_num, nb_sectors);
+    ret = bdrv_co_write_zeroes(s->image->bs, sector_num, nb_sectors, flags);
+    if (ret) {
+        return ret;
+    }
+    return qbm_write_bitmaps(bs, sector_num, nb_sectors);
+}
+
+static coroutine_fn int qbm_co_discard(BlockDriverState *bs,
+                                       int64_t sector_num,
+                                       int nb_sectors)
+{
+    int ret;
+    BDRVQBMState *s = bs->opaque;
+
+    ret = bdrv_co_discard(s->image->bs, sector_num, nb_sectors);
+    if (ret) {
+        return ret;
+    }
+    return qbm_write_bitmaps(bs, sector_num, nb_sectors);
+}
+
+static int qbm_make_empty(BlockDriverState *bs)
+{
+    BDRVQBMState *s = bs->opaque;
+    BlockDriverState *image_bs = s->image->bs;
+    int ret = 0;
+
+    if (image_bs->drv->bdrv_make_empty) {
+        ret = image_bs->drv->bdrv_make_empty(s->image->bs);
+        if (ret) {
+            return ret;
+        }
+    } else if (!s->alloc_bitmap) {
+        return -ENOTSUP;
+    }
+    if (s->alloc_bitmap) {
+        int i;
+        bdrv_clear_dirty_bitmap(s->alloc_bitmap, NULL);
+        for (i = 0; i < s->num_bitmaps; i++) {
+            QBMBitmap *bm = &s->bitmaps[i];
+            if (bm->bitmap != s->alloc_bitmap) {
+                continue;
+            }
+            ret = bdrv_write_zeroes(bm->file->bs, 0,
+                                    DIV_ROUND_UP(bdrv_getlength(bm->file->bs),
+                                                 BDRV_SECTOR_SIZE),
+                                    BDRV_REQ_MAY_UNMAP);
+        }
+    }
+    return ret;
+}
+
+/* Create a file with given size, and return the relative path. */
+static char *qbm_create_file(BlockDriverState *bs, const char *name,
+                             const char *ext,
+                             int64_t size, Error **errp)
+{
+    char *filename = NULL;
+    Error *local_err = NULL;
+    char fullname[PATH_MAX];
+    char path[PATH_MAX];
+    char prefix[PATH_MAX];
+    char postfix[PATH_MAX];
+
+    filename_decompose(bs->filename, path, prefix,
+                       postfix, PATH_MAX, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+    filename = g_strdup_printf("%s-%s.%s%s", prefix, name, ext, postfix);
+    qbm_get_fullname(bs, fullname, sizeof(fullname), filename);
+
+    bdrv_img_create(fullname, "raw", NULL, NULL, NULL, size, 0,
+                    &local_err, true);
+    if (local_err) {
+        g_free(filename);
+        filename = NULL;
+        error_propagate(errp, local_err);
+    }
+    return filename;
+}
+
+static QDict *qbm_create_image_dict(BlockDriverState *bs,
+                                    const char *image_name,
+                                    const char *format,
+                                    Error **errp)
+{
+    QDict *dict;
+    char fullname[PATH_MAX];
+
+    qbm_get_fullname(bs, fullname, sizeof(fullname), image_name);
+    dict = qdict_new();
+    qdict_put(dict, "file", qstring_from_str(image_name));
+    qdict_put(dict, "format", qstring_from_str(format ? : ""));
+    /* TODO: Set checksum when we support it. */
+
+    return dict;
+}
+
+static inline QDict *qbm_make_bitmap_dict(const char *filename,
+                                          int granularity,
+                                          QBMBitmapType type)
+{
+    QDict *d = qdict_new();
+    qdict_put(d, "file", qstring_from_str(filename));
+    qdict_put(d, "granularity-bytes", qint_from_int(granularity));
+    switch (type) {
+    case QBM_TYPE_DIRTY:
+        qdict_put(d, "type", qstring_from_str("dirty"));
+        break;
+    case QBM_TYPE_ALLOC:
+        qdict_put(d, "type", qstring_from_str("allocation"));
+        break;
+    default:
+        abort();
+    }
+    return d;
+}
+
+static QDict *qbm_create_dirty_bitmaps(BlockDriverState *bs,
+                                       uint64_t image_size,
+                                       int granularity,
+                                       int n, Error **errp)
+{
+    int i;
+    QDict *dict = qdict_new();
+    int64_t bitmap_size = DIV_ROUND_UP(image_size, granularity * BITS_PER_BYTE);
+
+    for (i = 0; i < n; i++) {
+        char *bitmap_filename;
+        char *key = g_strdup_printf("dirty.%d", i);
+
+        bitmap_filename = qbm_create_file(bs, key, "bitmap", bitmap_size,
+                                          errp);
+        if (!bitmap_filename) {
+            g_free(key);
+            QDECREF(dict);
+            dict = NULL;
+            goto out;
+        }
+        qdict_put(dict, key,
+                  qbm_make_bitmap_dict(bitmap_filename, granularity,
+                                       QBM_TYPE_DIRTY));
+        g_free(key);
+    }
+out:
+    return dict;
+}
+
+static QDict *qbm_create_allocation(BlockDriverState *bs,
+                                    uint64_t image_size,
+                                    int granularity,
+                                    const char *backing_file,
+                                    const char *format,
+                                    Error **errp)
+{
+    char *bitmap_filename;
+    QDict *ret, *backing;
+    int64_t bitmap_size = DIV_ROUND_UP(image_size, granularity * BITS_PER_BYTE);
+
+    bitmap_filename = qbm_create_file(bs, "allocation", "bitmap",
+                                      bitmap_size,
+                                      errp);
+    if (!bitmap_filename) {
+        return NULL;
+    }
+
+    ret = qdict_new();
+
+    qdict_put(ret, "file", qstring_from_str(bitmap_filename));
+    if (format) {
+        qdict_put(ret, "format", qstring_from_str(format));
+    }
+    qdict_put(ret, "type", qstring_from_str("allocation"));
+    qdict_put(ret, "granularity-bytes", qint_from_int(granularity));
+
+    backing = qbm_create_image_dict(bs, backing_file, format, errp);
+    if (!backing) {
+        QDECREF(ret);
+        ret = NULL;
+        goto out;
+    }
+    qdict_put(ret, "backing", backing);
+
+out:
+    g_free(bitmap_filename);
+    return ret;
+}
+
+static int qbm_create(const char *filename, QemuOpts *opts, Error **errp)
+{
+    char *backing_file;
+    const char *image_filename;
+    int granularity, dirty_bitmaps;
+    int64_t image_size;
+    int ret;
+    QDict *dict = NULL, *bitmaps, *image;
+    BlockDriverState *bs = NULL, *image_bs = NULL;
+    char fullname[PATH_MAX];
+
+    ret = bdrv_create_file(filename, NULL, errp);
+    if (ret) {
+        return ret;
+    }
+    ret = bdrv_open(&bs, filename, NULL, NULL,
+                    BDRV_O_RDWR | BDRV_O_PROTOCOL, errp);
+    if (ret) {
+        return ret;
+    }
+
+    image_filename = qemu_opt_get_del(opts, "image");
+    if (!image_filename) {
+        /* Try to create one */
+        int64_t size = qemu_opt_get_size_del(opts, "size", -1);
+        if (size == -1) {
+            error_setg(errp, "Invalid size specified for data image");
+            ret = -EINVAL;
+            goto out;
+        }
+        image_filename = qbm_create_file(bs, "data", "img", size, errp);
+        if (!image_filename) {
+            ret = -EIO;
+            goto out;
+        }
+    }
+
+    granularity = qemu_opt_get_number(opts, "granularity", 65536);
+    dirty_bitmaps = qemu_opt_get_number(opts, "dirty-bitmaps", 0);
+
+    qbm_get_fullname(bs, fullname, sizeof(fullname), image_filename);
+    ret = bdrv_open(&image_bs, fullname, NULL, NULL, 0, errp);
+    if (ret) {
+        goto out;
+    }
+    image_size = bdrv_getlength(image_bs);
+
+    dict = qdict_new();
+    bitmaps = qbm_create_dirty_bitmaps(bs, image_size, granularity,
+                                       dirty_bitmaps, errp);
+    image = qbm_create_image_dict(bs, image_filename,
+                                  bdrv_get_format_name(image_bs), errp);
+    if (!image) {
+        goto out;
+    }
+
+    qdict_put(dict, "version", qint_from_int(1));
+    qdict_put(dict, "creator", qstring_from_str("QEMU"));
+    qdict_put(dict, "bitmaps", bitmaps);
+    qdict_put(dict, "image", image);
+
+    backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
+    if (backing_file) {
+        char *backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
+        QDict *alloc = qbm_create_allocation(bs, image_size,
+                                             granularity, backing_file,
+                                             backing_fmt, errp);
+        if (!alloc) {
+            ret = -EIO;
+            goto out;
+        }
+        /* Create "allocation" bitmap. */
+        qdict_put(bitmaps, "allocation", alloc);
+        g_free(backing_file);
+        backing_file = NULL;
+        g_free(backing_fmt);
+    }
+
+    ret = qbm_save_desc(bs, dict);
+
+out:
+    bdrv_unref(image_bs);
+    bdrv_unref(bs);
+    QDECREF(dict);
+    return ret;
+}
+
+static int64_t qbm_getlength(BlockDriverState *bs)
+{
+    BDRVQBMState *s = bs->opaque;
+    return bdrv_getlength(s->image->bs);
+}
+
+static void qbm_close(BlockDriverState *bs)
+{
+    BDRVQBMState *s = bs->opaque;
+
+    qbm_release_bitmaps(bs);
+    bdrv_unref(s->image->bs);
+    g_free(s->bitmaps);
+    QDECREF(s->desc);
+}
+
+static int qbm_truncate(BlockDriverState *bs, int64_t offset)
+{
+    BDRVQBMState *s = bs->opaque;
+    /* Truncate the image only, the bitmaps's sizes will be made correct when
+     * saving. */
+    return bdrv_truncate(s->image->bs, offset);
+}
+
+static coroutine_fn int qbm_co_flush(BlockDriverState *bs)
+{
+    int ret;
+    int i;
+    BDRVQBMState *s = bs->opaque;
+
+    ret = bdrv_flush(s->image->bs);
+    for (i = 0; ret >= 0 && i < s->num_bitmaps; i++) {
+        ret = bdrv_flush(s->bitmaps[i].file->bs);
+    }
+    return ret;
+}
+
+static int qbm_change_backing_file(BlockDriverState *bs,
+                                   const char *backing_file,
+                                   const char *backing_fmt)
+{
+    BDRVQBMState *s = bs->opaque;
+    if (!s->backing_dict) {
+        return -ENOTSUP;
+    }
+    if (backing_file) {
+        qdict_put(s->backing_dict, "file", qstring_from_str(backing_file));
+        qdict_put(s->backing_dict, "format",
+                  qstring_from_str(backing_fmt ? : ""));
+    } else {
+        int i;
+        QDict *bitmaps = qdict_get_qdict(s->desc, "bitmaps");
+
+        assert(bitmaps);
+        if (!qdict_haskey(bitmaps, "allocation")) {
+            return 0;
+        }
+        qdict_del(bitmaps, "allocation");
+        for (i = 0; i < s->num_bitmaps; i++) {
+            if (s->bitmaps[i].type == QBM_TYPE_ALLOC) {
+                qbm_release_bitmap(bs, &s->bitmaps[i]);
+                s->bitmaps[i] = s->bitmaps[--s->num_bitmaps];
+                break;
+            }
+        }
+        s->alloc_bitmap = NULL;
+        s->backing_dict = NULL;
+    }
+    return qbm_save_desc(bs->file->bs, s->desc);
+}
+
+static int64_t qbm_get_allocated_file_size(BlockDriverState *bs)
+{
+    BDRVQBMState *s = bs->opaque;
+    /* Take the file sizes of descriptor and bitmap files into account? */
+    return bdrv_get_allocated_file_size(s->image->bs);
+}
+
+static int qbm_has_zero_init(BlockDriverState *bs)
+{
+    return 1;
+}
+
+static int qbm_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    BDRVQBMState *s = bs->opaque;
+
+    bdi->unallocated_blocks_are_zero = true;
+    bdi->can_write_zeroes_with_unmap = true;
+    if (s->alloc_bitmap) {
+        bdi->cluster_size = bdrv_dirty_bitmap_granularity(s->alloc_bitmap);
+    } else {
+        bdi->cluster_size = bdrv_get_cluster_size(s->image->bs);
+    }
+    return 0;
+}
+
+static int qbm_check(BlockDriverState *bs, BdrvCheckResult *result,
+                     BdrvCheckMode fix)
+{
+    /* TODO: checksum verification and bitmap size checks? */
+    return 0;
+}
+
+static void qbm_detach_aio_context(BlockDriverState *bs)
+{
+    int i;
+    BDRVQBMState *s = bs->opaque;
+
+    bdrv_detach_aio_context(s->image->bs);
+    for (i = 0; i < s->num_bitmaps; i++) {
+        bdrv_detach_aio_context(s->bitmaps[i].file->bs);
+    }
+}
+
+static void qbm_attach_aio_context(BlockDriverState *bs,
+                                   AioContext *new_context)
+{
+    int i;
+    BDRVQBMState *s = bs->opaque;
+
+    bdrv_attach_aio_context(s->image->bs, new_context);
+    for (i = 0; i < s->num_bitmaps; i++) {
+        bdrv_attach_aio_context(s->bitmaps[i].file->bs, new_context);
+    }
+}
+
+static int qbm_bitmap_set_persistent(BlockDriverState *bs,
+                                     BdrvDirtyBitmap *bitmap,
+                                     bool persistent, Error **errp)
+{
+    BDRVQBMState *s = bs->opaque;
+    int ret = 0;
+    QBMBitmap *bm;
+    char *filename;
+    const char *name = bdrv_dirty_bitmap_name(bitmap);
+    int granularity = bdrv_dirty_bitmap_granularity(bitmap);
+    QDict *bitmaps = qdict_get_qdict(s->desc, "bitmaps");
+
+    if (persistent) {
+        filename = qbm_create_file(bs, name, "bin",
+                                   bdrv_dirty_bitmap_size(bitmap), errp);
+        if (!filename) {
+            return -EIO;
+        }
+
+        bm = qbm_open_bitmap(bs, name, filename, granularity,
+                             QBM_TYPE_DIRTY, bitmap, errp);
+        if (!bm) {
+            ret = -EIO;
+        }
+        qdict_put(bitmaps, name, qbm_make_bitmap_dict(filename, granularity,
+                                                      QBM_TYPE_DIRTY));
+        g_free(filename);
+    } else {
+        if (!qdict_haskey(bitmaps, name)) {
+            error_setg(errp, "No persistent bitmap with name '%s'", name);
+            return -ENOENT;
+        }
+        qdict_del(bitmaps, name);
+    }
+    ret = qbm_save_desc(bs->file->bs, s->desc);
+    if (ret) {
+        error_setg(errp, "Failed to save json description to file");
+    }
+    return ret;
+}
+
+static QemuOptsList qbm_create_opts = {
+    .name = "qbm-create-opts",
+    .head = QTAILQ_HEAD_INITIALIZER(qbm_create_opts.head),
+    .desc = {
+        {
+            .name = BLOCK_OPT_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Virtual disk size"
+        },
+        {
+            .name = "image",
+            .type = QEMU_OPT_STRING,
+            .help = "The file name of the referenced image, if not specified, "
+                    "one will be created automatically",
+        },
+        {
+            .name = BLOCK_OPT_BACKING_FILE,
+            .type = QEMU_OPT_STRING,
+            .help = "File name of a base image"
+        },
+        {
+            .name = BLOCK_OPT_BACKING_FMT,
+            .type = QEMU_OPT_STRING,
+            .help = "Image format of the base image"
+        },
+        {
+            .name = "granularity",
+            .type = QEMU_OPT_NUMBER,
+            .help = "Bitmap granularity in bytes"
+        },
+        {
+            .name = "dirty-bitmaps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "The number of dirty bitmaps to create"
+        },
+        { /* end of list */ }
+    }
+};
+
+static BlockDriver bdrv_qbm = {
+    .format_name                  = "qbm",
+    .protocol_name                = "qbm",
+    .instance_size                = sizeof(BDRVQBMState),
+    .bdrv_probe                   = qbm_probe,
+    .bdrv_open                    = qbm_open,
+    .bdrv_reopen_prepare          = qbm_reopen_prepare,
+    .bdrv_co_readv                = qbm_co_readv,
+    .bdrv_co_writev               = qbm_co_writev,
+    .bdrv_co_write_zeroes         = qbm_co_write_zeroes,
+    .bdrv_co_discard              = qbm_co_discard,
+    .bdrv_make_empty              = qbm_make_empty,
+    .bdrv_close                   = qbm_close,
+    .bdrv_getlength               = qbm_getlength,
+    .bdrv_create                  = qbm_create,
+    .bdrv_co_flush_to_disk        = qbm_co_flush,
+    .bdrv_truncate                = qbm_truncate,
+    .bdrv_co_get_block_status     = qbm_co_get_block_status,
+    .bdrv_get_allocated_file_size = qbm_get_allocated_file_size,
+    .bdrv_has_zero_init           = qbm_has_zero_init,
+    .bdrv_refresh_limits          = qbm_refresh_limits,
+    .bdrv_get_info                = qbm_get_info,
+    .bdrv_check                   = qbm_check,
+    .bdrv_detach_aio_context      = qbm_detach_aio_context,
+    .bdrv_attach_aio_context      = qbm_attach_aio_context,
+    .bdrv_dirty_bitmap_set_persistent
+                                  = qbm_bitmap_set_persistent,
+    .bdrv_change_backing_file     = qbm_change_backing_file,
+    .supports_backing             = true,
+    .create_opts                  = &qbm_create_opts,
+};
+
+static void bdrv_qbm_init(void)
+{
+    bdrv_register(&bdrv_qbm);
+}
+
+block_init(bdrv_qbm_init);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 11/16] qapi: Add "qbm" as a generic cow format driver
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (9 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 10/16] qbm: Implement format driver Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 12/16] iotests: Add qbm format to 041 Fam Zheng
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 qapi/block-core.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 52689ed..97dc0cd 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1599,7 +1599,7 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
             'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
-            'http', 'https', 'null-aio', 'null-co', 'parallels',
+            'http', 'https', 'null-aio', 'null-co', 'parallels', 'qbm',
             'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
             'vmdk', 'vpc', 'vvfat' ] }
 
@@ -2058,6 +2058,7 @@
       'null-aio':   'BlockdevOptionsNull',
       'null-co':    'BlockdevOptionsNull',
       'parallels':  'BlockdevOptionsGenericFormat',
+      'qbm':        'BlockdevOptionsGenericCOWFormat',
       'qcow2':      'BlockdevOptionsQcow2',
       'qcow':       'BlockdevOptionsGenericCOWFormat',
       'qed':        'BlockdevOptionsGenericCOWFormat',
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 12/16] iotests: Add qbm format to 041
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (10 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 11/16] qapi: Add "qbm" as a generic cow " Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 13/16] iotests: Add qbm to case 097 Fam Zheng
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Though a number of test cases dosn't apply because of cluster size and
blkdebug limitation, mirroring qbm can be covered by all other cases.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/qemu-iotests/041 | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index c7da95d..3712aca 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -139,6 +139,8 @@ class TestSingleDrive(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
     def test_small_buffer2(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,size=%d'
@@ -155,6 +157,8 @@ class TestSingleDrive(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
     def test_large_cluster(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
@@ -265,9 +269,9 @@ class TestMirrorNoBacking(iotests.QMPTestCase):
         os.remove(backing_img)
         try:
             os.remove(target_backing_img)
+            os.remove(target_img)
         except:
             pass
-        os.remove(target_img)
 
     def test_complete(self):
         self.assert_no_active_block_jobs()
@@ -300,6 +304,8 @@ class TestMirrorNoBacking(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
     def test_large_cluster(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         # qemu-img create fails if the image is not there
@@ -461,6 +467,8 @@ new_state = "1"
         self.vm.shutdown()
 
     def test_large_cluster(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         # Test COW into the target image.  The first half of the
@@ -568,6 +576,8 @@ new_state = "1"
         os.remove(self.blkdebug_file)
 
     def test_report_write(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
@@ -595,6 +605,8 @@ new_state = "1"
         self.vm.shutdown()
 
     def test_ignore_write(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
@@ -612,6 +624,8 @@ new_state = "1"
         self.vm.shutdown()
 
     def test_stop_write(self):
+        if iotests.imgfmt == "qbm":
+            return
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
@@ -981,4 +995,4 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.vm.shutdown()
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2', 'qed'])
+    iotests.main(supported_fmts=['qcow2', 'qed', 'qbm'])
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 13/16] iotests: Add qbm to case 097
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (11 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 12/16] iotests: Add qbm format to 041 Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 14/16] iotests: Add qbm to applicable test cases Fam Zheng
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

The output of "qemu-img map" will be slightly different for qbm because
the data image paths are not $TEST_IMG, but the pattern is predicatable
enough so we can just filter it out.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/qemu-iotests/095 | 2 +-
 tests/qemu-iotests/097 | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/095 b/tests/qemu-iotests/095
index dad04b9..2f68953 100755
--- a/tests/qemu-iotests/095
+++ b/tests/qemu-iotests/095
@@ -43,7 +43,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 . ./common.qemu
 
-_supported_fmt qcow2
+_supported_fmt qcow2 qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/097 b/tests/qemu-iotests/097
index c7a613b..2252d62 100755
--- a/tests/qemu-iotests/097
+++ b/tests/qemu-iotests/097
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files and bdrv_make_empty
-_supported_fmt qcow qcow2
+_supported_fmt qcow qcow2 qbm
 _supported_proto file
 _supported_os Linux
 
@@ -109,9 +109,11 @@ else
     # Both top and intermediate should be unchanged
 fi
 
+{
 $QEMU_IMG map "$TEST_IMG.base" | _filter_qemu_img_map
 $QEMU_IMG map "$TEST_IMG.itmd" | _filter_qemu_img_map
 $QEMU_IMG map "$TEST_IMG" | _filter_qemu_img_map
+} | sed -e 's/.data.img//'
 
 done
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 14/16] iotests: Add qbm to applicable test cases
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (12 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 13/16] iotests: Add qbm to case 097 Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 15/16] iotests: Add qbm specific test case 140 Fam Zheng
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/qemu-iotests/004 | 2 +-
 tests/qemu-iotests/017 | 2 +-
 tests/qemu-iotests/018 | 2 +-
 tests/qemu-iotests/019 | 2 +-
 tests/qemu-iotests/020 | 2 +-
 tests/qemu-iotests/024 | 2 +-
 tests/qemu-iotests/025 | 2 +-
 tests/qemu-iotests/027 | 2 +-
 tests/qemu-iotests/028 | 2 +-
 tests/qemu-iotests/030 | 2 +-
 tests/qemu-iotests/034 | 2 +-
 tests/qemu-iotests/037 | 2 +-
 tests/qemu-iotests/038 | 2 +-
 tests/qemu-iotests/040 | 2 +-
 tests/qemu-iotests/050 | 2 +-
 tests/qemu-iotests/055 | 2 +-
 tests/qemu-iotests/056 | 2 +-
 tests/qemu-iotests/069 | 2 +-
 tests/qemu-iotests/072 | 2 +-
 tests/qemu-iotests/086 | 2 +-
 tests/qemu-iotests/096 | 2 +-
 tests/qemu-iotests/099 | 2 +-
 tests/qemu-iotests/110 | 2 +-
 tests/qemu-iotests/129 | 2 +-
 tests/qemu-iotests/132 | 2 +-
 tests/qemu-iotests/139 | 2 +-
 26 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/tests/qemu-iotests/004 b/tests/qemu-iotests/004
index 2ad77ed..a67882e 100755
--- a/tests/qemu-iotests/004
+++ b/tests/qemu-iotests/004
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt raw qcow qcow2 qed vdi vmdk vhdx
+_supported_fmt raw qcow qcow2 qed vdi vmdk vhdx qbm
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 3af3cdf..220d26f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto generic
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
diff --git a/tests/qemu-iotests/018 b/tests/qemu-iotests/018
index 07b2de9..185c617 100755
--- a/tests/qemu-iotests/018
+++ b/tests/qemu-iotests/018
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
diff --git a/tests/qemu-iotests/019 b/tests/qemu-iotests/019
index 0937b5c..6354a7d 100755
--- a/tests/qemu-iotests/019
+++ b/tests/qemu-iotests/019
@@ -44,7 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 6625b55..187739b 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto generic
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 2c2d148..844bb11 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Currently only qcow2 and qed support rebasing
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index 467a4b7..6a7e592 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -39,7 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 . ./common.pattern
 
-_supported_fmt raw qcow2 qed
+_supported_fmt raw qcow2 qed qbm
 _supported_proto file sheepdog rbd nfs archipelago
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/027 b/tests/qemu-iotests/027
index 3fa81b8..be97963 100755
--- a/tests/qemu-iotests/027
+++ b/tests/qemu-iotests/027
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt vmdk qcow qcow2 qed
+_supported_fmt vmdk qcow qcow2 qed qbm
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 4909b9b..43d3f0a 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -46,7 +46,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Any format supporting backing files except vmdk and qcow which do not support
 # smaller backing files.
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 32469ef..fe5ad4a 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -467,4 +467,4 @@ class TestSetSpeed(iotests.QMPTestCase):
         self.cancel_and_wait()
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2', 'qed'])
+    iotests.main(supported_fmts=['qcow2', 'qed', 'qbm'])
diff --git a/tests/qemu-iotests/034 b/tests/qemu-iotests/034
index c769dd8..12f9cc3 100755
--- a/tests/qemu-iotests/034
+++ b/tests/qemu-iotests/034
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
diff --git a/tests/qemu-iotests/037 b/tests/qemu-iotests/037
index 5862451..7264783 100755
--- a/tests/qemu-iotests/037
+++ b/tests/qemu-iotests/037
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
diff --git a/tests/qemu-iotests/038 b/tests/qemu-iotests/038
index 34fe698..a9dcd8b 100755
--- a/tests/qemu-iotests/038
+++ b/tests/qemu-iotests/038
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 5bdaf3d..0130849 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -282,4 +282,4 @@ class TestReopenOverlay(ImageCommitTestCase):
         self.run_commit_test(self.img1, self.img0)
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2', 'qed'])
+    iotests.main(supported_fmts=['qcow2', 'qed', 'qbm'])
diff --git a/tests/qemu-iotests/050 b/tests/qemu-iotests/050
index 13006dd..e3a0111 100755
--- a/tests/qemu-iotests/050
+++ b/tests/qemu-iotests/050
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index c8e3578..3487e25 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -452,4 +452,4 @@ class TestSingleTransaction(iotests.QMPTestCase):
         self.assert_no_active_block_jobs()
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['raw', 'qcow2'])
+    iotests.main(supported_fmts=['raw', 'qcow2', "qbm"])
diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056
index 04f2c3c..84aef62 100755
--- a/tests/qemu-iotests/056
+++ b/tests/qemu-iotests/056
@@ -109,4 +109,4 @@ class TestBeforeWriteNotifier(iotests.QMPTestCase):
         self.assert_qmp(event, 'data/type', 'backup')
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2', 'qed'])
+    iotests.main(supported_fmts=['qcow2', 'qed', 'qbm'])
diff --git a/tests/qemu-iotests/069 b/tests/qemu-iotests/069
index ce9e054..b973081 100755
--- a/tests/qemu-iotests/069
+++ b/tests/qemu-iotests/069
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qed qcow qcow2 vmdk
+_supported_fmt qed qcow qcow2 vmdk qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
diff --git a/tests/qemu-iotests/072 b/tests/qemu-iotests/072
index e4a723d..aac233a 100755
--- a/tests/qemu-iotests/072
+++ b/tests/qemu-iotests/072
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt vpc vmdk vhdx vdi qed qcow2 qcow
+_supported_fmt vpc vmdk vhdx vdi qed qcow2 qcow qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/086 b/tests/qemu-iotests/086
index 5527e86..1baa436 100755
--- a/tests/qemu-iotests/086
+++ b/tests/qemu-iotests/086
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow2 raw
+_supported_fmt qcow2 raw qbm
 _supported_proto file nfs
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/096 b/tests/qemu-iotests/096
index e34204b..7a45c19 100644
--- a/tests/qemu-iotests/096
+++ b/tests/qemu-iotests/096
@@ -66,4 +66,4 @@ class TestLiveSnapshot(iotests.QMPTestCase):
         self.checkConfig('target')
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2'])
+    iotests.main(supported_fmts=['qcow2', 'qbm'])
diff --git a/tests/qemu-iotests/099 b/tests/qemu-iotests/099
index 80f3d9a..aeb5a8f 100755
--- a/tests/qemu-iotests/099
+++ b/tests/qemu-iotests/099
@@ -41,7 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Basically all formats, but "raw" has issues with _filter_imgfmt regarding the
 # raw comparison image for blkverify; also, all images have to support creation
-_supported_fmt qcow qcow2 qed vdi vhdx vmdk vpc
+_supported_fmt qcow qcow2 qed vdi vhdx vmdk vpc qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat" \
diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index a687f95..630f773 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -39,7 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 
 # Any format supporting backing files
-_supported_fmt qed qcow qcow2 vmdk
+_supported_fmt qed qcow qcow2 vmdk qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
diff --git a/tests/qemu-iotests/129 b/tests/qemu-iotests/129
index 9e87e1c..3da2e07 100644
--- a/tests/qemu-iotests/129
+++ b/tests/qemu-iotests/129
@@ -83,4 +83,4 @@ class TestStopWithBlockJob(iotests.QMPTestCase):
         self.do_test_stop("block-commit", device="drive0")
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=["qcow2"])
+    iotests.main(supported_fmts=["qcow2", "qbm"])
diff --git a/tests/qemu-iotests/132 b/tests/qemu-iotests/132
index f53ef6e..2d16f3e 100644
--- a/tests/qemu-iotests/132
+++ b/tests/qemu-iotests/132
@@ -56,4 +56,4 @@ class TestSingleDrive(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['raw', 'qcow2'])
+    iotests.main(supported_fmts=['raw', 'qcow2', 'qbm'])
diff --git a/tests/qemu-iotests/139 b/tests/qemu-iotests/139
index a4b9694..716f6dc 100644
--- a/tests/qemu-iotests/139
+++ b/tests/qemu-iotests/139
@@ -413,4 +413,4 @@ class TestBlockdevDel(iotests.QMPTestCase):
 
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=["qcow2"])
+    iotests.main(supported_fmts=["qcow2", "qbm"])
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 15/16] iotests: Add qbm specific test case 140
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (13 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 14/16] iotests: Add qbm to applicable test cases Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 16/16] iotests: Add persistent bitmap test case 141 Fam Zheng
  2016-02-22 14:24 ` [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Kevin Wolf
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/qemu-iotests/140     |  80 +++++++++++++++++++++++++
 tests/qemu-iotests/140.out | 145 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/common  |   6 ++
 tests/qemu-iotests/group   |   1 +
 4 files changed, 232 insertions(+)
 create mode 100755 tests/qemu-iotests/140
 create mode 100644 tests/qemu-iotests/140.out

diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
new file mode 100755
index 0000000..e5c3c56
--- /dev/null
+++ b/tests/qemu-iotests/140
@@ -0,0 +1,80 @@
+#!/bin/bash
+#
+# General tests for QBM format
+#
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=famz@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+    return
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qbm
+_supported_proto file
+_supported_os Linux
+
+size=128M
+
+echo
+echo "=== Create a QBM file with no option ==="
+_make_test_img $size
+_img_info
+cat $TEST_IMG
+
+for n in 0 1 3; do
+    echo
+    echo "=== Create a QBM file with $n dirty bitmap(s) ==="
+    echo
+    _make_test_img -o dirty-bitmaps=$n $size
+    _img_info
+    cat $TEST_IMG
+done
+
+
+$QEMU_IMG map $TEST_IMG
+
+echo
+echo "=== Create a QBM file with raw backing image ==="
+IMGFMT=raw TEST_IMG=$TEST_IMG.base _make_test_img $size
+$QEMU_IO_PROG -f raw $TEST_IMG.base -c "write 0 $size" | _filter_qemu_io
+_make_test_img -o dirty-bitmaps=1 -b $TEST_IMG.base
+cat $TEST_IMG
+_img_info
+
+$QEMU_IO $TEST_IMG -c "write 130560 131072" | _filter_qemu_io
+$QEMU_IMG map $TEST_IMG
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/140.out b/tests/qemu-iotests/140.out
new file mode 100644
index 0000000..1d9cefb
--- /dev/null
+++ b/tests/qemu-iotests/140.out
@@ -0,0 +1,145 @@
+QA output created by 140
+
+=== Create a QBM file with no option ===
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+    "QBM": {
+        "bitmaps": {
+        },
+        "image": {
+            "format": "raw",
+            "file": "t-data.img.qbm"
+        },
+        "version": 1,
+        "creator": "QEMU"
+    }
+}
+
+=== Create a QBM file with 0 dirty bitmap(s) ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 dirty-bitmaps=0
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+    "QBM": {
+        "bitmaps": {
+        },
+        "image": {
+            "format": "raw",
+            "file": "t-data.img.qbm"
+        },
+        "version": 1,
+        "creator": "QEMU"
+    }
+}
+
+=== Create a QBM file with 1 dirty bitmap(s) ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 dirty-bitmaps=1
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+    "QBM": {
+        "bitmaps": {
+            "dirty.0": {
+                "granularity-bytes": 65536,
+                "type": "dirty",
+                "file": "t-dirty.0.bitmap.qbm"
+            }
+        },
+        "image": {
+            "format": "raw",
+            "file": "t-data.img.qbm"
+        },
+        "version": 1,
+        "creator": "QEMU"
+    }
+}
+
+=== Create a QBM file with 3 dirty bitmap(s) ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 dirty-bitmaps=3
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+    "QBM": {
+        "bitmaps": {
+            "dirty.1": {
+                "granularity-bytes": 65536,
+                "type": "dirty",
+                "file": "t-dirty.1.bitmap.qbm"
+            },
+            "dirty.2": {
+                "granularity-bytes": 65536,
+                "type": "dirty",
+                "file": "t-dirty.2.bitmap.qbm"
+            },
+            "dirty.0": {
+                "granularity-bytes": 65536,
+                "type": "dirty",
+                "file": "t-dirty.0.bitmap.qbm"
+            }
+        },
+        "image": {
+            "format": "raw",
+            "file": "t-data.img.qbm"
+        },
+        "version": 1,
+        "creator": "QEMU"
+    }
+}
+Offset          Length          Mapped to       File
+
+=== Create a QBM file with raw backing image ===
+Formatting 'TEST_DIR/t.qbm.base', fmt=IMGFMT size=134217728
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base dirty-bitmaps=1
+{
+    "QBM": {
+        "bitmaps": {
+            "allocation": {
+                "granularity-bytes": 65536,
+                "backing": {
+                    "format": "",
+                    "file": "/home/fam/build/last/tests/qemu-iotests/scratch/t.qbm.base"
+                },
+                "type": "allocation",
+                "file": "t-allocation.bitmap.qbm"
+            },
+            "dirty.0": {
+                "granularity-bytes": 65536,
+                "type": "dirty",
+                "file": "t-dirty.0.bitmap.qbm"
+            }
+        },
+        "image": {
+            "format": "raw",
+            "file": "t-data.img.qbm"
+        },
+        "version": 1,
+        "creator": "QEMU"
+    }
+}
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 65536
+backing file: TEST_DIR/t.IMGFMT.base
+wrote 131072/131072 bytes at offset 130560
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset          Length          Mapped to       File
+0               0x10000         0               /home/fam/build/last/tests/qemu-iotests/scratch/t.qbm.base
+0x10000         0x30000         0x10000         /home/fam/build/last/tests/qemu-iotests/scratch/t-data.img.qbm
+0x40000         0x7fc0000       0x40000         /home/fam/build/last/tests/qemu-iotests/scratch/t.qbm.base
+*** done
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index ff84f4b..868a5d1 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -148,6 +148,7 @@ check options
     -vpc                test vpc
     -vhdx               test vhdx
     -vmdk               test vmdk
+    -rbm                test qbm
     -file               test file (default)
     -rbd                test rbd
     -sheepdog           test sheepdog
@@ -221,6 +222,11 @@ testlist options
             xpand=false
             ;;
 
+        -qbm)
+            IMGFMT=qbm
+            xpand=false
+            ;;
+
         -vpc)
             IMGFMT=vpc
             xpand=false
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index d6e9219..e220a00 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -141,4 +141,5 @@
 137 rw auto
 138 rw auto quick
 139 rw auto quick
+140 rw auto quick
 142 auto
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [RFC PATCH 16/16] iotests: Add persistent bitmap test case 141
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (14 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 15/16] iotests: Add qbm specific test case 140 Fam Zheng
@ 2016-01-26 10:38 ` Fam Zheng
  2016-02-22 14:24 ` [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Kevin Wolf
  16 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-01-26 10:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, jsnow, Markus Armbruster,
	mreitz, vsementsov, Stefan Hajnoczi

For now it merely invokes block-dirty-bitmap-{add,set-persistent}.
Verification of the bitmap data and user data to be added in the future.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/qemu-iotests/141     | 62 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/141.out |  5 ++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 68 insertions(+)
 create mode 100644 tests/qemu-iotests/141
 create mode 100644 tests/qemu-iotests/141.out

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
new file mode 100644
index 0000000..434c7ce
--- /dev/null
+++ b/tests/qemu-iotests/141
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+#
+# Tests for persistent dirty bitmap
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+
+class TestPersistentDirtyBitmap(iotests.QMPTestCase):
+    image_len = 64 * 1024 * 1024 # MB
+    def setUp(self):
+        # Write data to the image so we can compare later
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(self.image_len))
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+
+    def do_test_create(self, n):
+        def make_range(k):
+            return (k * 65536, 512)
+        r = range(n)
+        for i in r:
+            result = self.vm.qmp('block-dirty-bitmap-add', node='drive0',
+                            name='bitmap-%d' % i,
+                            persistent=True)
+            self.assert_qmp(result, 'return', {})
+            self.vm.hmp_qemu_io('drive0', 'write -P %d %d %d' % ((i % 255,) + make_range(i)))
+        for i in r:
+            result = self.vm.qmp('block-dirty-bitmap-set-persistent',
+                                 node='drive0', name='bitmap-%d' % i,
+                                 persistent=False)
+            self.assert_qmp(result, 'return', {})
+
+    def test_simple_one(self):
+        self.do_test_create(1)
+
+    def test_simple_multiple(self):
+        self.do_test_create(10)
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qbm'])
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
new file mode 100644
index 0000000..fbc63e6
--- /dev/null
+++ b/tests/qemu-iotests/141.out
@@ -0,0 +1,5 @@
+..
+----------------------------------------------------------------------
+Ran 2 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index e220a00..877bdbb 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -142,4 +142,5 @@
 138 rw auto quick
 139 rw auto quick
 140 rw auto quick
+141 rw auto quick
 142 auto
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
@ 2016-01-26 17:51   ` Eric Blake
  2016-02-09  0:05     ` John Snow
  2016-02-23  8:35     ` Markus Armbruster
  2016-02-08 23:51   ` John Snow
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 42+ messages in thread
From: Eric Blake @ 2016-01-26 17:51 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 11314 bytes --]

On 01/26/2016 03:38 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 118 insertions(+)
>  create mode 100644 docs/specs/qbm.md
> 
> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
> new file mode 100644
> index 0000000..b91910b
> --- /dev/null
> +++ b/docs/specs/qbm.md
> @@ -0,0 +1,118 @@
> +QEMU Block Bitmap (QBM)
> +=======================

No explicit copyright mention means that this document is GPLv2+ by
default.  I don't know if any 3rd-party implementation trying to use
this spec would object to that, or if a looser license is desirable here
(I don't personally care, but just raising the point).

> +
> +QBM is a multi-file disk format to allow storing persistent block bitmaps along
> +with the tracked data image.  A QBM image includes one json descriptor file,
> +one data image, one or more bitmap files that describe the block dirty status

s/one or more/and one or more/

Must it have block dirty status bitmaps, or can you have a QBM image
with just an allocation bitmap?

> +of the data image.
> +
> +The json file describes the structure of the image. The structure of the json
> +descriptor file is:

Please mention that the file must be valid JSON per RFC 7159.  Probably
also worth requiring that the file is a text file (ends in a newline).

> +
> +    QBM-JSON-FILE := { "QBM": DESC-JSON }
> +
> +    DESC-JSON := { "version": 1,
> +                   "image": IMAGE,
> +                   "BITMAPS": BITMAPS

s/"BITMAPS"/"bitmaps"/

See my thoughts below on whether this is the ideal top-level structure.

> +                 }
> +
> +Fields in the top level json dictionary are:
> +
> +@version: An integer which must be 1.
> +@image: A dictionary in IMAGE schema, as described later. It provides the
> +        information of the data image where user data is stored. Its format is
> +        documented in the "IMAGE schema" section.
> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
> +          the dictionary are the names of bitmap, which must be strings, and
> +          each value is a dictionary describing the information of the bitmap,
> +          as documented below in the "BITMAP schema" section.

Making 'bitmaps' be a dictionary means that we now have keys that might
not be an identifier.  Although this is valid JSON, it may trip up some
tools.  Would it be better to make 'bitmaps' be a list of dictionaries,
where each dictionary has a 'name':'value' member, so that bitmap names
that are not a (C, Python, whatever) identifier are still valid?

> +
> +=== IMAGE schema ===
> +
> +An IMAGE records the information of an image (such as a data image or a backing
> +file). It has following fields:

I liked how you showed DESC-JSON := for the top level; should you do the
same here for IMAGE?

> +
> +@file: The file name string of the referenced image. If it's a relative path,
> +       the file should be found relative to the descriptor file's
> +       location.

Does that mean we'll have to use 'json:...' encoding for representing
network resources?  Should we instead reuse some of the
qapi/block-core.json representation of a block device?

> +@format: The format string of the file.

Nice that format is mandatory.  Do we want to call out a finite list of
supported formats, or leave it open-ended in this spec?

> +
> +=== BITMAP schema ===
> +
> +A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
> +or a block allocation status bitmap). It has following mandatory fields:
> +
> +@file: The name of the bitmap file. The bitmap file is in little endian, both

s/in //

> +       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
> +       corresponds to the first sectors.

s/the byte 0/the first byte/

Again, should we be reusing something from qapi/block-core.json, to
allow network devices with more structure than just 'json:...' naming?

> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
> +                    This value must be a power of 2 and no less than 512.
> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
> +       supported.
> +       "dirty" indicates a block dirty bitmap; "allocation" indicates a
> +       allocation status bitmap. There must be at most one "allocation" bitmap.
> +
> +If the type of the bitmap is "allocation", an extra field "backing" is also
> +accepted:
> +
> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
> +          adding a backing file to raw image.

s/adding/add/

As promised above, would an alternative representation be any better?
That is, I'm trying to see if I could write qapi to describe the
structure you've presented here, and I fell short when it comes to
naming a particular bitmap.  Also, since there is at most one
'type':'allocation' member of 'bitmaps', I wonder if separating it out
would make it easier to locate.

Here's my proposal for an alternative schema, written in qapi:

{ 'struct': 'Other', 'data': { ...however we describe a network file... } }
{ 'alternate': 'File', 'data': { 'file': 'str', 'struct': 'Other' } }
{ 'struct': 'Bitmap', 'data': {
   'name':'str', 'file': 'File',
   'granularity-bytes':'int' } }
{ 'struct': 'AllocationBitmap', 'base': 'Bitmap', 'data': {
  '*backing': 'Image' } }
{ 'struct': 'Image', 'data': {
  'file': 'File',
  'format': 'str' # would an enum be better?
} }
{ 'struct': 'Desc', 'data': {
  'version': 'int', 'image': 'Image',
  '*allocation': 'AllocationBitmap',
  'dirty': [ 'AllocationBitmap' ]
} }
{ 'struct': 'QBM', 'data': { 'QBM': 'Desc' } }

where the json description file must consist of a single 'QBM' qapi
struct, and where the use of a QAPI alternate type 'File' allows us to
specify either a file name or a formal structure for describing a
network resource.  Below, I'll rewrite your example with my schema...

> +
> +
> +=== Extended fields ===
> +
> +Implementations are allowed to extend the format schema by inserting additinoal

s/additinoal/additional/

> +members into above dictionaries, with key names that starts with either
> +an "ext-hard-" or an "ext-soft-" prefix.

Should we be more like qcow2 and have a third category of auto-clear
keys (if you don't recognize the key, remove it upon editing the file,
but reading is okay)?  Feature negotiation via this approach requires
reading every member of 'bitmaps' (well, we have to do that anyway to
parse the full JSON structure); would it be any better to have an
up-front section in the top level that describes what features are in
use, rather than requiring all new features to use the 'ext-' namespace?

Should we require hard failure on any key whose name is not recognized
(other than the weirdness of your proposal having the keys of 'bitmaps'
be user-supplied names)?

> +
> +Extended fields prefixed with "ext-soft-" are optional and can be ignored by
> +parsers if they do not support it; fields starting with "ext-hard-" are
> +mandatory and cannot be ignored, a parser should not proceed parsing the image
> +if it does not support it.

Is it really the entire image invalidated if an extension is tied to a
particular bitmap, or only that bitmap?

> +
> +It is strongly recommended that the application names are also included in the
> +extention name string, such as "ext-hard-qemu-", if the effect or

s/extention/extension/

> +interpretation of the field is local to a specific application.
> +
> +For example, QEMU can implement a "checksum" feature to make sure no files
> +referred to by the json descriptor are modified inconsistently, by adding
> +"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
> +the json text found below.

If an extension proves to be useful, how do we standardize it later?
Will it always have to carry the 'ext-' prefix?

You said that soft extensions can be ignored on parse - but if we write
to the file, couldn't we possibly be invalidating the contents of the
extension field, and not leaving a breadcrumb for the future reader that
understands the extension to know that we messed it up?  I think an
auto-clear feature would be useful (preserve the checksum field, but
clear the associated auto-clear so that the newer reader knows that the
checksum has to be checked).

Should we be thinking about a write lock extension (similar to the
current thread on qcow2 write locks), to make it less likely that two
writers will be modifying the descriptor, image file, and/or bitmaps at
the same time?

> +
> +=== QBM descriptor file example ===
> +
> +This is the content of a QBM image's json descriptor file, which contains a
> +data image (data.img), and three bitmaps, out of which the "allocation" bitmap
> +associates a backing file to this image (base.img).
> +
> +{ "QBM": {
> +    "version": 1,
> +    "image": {
> +        "file": "data.img",
> +        "format": "raw"
> +        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",

Comma on wrong row.

> +    },
> +    "bitmaps": {
> +        "0": {
> +            "file": "bitmap0.bin",
> +            "granularity-bytes": 512,
> +            "type": "dirty"
> +        },
> +        "1": {
> +            "file": "bitmap1.bin",
> +            "granularity-bytes": 4096,
> +            "type": "dirty"
> +        },
> +        "2": {
> +            "file": "bitmap3.bin",
> +            "granularity-bytes": 4096,
> +            "type": "allocation"

Missing comma

> +            "backing": {
> +                "file": "base.img",
> +                "format": "raw"
> +                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",

Comma on wrong row

> +            },

No trailing commas in JSON

Your approach means that the same bitmap cannot be used for both 'dirty'
and 'allocation' (because all entries in the 'bitmaps' dictionary must
have distinct names).  (Although nothing stops two bitmap entries from
naming the same 'file'.)

> +        }
> +    }
> +} }

So, rewriting to my schema above, this would be:

{ "QBM": {
    "version": 1,
    "image": {
        "file": "data.img",
        "format": "raw",
        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8"
    },
    "allocation": {
        "name": "2",
        "file": "bitmap3.bin",
        "granularity-bytes": 4096,
        "backing": {
            "file": "base.img",
            "format": "raw"
            "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7"
        }
    },
    "bitmaps": [
        {
            "name": "0",
            "file": "bitmap0.bin",
            "granularity-bytes": 512
        },
        {
            "name": "1",
            "file": "bitmap1.bin",
            "granularity-bytes": 4096
        }
    ]
} }

> +
> 

Awkward to end in a blank line.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write Fam Zheng
@ 2016-01-26 17:52   ` Eric Blake
  2016-02-09  0:11   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: Eric Blake @ 2016-01-26 17:52 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 969 bytes --]

On 01/26/2016 03:38 AM, Fam Zheng wrote:
> So that driver can write the dirty bits into persistent dirty bitmaps in
> the write callback.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/io.c b/block/io.c
> index 343ff1f..b964e7e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1164,6 +1164,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>          }
>      }
>  
> +    bdrv_set_dirty(bs, sector_num, nb_sectors);
> +
>      if (ret < 0) {
>          /* Do nothing, write notifier decided to fail this request */

This sets the dirty bit even on failure, but I guess that doesn't hurt
(it's better to mark too much dirty than it is to not mark enough).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps Fam Zheng
@ 2016-01-26 17:53   ` Eric Blake
  2016-02-09  0:23   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: Eric Blake @ 2016-01-26 17:53 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On 01/26/2016 03:38 AM, Fam Zheng wrote:
> If the driver owns some dirty bitmaps, this assertion will fail.
> 
> The correct place to release them is in bdrv_close, so move the
> assertion one line down.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c Fam Zheng
@ 2016-01-27 16:07   ` Eric Blake
  2016-02-09 20:56   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: Eric Blake @ 2016-01-27 16:07 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On 01/26/2016 03:38 AM, Fam Zheng wrote:
> With the return value decoupled from VMDK, it can be reused by other block
> code.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
>  block/vmdk.c          | 40 ----------------------------------------
>  include/block/block.h |  2 ++
>  3 files changed, 42 insertions(+), 40 deletions(-)
> 

> +++ b/block.c
> @@ -144,6 +144,46 @@ int path_is_absolute(const char *path)
>  #endif
>  }
>  
> +int filename_decompose(const char *filename, char *path, char *prefix,
> +                       char *postfix, size_t buf_len, Error **errp)
> +{
> +    const char *p, *q;
> +
> +    if (filename == NULL || !strlen(filename)) {
> +        error_setg(errp, "No filename provided");
> +        return -EINVAL;
> +    }
> +    p = strrchr(filename, '/');
> +    if (p == NULL) {
> +        p = strrchr(filename, '\\');
> +    }

I know this is just code motion, but it feels like it does the wrong
thing on Unix boxes (trying too hard to appease Windows boxes).  Is that
something that needs to be independently addressed?

But as for this patch, the code motion is fine.
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public Fam Zheng
@ 2016-01-27 16:08   ` Eric Blake
  2016-02-09 21:06   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: Eric Blake @ 2016-01-27 16:08 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On 01/26/2016 03:38 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c            | 2 +-
>  include/block/block.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/io.c b/block/io.c
> index b964e7e..15e461f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -425,7 +425,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
>      }
>  }
>  
> -static int bdrv_get_cluster_size(BlockDriverState *bs)
> +int bdrv_get_cluster_size(BlockDriverState *bs)
>  {

Worth adding a doc comment while touching it?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
  2016-01-26 17:51   ` Eric Blake
@ 2016-02-08 23:51   ` John Snow
  2016-02-17 11:48   ` Vladimir Sementsov-Ogievskiy
  2016-02-17 16:30   ` Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-08 23:51 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 118 insertions(+)
>  create mode 100644 docs/specs/qbm.md
> 
> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
> new file mode 100644
> index 0000000..b91910b
> --- /dev/null
> +++ b/docs/specs/qbm.md
> @@ -0,0 +1,118 @@
> +QEMU Block Bitmap (QBM)
> +=======================
> +
> +QBM is a multi-file disk format to allow storing persistent block bitmaps along
> +with the tracked data image.  A QBM image includes one json descriptor file,
> +one data image, one or more bitmap files that describe the block dirty status
> +of the data image.
> +
> +The json file describes the structure of the image. The structure of the json
> +descriptor file is:
> +
> +    QBM-JSON-FILE := { "QBM": DESC-JSON }
> +
> +    DESC-JSON := { "version": 1,
> +                   "image": IMAGE,
> +                   "BITMAPS": BITMAPS
> +                 }
> +
> +Fields in the top level json dictionary are:
> +
> +@version: An integer which must be 1.
> +@image: A dictionary in IMAGE schema, as described later. It provides the
> +        information of the data image where user data is stored. Its format is
> +        documented in the "IMAGE schema" section.
> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
                                             ^or

> +          the dictionary are the names of bitmap, which must be strings, and
> +          each value is a dictionary describing the information of the bitmap,
> +          as documented below in the "BITMAP schema" section.
> +
> +=== IMAGE schema ===
> +
> +An IMAGE records the information of an image (such as a data image or a backing
> +file). It has following fields:
> +
> +@file: The file name string of the referenced image. If it's a relative path,
> +       the file should be found relative to the descriptor file's
> +       location.
> +@format: The format string of the file.

Are these codified in any spec, or do they only exist as QEMU API? Is it
important to document what namespace these format strings belong to?

> +
> +=== BITMAP schema ===
> +
> +A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
> +or a block allocation status bitmap). It has following mandatory fields:
> +
> +@file: The name of the bitmap file. The bitmap file is in little endian, both
> +       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
> +       corresponds to the first sectors.
> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
> +                    This value must be a power of 2 and no less than 512.
> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
> +       supported.
> +       "dirty" indicates a block dirty bitmap; "allocation" indicates a
> +       allocation status bitmap. There must be at most one "allocation" bitmap.

Might be worth syncing with Vladimir to create consistent terminology
for these, e.g. a "Dirty Tracking Bitmap."

> +
> +If the type of the bitmap is "allocation", an extra field "backing" is also
> +accepted:
> +
> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
> +          adding a backing file to raw image.
> +
> +
> +=== Extended fields ===
> +
> +Implementations are allowed to extend the format schema by inserting additinoal
........................................................................^additional
> +members into above dictionaries, with key names that starts with either
> +an "ext-hard-" or an "ext-soft-" prefix.
> +
> +Extended fields prefixed with "ext-soft-" are optional and can be ignored by
> +parsers if they do not support it; fields starting with "ext-hard-" are
> +mandatory and cannot be ignored, a parser should not proceed parsing the image
> +if it does not support it.
> +
> +It is strongly recommended that the application names are also included in the
> +extention name string, such as "ext-hard-qemu-", if the effect or

extension

> +interpretation of the field is local to a specific application.
> +
> +For example, QEMU can implement a "checksum" feature to make sure no files
> +referred to by the json descriptor are modified inconsistently, by adding
> +"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
> +the json text found below.
> +
> +=== QBM descriptor file example ===
> +
> +This is the content of a QBM image's json descriptor file, which contains a
> +data image (data.img), and three bitmaps, out of which the "allocation" bitmap
> +associates a backing file to this image (base.img).
> +
> +{ "QBM": {
> +    "version": 1,
> +    "image": {
> +        "file": "data.img",
> +        "format": "raw"
> +        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
> +    },
> +    "bitmaps": {
> +        "0": {
> +            "file": "bitmap0.bin",
> +            "granularity-bytes": 512,
> +            "type": "dirty"
> +        },
> +        "1": {
> +            "file": "bitmap1.bin",
> +            "granularity-bytes": 4096,
> +            "type": "dirty"
> +        },
> +        "2": {
> +            "file": "bitmap3.bin",
> +            "granularity-bytes": 4096,
> +            "type": "allocation"
> +            "backing": {
> +                "file": "base.img",
> +                "format": "raw"
> +                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
> +            },
> +        }
> +    }
> +} }
> +
> 

I think I agree with Eric that it might be best to use name:value pairs,
unless there is a strong motivator otherwise -- I seem to recall that
this was a way to prevent duplicate named bitmaps within the JSON schema
itself, which is reasonably compelling.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 17:51   ` Eric Blake
@ 2016-02-09  0:05     ` John Snow
  2016-02-23  8:35     ` Markus Armbruster
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09  0:05 UTC (permalink / raw)
  To: Eric Blake, Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 12:51 PM, Eric Blake wrote:
> On 01/26/2016 03:38 AM, Fam Zheng wrote:
>> Signed-off-by: Fam Zheng <famz@redhat.com>
>> ---
>>  docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 118 insertions(+)
>>  create mode 100644 docs/specs/qbm.md
>>
>> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
>> new file mode 100644
>> index 0000000..b91910b
>> --- /dev/null
>> +++ b/docs/specs/qbm.md
>> @@ -0,0 +1,118 @@
>> +QEMU Block Bitmap (QBM)
>> +=======================
> 
> No explicit copyright mention means that this document is GPLv2+ by
> default.  I don't know if any 3rd-party implementation trying to use
> this spec would object to that, or if a looser license is desirable here
> (I don't personally care, but just raising the point).
> 
>> +
>> +QBM is a multi-file disk format to allow storing persistent block bitmaps along
>> +with the tracked data image.  A QBM image includes one json descriptor file,
>> +one data image, one or more bitmap files that describe the block dirty status
> 
> s/one or more/and one or more/
> 
> Must it have block dirty status bitmaps, or can you have a QBM image
> with just an allocation bitmap?
> 

Intent seems to be one or more of any kind.

>> +of the data image.
>> +
>> +The json file describes the structure of the image. The structure of the json
>> +descriptor file is:
> 
> Please mention that the file must be valid JSON per RFC 7159.  Probably
> also worth requiring that the file is a text file (ends in a newline).
> 
>> +
>> +    QBM-JSON-FILE := { "QBM": DESC-JSON }
>> +
>> +    DESC-JSON := { "version": 1,
>> +                   "image": IMAGE,
>> +                   "BITMAPS": BITMAPS
> 
> s/"BITMAPS"/"bitmaps"/
> 
> See my thoughts below on whether this is the ideal top-level structure.
> 
>> +                 }
>> +
>> +Fields in the top level json dictionary are:
>> +
>> +@version: An integer which must be 1.
>> +@image: A dictionary in IMAGE schema, as described later. It provides the
>> +        information of the data image where user data is stored. Its format is
>> +        documented in the "IMAGE schema" section.
>> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
>> +          the dictionary are the names of bitmap, which must be strings, and
>> +          each value is a dictionary describing the information of the bitmap,
>> +          as documented below in the "BITMAP schema" section.
> 
> Making 'bitmaps' be a dictionary means that we now have keys that might
> not be an identifier.  Although this is valid JSON, it may trip up some
> tools.  Would it be better to make 'bitmaps' be a list of dictionaries,
> where each dictionary has a 'name':'value' member, so that bitmap names
> that are not a (C, Python, whatever) identifier are still valid?
> 
>> +
>> +=== IMAGE schema ===
>> +
>> +An IMAGE records the information of an image (such as a data image or a backing
>> +file). It has following fields:
> 
> I liked how you showed DESC-JSON := for the top level; should you do the
> same here for IMAGE?
> 
>> +
>> +@file: The file name string of the referenced image. If it's a relative path,
>> +       the file should be found relative to the descriptor file's
>> +       location.
> 
> Does that mean we'll have to use 'json:...' encoding for representing
> network resources?  Should we instead reuse some of the
> qapi/block-core.json representation of a block device?
> 

This is an interesting thought: If the design of QBM is such that it is
externally useful to other VM storage utilities, it might be best to
avoid baking in QEMU-specific 'isms for the storage specifier.

Of course, that still leaves us up a creek without a paddle to describe
more complicated image formats, so perhaps we should spell out that the
"filename" can be any kind of identifier (including more json) that
describes the resource. For QEMU, this means our block device json format.

Other utilities will have to be prepared for the notion that this might
not be a straight filename, and they may not know how to interpret it.

Is that acceptable?

>> +@format: The format string of the file.
> 
> Nice that format is mandatory.  Do we want to call out a finite list of
> supported formats, or leave it open-ended in this spec?
> 

Right, QEMU's usage of "qcow2" "raw" etc are not necessarily following
some global standard (unless -- are we?) and we should either spell out
the expected values or point to some namespace.

>> +
>> +=== BITMAP schema ===
>> +
>> +A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
>> +or a block allocation status bitmap). It has following mandatory fields:
>> +
>> +@file: The name of the bitmap file. The bitmap file is in little endian, both
> 
> s/in //
> 
>> +       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
>> +       corresponds to the first sectors.
> 
> s/the byte 0/the first byte/
> 
> Again, should we be reusing something from qapi/block-core.json, to
> allow network devices with more structure than just 'json:...' naming?
> 
>> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
>> +                    This value must be a power of 2 and no less than 512.
>> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
>> +       supported.
>> +       "dirty" indicates a block dirty bitmap; "allocation" indicates a
>> +       allocation status bitmap. There must be at most one "allocation" bitmap.
>> +
>> +If the type of the bitmap is "allocation", an extra field "backing" is also
>> +accepted:
>> +
>> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
>> +          adding a backing file to raw image.
> 
> s/adding/add/
> 
> As promised above, would an alternative representation be any better?
> That is, I'm trying to see if I could write qapi to describe the
> structure you've presented here, and I fell short when it comes to
> naming a particular bitmap.  Also, since there is at most one
> 'type':'allocation' member of 'bitmaps', I wonder if separating it out
> would make it easier to locate.
> 
> Here's my proposal for an alternative schema, written in qapi:
> 

Hesitant to use QEMUisms for describing data in a spec.

> { 'struct': 'Other', 'data': { ...however we describe a network file... } }
> { 'alternate': 'File', 'data': { 'file': 'str', 'struct': 'Other' } }
> { 'struct': 'Bitmap', 'data': {
>    'name':'str', 'file': 'File',
>    'granularity-bytes':'int' } }
> { 'struct': 'AllocationBitmap', 'base': 'Bitmap', 'data': {
>   '*backing': 'Image' } }
> { 'struct': 'Image', 'data': {
>   'file': 'File',
>   'format': 'str' # would an enum be better?
> } }
> { 'struct': 'Desc', 'data': {
>   'version': 'int', 'image': 'Image',
>   '*allocation': 'AllocationBitmap',
>   'dirty': [ 'AllocationBitmap' ]
> } }
> { 'struct': 'QBM', 'data': { 'QBM': 'Desc' } }
> 
> where the json description file must consist of a single 'QBM' qapi
> struct, and where the use of a QAPI alternate type 'File' allows us to
> specify either a file name or a formal structure for describing a
> network resource.  Below, I'll rewrite your example with my schema...
> 
>> +
>> +
>> +=== Extended fields ===
>> +
>> +Implementations are allowed to extend the format schema by inserting additinoal
> 
> s/additinoal/additional/
> 
>> +members into above dictionaries, with key names that starts with either
>> +an "ext-hard-" or an "ext-soft-" prefix.
> 
> Should we be more like qcow2 and have a third category of auto-clear
> keys (if you don't recognize the key, remove it upon editing the file,
> but reading is okay)?  Feature negotiation via this approach requires
> reading every member of 'bitmaps' (well, we have to do that anyway to
> parse the full JSON structure); would it be any better to have an
> up-front section in the top level that describes what features are in
> use, rather than requiring all new features to use the 'ext-' namespace?
> 
> Should we require hard failure on any key whose name is not recognized
> (other than the weirdness of your proposal having the keys of 'bitmaps'
> be user-supplied names)?
> 

Sounds appropriate, if we use name:value pairs.

>> +
>> +Extended fields prefixed with "ext-soft-" are optional and can be ignored by
>> +parsers if they do not support it; fields starting with "ext-hard-" are
>> +mandatory and cannot be ignored, a parser should not proceed parsing the image
>> +if it does not support it.
> 
> Is it really the entire image invalidated if an extension is tied to a
> particular bitmap, or only that bitmap?
> 

Going to go ahead and say "yes." If we tie allocation bitmaps to images,
it's a bit rude to reach in and fiddle with the image if just this QBM
descriptor is valid.

>> +
>> +It is strongly recommended that the application names are also included in the
>> +extention name string, such as "ext-hard-qemu-", if the effect or
> 
> s/extention/extension/
> 
>> +interpretation of the field is local to a specific application.
>> +
>> +For example, QEMU can implement a "checksum" feature to make sure no files
>> +referred to by the json descriptor are modified inconsistently, by adding
>> +"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
>> +the json text found below.
> 
> If an extension proves to be useful, how do we standardize it later?
> Will it always have to carry the 'ext-' prefix?
> 

Interesting question. Perhaps future revisions of the spec will have to
specify aliases for ext-** properties that get canonicalized.

> You said that soft extensions can be ignored on parse - but if we write
> to the file, couldn't we possibly be invalidating the contents of the
> extension field, and not leaving a breadcrumb for the future reader that
> understands the extension to know that we messed it up?  I think an
> auto-clear feature would be useful (preserve the checksum field, but
> clear the associated auto-clear so that the newer reader knows that the
> checksum has to be checked).
> 

Shame on the tool that used "soft" for a hard requirement, then. Point
taken, though, the checksum would be a prime candidate for an autoclear
field.

Or, at the very least, adding a breadcrumb field that's essentially "A
tool modified this without understanding all of the soft fields" sounds
nice.

> Should we be thinking about a write lock extension (similar to the
> current thread on qcow2 write locks), to make it less likely that two
> writers will be modifying the descriptor, image file, and/or bitmaps at
> the same time?
> 

Sounds "nice to have" but not necessary for V1. I think it's well
understood that opening a file in two places will lead to heartbreak and
ruin.

>> +
>> +=== QBM descriptor file example ===
>> +
>> +This is the content of a QBM image's json descriptor file, which contains a
>> +data image (data.img), and three bitmaps, out of which the "allocation" bitmap
>> +associates a backing file to this image (base.img).
>> +
>> +{ "QBM": {
>> +    "version": 1,
>> +    "image": {
>> +        "file": "data.img",
>> +        "format": "raw"
>> +        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
> 
> Comma on wrong row.
> 
>> +    },
>> +    "bitmaps": {
>> +        "0": {
>> +            "file": "bitmap0.bin",
>> +            "granularity-bytes": 512,
>> +            "type": "dirty"
>> +        },
>> +        "1": {
>> +            "file": "bitmap1.bin",
>> +            "granularity-bytes": 4096,
>> +            "type": "dirty"
>> +        },
>> +        "2": {
>> +            "file": "bitmap3.bin",
>> +            "granularity-bytes": 4096,
>> +            "type": "allocation"
> 
> Missing comma
> 
>> +            "backing": {
>> +                "file": "base.img",
>> +                "format": "raw"
>> +                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
> 
> Comma on wrong row
> 
>> +            },
> 
> No trailing commas in JSON
> 
> Your approach means that the same bitmap cannot be used for both 'dirty'
> and 'allocation' (because all entries in the 'bitmaps' dictionary must
> have distinct names).  (Although nothing stops two bitmap entries from
> naming the same 'file'.)
> 

I think that's fine, how often will the allocation bitmap align with the
dirty tracking bitmap? It will at first, but it's likely to diverge
fairly often.

>> +        }
>> +    }
>> +} }
> 
> So, rewriting to my schema above, this would be:
> 
> { "QBM": {
>     "version": 1,
>     "image": {
>         "file": "data.img",
>         "format": "raw",
>         "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8"
>     },
>     "allocation": {
>         "name": "2",
>         "file": "bitmap3.bin",
>         "granularity-bytes": 4096,
>         "backing": {
>             "file": "base.img",
>             "format": "raw"
>             "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7"
>         }
>     },
>     "bitmaps": [
>         {
>             "name": "0",
>             "file": "bitmap0.bin",
>             "granularity-bytes": 512
>         },
>         {
>             "name": "1",
>             "file": "bitmap1.bin",
>             "granularity-bytes": 4096
>         }
>     ]
> } }
> 
>> +
>>
> 
> Awkward to end in a blank line.
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write Fam Zheng
  2016-01-26 17:52   ` Eric Blake
@ 2016-02-09  0:11   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09  0:11 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> So that driver can write the dirty bits into persistent dirty bitmaps in
> the write callback.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 343ff1f..b964e7e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1164,6 +1164,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>          }
>      }
>  
> +    bdrv_set_dirty(bs, sector_num, nb_sectors);
> +
>      if (ret < 0) {
>          /* Do nothing, write notifier decided to fail this request */
>      } else if (flags & BDRV_REQ_ZERO_WRITE) {
> @@ -1179,8 +1181,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>          ret = bdrv_co_flush(bs);
>      }
>  
> -    bdrv_set_dirty(bs, sector_num, nb_sectors);
> -
>      if (bs->wr_highest_offset < offset + bytes) {
>          bs->wr_highest_offset = offset + bytes;
>      }
> 

Might want a comment here acknowledging the write gets errantly set on
write failure, but that for $reasons it needs to be above anyway -- so
someone doesn't helpfully try to move it back below.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps Fam Zheng
  2016-01-26 17:53   ` Eric Blake
@ 2016-02-09  0:23   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09  0:23 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> If the driver owns some dirty bitmaps, this assertion will fail.
> 
> The correct place to release them is in bdrv_close, so move the
> assertion one line down.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index afb71c0..fa6ad1d 100644
> --- a/block.c
> +++ b/block.c
> @@ -2348,10 +2348,11 @@ static void bdrv_delete(BlockDriverState *bs)
>      assert(!bs->job);
>      assert(bdrv_op_blocker_is_empty(bs));
>      assert(!bs->refcnt);
> -    assert(QLIST_EMPTY(&bs->dirty_bitmaps));
>  
>      bdrv_close(bs);
>  
> +    assert(QLIST_EMPTY(&bs->dirty_bitmaps));
> +
>      /* remove from list, if necessary */
>      bdrv_make_anon(bs);
>  
> 

I think now is where we need to begin distinguishing internally owned
bitmaps from qmp/monitor created ones. I suppose this is just an assert
so it isn't changing much, but there are different ideas at play here...

- Bitmaps created internally for various reasons (backups, migration, etc)
- Bitmaps created explicitly by the user (transient bitmaps)
- Bitmaps autoloaded from qcow2, qbm, etc (persistent bitmaps)

We should still make sure we don't have any of the first two types when
we go to close the bitmap, and making sure we have none of the third is
reasonable after the close.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c Fam Zheng
  2016-01-27 16:07   ` Eric Blake
@ 2016-02-09 20:56   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09 20:56 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> With the return value decoupled from VMDK, it can be reused by other block
> code.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
>  block/vmdk.c          | 40 ----------------------------------------
>  include/block/block.h |  2 ++
>  3 files changed, 42 insertions(+), 40 deletions(-)
> 
> diff --git a/block.c b/block.c
> index fa6ad1d..78db342 100644
> --- a/block.c
> +++ b/block.c
> @@ -144,6 +144,46 @@ int path_is_absolute(const char *path)
>  #endif
>  }
>  
> +int filename_decompose(const char *filename, char *path, char *prefix,
> +                       char *postfix, size_t buf_len, Error **errp)
> +{
> +    const char *p, *q;
> +
> +    if (filename == NULL || !strlen(filename)) {
> +        error_setg(errp, "No filename provided");
> +        return -EINVAL;
> +    }
> +    p = strrchr(filename, '/');
> +    if (p == NULL) {
> +        p = strrchr(filename, '\\');
> +    }
> +    if (p == NULL) {
> +        p = strrchr(filename, ':');
> +    }
> +    if (p != NULL) {
> +        p++;
> +        if (p - filename >= buf_len) {
> +            return -EINVAL;
> +        }
> +        pstrcpy(path, p - filename + 1, filename);
> +    } else {
> +        p = filename;
> +        path[0] = '\0';
> +    }
> +    q = strrchr(p, '.');
> +    if (q == NULL) {
> +        pstrcpy(prefix, buf_len, p);
> +        postfix[0] = '\0';
> +    } else {
> +        if (q - p >= buf_len) {
> +            return -EINVAL;
> +        }
> +        pstrcpy(prefix, q - p + 1, p);
> +        pstrcpy(postfix, buf_len, q);
> +    }
> +    return 0;
> +}
> +
>  /* if filename is absolute, just copy it to dest. Otherwise, build a
>     path to it by considering it is relative to base_path. URL are
>     supported. */
> diff --git a/block/vmdk.c b/block/vmdk.c
> index f8f7fcf..505e0c2 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1764,46 +1764,6 @@ exit:
>      return ret;
>  }
>  
> -static int filename_decompose(const char *filename, char *path, char *prefix,
> -                              char *postfix, size_t buf_len, Error **errp)
> -{
> -    const char *p, *q;
> -
> -    if (filename == NULL || !strlen(filename)) {
> -        error_setg(errp, "No filename provided");
> -        return VMDK_ERROR;
> -    }
> -    p = strrchr(filename, '/');
> -    if (p == NULL) {
> -        p = strrchr(filename, '\\');
> -    }
> -    if (p == NULL) {
> -        p = strrchr(filename, ':');
> -    }
> -    if (p != NULL) {
> -        p++;
> -        if (p - filename >= buf_len) {
> -            return VMDK_ERROR;
> -        }
> -        pstrcpy(path, p - filename + 1, filename);
> -    } else {
> -        p = filename;
> -        path[0] = '\0';
> -    }
> -    q = strrchr(p, '.');
> -    if (q == NULL) {
> -        pstrcpy(prefix, buf_len, p);
> -        postfix[0] = '\0';
> -    } else {
> -        if (q - p >= buf_len) {
> -            return VMDK_ERROR;
> -        }
> -        pstrcpy(prefix, q - p + 1, p);
> -        pstrcpy(postfix, buf_len, q);
> -    }
> -    return VMDK_OK;
> -}
> -
>  static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
>  {
>      int idx = 0;
> diff --git a/include/block/block.h b/include/block/block.h
> index bfb76f8..b9b30cb 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -449,6 +449,8 @@ int bdrv_is_snapshot(BlockDriverState *bs);
>  
>  int path_has_protocol(const char *path);
>  int path_is_absolute(const char *path);
> +int filename_decompose(const char *filename, char *path, char *prefix,
> +                       char *postfix, size_t buf_len, Error **errp);
>  void path_combine(char *dest, int dest_size,
>                    const char *base_path,
>                    const char *filename);
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public Fam Zheng
  2016-01-27 16:08   ` Eric Blake
@ 2016-02-09 21:06   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09 21:06 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c            | 2 +-
>  include/block/block.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index b964e7e..15e461f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -425,7 +425,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
>      }
>  }
>  
> -static int bdrv_get_cluster_size(BlockDriverState *bs)
> +int bdrv_get_cluster_size(BlockDriverState *bs)
>  {
>      BlockDriverInfo bdi;
>      int ret;
> diff --git a/include/block/block.h b/include/block/block.h
> index b9b30cb..16b7845 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -435,7 +435,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
>                              int64_t sector_num, int nb_sectors,
>                              int64_t *cluster_sector_num,
>                              int *cluster_nb_sectors);
> -
> +int bdrv_get_cluster_size(BlockDriverState *bs);
>  const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
>  void bdrv_get_backing_filename(BlockDriverState *bs,
>                                 char *filename, int filename_size);
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent Fam Zheng
@ 2016-02-09 21:31   ` John Snow
  2016-02-09 22:04   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09 21:31 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> By implementing bdrv_dirty_bitmap_set_persistent, a driver can support
> the persistent dirty bitmap feature.
> 
> Once a dirty bitmap is made persistent, the driver is responsible for saving
> the dirty bitmap when appropriate, for example before close; if a persistent
> bitmap is removed or made non-persistent, .bdrv_dirty_bitmap_set_persistent
> will be called, the driver should then remove the dirty bitmap from the disk.
> 
> This operation is not recursed in block layer, a filter such as blkdebug needs
> to implement the callback and explicitly pass down to bs->file, etc.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/dirty-bitmap.c         | 38 ++++++++++++++++++++++++++++++++++++++
>  include/block/block_int.h    |  8 ++++++++
>  include/block/dirty-bitmap.h |  4 ++++
>  3 files changed, 50 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 1aa7f76..882a0db 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -43,6 +43,7 @@ struct BdrvDirtyBitmap {
>      int64_t size;               /* Size of the bitmap (Number of sectors) */
>      bool disabled;              /* Bitmap is read-only */
>      int active_iterators;       /* How many iterators are active */
> +    bool persistent;            /* Whether this bitmap is persistent. */
>      QLIST_ENTRY(BdrvDirtyBitmap) list;
>  };
>  
> @@ -71,6 +72,37 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>      bitmap->name = NULL;
>  }
>  
> +int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
> +                                     BdrvDirtyBitmap *bitmap,
> +                                     bool persistent, bool flag_only,
> +                                     Error **errp)
> +{
> +    int ret = 0;
> +
> +    if (!bitmap->name) {
> +        error_setg(errp, "Cannot change the persistent status of an anonymous"
> +                         "bitmap");
> +        return -EINVAL;
> +    }
> +
> +    if (persistent == bitmap->persistent) {
> +        return 0;
> +    }
> +
> +    if (!flag_only) {
> +        if (!bs->drv || !bs->drv->bdrv_dirty_bitmap_set_persistent) {
> +            error_setg(errp, "Not supported in this format.");
> +            return -ENOTSUP;
> +        }
> +        ret = bs->drv->bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent,
> +                                                        errp);
> +    }
> +    if (!ret) {
> +        bitmap->persistent = persistent;
> +    }
> +    return ret;
> +}
> +
>  BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>                                            uint32_t granularity,
>                                            const char *name,
> @@ -194,6 +226,12 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
>      uint64_t granularity;
>      BdrvDirtyBitmap *child;
>  
> +    if (bitmap->persistent) {
> +        error_setg(errp, "Cannot create a successor for a bitmap that is "
> +                   "persistent");
> +        return -1;
> +    }
> +

Oh? so we can't make backups with persistent bitmaps?

(I'll keep reading forward in the series...)

>      if (bdrv_dirty_bitmap_frozen(bitmap)) {
>          error_setg(errp, "Cannot create a successor for a bitmap that is "
>                     "currently frozen");
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 5fa58e8..fbc34af 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -305,6 +305,14 @@ struct BlockDriver {
>       */
>      void (*bdrv_drain)(BlockDriverState *bs);
>  
> +    /**
> +     * Make the dirty bitmap persistent if persistent=true or transient
> +     * otherwise.
> +     */
> +    int (*bdrv_dirty_bitmap_set_persistent)(BlockDriverState *bs,
> +                                            BdrvDirtyBitmap *bitmap,
> +                                            bool persistent, Error **errp);
> +
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index d14d923..5885720 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -24,6 +24,10 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
>  BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
>                                          const char *name);
>  void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
> +int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
> +                                     BdrvDirtyBitmap *bitmap,
> +                                     bool persistent, bool flag_only,
> +                                     Error **errp);
>  void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
>  void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
>  void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
> 

-- 
—js

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent Fam Zheng
  2016-02-09 21:31   ` John Snow
@ 2016-02-09 22:04   ` John Snow
  1 sibling, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09 22:04 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> By implementing bdrv_dirty_bitmap_set_persistent, a driver can support
> the persistent dirty bitmap feature.
> 
> Once a dirty bitmap is made persistent, the driver is responsible for saving
> the dirty bitmap when appropriate, for example before close; if a persistent
> bitmap is removed or made non-persistent, .bdrv_dirty_bitmap_set_persistent
> will be called, the driver should then remove the dirty bitmap from the disk.
> 
> This operation is not recursed in block layer, a filter such as blkdebug needs
> to implement the callback and explicitly pass down to bs->file, etc.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/dirty-bitmap.c         | 38 ++++++++++++++++++++++++++++++++++++++
>  include/block/block_int.h    |  8 ++++++++
>  include/block/dirty-bitmap.h |  4 ++++
>  3 files changed, 50 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 1aa7f76..882a0db 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -43,6 +43,7 @@ struct BdrvDirtyBitmap {
>      int64_t size;               /* Size of the bitmap (Number of sectors) */
>      bool disabled;              /* Bitmap is read-only */
>      int active_iterators;       /* How many iterators are active */
> +    bool persistent;            /* Whether this bitmap is persistent. */
>      QLIST_ENTRY(BdrvDirtyBitmap) list;
>  };
>  
> @@ -71,6 +72,37 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>      bitmap->name = NULL;
>  }
>  

What's the intended usage of flag_only? (/keeps reading/...)

> +int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
> +                                     BdrvDirtyBitmap *bitmap,
> +                                     bool persistent, bool flag_only,
> +                                     Error **errp)
> +{
> +    int ret = 0;
> +
> +    if (!bitmap->name) {
> +        error_setg(errp, "Cannot change the persistent status of an anonymous"
> +                         "bitmap");
> +        return -EINVAL;
> +    }
> +
> +    if (persistent == bitmap->persistent) {
> +        return 0;
> +    }
> +
> +    if (!flag_only) {
> +        if (!bs->drv || !bs->drv->bdrv_dirty_bitmap_set_persistent) {
> +            error_setg(errp, "Not supported in this format.");
> +            return -ENOTSUP;
> +        }
> +        ret = bs->drv->bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent,
> +                                                        errp);
> +    }
> +    if (!ret) {
> +        bitmap->persistent = persistent;
> +    }
> +    return ret;
> +}
> +
>  BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>                                            uint32_t granularity,
>                                            const char *name,
> @@ -194,6 +226,12 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
>      uint64_t granularity;
>      BdrvDirtyBitmap *child;
>  
> +    if (bitmap->persistent) {
> +        error_setg(errp, "Cannot create a successor for a bitmap that is "
> +                   "persistent");
> +        return -1;
> +    }
> +
>      if (bdrv_dirty_bitmap_frozen(bitmap)) {
>          error_setg(errp, "Cannot create a successor for a bitmap that is "
>                     "currently frozen");
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 5fa58e8..fbc34af 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -305,6 +305,14 @@ struct BlockDriver {
>       */
>      void (*bdrv_drain)(BlockDriverState *bs);
>  
> +    /**
> +     * Make the dirty bitmap persistent if persistent=true or transient
> +     * otherwise.
> +     */
> +    int (*bdrv_dirty_bitmap_set_persistent)(BlockDriverState *bs,
> +                                            BdrvDirtyBitmap *bitmap,
> +                                            bool persistent, Error **errp);
> +
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index d14d923..5885720 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -24,6 +24,10 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
>  BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
>                                          const char *name);
>  void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
> +int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
> +                                     BdrvDirtyBitmap *bitmap,
> +                                     bool persistent, bool flag_only,
> +                                     Error **errp);
>  void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
>  void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
>  void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add Fam Zheng
@ 2016-02-09 22:05   ` John Snow
  0 siblings, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09 22:05 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> When omitted it defaults to false with unchanged behavior.
> 
> When set to true, the created dirty bitmap is made persistent if supported, it
> requires support from the active image format. Otherwise an error is returned.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  blockdev.c           | 8 +++++++-
>  qapi/block-core.json | 6 +++++-
>  qmp-commands.hx      | 3 ++-
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 07cfe25..08236f2 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1997,6 +1997,7 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>      qmp_block_dirty_bitmap_add(action->node, action->name,
>                                 action->has_granularity, action->granularity,
> +                               action->has_persistent, action->persistent,
>                                 &local_err);
>  
>      if (!local_err) {
> @@ -2640,10 +2641,12 @@ out:
>  
>  void qmp_block_dirty_bitmap_add(const char *node, const char *name,
>                                  bool has_granularity, uint32_t granularity,
> +                                bool has_persistent, bool persistent,
>                                  Error **errp)
>  {
>      AioContext *aio_context;
>      BlockDriverState *bs;
> +    BdrvDirtyBitmap *bitmap;
>  
>      if (!name || name[0] == '\0') {
>          error_setg(errp, "Bitmap name cannot be empty");
> @@ -2669,7 +2672,10 @@ void qmp_block_dirty_bitmap_add(const char *node, const char *name,
>          granularity = bdrv_get_default_bitmap_granularity(bs);
>      }
>  
> -    bdrv_create_dirty_bitmap(bs, granularity, name, errp);
> +    bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
> +    if (bitmap && has_persistent && persistent) {
> +        bdrv_dirty_bitmap_set_persistent(bs, bitmap, true, false, errp);
> +    }
>  
>   out:
>      aio_context_release(aio_context);
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 30c2e5f..0ac107c 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1162,10 +1162,14 @@
>  # @granularity: #optional the bitmap granularity, default is 64k for
>  #               block-dirty-bitmap-add
>  #
> +# @persistent: #optinal whether to make the bitmap persistent, default is false.
> +#              (Since 2.6)
> +#

optional

>  # Since 2.4
>  ##
>  { 'struct': 'BlockDirtyBitmapAdd',
> -  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32' } }
> +  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32',
> +            '*persistent': 'bool' } }
>  
>  ##
>  # @block-dirty-bitmap-add
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index db072a6..bd4428e 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1373,7 +1373,7 @@ EQMP
>  
>      {
>          .name       = "block-dirty-bitmap-add",
> -        .args_type  = "node:B,name:s,granularity:i?",
> +        .args_type  = "node:B,name:s,granularity:i?,persistent:b?",
>          .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_add,
>      },
>  
> @@ -1390,6 +1390,7 @@ Arguments:
>  - "node": device/node on which to create dirty bitmap (json-string)
>  - "name": name of the new dirty bitmap (json-string)
>  - "granularity": granularity to track writes with (int, optional)
> +- "persistent": whether the bitmap is persistent (bool, optional, default to no)
>  
>  Example:
>  
>

With typo fixed:

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent Fam Zheng
@ 2016-02-09 22:49   ` John Snow
  0 siblings, 0 replies; 42+ messages in thread
From: John Snow @ 2016-02-09 22:49 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, mreitz, vsementsov,
	Stefan Hajnoczi



On 01/26/2016 05:38 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  blockdev.c           | 20 ++++++++++++++++++++
>  qapi/block-core.json | 22 ++++++++++++++++++++++
>  qmp-commands.hx      | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 73 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 08236f2..a9d6617 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2699,6 +2699,9 @@ void qmp_block_dirty_bitmap_remove(const char *node, const char *name,
>                     name);
>          goto out;
>      }
> +    if (bdrv_dirty_bitmap_set_persistent(bs, bitmap, false, false, errp)) {
> +        goto out;
> +    }

Does this belong in Patch #6?

>      bdrv_dirty_bitmap_make_anon(bitmap);
>      bdrv_release_dirty_bitmap(bs, bitmap);
>  
> @@ -2740,6 +2743,23 @@ void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
>      aio_context_release(aio_context);
>  }
>  
> +void qmp_block_dirty_bitmap_set_persistent(const char *node, const char *name,
> +                                           bool persistent, Error **errp)
> +{
> +    AioContext *aio_context;
> +    BdrvDirtyBitmap *bitmap;
> +    BlockDriverState *bs;
> +
> +    bitmap = block_dirty_bitmap_lookup(node, name, &bs, &aio_context, errp);
> +    if (!bitmap || !bs) {
> +        return;
> +    }
> +
> +    bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent, false, errp);
> +
> +    aio_context_release(aio_context);
> +}
> +
>  void hmp_drive_del(Monitor *mon, const QDict *qdict)
>  {
>      const char *id = qdict_get_str(qdict, "id");
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0ac107c..52689ed 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1263,6 +1263,28 @@
>              '*on-target-error': 'BlockdevOnError' } }
>  
>  ##
> +# @block-dirty-bitmap-set-persistent
> +#
> +# Update a dirty bitmap's persistent state on the device
> +#
> +# @node: name of device/node which the bitmap is tracking
> +#
> +# @name: name of the dirty bitmap
> +#
> +# @persistent: #optinal whether to make the bitmap persistent, default is false
> +#

optional :)

> +# Returns: nothing on success
> +#          If @node is not a valid block device, DeviceNotFound
> +#          If @name is not found, GenericError with an explanation
> +#          If an error happens when setting the persistent state, GenericError
> +#          with an explanation
> +#
> +# Since 2.6
> +##
> +{ 'command': 'block-dirty-bitmap-set-persistent',
> +  'data': { 'node': 'str', 'name': 'str', 'persistent': 'bool' } }
> +
> +##
>  # @block_set_io_throttle:
>  #
>  # Change I/O throttle limits for a block drive.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index bd4428e..e37cf09 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1458,6 +1458,37 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "block-dirty-bitmap-set-persistent",
> +        .args_type  = "node:B,name:s,persistent:b",
> +        .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_set_persistent,
> +    },
> +
> +SQMP
> +
> +block-dirty-bitmap-set-persistent
> +---------------------------------
> +Since 2.6
> +
> +Update the persistent state of a dirty bitmap. Format driver support is
> +required.
> +

TODO: Mention supported drivers, perhaps. I know as of right now that's
none, but just a reminder.

> +Arguments:
> +
> +- "node": device/node on which to update the dirty bitmap (json-string)
> +- "name": name of the dirty bitmap to update (json-string)
> +- "persistent": the state to update to. (json-bool)
> +
> +Example:
> +
> +-> { "execute": "block-dirty-bitmap-set-persistent",
> +                "arguments": { "node": "drive0",
> +                               "name": "bitmap0",
> +                               "persistent": true } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "blockdev-snapshot-sync",
>          .args_type  = "device:s?,node-name:s?,snapshot-file:s,snapshot-node-name:s?,format:s?,mode:s?",
>          .mhandler.cmd_new = qmp_marshal_blockdev_snapshot_sync,
> 

Looks good.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
  2016-01-26 17:51   ` Eric Blake
  2016-02-08 23:51   ` John Snow
@ 2016-02-17 11:48   ` Vladimir Sementsov-Ogievskiy
  2016-02-17 16:30   ` Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 42+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-02-17 11:48 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jsnow, Markus Armbruster, mreitz,
	vsementsov, Stefan Hajnoczi

On 26.01.2016 13:38, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 118 insertions(+)
>   create mode 100644 docs/specs/qbm.md
>
> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
> new file mode 100644
> index 0000000..b91910b
> --- /dev/null
> +++ b/docs/specs/qbm.md
> @@ -0,0 +1,118 @@
> +QEMU Block Bitmap (QBM)
> +=======================
> +
> +QBM is a multi-file disk format to allow storing persistent block bitmaps along
> +with the tracked data image.  A QBM image includes one json descriptor file,
> +one data image, one or more bitmap files that describe the block dirty status
> +of the data image.
> +
> +The json file describes the structure of the image. The structure of the json
> +descriptor file is:
> +
> +    QBM-JSON-FILE := { "QBM": DESC-JSON }
> +
> +    DESC-JSON := { "version": 1,
> +                   "image": IMAGE,
> +                   "BITMAPS": BITMAPS
> +                 }
> +
> +Fields in the top level json dictionary are:
> +
> +@version: An integer which must be 1.
> +@image: A dictionary in IMAGE schema, as described later. It provides the
> +        information of the data image where user data is stored. Its format is
> +        documented in the "IMAGE schema" section.

I think,  ', as described later' or entire third sentence may be dropped 
as duplication.

> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
> +          the dictionary are the names of bitmap, which must be strings, and
> +          each value is a dictionary describing the information of the bitmap,
> +          as documented below in the "BITMAP schema" section.

Agree with others, that it should be an array. It seems like it is more 
common practice.

> +
> +=== IMAGE schema ===
> +
> +An IMAGE records the information of an image (such as a data image or a backing
> +file). It has following fields:
> +
> +@file: The file name string of the referenced image. If it's a relative path,
> +       the file should be found relative to the descriptor file's
> +       location.
> +@format: The format string of the file.
> +
> +=== BITMAP schema ===
> +
> +A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
> +or a block allocation status bitmap). It has following mandatory fields:
> +
> +@file: The name of the bitmap file. The bitmap file is in little endian, both
> +       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
> +       corresponds to the first sectors.

What about reuse my bitmap-table approach (one indexing table, like L1), 
to save space? Or, add @format field, to allow adding this later as 
native option, without ext-hard-.

> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
> +                    This value must be a power of 2 and no less than 512.
> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
> +       supported.
> +       "dirty" indicates a block dirty bitmap; "allocation" indicates a
> +       allocation status bitmap. There must be at most one "allocation" bitmap.
> +
> +If the type of the bitmap is "allocation", an extra field "backing" is also
> +accepted:
> +
> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
> +          adding a backing file to raw image.
> +
> +
> +=== Extended fields ===
> +
> +Implementations are allowed to extend the format schema by inserting additinoal
> +members into above dictionaries, with key names that starts with either
> +an "ext-hard-" or an "ext-soft-" prefix.
> +
> +Extended fields prefixed with "ext-soft-" are optional and can be ignored by
> +parsers if they do not support it; fields starting with "ext-hard-" are
> +mandatory and cannot be ignored, a parser should not proceed parsing the image
> +if it does not support it.
> +
> +It is strongly recommended that the application names are also included in the
> +extention name string, such as "ext-hard-qemu-", if the effect or
> +interpretation of the field is local to a specific application.
> +
> +For example, QEMU can implement a "checksum" feature to make sure no files
> +referred to by the json descriptor are modified inconsistently, by adding
> +"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
> +the json text found below.
> +
> +=== QBM descriptor file example ===
> +
> +This is the content of a QBM image's json descriptor file, which contains a
> +data image (data.img), and three bitmaps, out of which the "allocation" bitmap
> +associates a backing file to this image (base.img).
> +
> +{ "QBM": {
> +    "version": 1,
> +    "image": {
> +        "file": "data.img",
> +        "format": "raw"
> +        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
> +    },
> +    "bitmaps": {
> +        "0": {
> +            "file": "bitmap0.bin",
> +            "granularity-bytes": 512,
> +            "type": "dirty"
> +        },
> +        "1": {
> +            "file": "bitmap1.bin",
> +            "granularity-bytes": 4096,
> +            "type": "dirty"
> +        },
> +        "2": {
> +            "file": "bitmap3.bin",
> +            "granularity-bytes": 4096,
> +            "type": "allocation"
> +            "backing": {
> +                "file": "base.img",
> +                "format": "raw"
> +                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
> +            },
> +        }
> +    }
> +} }
> +


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/16] qbm: Implement format driver
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 10/16] qbm: Implement format driver Fam Zheng
@ 2016-02-17 13:30   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 42+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-02-17 13:30 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jsnow, Markus Armbruster, mreitz,
	vsementsov, Stefan Hajnoczi

As I understand, the difference between our driver interfaces:

Fam:
   methods:
     bdrv_dirty_bitmap_set_persistent
   all persistent bitmaps are loaded, and they all are enabled

Me:
   methods:
     bdrv_dirty_bitmap_load              \   bitmaps loaded on demand, 
by name (by cmd line at qemu start for example). So there are may be 
disabled bitmaps in qcow2, which are not loaded.
     bdrv_dirty_bitmap_load_check   /   I'm not sure that is necessary 
feature, it should be discussed
     bdrv_dirty_bitmap_store   - called from bdrv_close, exists as 
mirrored _load method.

The other difference is sync-policy:

Fam:
    - use meta bitmaps
    - sync (aio write) after each write to disk. We have discussed it a 
lot, and I can't see real application of it (will aio requests be lost 
on qemu crash?). On the other hand, I'm afraid that performance will 
suffer.. However, now it may be tested easily.

Me:
    - no meta bitmaps, so I have to save whole bitmaps on close...
    - just save bitmaps on close.

On 26.01.2016 13:38, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   block/Makefile.objs |    1 +
>   block/qbm.c         | 1315 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 1316 insertions(+)
>   create mode 100644 block/qbm.c
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index cdd8655..1111ba7 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -5,6 +5,7 @@ block-obj-y += qed-check.o
>   block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
>   block-obj-y += quorum.o
>   block-obj-y += parallels.o blkdebug.o blkverify.o
> +block-obj-y += qbm.o
>   block-obj-y += block-backend.o snapshot.o qapi.o
>   block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
>   block-obj-$(CONFIG_POSIX) += raw-posix.o
> diff --git a/block/qbm.c b/block/qbm.c
> new file mode 100644
> index 0000000..91e129f
> --- /dev/null
> +++ b/block/qbm.c
> @@ -0,0 +1,1315 @@
> +/*
> + * Block driver for the QBM format
> + *
> + * Copyright (c) 2016 Red Hat Inc.
> + *
> + * Authors:
> + *     Fam Zheng <famz@redhat.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qemu/error-report.h"
> +#include "qemu/module.h"
> +#include "migration/migration.h"
> +#include "qapi/qmp/qint.h"
> +#include "qapi/qmp/qjson.h"
> +
> +#define QBM_BUF_SIZE_MAX (32 << 20)
> +
> +typedef enum QBMBitmapType {
> +    QBM_TYPE_DIRTY,
> +    QBM_TYPE_ALLOC,
> +} QBMBitmapType;
> +
> +typedef struct QBMBitmap {
> +    BdrvDirtyBitmap *bitmap;
> +    BdrvChild *file;
> +    char *name;
> +    QBMBitmapType type;
> +} QBMBitmap;
> +
> +typedef struct BDRVQBMState {
> +    BdrvChild *image;
> +    BdrvDirtyBitmap *alloc_bitmap;
> +    QDict *desc;
> +    QDict *backing_dict;
> +    QBMBitmap *bitmaps;
> +    int num_bitmaps;
> +} BDRVQBMState;
> +
> +static const char *qbm_token_consume(const char *p, const char *token)
> +{
> +    size_t len = strlen(token);
> +
> +    if (!p) {
> +        return NULL;
> +    }
> +    while (*p && (*p == ' ' ||
> +                  *p == '\t' ||
> +                  *p == '\n' ||
> +                  *p == '\r')) {
> +        p++;
> +    }
> +    if (strncmp(p, token, len)) {
> +        return p + len;
> +    }
> +    return NULL;
> +}
> +
> +static int qbm_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const char *p;
> +    p = strstr((const char *)buf, "\"QBM\"");
> +    if (!p) {
> +        p = strstr((const char *)buf, "'QBM'");
> +    }
> +    if (!p) {
> +        return 0;
> +    }
> +    p = qbm_token_consume(p, ":");
> +    p = qbm_token_consume(p, "{");
> +    if (p && *p) {
> +        return 100;
> +    }
> +    return 0;
> +}
> +
> +static void qbm_load_bitmap(BlockDriverState *bs, QBMBitmap *bm, Error **errp)
> +{
> +    int r;
> +    BDRVQBMState *s = bs->opaque;
> +    int64_t bitmap_file_size;
> +    int64_t bitmap_size;
> +    uint8_t *buf = NULL;
> +    BlockDriverState *file = bm->file->bs;
> +    int64_t image_size = bdrv_getlength(s->image->bs);
> +
> +    if (image_size < 0) {
> +        error_setg(errp, "Cannot get image size: %s", s->image->bs->filename);
> +        return;
> +    }
> +    bitmap_size = bdrv_dirty_bitmap_serialization_size(bm->bitmap, 0,
> +                        bdrv_dirty_bitmap_size(bm->bitmap));
> +    if (bitmap_size > QBM_BUF_SIZE_MAX) {
> +        error_setg(errp, "Bitmap too big");
> +        return;
> +    }
> +    bitmap_file_size = bdrv_getlength(file);
> +    if (bitmap_file_size < bitmap_size) {
> +        error_setg(errp,
> +                   "Bitmap \"%s\" file too small "
> +                   "(expecting at least %ld bytes but got %ld bytes): %s",
> +                   bm->name, bitmap_size, bitmap_file_size, file->filename);
> +        goto out;
> +    }
> +    buf = qemu_blockalign(file, bitmap_size);
> +    r = bdrv_pread(file, 0, buf, bitmap_size);
> +    if (r < 0) {
> +        error_setg(errp, "Failed to read bitmap file \"%s\"",
> +                   file->filename);
> +        goto out;
> +    }
> +    bdrv_dirty_bitmap_deserialize_part(bm->bitmap, buf, 0, bs->total_sectors,
> +                                       true);

I now think, that to reduce copying, it is better to implement 
hbitmap_{load,store}, wait for my v4 for qcow2 bitmaps

> +
> +out:
> +    g_free(buf);
> +}
> +
> +static int qbm_reopen_prepare(BDRVReopenState *state,
> +                              BlockReopenQueue *queue, Error **errp)
> +{
> +    return 0;
> +}
> +
> +static void qbm_get_fullname(BlockDriverState *bs, char *dest, size_t sz,
> +                             const char *filename)
> +{
> +    const char *base, *p;
> +
> +    base = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
> +
> +    if (strstart(base, "json:", NULL)) {
> +        /* There is not much we can do with a json: file name, try bs->file and
> +         * cross our fingers. */
> +        if (bs->file) {
> +            qbm_get_fullname(bs->file->bs, dest, sz, filename);
> +        } else {
> +            pstrcpy(dest, sz, filename);
> +        }
> +        return;
> +    }
> +
> +    p = strrchr(base, '/');
> +
> +    assert(sz > 0);
> +    if (path_has_protocol(filename) || path_is_absolute(filename)) {
> +        pstrcpy(dest, sz, filename);
> +        return;
> +    }
> +
> +    if (p) {
> +        pstrcpy(dest, MIN(sz, p - base + 2), base);
> +    } else {
> +        dest[0] = '\0';
> +    }
> +    pstrcat(dest, sz, filename);
> +}
> +
> +static BdrvChild *qbm_open_image(BlockDriverState *bs,
> +                                 QDict *image, QDict *options,
> +                                 Error **errp)
> +{
> +    BdrvChild *child;
> +    const char *filename = qdict_get_try_str(image, "file");
> +    const char *fmt = qdict_get_try_str(image, "format");
> +    const char *checksum = qdict_get_try_str(image, "checksum");
> +    char fullname[PATH_MAX];
> +
> +    if (!filename) {
> +        error_setg(errp, "Image missing 'file' field");
> +        return NULL;
> +    }
> +    if (!fmt) {
> +        error_setg(errp, "Image missing 'format' field");
> +        return NULL;
> +    }
> +    qbm_get_fullname(bs, fullname, sizeof(fullname), filename);
> +    qdict_put(options, "image.driver", qstring_from_str(fmt));
> +    child = bdrv_open_child(fullname, options, "image", bs, &child_file, false,
> +                            errp);
> +    if (!child) {
> +        goto out;
> +    }
> +    if (checksum) {
> +        /* TODO: compare checksum when we support this */
> +        error_setg(errp, "Checksum not supported");
> +    }
> +out:
> +    return child;
> +}
> +
> +/* Open and load the persistent bitmap and return the created QBMBitmap object.
> + * If reuse_bitmap is not NULL, we skip bdrv_create_dirty_bitmap and reuse it.
> + **/
> +static QBMBitmap *qbm_open_bitmap(BlockDriverState *bs,
> +                                  const char *name,
> +                                  const char *filename, int granularity,
> +                                  QBMBitmapType type,
> +                                  BdrvDirtyBitmap *reuse_bitmap,
> +                                  Error **errp)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    QBMBitmap *bm;
> +    BdrvChild *file;
> +    BdrvDirtyBitmap *bdrv_bitmap;
> +    char *key;
> +    QDict *options;
> +    char fullname[PATH_MAX];
> +    Error *local_err = NULL;
> +
> +    qbm_get_fullname(bs, fullname, sizeof(fullname), filename);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +    s->bitmaps = g_realloc_n(s->bitmaps, s->num_bitmaps + 1,
> +                             sizeof(QBMBitmap));
> +
> +    /* Create options for the bitmap child BDS */
> +    options = qdict_new();
> +    key = g_strdup_printf("bitmap-%s.driver", name);
> +    qdict_put(options, key, qstring_from_str("raw"));
> +    g_free(key);
> +
> +    /* Open the child as plain "file" */
> +    key = g_strdup_printf("bitmap-%s", name);
> +    file = bdrv_open_child(fullname, options, key, bs, &child_file, false,
> +                           errp);
> +    g_free(key);
> +    QDECREF(options);
> +    if (!file) {
> +        return NULL;
> +    }
> +
> +    if (reuse_bitmap) {
> +        bdrv_bitmap = reuse_bitmap;
> +    } else {
> +        bdrv_bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
> +        if (!bdrv_bitmap) {
> +            bdrv_unref_child(bs, file);
> +            return NULL;
> +        }
> +        bdrv_dirty_bitmap_set_persistent(bs, bdrv_bitmap, true, true, NULL);
> +    }
> +    bdrv_create_meta_dirty_bitmap(bdrv_bitmap, BDRV_SECTOR_SIZE);
> +
> +    bm = &s->bitmaps[s->num_bitmaps++];
> +    bm->file = file;
> +    bm->name = g_strdup(name);
> +    bm->type = type;
> +    bm->bitmap = bdrv_bitmap;
> +    if (type == QBM_TYPE_ALLOC) {
> +        assert(!s->alloc_bitmap);
> +        s->alloc_bitmap = bdrv_bitmap;
> +        /* Align the request to granularity so the block layer will take care
> +         * of RMW for partial writes. */
> +        bs->request_alignment = granularity;
> +    }
> +    return bm;
> +}
> +
> +typedef struct QBMIterState {
> +    QDict *options;
> +    BlockDriverState *bs;
> +    Error *err;
> +    bool has_backing;
> +} QBMIterState;
> +
> +static void qbm_bitmap_iter(const char *key, QObject *obj, void *opaque)
> +{
> +    QDict *dict;
> +    const char *filename, *typename;
> +    QBMBitmapType type;
> +    int granularity;
> +    QBMIterState *state = opaque;
> +    BDRVQBMState *s = state->bs->opaque;
> +    QBMBitmap *bm;
> +
> +    if (state->err) {
> +        return;
> +    }
> +    dict = qobject_to_qdict(obj);
> +    if (!dict) {
> +        error_setg(&state->err, "'%s' is not a dicionary", key);
> +        return;
> +    }
> +    filename = qdict_get_try_str(dict, "file");
> +    if (!filename) {
> +        error_setg(&state->err, "\"file\" is missing in bitmap \"%s\"", key);
> +        return;
> +    }
> +    typename = qdict_get_try_str(dict, "type");
> +    if (!typename) {
> +        error_setg(&state->err, "\"value\" is missing in bitmap \"%s\"", key);
> +        return;
> +    } else if (!strcmp(typename, "dirty")) {
> +        type = QBM_TYPE_DIRTY;
> +    } else if (!strcmp(typename, "allocation")) {
> +        QDict *backing_dict = qdict_get_qdict(dict, "backing");
> +        type = QBM_TYPE_ALLOC;
> +        if (backing_dict) {
> +            if (state->has_backing) {
> +                error_setg(&state->err, "Multiple backing is not supported");
> +                return;
> +            }
> +            state->has_backing = true;
> +            pstrcpy(state->bs->backing_file, PATH_MAX,
> +                    qdict_get_try_str(backing_dict, "file"));
> +            if (qdict_haskey(backing_dict, "format")) {
> +                pstrcpy(state->bs->backing_format,
> +                        sizeof(state->bs->backing_format),
> +                        qdict_get_try_str(backing_dict, "format"));
> +                }
> +            s->backing_dict = backing_dict;
> +            if (!strlen(state->bs->backing_file)) {
> +                error_setg(&state->err, "Backing file name not specified");
> +                return;
> +            }
> +        }
> +    } else {
> +        error_setg(&state->err, "\"value\" is missing in bitmap \"%s\"", key);
> +        return;
> +    }
> +    granularity = qdict_get_try_int(dict, "granularity-bytes", -1);
> +    if (granularity == -1) {
> +        error_setg(&state->err, "\"granularity\" is missing in bitmap \"%s\"",
> +                   key);
> +        return;
> +    } else if (granularity & (granularity - 1)) {
> +        error_setg(&state->err, "\"granularity\" must be power of two");
> +        return;
> +    } else if (granularity < 512) {
> +        error_setg(&state->err, "\"granularity\" too small");
> +        return;
> +    }
> +
> +    bm = qbm_open_bitmap(state->bs, key, filename, granularity,
> +                         type, NULL, &state->err);
> +    if (!bm) {
> +        return;
> +    }
> +    qbm_load_bitmap(state->bs, bm, &state->err);
> +}
> +
> +static void qbm_release_bitmap(BlockDriverState *bs, QBMBitmap *bm)
> +{
> +    bdrv_release_meta_dirty_bitmap(bm->bitmap);
> +    bdrv_release_dirty_bitmap(bs, bm->bitmap);
> +    bdrv_unref_child(bs, bm->file);
> +}
> +
> +static void qbm_release_bitmaps(BlockDriverState *bs)
> +{
> +    int i;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    for (i = 0; i < s->num_bitmaps; i++) {
> +        QBMBitmap *bm = &s->bitmaps[i];
> +        bdrv_flush(bm->file->bs);
> +        qbm_release_bitmap(bs, bm);
> +        g_free(bm->name);
> +    }
> +}
> +
> +static int qbm_open_bitmaps(BlockDriverState *bs, QDict *bitmaps,
> +                            QDict *options, Error **errp)
> +{
> +    QBMIterState state = (QBMIterState) {
> +        .bs = bs,
> +        .options = options,
> +    };
> +    qdict_iter(bitmaps, qbm_bitmap_iter, &state);
> +    if (state.err) {
> +        qbm_release_bitmaps(bs);
> +        error_propagate(errp, state.err);
> +        return -EINVAL;
> +    }
> +    return 0;
> +}
> +
> +static int qbm_open(BlockDriverState *bs, QDict *options, int flags,
> +                    Error **errp)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    int ret;
> +    int64_t len;
> +    char *desc;
> +    QDict *dict, *image_dict, *bitmaps;
> +
> +    len = bdrv_getlength(bs->file->bs);
> +    if (len > QBM_BUF_SIZE_MAX) {
> +        error_setg(errp, "QBM description file too big.");
> +        return -ENOMEM;
> +    } else if (len < 0) {
> +        error_setg(errp, "Failed to get descriptor file size");
> +        return len;
> +    } else if (!len) {
> +        error_setg(errp, "Empty file");
> +        return -EINVAL;
> +    }
> +
> +    desc = qemu_blockalign(bs->file->bs, len);
> +    ret = bdrv_pread(bs->file->bs, 0, desc, len);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    dict = qobject_to_qdict(qobject_from_json(desc));
> +    if (!dict || !qdict_haskey(dict, "QBM")) {
> +        error_setg(errp, "Failed to parse json from file");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +    s->desc = qdict_get_qdict(dict, "QBM");
> +    if (!s->desc) {
> +        error_setg(errp, "Json doesn't have key \"QBM\"");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +    if (qdict_get_try_int(s->desc, "version", -1) != 1) {
> +        error_setg(errp, "Invalid version of json file");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +    if (!qdict_haskey(s->desc, "image")) {
> +        error_setg(errp, "Key \"image\" not found in json file");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +    image_dict = qdict_get_qdict(s->desc, "image");
> +    if (!image_dict) {
> +        error_setg(errp, "\"image\" information invalid");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    s->image = qbm_open_image(bs, image_dict, options, errp);
> +    if (!s->image) {
> +        ret = -EIO;
> +        goto out;
> +    }
> +    bs->total_sectors = bdrv_nb_sectors(s->image->bs);
> +    if (bs->total_sectors < 0) {
> +        error_setg(errp, "Failed to get image size");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    bitmaps = qdict_get_qdict(s->desc, "bitmaps");
> +    if (!bitmaps) {
> +        error_setg(errp, "\"bitmaps\" not found");
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ret = qbm_open_bitmaps(bs, bitmaps, options, errp);
> +
> +out:
> +    g_free(desc);
> +    return ret;
> +}
> +
> +
> +static void qbm_refresh_limits(BlockDriverState *bs, Error **errp)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    Error *local_err = NULL;
> +
> +    bdrv_refresh_limits(s->image->bs, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +    bs->bl.min_mem_alignment = s->image->bs->bl.min_mem_alignment;
> +    bs->bl.opt_mem_alignment = s->image->bs->bl.opt_mem_alignment;
> +    bs->bl.write_zeroes_alignment = bdrv_get_cluster_size(bs);
> +}
> +
> +static int64_t coroutine_fn qbm_co_get_block_status(BlockDriverState *bs,
> +                                                    int64_t sector_num,
> +                                                    int nb_sectors,
> +                                                    int *pnum,
> +                                                    BlockDriverState **file)
> +{
> +    bool alloc = true;
> +    int64_t next;
> +    int cluster_sectors;
> +    BDRVQBMState *s = bs->opaque;
> +    int64_t ret = BDRV_BLOCK_OFFSET_VALID;
> +
> +    if (!s->alloc_bitmap) {
> +        return bdrv_get_block_status(s->image->bs, sector_num, nb_sectors,
> +                                     pnum, file);
> +    }
> +
> +    ret |= BDRV_BLOCK_OFFSET_MASK & (sector_num << BDRV_SECTOR_BITS);
> +    next = sector_num;
> +    cluster_sectors = bdrv_dirty_bitmap_granularity(s->alloc_bitmap)
> +                            >> BDRV_SECTOR_BITS;
> +    while (next < sector_num + nb_sectors) {
> +        if (next == sector_num) {
> +            alloc = bdrv_get_dirty(bs, s->alloc_bitmap, next);
> +        } else if (bdrv_get_dirty(bs, s->alloc_bitmap, next) != alloc) {
> +            break;
> +        }
> +        next += cluster_sectors - next % cluster_sectors;
> +    }
> +    *pnum = MIN(next - sector_num, nb_sectors);
> +
> +    ret |= alloc ? BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED : 0;
> +    *file = alloc ? s->image->bs : NULL;
> +
> +    return ret;
> +}
> +
> +static int qbm_save_desc(BlockDriverState *desc_file, QDict *desc)
> +{
> +    int ret;
> +    const char *str;
> +    size_t len;
> +    QString *json_str = NULL;
> +    QDict *td;
> +
> +    ret = bdrv_truncate(desc_file, 0);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    td = qdict_new();
> +    /* Grab an extra reference so it doesn't get freed with td */
> +    QINCREF(desc);
> +    qdict_put(td, "QBM", desc);
> +
> +    json_str = qobject_to_json_pretty(QOBJECT(td));
> +    str = qstring_get_str(json_str);
> +    len = strlen(str);
> +    ret = bdrv_pwrite(desc_file, 0, str, len);
> +    /* End the json file with a new line, doesn't hurt if it fails. */
> +    bdrv_pwrite(desc_file, len, "\n", 1);
> +    /* bdrv_pwrite write padding zeros to align to sector, we don't need that
> +     * for a text file */
> +    bdrv_truncate(desc_file, len + 1);
> +    QDECREF(json_str);
> +    QDECREF(td);
> +    return ret == len ? 0 : -EIO;
> +}
> +
> +static coroutine_fn int qbm_co_readv(BlockDriverState *bs, int64_t sector_num,
> +                                     int nb_sectors, QEMUIOVector *qiov)
> +{
> +    QEMUIOVector local_qiov;
> +    BDRVQBMState *s = bs->opaque;
> +    BdrvDirtyBitmapIter *iter;
> +    int done_sectors = 0;
> +    int ret;
> +    int64_t next_allocated;
> +    int64_t cur_sector = sector_num;
> +    int granularity_sectors;
> +
> +    if (!s->alloc_bitmap) {
> +        return bdrv_co_readv(s->image->bs, sector_num, nb_sectors, qiov);
> +    }
> +    granularity_sectors = bdrv_dirty_bitmap_granularity(s->alloc_bitmap)
> +                            >> BDRV_SECTOR_BITS;
> +    iter = bdrv_dirty_iter_new(s->alloc_bitmap, sector_num);
> +    qemu_iovec_init(&local_qiov, qiov->niov);
> +    do {
> +        int64_t n;
> +        int64_t consective_end;
> +        next_allocated = bdrv_dirty_iter_next(iter);
> +        if (next_allocated < 0) {
> +            next_allocated = sector_num + nb_sectors;
> +        } else {
> +            next_allocated = MIN(next_allocated, sector_num + nb_sectors);
> +        }
> +        if (next_allocated > cur_sector) {
> +            /* Read [cur_sector, next_allocated) from backing */
> +            n = next_allocated - cur_sector;
> +            qemu_iovec_reset(&local_qiov);
> +            qemu_iovec_concat(&local_qiov, qiov,
> +                              done_sectors << BDRV_SECTOR_BITS,
> +                              n << BDRV_SECTOR_BITS);
> +            ret = bdrv_co_readv(bs->backing->bs, cur_sector, n, &local_qiov);
> +            if (ret) {
> +                goto out;
> +            }
> +            done_sectors += n;
> +            cur_sector += n;
> +            if (done_sectors == nb_sectors) {
> +                break;
> +            }
> +        }
> +        consective_end = next_allocated;
> +        /* Find consective allocated sectors */
> +        while (consective_end < sector_num + nb_sectors) {
> +            int64_t next = bdrv_dirty_iter_next(iter);
> +            if (next < 0 || next - consective_end > granularity_sectors) {
> +                /* No more consective sectors */
> +                consective_end += granularity_sectors
> +                                  - consective_end % granularity_sectors;
> +                break;
> +            }
> +            consective_end = next;
> +        }
> +        consective_end = MIN(consective_end, sector_num + nb_sectors);
> +        n = consective_end - cur_sector;
> +        assert(n > 0);
> +        /* Read [cur_sector, consective_end] from image */
> +        qemu_iovec_reset(&local_qiov);
> +        qemu_iovec_concat(&local_qiov, qiov,
> +                          done_sectors << BDRV_SECTOR_BITS,
> +                          n << BDRV_SECTOR_BITS);
> +        ret = bdrv_co_readv(s->image->bs, cur_sector, n, &local_qiov);
> +        if (ret) {
> +            goto out;
> +        }
> +        done_sectors += n;
> +        cur_sector += n;
> +    } while (done_sectors < nb_sectors);
> +out:
> +    qemu_iovec_destroy(&local_qiov);
> +    bdrv_dirty_iter_free(iter);
> +    return ret;
> +}
> +
> +static inline void qbm_check_alignment(BDRVQBMState *s, int64_t sector_num,
> +                                       int nb_sectors)
> +{
> +    if (s->alloc_bitmap) {
> +        int cluster_sectors = bdrv_dirty_bitmap_granularity(s->alloc_bitmap)
> +                                >> BDRV_SECTOR_BITS;
> +        assert(sector_num % cluster_sectors == 0);
> +        assert(nb_sectors % cluster_sectors == 0);
> +    }
> +}
> +
> +typedef struct {
> +    int inflight;
> +    Coroutine *co;
> +    int ret;
> +} QBMBitmapWriteTracker;
> +
> +typedef struct {
> +    QEMUIOVector qiov;
> +    uint8_t *buf;
> +    QBMBitmapWriteTracker *tracker;
> +    BlockDriverState *bs;
> +    QBMBitmap *bitmap;
> +    int64_t sector_num;
> +    int nb_sectors;
> +} QBMBitmapWriteData;
> +
> +static void qbm_write_bitmap_cb(void *opaque, int ret)
> +{
> +    QBMBitmapWriteData *data = opaque;
> +    QBMBitmapWriteTracker *tracker = data->tracker;
> +
> +    qemu_iovec_destroy(&data->qiov);
> +    qemu_vfree(data->buf);
> +    if (!ret) {
> +        bdrv_dirty_bitmap_reset_meta(data->bs,
> +                                     data->bitmap->bitmap,
> +                                     data->sector_num, data->nb_sectors);
> +    }
> +    g_free(data);
> +    tracker->ret = tracker->ret ? : ret;
> +    if (!--tracker->inflight) {
> +        qemu_coroutine_enter(tracker->co, NULL);
> +    }
> +}
> +
> +static int qbm_write_bitmap(BlockDriverState *bs, QBMBitmap *bm,
> +                            int64_t sector_num, int nb_sectors,
> +                            QBMBitmapWriteTracker *tracker)
> +{
> +    QBMBitmapWriteData *data;
> +    int64_t start, end;
> +    int64_t file_sector_num;
> +    int file_nb_sectors;
> +    size_t buf_size;
> +    /* Each bit in the bitmap tracks bdrv_dirty_bitmap_granularity(bm->bitmap)
> +     * bytes of guest data, so each sector in the bitmap tracks
> +     * (bdrv_dirty_bitmap_granularity(bm->bitmap) * BDRV_SECTOR_SIZE *
> +     * BITS_PER_BYTE) bytes of guest data, so in sector unit is: */
> +    int64_t sectors_per_bitmap_sector =
> +        BITS_PER_BYTE * bdrv_dirty_bitmap_granularity(bm->bitmap);
> +    int align = MAX(bdrv_dirty_bitmap_serialization_align(bm->bitmap),
> +                    sectors_per_bitmap_sector);
> +
> +    /* The start sector that is being marked dirty. */
> +    start = QEMU_ALIGN_DOWN(sector_num, align);
> +    /* The end sector that is being marked dirty. */
> +    end = MIN(QEMU_ALIGN_UP(sector_num + nb_sectors, align),
> +              bs->total_sectors);
> +
> +    if (!bdrv_dirty_bitmap_get_meta(bs, bm->bitmap, sector_num, nb_sectors)) {
> +        return 0;
> +    }
> +
> +    file_sector_num = start / sectors_per_bitmap_sector;
> +    buf_size = bdrv_dirty_bitmap_serialization_size(bm->bitmap, start,
> +                                                    end - start);
> +    buf_size = QEMU_ALIGN_UP(buf_size, BDRV_SECTOR_SIZE);
> +    file_nb_sectors = buf_size >> BDRV_SECTOR_BITS;
> +
> +    data = g_new(QBMBitmapWriteData, 1);
> +    data->buf = qemu_blockalign0(bm->file->bs, buf_size);
> +    bdrv_dirty_bitmap_serialize_part(bm->bitmap, data->buf, start,
> +                                     end - start);
> +    qemu_iovec_init(&data->qiov, 1);
> +    qemu_iovec_add(&data->qiov, data->buf, buf_size);
> +    data->tracker = tracker;
> +    data->sector_num = start;
> +    data->nb_sectors = end - start;
> +    data->bs = bm->file->bs;
> +    data->bitmap = bm;
> +    bdrv_aio_writev(bm->file->bs, file_sector_num, &data->qiov,
> +                    file_nb_sectors, qbm_write_bitmap_cb,
> +                    data);
> +    return -EINPROGRESS;
> +}
> +
> +static int qbm_write_bitmaps(BlockDriverState *bs, int64_t sector_num,
> +                             int nb_sectors)
> +{
> +    int i;
> +    BDRVQBMState *s = bs->opaque;
> +    QBMBitmapWriteTracker tracker = (QBMBitmapWriteTracker) {
> +        .inflight = 1, /* So that no aio completion will call
> +                          qemu_coroutine_enter before we yield. */
> +        .co = qemu_coroutine_self(),
> +    };
> +
> +    for (i = 0; i < s->num_bitmaps; i++) {
> +        int ret = qbm_write_bitmap(bs, &s->bitmaps[i],
> +                                   sector_num, nb_sectors, &tracker);
> +        if (ret == -EINPROGRESS) {
> +            tracker.inflight++;
> +        } else if (ret < 0) {
> +            tracker.ret = ret;
> +            break;
> +        }
> +    }
> +    tracker.inflight--;
> +    if (tracker.inflight) {
> +        /* At least one aio in submitted, wait. */
> +        qemu_coroutine_yield();
> +    }
> +    return tracker.ret;
> +}
> +
> +static coroutine_fn int qbm_co_writev(BlockDriverState *bs, int64_t sector_num,
> +                                      int nb_sectors, QEMUIOVector *qiov)
> +{
> +    int ret;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    qbm_check_alignment(s, sector_num, nb_sectors);
> +    ret = bdrv_co_writev(s->image->bs, sector_num, nb_sectors, qiov);
> +    if (ret) {
> +        return ret;
> +    }
> +    return qbm_write_bitmaps(bs, sector_num, nb_sectors);

So, you create aio write request for each bitmap on each write to disk. 
Isn't it damage for performance?

> +}
> +
> +static int coroutine_fn qbm_co_write_zeroes(BlockDriverState *bs,
> +                                            int64_t sector_num,
> +                                            int nb_sectors,
> +                                            BdrvRequestFlags flags)
> +{
> +    int ret;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    qbm_check_alignment(s, sector_num, nb_sectors);
> +    ret = bdrv_co_write_zeroes(s->image->bs, sector_num, nb_sectors, flags);
> +    if (ret) {
> +        return ret;
> +    }
> +    return qbm_write_bitmaps(bs, sector_num, nb_sectors);
> +}
> +
> +static coroutine_fn int qbm_co_discard(BlockDriverState *bs,
> +                                       int64_t sector_num,
> +                                       int nb_sectors)
> +{
> +    int ret;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    ret = bdrv_co_discard(s->image->bs, sector_num, nb_sectors);
> +    if (ret) {
> +        return ret;
> +    }
> +    return qbm_write_bitmaps(bs, sector_num, nb_sectors);
> +}
> +
> +static int qbm_make_empty(BlockDriverState *bs)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    BlockDriverState *image_bs = s->image->bs;
> +    int ret = 0;
> +
> +    if (image_bs->drv->bdrv_make_empty) {
> +        ret = image_bs->drv->bdrv_make_empty(s->image->bs);
> +        if (ret) {
> +            return ret;
> +        }
> +    } else if (!s->alloc_bitmap) {
> +        return -ENOTSUP;
> +    }
> +    if (s->alloc_bitmap) {
> +        int i;
> +        bdrv_clear_dirty_bitmap(s->alloc_bitmap, NULL);
> +        for (i = 0; i < s->num_bitmaps; i++) {
> +            QBMBitmap *bm = &s->bitmaps[i];
> +            if (bm->bitmap != s->alloc_bitmap) {
> +                continue;
> +            }
> +            ret = bdrv_write_zeroes(bm->file->bs, 0,
> +                                    DIV_ROUND_UP(bdrv_getlength(bm->file->bs),
> +                                                 BDRV_SECTOR_SIZE),
> +                                    BDRV_REQ_MAY_UNMAP);
> +        }
> +    }
> +    return ret;
> +}
> +
> +/* Create a file with given size, and return the relative path. */
> +static char *qbm_create_file(BlockDriverState *bs, const char *name,
> +                             const char *ext,
> +                             int64_t size, Error **errp)
> +{
> +    char *filename = NULL;
> +    Error *local_err = NULL;
> +    char fullname[PATH_MAX];
> +    char path[PATH_MAX];
> +    char prefix[PATH_MAX];
> +    char postfix[PATH_MAX];
> +
> +    filename_decompose(bs->filename, path, prefix,
> +                       postfix, PATH_MAX, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +    filename = g_strdup_printf("%s-%s.%s%s", prefix, name, ext, postfix);
> +    qbm_get_fullname(bs, fullname, sizeof(fullname), filename);
> +
> +    bdrv_img_create(fullname, "raw", NULL, NULL, NULL, size, 0,
> +                    &local_err, true);
> +    if (local_err) {
> +        g_free(filename);
> +        filename = NULL;
> +        error_propagate(errp, local_err);
> +    }
> +    return filename;
> +}
> +
> +static QDict *qbm_create_image_dict(BlockDriverState *bs,
> +                                    const char *image_name,
> +                                    const char *format,
> +                                    Error **errp)
> +{
> +    QDict *dict;
> +    char fullname[PATH_MAX];
> +
> +    qbm_get_fullname(bs, fullname, sizeof(fullname), image_name);
> +    dict = qdict_new();
> +    qdict_put(dict, "file", qstring_from_str(image_name));
> +    qdict_put(dict, "format", qstring_from_str(format ? : ""));
> +    /* TODO: Set checksum when we support it. */
> +
> +    return dict;
> +}
> +
> +static inline QDict *qbm_make_bitmap_dict(const char *filename,
> +                                          int granularity,
> +                                          QBMBitmapType type)
> +{
> +    QDict *d = qdict_new();
> +    qdict_put(d, "file", qstring_from_str(filename));
> +    qdict_put(d, "granularity-bytes", qint_from_int(granularity));
> +    switch (type) {
> +    case QBM_TYPE_DIRTY:
> +        qdict_put(d, "type", qstring_from_str("dirty"));
> +        break;
> +    case QBM_TYPE_ALLOC:
> +        qdict_put(d, "type", qstring_from_str("allocation"));
> +        break;
> +    default:
> +        abort();
> +    }
> +    return d;
> +}
> +
> +static QDict *qbm_create_dirty_bitmaps(BlockDriverState *bs,
> +                                       uint64_t image_size,
> +                                       int granularity,
> +                                       int n, Error **errp)
> +{
> +    int i;
> +    QDict *dict = qdict_new();
> +    int64_t bitmap_size = DIV_ROUND_UP(image_size, granularity * BITS_PER_BYTE);
> +
> +    for (i = 0; i < n; i++) {
> +        char *bitmap_filename;
> +        char *key = g_strdup_printf("dirty.%d", i);
> +
> +        bitmap_filename = qbm_create_file(bs, key, "bitmap", bitmap_size,
> +                                          errp);
> +        if (!bitmap_filename) {
> +            g_free(key);
> +            QDECREF(dict);
> +            dict = NULL;
> +            goto out;
> +        }
> +        qdict_put(dict, key,
> +                  qbm_make_bitmap_dict(bitmap_filename, granularity,
> +                                       QBM_TYPE_DIRTY));
> +        g_free(key);
> +    }
> +out:
> +    return dict;
> +}
> +
> +static QDict *qbm_create_allocation(BlockDriverState *bs,
> +                                    uint64_t image_size,
> +                                    int granularity,
> +                                    const char *backing_file,
> +                                    const char *format,
> +                                    Error **errp)
> +{
> +    char *bitmap_filename;
> +    QDict *ret, *backing;
> +    int64_t bitmap_size = DIV_ROUND_UP(image_size, granularity * BITS_PER_BYTE);
> +
> +    bitmap_filename = qbm_create_file(bs, "allocation", "bitmap",
> +                                      bitmap_size,
> +                                      errp);
> +    if (!bitmap_filename) {
> +        return NULL;
> +    }
> +
> +    ret = qdict_new();
> +
> +    qdict_put(ret, "file", qstring_from_str(bitmap_filename));
> +    if (format) {
> +        qdict_put(ret, "format", qstring_from_str(format));
> +    }
> +    qdict_put(ret, "type", qstring_from_str("allocation"));
> +    qdict_put(ret, "granularity-bytes", qint_from_int(granularity));
> +
> +    backing = qbm_create_image_dict(bs, backing_file, format, errp);
> +    if (!backing) {
> +        QDECREF(ret);
> +        ret = NULL;
> +        goto out;
> +    }
> +    qdict_put(ret, "backing", backing);
> +
> +out:
> +    g_free(bitmap_filename);
> +    return ret;
> +}
> +
> +static int qbm_create(const char *filename, QemuOpts *opts, Error **errp)
> +{
> +    char *backing_file;
> +    const char *image_filename;
> +    int granularity, dirty_bitmaps;
> +    int64_t image_size;
> +    int ret;
> +    QDict *dict = NULL, *bitmaps, *image;
> +    BlockDriverState *bs = NULL, *image_bs = NULL;
> +    char fullname[PATH_MAX];
> +
> +    ret = bdrv_create_file(filename, NULL, errp);
> +    if (ret) {
> +        return ret;
> +    }
> +    ret = bdrv_open(&bs, filename, NULL, NULL,
> +                    BDRV_O_RDWR | BDRV_O_PROTOCOL, errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    image_filename = qemu_opt_get_del(opts, "image");
> +    if (!image_filename) {
> +        /* Try to create one */
> +        int64_t size = qemu_opt_get_size_del(opts, "size", -1);
> +        if (size == -1) {
> +            error_setg(errp, "Invalid size specified for data image");
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        image_filename = qbm_create_file(bs, "data", "img", size, errp);
> +        if (!image_filename) {
> +            ret = -EIO;
> +            goto out;
> +        }
> +    }
> +
> +    granularity = qemu_opt_get_number(opts, "granularity", 65536);
> +    dirty_bitmaps = qemu_opt_get_number(opts, "dirty-bitmaps", 0);
> +
> +    qbm_get_fullname(bs, fullname, sizeof(fullname), image_filename);
> +    ret = bdrv_open(&image_bs, fullname, NULL, NULL, 0, errp);
> +    if (ret) {
> +        goto out;
> +    }
> +    image_size = bdrv_getlength(image_bs);
> +
> +    dict = qdict_new();
> +    bitmaps = qbm_create_dirty_bitmaps(bs, image_size, granularity,
> +                                       dirty_bitmaps, errp);
> +    image = qbm_create_image_dict(bs, image_filename,
> +                                  bdrv_get_format_name(image_bs), errp);
> +    if (!image) {
> +        goto out;
> +    }
> +
> +    qdict_put(dict, "version", qint_from_int(1));
> +    qdict_put(dict, "creator", qstring_from_str("QEMU"));
> +    qdict_put(dict, "bitmaps", bitmaps);
> +    qdict_put(dict, "image", image);
> +
> +    backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
> +    if (backing_file) {
> +        char *backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
> +        QDict *alloc = qbm_create_allocation(bs, image_size,
> +                                             granularity, backing_file,
> +                                             backing_fmt, errp);
> +        if (!alloc) {
> +            ret = -EIO;
> +            goto out;
> +        }
> +        /* Create "allocation" bitmap. */
> +        qdict_put(bitmaps, "allocation", alloc);
> +        g_free(backing_file);
> +        backing_file = NULL;
> +        g_free(backing_fmt);
> +    }
> +
> +    ret = qbm_save_desc(bs, dict);
> +
> +out:
> +    bdrv_unref(image_bs);
> +    bdrv_unref(bs);
> +    QDECREF(dict);
> +    return ret;
> +}
> +
> +static int64_t qbm_getlength(BlockDriverState *bs)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    return bdrv_getlength(s->image->bs);
> +}
> +
> +static void qbm_close(BlockDriverState *bs)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +
> +    qbm_release_bitmaps(bs);
> +    bdrv_unref(s->image->bs);
> +    g_free(s->bitmaps);
> +    QDECREF(s->desc);
> +}
> +
> +static int qbm_truncate(BlockDriverState *bs, int64_t offset)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    /* Truncate the image only, the bitmaps's sizes will be made correct when
> +     * saving. */
> +    return bdrv_truncate(s->image->bs, offset);
> +}
> +
> +static coroutine_fn int qbm_co_flush(BlockDriverState *bs)
> +{
> +    int ret;
> +    int i;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    ret = bdrv_flush(s->image->bs);
> +    for (i = 0; ret >= 0 && i < s->num_bitmaps; i++) {
> +        ret = bdrv_flush(s->bitmaps[i].file->bs);
> +    }
> +    return ret;
> +}
> +
> +static int qbm_change_backing_file(BlockDriverState *bs,
> +                                   const char *backing_file,
> +                                   const char *backing_fmt)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    if (!s->backing_dict) {
> +        return -ENOTSUP;
> +    }
> +    if (backing_file) {
> +        qdict_put(s->backing_dict, "file", qstring_from_str(backing_file));
> +        qdict_put(s->backing_dict, "format",
> +                  qstring_from_str(backing_fmt ? : ""));
> +    } else {
> +        int i;
> +        QDict *bitmaps = qdict_get_qdict(s->desc, "bitmaps");
> +
> +        assert(bitmaps);
> +        if (!qdict_haskey(bitmaps, "allocation")) {
> +            return 0;
> +        }
> +        qdict_del(bitmaps, "allocation");
> +        for (i = 0; i < s->num_bitmaps; i++) {
> +            if (s->bitmaps[i].type == QBM_TYPE_ALLOC) {
> +                qbm_release_bitmap(bs, &s->bitmaps[i]);
> +                s->bitmaps[i] = s->bitmaps[--s->num_bitmaps];
> +                break;
> +            }
> +        }
> +        s->alloc_bitmap = NULL;
> +        s->backing_dict = NULL;
> +    }
> +    return qbm_save_desc(bs->file->bs, s->desc);
> +}
> +
> +static int64_t qbm_get_allocated_file_size(BlockDriverState *bs)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    /* Take the file sizes of descriptor and bitmap files into account? */
> +    return bdrv_get_allocated_file_size(s->image->bs);
> +}
> +
> +static int qbm_has_zero_init(BlockDriverState *bs)
> +{
> +    return 1;
> +}
> +
> +static int qbm_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +
> +    bdi->unallocated_blocks_are_zero = true;
> +    bdi->can_write_zeroes_with_unmap = true;
> +    if (s->alloc_bitmap) {
> +        bdi->cluster_size = bdrv_dirty_bitmap_granularity(s->alloc_bitmap);
> +    } else {
> +        bdi->cluster_size = bdrv_get_cluster_size(s->image->bs);
> +    }
> +    return 0;
> +}
> +
> +static int qbm_check(BlockDriverState *bs, BdrvCheckResult *result,
> +                     BdrvCheckMode fix)
> +{
> +    /* TODO: checksum verification and bitmap size checks? */
> +    return 0;
> +}
> +
> +static void qbm_detach_aio_context(BlockDriverState *bs)
> +{
> +    int i;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    bdrv_detach_aio_context(s->image->bs);
> +    for (i = 0; i < s->num_bitmaps; i++) {
> +        bdrv_detach_aio_context(s->bitmaps[i].file->bs);
> +    }
> +}
> +
> +static void qbm_attach_aio_context(BlockDriverState *bs,
> +                                   AioContext *new_context)
> +{
> +    int i;
> +    BDRVQBMState *s = bs->opaque;
> +
> +    bdrv_attach_aio_context(s->image->bs, new_context);
> +    for (i = 0; i < s->num_bitmaps; i++) {
> +        bdrv_attach_aio_context(s->bitmaps[i].file->bs, new_context);
> +    }
> +}
> +
> +static int qbm_bitmap_set_persistent(BlockDriverState *bs,
> +                                     BdrvDirtyBitmap *bitmap,
> +                                     bool persistent, Error **errp)
> +{
> +    BDRVQBMState *s = bs->opaque;
> +    int ret = 0;
> +    QBMBitmap *bm;
> +    char *filename;
> +    const char *name = bdrv_dirty_bitmap_name(bitmap);
> +    int granularity = bdrv_dirty_bitmap_granularity(bitmap);
> +    QDict *bitmaps = qdict_get_qdict(s->desc, "bitmaps");
> +
> +    if (persistent) {
> +        filename = qbm_create_file(bs, name, "bin",
> +                                   bdrv_dirty_bitmap_size(bitmap), errp);
> +        if (!filename) {
> +            return -EIO;
> +        }
> +
> +        bm = qbm_open_bitmap(bs, name, filename, granularity,
> +                             QBM_TYPE_DIRTY, bitmap, errp);
> +        if (!bm) {
> +            ret = -EIO;
> +        }
> +        qdict_put(bitmaps, name, qbm_make_bitmap_dict(filename, granularity,
> +                                                      QBM_TYPE_DIRTY));
> +        g_free(filename);
> +    } else {
> +        if (!qdict_haskey(bitmaps, name)) {
> +            error_setg(errp, "No persistent bitmap with name '%s'", name);
> +            return -ENOENT;
> +        }
> +        qdict_del(bitmaps, name);
> +    }
> +    ret = qbm_save_desc(bs->file->bs, s->desc);
> +    if (ret) {
> +        error_setg(errp, "Failed to save json description to file");
> +    }
> +    return ret;
> +}
> +
> +static QemuOptsList qbm_create_opts = {
> +    .name = "qbm-create-opts",
> +    .head = QTAILQ_HEAD_INITIALIZER(qbm_create_opts.head),
> +    .desc = {
> +        {
> +            .name = BLOCK_OPT_SIZE,
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Virtual disk size"
> +        },
> +        {
> +            .name = "image",
> +            .type = QEMU_OPT_STRING,
> +            .help = "The file name of the referenced image, if not specified, "
> +                    "one will be created automatically",
> +        },
> +        {
> +            .name = BLOCK_OPT_BACKING_FILE,
> +            .type = QEMU_OPT_STRING,
> +            .help = "File name of a base image"
> +        },
> +        {
> +            .name = BLOCK_OPT_BACKING_FMT,
> +            .type = QEMU_OPT_STRING,
> +            .help = "Image format of the base image"
> +        },
> +        {
> +            .name = "granularity",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Bitmap granularity in bytes"
> +        },
> +        {
> +            .name = "dirty-bitmaps",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "The number of dirty bitmaps to create"
> +        },
> +        { /* end of list */ }
> +    }
> +};
> +
> +static BlockDriver bdrv_qbm = {
> +    .format_name                  = "qbm",
> +    .protocol_name                = "qbm",
> +    .instance_size                = sizeof(BDRVQBMState),
> +    .bdrv_probe                   = qbm_probe,
> +    .bdrv_open                    = qbm_open,
> +    .bdrv_reopen_prepare          = qbm_reopen_prepare,
> +    .bdrv_co_readv                = qbm_co_readv,
> +    .bdrv_co_writev               = qbm_co_writev,
> +    .bdrv_co_write_zeroes         = qbm_co_write_zeroes,
> +    .bdrv_co_discard              = qbm_co_discard,
> +    .bdrv_make_empty              = qbm_make_empty,
> +    .bdrv_close                   = qbm_close,
> +    .bdrv_getlength               = qbm_getlength,
> +    .bdrv_create                  = qbm_create,
> +    .bdrv_co_flush_to_disk        = qbm_co_flush,
> +    .bdrv_truncate                = qbm_truncate,
> +    .bdrv_co_get_block_status     = qbm_co_get_block_status,
> +    .bdrv_get_allocated_file_size = qbm_get_allocated_file_size,
> +    .bdrv_has_zero_init           = qbm_has_zero_init,
> +    .bdrv_refresh_limits          = qbm_refresh_limits,
> +    .bdrv_get_info                = qbm_get_info,
> +    .bdrv_check                   = qbm_check,
> +    .bdrv_detach_aio_context      = qbm_detach_aio_context,
> +    .bdrv_attach_aio_context      = qbm_attach_aio_context,
> +    .bdrv_dirty_bitmap_set_persistent
> +                                  = qbm_bitmap_set_persistent,
> +    .bdrv_change_backing_file     = qbm_change_backing_file,
> +    .supports_backing             = true,
> +    .create_opts                  = &qbm_create_opts,
> +};
> +
> +static void bdrv_qbm_init(void)
> +{
> +    bdrv_register(&bdrv_qbm);
> +}
> +
> +block_init(bdrv_qbm_init);


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
                     ` (2 preceding siblings ...)
  2016-02-17 11:48   ` Vladimir Sementsov-Ogievskiy
@ 2016-02-17 16:30   ` Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 42+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-02-17 16:30 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jsnow, Markus Armbruster, mreitz,
	vsementsov, Stefan Hajnoczi

On 26.01.2016 13:38, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 118 insertions(+)
>   create mode 100644 docs/specs/qbm.md
>
> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
> new file mode 100644
> index 0000000..b91910b
> --- /dev/null
> +++ b/docs/specs/qbm.md
> @@ -0,0 +1,118 @@
> +QEMU Block Bitmap (QBM)
> +=======================
> +
> +QBM is a multi-file disk format to allow storing persistent block bitmaps along
> +with the tracked data image.  A QBM image includes one json descriptor file,
> +one data image, one or more bitmap files that describe the block dirty status
> +of the data image.
> +
> +The json file describes the structure of the image. The structure of the json
> +descriptor file is:
> +
> +    QBM-JSON-FILE := { "QBM": DESC-JSON }
> +
> +    DESC-JSON := { "version": 1,
> +                   "image": IMAGE,
> +                   "BITMAPS": BITMAPS
> +                 }
> +
> +Fields in the top level json dictionary are:
> +
> +@version: An integer which must be 1.
> +@image: A dictionary in IMAGE schema, as described later. It provides the
> +        information of the data image where user data is stored. Its format is
> +        documented in the "IMAGE schema" section.
> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
> +          the dictionary are the names of bitmap, which must be strings, and
> +          each value is a dictionary describing the information of the bitmap,
> +          as documented below in the "BITMAP schema" section.
> +
> +=== IMAGE schema ===
> +
> +An IMAGE records the information of an image (such as a data image or a backing
> +file). It has following fields:
> +
> +@file: The file name string of the referenced image. If it's a relative path,
> +       the file should be found relative to the descriptor file's
> +       location.
> +@format: The format string of the file.
> +
> +=== BITMAP schema ===
> +
> +A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
> +or a block allocation status bitmap). It has following mandatory fields:
> +
> +@file: The name of the bitmap file. The bitmap file is in little endian, both
> +       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
> +       corresponds to the first sectors.
> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
> +                    This value must be a power of 2 and no less than 512.
> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
> +       supported.
> +       "dirty" indicates a block dirty bitmap; "allocation" indicates a
> +       allocation status bitmap. There must be at most one "allocation" bitmap.
> +
> +If the type of the bitmap is "allocation", an extra field "backing" is also
> +accepted:
> +
> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
> +          adding a backing file to raw image.
> +

all bitmaps are auto-loaded. Don't you want to add 'auto=true|false' 
parameter, like with qcow2 bitmaps.
also what about 'in_use' or something like this, to understand, that 
dirty bitmap may be inconsistent after qemu crash?

> +
> +=== Extended fields ===
> +
> +Implementations are allowed to extend the format schema by inserting additinoal
> +members into above dictionaries, with key names that starts with either
> +an "ext-hard-" or an "ext-soft-" prefix.
> +
> +Extended fields prefixed with "ext-soft-" are optional and can be ignored by
> +parsers if they do not support it; fields starting with "ext-hard-" are
> +mandatory and cannot be ignored, a parser should not proceed parsing the image
> +if it does not support it.
> +
> +It is strongly recommended that the application names are also included in the
> +extention name string, such as "ext-hard-qemu-", if the effect or
> +interpretation of the field is local to a specific application.
> +
> +For example, QEMU can implement a "checksum" feature to make sure no files
> +referred to by the json descriptor are modified inconsistently, by adding
> +"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
> +the json text found below.
> +
> +=== QBM descriptor file example ===
> +
> +This is the content of a QBM image's json descriptor file, which contains a
> +data image (data.img), and three bitmaps, out of which the "allocation" bitmap
> +associates a backing file to this image (base.img).
> +
> +{ "QBM": {
> +    "version": 1,
> +    "image": {
> +        "file": "data.img",
> +        "format": "raw"
> +        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
> +    },
> +    "bitmaps": {
> +        "0": {
> +            "file": "bitmap0.bin",
> +            "granularity-bytes": 512,
> +            "type": "dirty"
> +        },
> +        "1": {
> +            "file": "bitmap1.bin",
> +            "granularity-bytes": 4096,
> +            "type": "dirty"
> +        },
> +        "2": {
> +            "file": "bitmap3.bin",
> +            "granularity-bytes": 4096,
> +            "type": "allocation"
> +            "backing": {
> +                "file": "base.img",
> +                "format": "raw"
> +                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
> +            },
> +        }
> +    }
> +} }
> +


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
  2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
                   ` (15 preceding siblings ...)
  2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 16/16] iotests: Add persistent bitmap test case 141 Fam Zheng
@ 2016-02-22 14:24 ` Kevin Wolf
  2016-02-23  3:40   ` Fam Zheng
  2016-02-23  9:14   ` Markus Armbruster
  16 siblings, 2 replies; 42+ messages in thread
From: Kevin Wolf @ 2016-02-22 14:24 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, jsnow, qemu-devel, Markus Armbruster, vsementsov,
	Stefan Hajnoczi, mreitz

Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben:
> This series introduces a simple format to enable support of persistence of
> block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
> backup, and persistence of block dirty bitmap makes incrememtal backup possible
> across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.
> 
> When user creates a "persisted" dirty bitmap, the QBM driver will create a
> binary file and synchronize it with the existing in-memory block dirty bitmap
> (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
> bits saved on disk, which will be loaded and used to initialize the in-memory
> block dirty bitmap next time the guest is started.
> 
> The idea of the format is to reuse as much existing infrastructure as possible
> and avoid introducing complex data structures - it works with any image format,
> by gluing it together plain bitmap files with a json descriptor file. The
> advantage of this approach over extending existing formats, such as qcow2, is
> that the new feature is implemented by an orthogonal driver, in a format
> agnostic way. This way, even raw images can have their persistent dirty
> bitmaps.  (And you will notice in this series, with a little forging to the
> spec, raw images can also have backing files through a QBM overlay!)
> 
> Rather than superseding it, this intends to be coexistent in parallel with the
> qcow2 bitmap extension that Vladimir is working on.  The block driver interface
> changes in this series also try to be generic and compatible for both drivers.

So as I already told Fam last week, before we discuss any technical
details here, we first need to discuss whether this is even the right
thing to do. Currently I'm doubtful, as this is another attempt to
introduce a new native image format in qemu.

Let's recap the image formats and what we tell users about them today:

* qcow2: This is the default choice for disk images. It gives you access
  to all of the features in qemu at a good performance. If it doesn't
  perform well in your case, we'll fix it.

* raw: Use this when you need absolute performance and don't need any
  features from an image format, so you want to get any complexity just
  out of the way and pass requests as directly as possible from the
  guest device to the host kernel.

* Anything else: Only use them to convert into raw or qcow2.

Now using bitmaps is clearly on the "features" side, which suggests that
qcow2 is the format of choice for this. If you want to introduce a new
format, you need to justify it with evidence that...

1. there is a relevant use case that qcow2 doesn't cover
2. qcow2 can't be fixed/enhanced to cover the use case

The one thing that people have claimed in the past that qcow2 can't
provide is enough performance. This is where QED tried to come in and
promised a compromise between performance (then a bit faster than qcow2)
and features (almost none, but supports backing files). We all know that
it was a failure because you had to sacrifice features and still the
idea that qcow2 couldn't be fixed was wrong, so today we have a QED
driver that is much slower than qcow2 despite having less features.

Now for QBM. First, let's have a look at the image format that it can be
used with. qcow2 doesn't need it if we continue with Vladimir's
extension. Other non-raw formats are only supposed to be used for
conversion. The only thing that's really left is raw. Now adding a
feature only for raw, as a compromise between features and performance,
looks an awful lot like what QED tried. We don't want to go there.

Even if we wanted to support persistent dirty bitmaps with raw images
(which has to be discussed based on use cases), it's still questionable
whether we need a new image format with JSON descriptor files instead of
just raw bitmaps that can be added with a QMP command.


tl;dr: Where is the justification for a new image format? You need a
good one.

Kevin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
  2016-02-22 14:24 ` [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Kevin Wolf
@ 2016-02-23  3:40   ` Fam Zheng
  2016-02-23 17:43     ` Kevin Wolf
  2016-02-23  9:14   ` Markus Armbruster
  1 sibling, 1 reply; 42+ messages in thread
From: Fam Zheng @ 2016-02-23  3:40 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Alberto Garcia, qemu-block, jsnow, Peter Lieven, qemu-devel,
	Markus Armbruster, vsementsov, Stefan Hajnoczi, Denis V. Lunev,
	pbonzini, mreitz

(I'm Cc'ing a few more people here just in case they have different visions
about raw image use cases.)

On Mon, 02/22 15:24, Kevin Wolf wrote:
> Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben:
> > This series introduces a simple format to enable support of persistence of
> > block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
> > backup, and persistence of block dirty bitmap makes incrememtal backup possible
> > across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.
> > 
> > When user creates a "persisted" dirty bitmap, the QBM driver will create a
> > binary file and synchronize it with the existing in-memory block dirty bitmap
> > (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
> > bits saved on disk, which will be loaded and used to initialize the in-memory
> > block dirty bitmap next time the guest is started.
> > 
> > The idea of the format is to reuse as much existing infrastructure as possible
> > and avoid introducing complex data structures - it works with any image format,
> > by gluing it together plain bitmap files with a json descriptor file. The
> > advantage of this approach over extending existing formats, such as qcow2, is
> > that the new feature is implemented by an orthogonal driver, in a format
> > agnostic way. This way, even raw images can have their persistent dirty
> > bitmaps.  (And you will notice in this series, with a little forging to the
> > spec, raw images can also have backing files through a QBM overlay!)
> > 
> > Rather than superseding it, this intends to be coexistent in parallel with the
> > qcow2 bitmap extension that Vladimir is working on.  The block driver interface
> > changes in this series also try to be generic and compatible for both drivers.
> 
> So as I already told Fam last week, before we discuss any technical
> details here, we first need to discuss whether this is even the right
> thing to do. Currently I'm doubtful, as this is another attempt to
> introduce a new native image format in qemu.
> 
> Let's recap the image formats and what we tell users about them today:
> 
> * qcow2: This is the default choice for disk images. It gives you access
>   to all of the features in qemu at a good performance. If it doesn't
>   perform well in your case, we'll fix it.
> 
> * raw: Use this when you need absolute performance and don't need any
>   features from an image format, so you want to get any complexity just
>   out of the way and pass requests as directly as possible from the
>   guest device to the host kernel.
> 
> * Anything else: Only use them to convert into raw or qcow2.
> 
> Now using bitmaps is clearly on the "features" side, which suggests that
> qcow2 is the format of choice for this. If you want to introduce a new
> format, you need to justify it with evidence that...
> 
> 1. there is a relevant use case that qcow2 doesn't cover
> 2. qcow2 can't be fixed/enhanced to cover the use case
> 
> The one thing that people have claimed in the past that qcow2 can't
> provide is enough performance. This is where QED tried to come in and
> promised a compromise between performance (then a bit faster than qcow2)
> and features (almost none, but supports backing files). We all know that
> it was a failure because you had to sacrifice features and still the
> idea that qcow2 couldn't be fixed was wrong, so today we have a QED
> driver that is much slower than qcow2 despite having less features.
> 
> Now for QBM. First, let's have a look at the image format that it can be
> used with. qcow2 doesn't need it if we continue with Vladimir's
> extension. Other non-raw formats are only supposed to be used for
> conversion. The only thing that's really left is raw.

Yes, I agree with this point.

> Now adding a
> feature only for raw, as a compromise between features and performance,
> looks an awful lot like what QED tried. We don't want to go there.
> 
> Even if we wanted to support persistent dirty bitmaps with raw images
> (which has to be discussed based on use cases), it's still questionable
> whether we need a new image format with JSON descriptor files instead of
> just raw bitmaps that can be added with a QMP command.
> 

I don't think QMP interface alone is enough, in persistent backup use case,
when starting a guest, command line interface is more appropriate to continue
dirty trackings that were enabled during shutdown.

I'd justify in two parts, one is "why" and the other is "how".

So to answer why.  The reason I worked on QBM is because I feel it wrong to
leaving raw behind. Ceph and LVM users use raw format. You could technically
use qcow2 with ceph but that is discouraged[1] or even refused by openstack[2].
We've seen qcow2 on top of LVs but that is not the dominance.

The scope of "features" for which we tell users they have to use qcow2 should
those that are format specific, not "block features" in general.  Backing file,
internal/external snapshot, thin provisioning, compression and encryption are
all great examples of format features, whereas things including throttling,
statistics, migration, mirroring and backing up are IMHO not.  Actually we
already support snapshotting a raw image, with an qcow2 overlay.  We've even
implemented non-persistent incremental backup for raw today, through
drive-backup.  If we will decide qcow2 is the only possible format that can do
persistent backup, I'm not really a huge fan of it.

Then "how"?

Actually, I thought we could do it in a way similar to quorum. The way quorum
driver works is by specifying tediously long options. A snippet from
qemu-iotests to build a quorum driver with 3 children is like this:

    quorum="driver=raw,file.driver=quorum,file.vote-threshold=2"
    quorum="$quorum,file.children.0.file.filename=$TEST_DIR/1.raw"
    quorum="$quorum,file.children.1.file.filename=$TEST_DIR/2.raw"
    quorum="$quorum,file.children.2.file.filename=$TEST_DIR/3.raw"
    quorum="$quorum,file.children.0.driver=raw"
    quorum="$quorum,file.children.1.driver=raw"
    quorum="$quorum,file.children.2.driver=raw"

Though very repetitive, it is also very simple: all children are almost
symmemtrical (identical in user data). The only thing for user/management tool
to make sure is the images have the same data.

Unfortunately the logic is more complicated in an persistent incremental backup
scenario. Manual users will have to specify bitmap file names and the
granularities which they may have no clue anymore two weeks after they created
the bitmap, and can get wrong.  Management seems a must in this case, but the
interface we provide to them still feels way too low level. Anyway, I do think
we can consider a "banana" (dummy name) driver for persistent bitmap management
which is structured like quorum:

    banana="driver=raw,file.driver=banana,file.mode=synchronous"
    banana="$banana,file.image.file.filename=$TEST_IMG"
    banana="$banana,file.bitmaps.0.file.filename=$TEST_DIR/bm0.raw"
    banana="$banana,file.bitmaps.0.granularity=65536"
    banana="$banana,file.bitmaps.0.name=bm0"
    banana="$banana,file.bitmaps.1.file.filename=$TEST_DIR/bm1.raw"
    banana="$banana,file.bitmaps.1.granularity=1048576"
    banana="$banana,file.bitmaps.1.name=bm1"
    ...

But we're merely inlining the information from QBM JSON format into the command
line. This is IMO only one step of differences in between. Compared to the
command line building blocks, QBM has:

Pros:
  - Better data integrity as all bitmaps are tied together by the descriptor
    file.
  - Ease of use because of qemu-img supportability.
  - Management or user can rely on the QBM file to store the bitmap information
    and query easily.
  - If any information need to be transactionally stored together with the
    bitmap data, it is easier to do as file writes, compared to QMP event and
    command communication between whoever is storing it. For example, think
    about implementing the in_use feature like qcow2 bitmaps.

Cons:
  - Maintenance burden of another format specification.
  - Redundancy with qcow2 internal bitmap format (unavoidable if we agree on
    raw).

Thus the QBM format makes sense to me.

Fam

[1]: http://docs.ceph.com/docs/hammer/rbd/qemu-rbd/
[2]: http://docs.ceph.com/docs/master/rbd/rbd-openstack/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification
  2016-01-26 17:51   ` Eric Blake
  2016-02-09  0:05     ` John Snow
@ 2016-02-23  8:35     ` Markus Armbruster
  1 sibling, 0 replies; 42+ messages in thread
From: Markus Armbruster @ 2016-02-23  8:35 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, mreitz,
	vsementsov, Stefan Hajnoczi, jsnow

Eric Blake <eblake@redhat.com> writes:

> On 01/26/2016 03:38 AM, Fam Zheng wrote:
>> Signed-off-by: Fam Zheng <famz@redhat.com>
>> ---
>>  docs/specs/qbm.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 118 insertions(+)
>>  create mode 100644 docs/specs/qbm.md
>> 
>> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
>> new file mode 100644
>> index 0000000..b91910b
>> --- /dev/null
>> +++ b/docs/specs/qbm.md
>> @@ -0,0 +1,118 @@
>> +QEMU Block Bitmap (QBM)
>> +=======================
>
> No explicit copyright mention means that this document is GPLv2+ by
> default.  I don't know if any 3rd-party implementation trying to use
> this spec would object to that, or if a looser license is desirable here
> (I don't personally care, but just raising the point).

IANAL, but protecting the spec with a strong copyleft shouldn't preclude
differently licensed implementations.  It would preclude including the
spec in whatever they distribute.

>> +
>> +QBM is a multi-file disk format to allow storing persistent block bitmaps along
>> +with the tracked data image.  A QBM image includes one json descriptor file,
>> +one data image, one or more bitmap files that describe the block dirty status
>
> s/one or more/and one or more/
>
> Must it have block dirty status bitmaps, or can you have a QBM image
> with just an allocation bitmap?
>
>> +of the data image.
>> +
>> +The json file describes the structure of the image. The structure of the json
>> +descriptor file is:
>
> Please mention that the file must be valid JSON per RFC 7159.  Probably
> also worth requiring that the file is a text file (ends in a newline).
>
>> +
>> +    QBM-JSON-FILE := { "QBM": DESC-JSON }
>> +
>> +    DESC-JSON := { "version": 1,
>> +                   "image": IMAGE,
>> +                   "BITMAPS": BITMAPS
>
> s/"BITMAPS"/"bitmaps"/
>
> See my thoughts below on whether this is the ideal top-level structure.
>
>> +                 }
>> +
>> +Fields in the top level json dictionary are:
>> +
>> +@version: An integer which must be 1.
>> +@image: A dictionary in IMAGE schema, as described later. It provides the
>> +        information of the data image where user data is stored. Its format is
>> +        documented in the "IMAGE schema" section.
>> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
>> +          the dictionary are the names of bitmap, which must be strings, and
>> +          each value is a dictionary describing the information of the bitmap,
>> +          as documented below in the "BITMAP schema" section.
>
> Making 'bitmaps' be a dictionary means that we now have keys that might
> not be an identifier.  Although this is valid JSON, it may trip up some
> tools.  Would it be better to make 'bitmaps' be a list of dictionaries,
> where each dictionary has a 'name':'value' member, so that bitmap names
> that are not a (C, Python, whatever) identifier are still valid?

We need to make up our mind whether to use JSON or the intersection of
JSON with other languages.  I'd stick to JSON and the more natural
schema Fam proposed.

>> +
>> +=== IMAGE schema ===
>> +
>> +An IMAGE records the information of an image (such as a data image or a backing
>> +file). It has following fields:
>
> I liked how you showed DESC-JSON := for the top level; should you do the
> same here for IMAGE?
>
>> +
>> +@file: The file name string of the referenced image. If it's a relative path,
>> +       the file should be found relative to the descriptor file's
>> +       location.
>
> Does that mean we'll have to use 'json:...' encoding for representing
> network resources?  Should we instead reuse some of the
> qapi/block-core.json representation of a block device?

This isn't QMP, where encoding data in strings is anathema, but the same
argument applies: instead of inventing ad hoc syntaxes, just use the
JSON syntax, it should be expressive enough.

On the other hand, doing things like we do in QCOw2 is also a valid
argument.

>> +@format: The format string of the file.
>
> Nice that format is mandatory.  Do we want to call out a finite list of
> supported formats, or leave it open-ended in this spec?

I'd expect QEMU's implementation of QBM to recognize all formats QEMU
recognizes elsewhere.  Nailing it down here would add yet another place
to update when we change the list of formats QEMU recognizes.  I'd
rather not do that.

If you want to get fancy, do something like 'implementations shall
implement format "raw", and may implement additional formats.'

>> +
>> +=== BITMAP schema ===
>> +
>> +A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
>> +or a block allocation status bitmap). It has following mandatory fields:
>> +
>> +@file: The name of the bitmap file. The bitmap file is in little endian, both
>
> s/in //
>
>> +       byte-order-wise and bit-order-wise, which means the LSB in the byte 0
>> +       corresponds to the first sectors.
>
> s/the byte 0/the first byte/
>
> Again, should we be reusing something from qapi/block-core.json, to
> allow network devices with more structure than just 'json:...' naming?
>
>> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
>> +                    This value must be a power of 2 and no less than 512.
>> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
>> +       supported.
>> +       "dirty" indicates a block dirty bitmap; "allocation" indicates a
>> +       allocation status bitmap. There must be at most one "allocation" bitmap.
>> +
>> +If the type of the bitmap is "allocation", an extra field "backing" is also
>> +accepted:
>> +
>> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
>> +          adding a backing file to raw image.
>
> s/adding/add/
>
> As promised above, would an alternative representation be any better?
> That is, I'm trying to see if I could write qapi to describe the
> structure you've presented here, and I fell short when it comes to
> naming a particular bitmap.  Also, since there is at most one
> 'type':'allocation' member of 'bitmaps', I wonder if separating it out
> would make it easier to locate.
>
> Here's my proposal for an alternative schema, written in qapi:
>
> { 'struct': 'Other', 'data': { ...however we describe a network file... } }
> { 'alternate': 'File', 'data': { 'file': 'str', 'struct': 'Other' } }
> { 'struct': 'Bitmap', 'data': {
>    'name':'str', 'file': 'File',
>    'granularity-bytes':'int' } }
> { 'struct': 'AllocationBitmap', 'base': 'Bitmap', 'data': {
>   '*backing': 'Image' } }
> { 'struct': 'Image', 'data': {
>   'file': 'File',
>   'format': 'str' # would an enum be better?
> } }
> { 'struct': 'Desc', 'data': {
>   'version': 'int', 'image': 'Image',
>   '*allocation': 'AllocationBitmap',
>   'dirty': [ 'AllocationBitmap' ]
> } }
> { 'struct': 'QBM', 'data': { 'QBM': 'Desc' } }
>
> where the json description file must consist of a single 'QBM' qapi
> struct, and where the use of a QAPI alternate type 'File' allows us to
> specify either a file name or a formal structure for describing a
> network resource.  Below, I'll rewrite your example with my schema...
>
>> +
>> +
>> +=== Extended fields ===
>> +
>> +Implementations are allowed to extend the format schema by inserting additinoal
>
> s/additinoal/additional/
>
>> +members into above dictionaries, with key names that starts with either
>> +an "ext-hard-" or an "ext-soft-" prefix.

Elsewhere in QEMU, we use "x-" prefixes.

> Should we be more like qcow2 and have a third category of auto-clear
> keys (if you don't recognize the key, remove it upon editing the file,
> but reading is okay)?  Feature negotiation via this approach requires
> reading every member of 'bitmaps' (well, we have to do that anyway to
> parse the full JSON structure); would it be any better to have an
> up-front section in the top level that describes what features are in
> use, rather than requiring all new features to use the 'ext-' namespace?
>
> Should we require hard failure on any key whose name is not recognized
> (other than the weirdness of your proposal having the keys of 'bitmaps'
> be user-supplied names)?
>
>> +
>> +Extended fields prefixed with "ext-soft-" are optional and can be ignored by
>> +parsers if they do not support it; fields starting with "ext-hard-" are
>> +mandatory and cannot be ignored, a parser should not proceed parsing the image
>> +if it does not support it.
>
> Is it really the entire image invalidated if an extension is tied to a
> particular bitmap, or only that bitmap?
>
>> +
>> +It is strongly recommended that the application names are also included in the
>> +extention name string, such as "ext-hard-qemu-", if the effect or
>
> s/extention/extension/
>
>> +interpretation of the field is local to a specific application.
>> +
>> +For example, QEMU can implement a "checksum" feature to make sure no files
>> +referred to by the json descriptor are modified inconsistently, by adding
>> +"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
>> +the json text found below.
>
> If an extension proves to be useful, how do we standardize it later?
> Will it always have to carry the 'ext-' prefix?

I'd say no.  By definition above, "ext-" is for implementation's
extensions.  Using it for standard features as well would be bad taste
and confusing.

> You said that soft extensions can be ignored on parse - but if we write
> to the file, couldn't we possibly be invalidating the contents of the
> extension field, and not leaving a breadcrumb for the future reader that
> understands the extension to know that we messed it up?  I think an
> auto-clear feature would be useful (preserve the checksum field, but
> clear the associated auto-clear so that the newer reader knows that the
> checksum has to be checked).
>
> Should we be thinking about a write lock extension (similar to the
> current thread on qcow2 write locks), to make it less likely that two
> writers will be modifying the descriptor, image file, and/or bitmaps at
> the same time?
>
>> +
>> +=== QBM descriptor file example ===
>> +
>> +This is the content of a QBM image's json descriptor file, which contains a
>> +data image (data.img), and three bitmaps, out of which the "allocation" bitmap
>> +associates a backing file to this image (base.img).
>> +
>> +{ "QBM": {
>> +    "version": 1,
>> +    "image": {
>> +        "file": "data.img",
>> +        "format": "raw"
>> +        "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
>
> Comma on wrong row.
>
>> +    },
>> +    "bitmaps": {
>> +        "0": {
>> +            "file": "bitmap0.bin",
>> +            "granularity-bytes": 512,
>> +            "type": "dirty"
>> +        },
>> +        "1": {
>> +            "file": "bitmap1.bin",
>> +            "granularity-bytes": 4096,
>> +            "type": "dirty"
>> +        },
>> +        "2": {
>> +            "file": "bitmap3.bin",
>> +            "granularity-bytes": 4096,
>> +            "type": "allocation"
>
> Missing comma
>
>> +            "backing": {
>> +                "file": "base.img",
>> +                "format": "raw"
>> +                "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
>
> Comma on wrong row
>
>> +            },
>
> No trailing commas in JSON
>
> Your approach means that the same bitmap cannot be used for both 'dirty'
> and 'allocation' (because all entries in the 'bitmaps' dictionary must
> have distinct names).  (Although nothing stops two bitmap entries from
> naming the same 'file'.)
>
>> +        }
>> +    }
>> +} }
>
> So, rewriting to my schema above, this would be:
>
> { "QBM": {
>     "version": 1,
>     "image": {
>         "file": "data.img",
>         "format": "raw",
>         "ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8"
>     },
>     "allocation": {
>         "name": "2",
>         "file": "bitmap3.bin",
>         "granularity-bytes": 4096,
>         "backing": {
>             "file": "base.img",
>             "format": "raw"
>             "ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7"
>         }
>     },
>     "bitmaps": [
>         {
>             "name": "0",
>             "file": "bitmap0.bin",
>             "granularity-bytes": 512
>         },
>         {
>             "name": "1",
>             "file": "bitmap1.bin",
>             "granularity-bytes": 4096
>         }
>     ]
> } }
>
>> +
>> 
>
> Awkward to end in a blank line.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
  2016-02-22 14:24 ` [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Kevin Wolf
  2016-02-23  3:40   ` Fam Zheng
@ 2016-02-23  9:14   ` Markus Armbruster
  2016-02-23 11:28     ` Kevin Wolf
  1 sibling, 1 reply; 42+ messages in thread
From: Markus Armbruster @ 2016-02-23  9:14 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-block, qemu-devel, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

Kevin Wolf <kwolf@redhat.com> writes:

> Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben:
>> This series introduces a simple format to enable support of persistence of
>> block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
>> backup, and persistence of block dirty bitmap makes incrememtal backup possible
>> across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.
>> 
>> When user creates a "persisted" dirty bitmap, the QBM driver will create a
>> binary file and synchronize it with the existing in-memory block dirty bitmap
>> (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
>> bits saved on disk, which will be loaded and used to initialize the in-memory
>> block dirty bitmap next time the guest is started.
>> 
>> The idea of the format is to reuse as much existing infrastructure as possible
>> and avoid introducing complex data structures - it works with any image format,
>> by gluing it together plain bitmap files with a json descriptor file. The
>> advantage of this approach over extending existing formats, such as qcow2, is
>> that the new feature is implemented by an orthogonal driver, in a format
>> agnostic way. This way, even raw images can have their persistent dirty
>> bitmaps.  (And you will notice in this series, with a little forging to the
>> spec, raw images can also have backing files through a QBM overlay!)
>> 
>> Rather than superseding it, this intends to be coexistent in parallel with the
>> qcow2 bitmap extension that Vladimir is working on.  The block driver interface
>> changes in this series also try to be generic and compatible for both drivers.
>
> So as I already told Fam last week, before we discuss any technical
> details here, we first need to discuss whether this is even the right
> thing to do.

Yes, this must come first.

>              Currently I'm doubtful, as this is another attempt to
> introduce a new native image format in qemu.
>
> Let's recap the image formats and what we tell users about them today:
>
> * qcow2: This is the default choice for disk images. It gives you access
>   to all of the features in qemu at a good performance. If it doesn't
>   perform well in your case, we'll fix it.

Rather: we'll fix it if we can.

> * raw: Use this when you need absolute performance and don't need any
>   features from an image format, so you want to get any complexity just
>   out of the way and pass requests as directly as possible from the
>   guest device to the host kernel.
>
> * Anything else: Only use them to convert into raw or qcow2.
>
> Now using bitmaps is clearly on the "features" side, which suggests that
> qcow2 is the format of choice for this.

I'd agree with a general "extra feature suggests QCOW2" maxim, with
stress on "suggests".

However, the "extraness" of bitmaps is perhaps less clear than for other
features.  Bitmap-like things occur not just in formats: sparse files,
thinly provisioned SCSI devices, ...

>                                         If you want to introduce a new
> format, you need to justify it with evidence that...
>
> 1. there is a relevant use case that qcow2 doesn't cover
> 2. qcow2 can't be fixed/enhanced to cover the use case
>
> The one thing that people have claimed in the past that qcow2 can't
> provide is enough performance. This is where QED tried to come in and
> promised a compromise between performance (then a bit faster than qcow2)
> and features (almost none, but supports backing files). We all know that
> it was a failure because you had to sacrifice features and still the
> idea that qcow2 couldn't be fixed was wrong, so today we have a QED
> driver that is much slower than qcow2 despite having less features.

Yes.  We thought QCOW2 could not be made to perform[*], until you did.

New storage hardware will bring back performance pressure with a
vengeance, though.

> Now for QBM. First, let's have a look at the image format that it can be
> used with. qcow2 doesn't need it if we continue with Vladimir's
> extension. Other non-raw formats are only supposed to be used for
> conversion. The only thing that's really left is raw. Now adding a
> feature only for raw, as a compromise between features and performance,
> looks an awful lot like what QED tried. We don't want to go there.

A possible difference: complexity.

Adding another QEMU-native format in QCOW2's complexity class would be
highly problematic.  We tried with QED, because we thought we'd need it
to support different tradeoffs, but it turned out to be a dead end.

Doesn't mean there's absolutely no space for a *simple* format to
support different tradeoffs.  Is QBM simple enough?  Will it stay simple
enough?

> Even if we wanted to support persistent dirty bitmaps with raw images
> (which has to be discussed based on use cases), it's still questionable
> whether we need a new image format with JSON descriptor files instead of
> just raw bitmaps that can be added with a QMP command.
>
>
> tl;dr: Where is the justification for a new image format? You need a
> good one.

Yes.

> Kevin


[*] Mostly because we thought QCOW2 could not be hacked.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
  2016-02-23  9:14   ` Markus Armbruster
@ 2016-02-23 11:28     ` Kevin Wolf
  0 siblings, 0 replies; 42+ messages in thread
From: Kevin Wolf @ 2016-02-23 11:28 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Fam Zheng, qemu-block, qemu-devel, mreitz, vsementsov,
	Stefan Hajnoczi, jsnow

Am 23.02.2016 um 10:14 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben:
> >> This series introduces a simple format to enable support of persistence of
> >> block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
> >> backup, and persistence of block dirty bitmap makes incrememtal backup possible
> >> across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.
> >> 
> >> When user creates a "persisted" dirty bitmap, the QBM driver will create a
> >> binary file and synchronize it with the existing in-memory block dirty bitmap
> >> (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
> >> bits saved on disk, which will be loaded and used to initialize the in-memory
> >> block dirty bitmap next time the guest is started.
> >> 
> >> The idea of the format is to reuse as much existing infrastructure as possible
> >> and avoid introducing complex data structures - it works with any image format,
> >> by gluing it together plain bitmap files with a json descriptor file. The
> >> advantage of this approach over extending existing formats, such as qcow2, is
> >> that the new feature is implemented by an orthogonal driver, in a format
> >> agnostic way. This way, even raw images can have their persistent dirty
> >> bitmaps.  (And you will notice in this series, with a little forging to the
> >> spec, raw images can also have backing files through a QBM overlay!)
> >> 
> >> Rather than superseding it, this intends to be coexistent in parallel with the
> >> qcow2 bitmap extension that Vladimir is working on.  The block driver interface
> >> changes in this series also try to be generic and compatible for both drivers.
> >
> > So as I already told Fam last week, before we discuss any technical
> > details here, we first need to discuss whether this is even the right
> > thing to do.
> 
> Yes, this must come first.
> 
> >              Currently I'm doubtful, as this is another attempt to
> > introduce a new native image format in qemu.
> >
> > Let's recap the image formats and what we tell users about them today:
> >
> > * qcow2: This is the default choice for disk images. It gives you access
> >   to all of the features in qemu at a good performance. If it doesn't
> >   perform well in your case, we'll fix it.
> 
> Rather: we'll fix it if we can.

Right. The assumption is so far that we generally can. If it turns out
at some point that we can't improve it sufficiently and a new format
could improve it, then the whole approach of having only raw and qcow2
is indeed in question.

If you look at how different the various VMDK subformats are, though, it
seems that we still have more than enough maneuvering room. I think
having optional features in the format are preferrable to multiple
user-visible formats. For most people, VMDK is just VMDK, and there is
no reason why they should know about the subformats. Similarly, it would
be good if qemu users didn't have to know about qcow2, qed, fvd, qbm...

> > * raw: Use this when you need absolute performance and don't need any
> >   features from an image format, so you want to get any complexity just
> >   out of the way and pass requests as directly as possible from the
> >   guest device to the host kernel.
> >
> > * Anything else: Only use them to convert into raw or qcow2.
> >
> > Now using bitmaps is clearly on the "features" side, which suggests that
> > qcow2 is the format of choice for this.
> 
> I'd agree with a general "extra feature suggests QCOW2" maxim, with
> stress on "suggests".
> 
> However, the "extraness" of bitmaps is perhaps less clear than for other
> features.  Bitmap-like things occur not just in formats: sparse files,
> thinly provisioned SCSI devices, ...
> 
> >                                         If you want to introduce a new
> > format, you need to justify it with evidence that...
> >
> > 1. there is a relevant use case that qcow2 doesn't cover
> > 2. qcow2 can't be fixed/enhanced to cover the use case
> >
> > The one thing that people have claimed in the past that qcow2 can't
> > provide is enough performance. This is where QED tried to come in and
> > promised a compromise between performance (then a bit faster than qcow2)
> > and features (almost none, but supports backing files). We all know that
> > it was a failure because you had to sacrifice features and still the
> > idea that qcow2 couldn't be fixed was wrong, so today we have a QED
> > driver that is much slower than qcow2 despite having less features.
> 
> Yes.  We thought QCOW2 could not be made to perform[*], until you did.
> 
> New storage hardware will bring back performance pressure with a
> vengeance, though.

Then we'll have to address them. Not only for dirty bitmaps, but also
for snapshots and all the other features you only get with an image
format.

> > Now for QBM. First, let's have a look at the image format that it can be
> > used with. qcow2 doesn't need it if we continue with Vladimir's
> > extension. Other non-raw formats are only supposed to be used for
> > conversion. The only thing that's really left is raw. Now adding a
> > feature only for raw, as a compromise between features and performance,
> > looks an awful lot like what QED tried. We don't want to go there.
> 
> A possible difference: complexity.
> 
> Adding another QEMU-native format in QCOW2's complexity class would be
> highly problematic.  We tried with QED, because we thought we'd need it
> to support different tradeoffs, but it turned out to be a dead end.
> 
> Doesn't mean there's absolutely no space for a *simple* format to
> support different tradeoffs.  Is QBM simple enough?  Will it stay simple
> enough?

Good questions. I'd put a different one first, though: Can the different
tradeoffs be accommodated with (new options to) an existing format (i.e.
qcow2)? If it can, no matter how simple we think the driver would end
up, we probably don't want the duplication.

That was assuming for a moment that this tradeoff is even relevant,
which hasn't been shown. I'm actucally not completely sure yet that
we're not discussing a solution in search of a problem here.

Kevin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
  2016-02-23  3:40   ` Fam Zheng
@ 2016-02-23 17:43     ` Kevin Wolf
  2016-02-24  0:49       ` Fam Zheng
  0 siblings, 1 reply; 42+ messages in thread
From: Kevin Wolf @ 2016-02-23 17:43 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Alberto Garcia, qemu-block, jsnow, Peter Lieven, qemu-devel,
	Markus Armbruster, vsementsov, Stefan Hajnoczi, Denis V. Lunev,
	pbonzini, mreitz

Am 23.02.2016 um 04:40 hat Fam Zheng geschrieben:
> (I'm Cc'ing a few more people here just in case they have different visions
> about raw image use cases.)
> 
> On Mon, 02/22 15:24, Kevin Wolf wrote:
> > Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben:
> > > This series introduces a simple format to enable support of persistence of
> > > block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
> > > backup, and persistence of block dirty bitmap makes incrememtal backup possible
> > > across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.
> > > 
> > > When user creates a "persisted" dirty bitmap, the QBM driver will create a
> > > binary file and synchronize it with the existing in-memory block dirty bitmap
> > > (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
> > > bits saved on disk, which will be loaded and used to initialize the in-memory
> > > block dirty bitmap next time the guest is started.
> > > 
> > > The idea of the format is to reuse as much existing infrastructure as possible
> > > and avoid introducing complex data structures - it works with any image format,
> > > by gluing it together plain bitmap files with a json descriptor file. The
> > > advantage of this approach over extending existing formats, such as qcow2, is
> > > that the new feature is implemented by an orthogonal driver, in a format
> > > agnostic way. This way, even raw images can have their persistent dirty
> > > bitmaps.  (And you will notice in this series, with a little forging to the
> > > spec, raw images can also have backing files through a QBM overlay!)
> > > 
> > > Rather than superseding it, this intends to be coexistent in parallel with the
> > > qcow2 bitmap extension that Vladimir is working on.  The block driver interface
> > > changes in this series also try to be generic and compatible for both drivers.
> > 
> > So as I already told Fam last week, before we discuss any technical
> > details here, we first need to discuss whether this is even the right
> > thing to do. Currently I'm doubtful, as this is another attempt to
> > introduce a new native image format in qemu.
> > 
> > Let's recap the image formats and what we tell users about them today:
> > 
> > * qcow2: This is the default choice for disk images. It gives you access
> >   to all of the features in qemu at a good performance. If it doesn't
> >   perform well in your case, we'll fix it.
> > 
> > * raw: Use this when you need absolute performance and don't need any
> >   features from an image format, so you want to get any complexity just
> >   out of the way and pass requests as directly as possible from the
> >   guest device to the host kernel.
> > 
> > * Anything else: Only use them to convert into raw or qcow2.
> > 
> > Now using bitmaps is clearly on the "features" side, which suggests that
> > qcow2 is the format of choice for this. If you want to introduce a new
> > format, you need to justify it with evidence that...
> > 
> > 1. there is a relevant use case that qcow2 doesn't cover
> > 2. qcow2 can't be fixed/enhanced to cover the use case
> > 
> > The one thing that people have claimed in the past that qcow2 can't
> > provide is enough performance. This is where QED tried to come in and
> > promised a compromise between performance (then a bit faster than qcow2)
> > and features (almost none, but supports backing files). We all know that
> > it was a failure because you had to sacrifice features and still the
> > idea that qcow2 couldn't be fixed was wrong, so today we have a QED
> > driver that is much slower than qcow2 despite having less features.
> > 
> > Now for QBM. First, let's have a look at the image format that it can be
> > used with. qcow2 doesn't need it if we continue with Vladimir's
> > extension. Other non-raw formats are only supposed to be used for
> > conversion. The only thing that's really left is raw.
> 
> Yes, I agree with this point.
> 
> > Now adding a
> > feature only for raw, as a compromise between features and performance,
> > looks an awful lot like what QED tried. We don't want to go there.
> > 
> > Even if we wanted to support persistent dirty bitmaps with raw images
> > (which has to be discussed based on use cases), it's still questionable
> > whether we need a new image format with JSON descriptor files instead of
> > just raw bitmaps that can be added with a QMP command.
> > 
> 
> I don't think QMP interface alone is enough, in persistent backup use case,
> when starting a guest, command line interface is more appropriate to continue
> dirty trackings that were enabled during shutdown.

Yes, I was sloppy. Maybe s/QMP command/runtime option/ gets closer.

> I'd justify in two parts, one is "why" and the other is "how".
> 
> So to answer why.  The reason I worked on QBM is because I feel it wrong to
> leaving raw behind. Ceph and LVM users use raw format. You could technically
> use qcow2 with ceph but that is discouraged[1] or even refused by openstack[2].
> We've seen qcow2 on top of LVs but that is not the dominance.

Ceph is definitely a valid point. I think we agree that qcow2 can't
provide what we need there today.

The question is whether qcow2 can be extended to provide it. As we
discussed last week internally and today on the call, a possible idea
would be to extend qcow2 to act as the filter driver here, where all I/O
is redirected to the backing file and only the bitmaps remain in the
qcow2 layer.

> The scope of "features" for which we tell users they have to use qcow2 should
> those that are format specific, not "block features" in general.  Backing file,
> internal/external snapshot, thin provisioning, compression and encryption are
> all great examples of format features, whereas things including throttling,
> statistics, migration, mirroring and backing up are IMHO not.  Actually we
> already support snapshotting a raw image, with an qcow2 overlay.  We've even
> implemented non-persistent incremental backup for raw today, through
> drive-backup.  If we will decide qcow2 is the only possible format that can do
> persistent backup, I'm not really a huge fan of it.

Yeah, but that's just a feeling, not a use case.

> Then "how"?
> 
> Actually, I thought we could do it in a way similar to quorum. The way quorum
> driver works is by specifying tediously long options. A snippet from
> qemu-iotests to build a quorum driver with 3 children is like this:
> 
>     quorum="driver=raw,file.driver=quorum,file.vote-threshold=2"
>     quorum="$quorum,file.children.0.file.filename=$TEST_DIR/1.raw"
>     quorum="$quorum,file.children.1.file.filename=$TEST_DIR/2.raw"
>     quorum="$quorum,file.children.2.file.filename=$TEST_DIR/3.raw"
>     quorum="$quorum,file.children.0.driver=raw"
>     quorum="$quorum,file.children.1.driver=raw"
>     quorum="$quorum,file.children.2.driver=raw"
> 
> Though very repetitive, it is also very simple: all children are almost
> symmemtrical (identical in user data). The only thing for user/management tool
> to make sure is the images have the same data.

By the way, the repetitiveness would be greatly reduced if the test case
were using the json: pseudo-protocol.

> Unfortunately the logic is more complicated in an persistent incremental backup
> scenario. Manual users will have to specify bitmap file names and the
> granularities which they may have no clue anymore two weeks after they created
> the bitmap, and can get wrong.  Management seems a must in this case, but the
> interface we provide to them still feels way too low level. Anyway, I do think
> we can consider a "banana" (dummy name) driver for persistent bitmap management
> which is structured like quorum:
> 
>     banana="driver=raw,file.driver=banana,file.mode=synchronous"
>     banana="$banana,file.image.file.filename=$TEST_IMG"
>     banana="$banana,file.bitmaps.0.file.filename=$TEST_DIR/bm0.raw"
>     banana="$banana,file.bitmaps.0.granularity=65536"
>     banana="$banana,file.bitmaps.0.name=bm0"
>     banana="$banana,file.bitmaps.1.file.filename=$TEST_DIR/bm1.raw"
>     banana="$banana,file.bitmaps.1.granularity=1048576"
>     banana="$banana,file.bitmaps.1.name=bm1"
>     ...
> 
> But we're merely inlining the information from QBM JSON format into the command
> line. This is IMO only one step of differences in between.

It wasn't as clear to me before I read this explanation, but is the QBM
on-disk file format really just reinventing qemu config files then?

Kevin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap
  2016-02-23 17:43     ` Kevin Wolf
@ 2016-02-24  0:49       ` Fam Zheng
  0 siblings, 0 replies; 42+ messages in thread
From: Fam Zheng @ 2016-02-24  0:49 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Alberto Garcia, qemu-block, jsnow, Peter Lieven, qemu-devel,
	Markus Armbruster, vsementsov, Stefan Hajnoczi, Denis V. Lunev,
	pbonzini, mreitz

On Tue, 02/23 18:43, Kevin Wolf wrote:
> Am 23.02.2016 um 04:40 hat Fam Zheng geschrieben:
> > (I'm Cc'ing a few more people here just in case they have different visions
> > about raw image use cases.)
> > 
> > On Mon, 02/22 15:24, Kevin Wolf wrote:
> > > Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben:
> > > > This series introduces a simple format to enable support of persistence of
> > > > block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
> > > > backup, and persistence of block dirty bitmap makes incrememtal backup possible
> > > > across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.
> > > > 
> > > > When user creates a "persisted" dirty bitmap, the QBM driver will create a
> > > > binary file and synchronize it with the existing in-memory block dirty bitmap
> > > > (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
> > > > bits saved on disk, which will be loaded and used to initialize the in-memory
> > > > block dirty bitmap next time the guest is started.
> > > > 
> > > > The idea of the format is to reuse as much existing infrastructure as possible
> > > > and avoid introducing complex data structures - it works with any image format,
> > > > by gluing it together plain bitmap files with a json descriptor file. The
> > > > advantage of this approach over extending existing formats, such as qcow2, is
> > > > that the new feature is implemented by an orthogonal driver, in a format
> > > > agnostic way. This way, even raw images can have their persistent dirty
> > > > bitmaps.  (And you will notice in this series, with a little forging to the
> > > > spec, raw images can also have backing files through a QBM overlay!)
> > > > 
> > > > Rather than superseding it, this intends to be coexistent in parallel with the
> > > > qcow2 bitmap extension that Vladimir is working on.  The block driver interface
> > > > changes in this series also try to be generic and compatible for both drivers.
> > > 
> > > So as I already told Fam last week, before we discuss any technical
> > > details here, we first need to discuss whether this is even the right
> > > thing to do. Currently I'm doubtful, as this is another attempt to
> > > introduce a new native image format in qemu.
> > > 
> > > Let's recap the image formats and what we tell users about them today:
> > > 
> > > * qcow2: This is the default choice for disk images. It gives you access
> > >   to all of the features in qemu at a good performance. If it doesn't
> > >   perform well in your case, we'll fix it.
> > > 
> > > * raw: Use this when you need absolute performance and don't need any
> > >   features from an image format, so you want to get any complexity just
> > >   out of the way and pass requests as directly as possible from the
> > >   guest device to the host kernel.
> > > 
> > > * Anything else: Only use them to convert into raw or qcow2.
> > > 
> > > Now using bitmaps is clearly on the "features" side, which suggests that
> > > qcow2 is the format of choice for this. If you want to introduce a new
> > > format, you need to justify it with evidence that...
> > > 
> > > 1. there is a relevant use case that qcow2 doesn't cover
> > > 2. qcow2 can't be fixed/enhanced to cover the use case
> > > 
> > > The one thing that people have claimed in the past that qcow2 can't
> > > provide is enough performance. This is where QED tried to come in and
> > > promised a compromise between performance (then a bit faster than qcow2)
> > > and features (almost none, but supports backing files). We all know that
> > > it was a failure because you had to sacrifice features and still the
> > > idea that qcow2 couldn't be fixed was wrong, so today we have a QED
> > > driver that is much slower than qcow2 despite having less features.
> > > 
> > > Now for QBM. First, let's have a look at the image format that it can be
> > > used with. qcow2 doesn't need it if we continue with Vladimir's
> > > extension. Other non-raw formats are only supposed to be used for
> > > conversion. The only thing that's really left is raw.
> > 
> > Yes, I agree with this point.
> > 
> > > Now adding a
> > > feature only for raw, as a compromise between features and performance,
> > > looks an awful lot like what QED tried. We don't want to go there.
> > > 
> > > Even if we wanted to support persistent dirty bitmaps with raw images
> > > (which has to be discussed based on use cases), it's still questionable
> > > whether we need a new image format with JSON descriptor files instead of
> > > just raw bitmaps that can be added with a QMP command.
> > > 
> > 
> > I don't think QMP interface alone is enough, in persistent backup use case,
> > when starting a guest, command line interface is more appropriate to continue
> > dirty trackings that were enabled during shutdown.
> 
> Yes, I was sloppy. Maybe s/QMP command/runtime option/ gets closer.
> 
> > I'd justify in two parts, one is "why" and the other is "how".
> > 
> > So to answer why.  The reason I worked on QBM is because I feel it wrong to
> > leaving raw behind. Ceph and LVM users use raw format. You could technically
> > use qcow2 with ceph but that is discouraged[1] or even refused by openstack[2].
> > We've seen qcow2 on top of LVs but that is not the dominance.
> 
> Ceph is definitely a valid point. I think we agree that qcow2 can't
> provide what we need there today.
> 
> The question is whether qcow2 can be extended to provide it. As we
> discussed last week internally and today on the call, a possible idea
> would be to extend qcow2 to act as the filter driver here, where all I/O
> is redirected to the backing file and only the bitmaps remain in the
> qcow2 layer.
> 
> > The scope of "features" for which we tell users they have to use qcow2 should
> > those that are format specific, not "block features" in general.  Backing file,
> > internal/external snapshot, thin provisioning, compression and encryption are
> > all great examples of format features, whereas things including throttling,
> > statistics, migration, mirroring and backing up are IMHO not.  Actually we
> > already support snapshotting a raw image, with an qcow2 overlay.  We've even
> > implemented non-persistent incremental backup for raw today, through
> > drive-backup.  If we will decide qcow2 is the only possible format that can do
> > persistent backup, I'm not really a huge fan of it.
> 
> Yeah, but that's just a feeling, not a use case.
> 
> > Then "how"?
> > 
> > Actually, I thought we could do it in a way similar to quorum. The way quorum
> > driver works is by specifying tediously long options. A snippet from
> > qemu-iotests to build a quorum driver with 3 children is like this:
> > 
> >     quorum="driver=raw,file.driver=quorum,file.vote-threshold=2"
> >     quorum="$quorum,file.children.0.file.filename=$TEST_DIR/1.raw"
> >     quorum="$quorum,file.children.1.file.filename=$TEST_DIR/2.raw"
> >     quorum="$quorum,file.children.2.file.filename=$TEST_DIR/3.raw"
> >     quorum="$quorum,file.children.0.driver=raw"
> >     quorum="$quorum,file.children.1.driver=raw"
> >     quorum="$quorum,file.children.2.driver=raw"
> > 
> > Though very repetitive, it is also very simple: all children are almost
> > symmemtrical (identical in user data). The only thing for user/management tool
> > to make sure is the images have the same data.
> 
> By the way, the repetitiveness would be greatly reduced if the test case
> were using the json: pseudo-protocol.
> 
> > Unfortunately the logic is more complicated in an persistent incremental backup
> > scenario. Manual users will have to specify bitmap file names and the
> > granularities which they may have no clue anymore two weeks after they created
> > the bitmap, and can get wrong.  Management seems a must in this case, but the
> > interface we provide to them still feels way too low level. Anyway, I do think
> > we can consider a "banana" (dummy name) driver for persistent bitmap management
> > which is structured like quorum:
> > 
> >     banana="driver=raw,file.driver=banana,file.mode=synchronous"
> >     banana="$banana,file.image.file.filename=$TEST_IMG"
> >     banana="$banana,file.bitmaps.0.file.filename=$TEST_DIR/bm0.raw"
> >     banana="$banana,file.bitmaps.0.granularity=65536"
> >     banana="$banana,file.bitmaps.0.name=bm0"
> >     banana="$banana,file.bitmaps.1.file.filename=$TEST_DIR/bm1.raw"
> >     banana="$banana,file.bitmaps.1.granularity=1048576"
> >     banana="$banana,file.bitmaps.1.name=bm1"
> >     ...
> > 
> > But we're merely inlining the information from QBM JSON format into the command
> > line. This is IMO only one step of differences in between.
> 
> It wasn't as clear to me before I read this explanation, but is the QBM
> on-disk file format really just reinventing qemu config files then?

I agree they look alike on the surface, but are qemu config files updated by
QEMU? An image is both read and more importantly written by the driver
following a definite format specification, I think that is fundamentally
different.

Fam

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-02-24  0:49 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26 10:38 [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 01/16] doc: Add QBM format specification Fam Zheng
2016-01-26 17:51   ` Eric Blake
2016-02-09  0:05     ` John Snow
2016-02-23  8:35     ` Markus Armbruster
2016-02-08 23:51   ` John Snow
2016-02-17 11:48   ` Vladimir Sementsov-Ogievskiy
2016-02-17 16:30   ` Vladimir Sementsov-Ogievskiy
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 02/16] block: Set dirty before doing write Fam Zheng
2016-01-26 17:52   ` Eric Blake
2016-02-09  0:11   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps Fam Zheng
2016-01-26 17:53   ` Eric Blake
2016-02-09  0:23   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 04/16] block: Move filename_decompose to block.c Fam Zheng
2016-01-27 16:07   ` Eric Blake
2016-02-09 20:56   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public Fam Zheng
2016-01-27 16:08   ` Eric Blake
2016-02-09 21:06   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent Fam Zheng
2016-02-09 21:31   ` John Snow
2016-02-09 22:04   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 07/16] block: Only swap non-persistent dirty bitmaps Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add Fam Zheng
2016-02-09 22:05   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent Fam Zheng
2016-02-09 22:49   ` John Snow
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 10/16] qbm: Implement format driver Fam Zheng
2016-02-17 13:30   ` Vladimir Sementsov-Ogievskiy
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 11/16] qapi: Add "qbm" as a generic cow " Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 12/16] iotests: Add qbm format to 041 Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 13/16] iotests: Add qbm to case 097 Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 14/16] iotests: Add qbm to applicable test cases Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 15/16] iotests: Add qbm specific test case 140 Fam Zheng
2016-01-26 10:38 ` [Qemu-devel] [RFC PATCH 16/16] iotests: Add persistent bitmap test case 141 Fam Zheng
2016-02-22 14:24 ` [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap Kevin Wolf
2016-02-23  3:40   ` Fam Zheng
2016-02-23 17:43     ` Kevin Wolf
2016-02-24  0:49       ` Fam Zheng
2016-02-23  9:14   ` Markus Armbruster
2016-02-23 11:28     ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).